Lightweight Models and Cost Effective Scalability

May 12, 2010

Today, large scale websites are amazingly complex systems. As a result of this (and other factors), many websites eventually run into problems such as cost, performance and scalability. In fact, the problem is so ubiquitous that it’s not highly uncommon for websites to undergo major re-engineering once they hit a certain size threshold, in order to effectively keep up with growth and demand.

Fortunately, there are concepts and methodologies that, if followed correctly and consistently, will help minimize the aforementioned problems.

One such important concept is choosing the right Web Application Stack. The stack is made up of various layers (see diagram below). At each layer of the stack, there are several technologies to choose from. Often they are interchangeable, resulting in many different combinations depending on what suites the business needs best (cost wise, performance wise etc.).

As a demonstration, I will be talking about the popular video website, YouTube. The reason I have picked YouTube is because it is well known and widely used, and also because it’s Web Application Stack is one that is very widespread on the World Wide Web today. It’s called LAMP.

LAMP STACK

LAMP STACK

LAMP is an acronym that stands for LinuxApache HTTP ServerMySQLPHP/Perl/Python. As you can see, it’s made up of all the technologies used at various levels of the stack. As I mentioned earlier, the levels are quite interchangeable, which is why you have other competing stacks such as:

WAMP:

Windows Server – Apache HTTP Server – MySQL – PHP/Perl/Python

WIMP

Windows Server – IIS  – MSAccess – PHP/Perl/Python

WISA

Windows Server – IIS – SQL Server – ASP.NET

One of the most prominent reasons this stack has become popular is because of its free of cost, open source status. Many of todays big websites such as Google, Facebook, Youtube, Gmail, Digg, Wikipedia and Flikr use the LAMP stack.

Before proceeding further, lets define what scalability means. Scalability refers to the amount of data a website contains, and also the number of users it can simultaneously support. A scalable website is one that can easily support additional users and traffic by expanding hardware and bandwidth, without making changes to the software or structure of the website. If a website structure cannot cope with additional users, it has reached its scalability threshold.

Where does LAMP come in to all of this? Well firstly, as mentioned above, LAMP consists of free software, which makes it the perfect choice for Cost Effective scalability. As far as scalability itself is concerned, the software in LAMP, such as PHP allow for horizontal scalability. Each layer in the stack can expand and grow on its own, without having any unwanted effect on the other layers.

At this point, I’d like to mention the difference between scaling up and scaling out.

Scaling Up: This refers to upgrading already existing equipment, e.g. adding more RAM to the database server

Scaling Out: This refers to expanding the number of equipment working simultaneously, e.g. adding a second database server

Most times scaling up is more cost effective than scaling out, however over time scaling up will reach its limits and scaling out is inevitable.

Software in the LAMP stack encourages and supports scalability.

Apache – Multiple Apache HTTP Servers can be set to run in networked fashion (Load Balancing software such as NetScalar required)

MySQL – Some of the ways in which MySQL supports scalability is by providing:

Scalability for read-intensive applications through multiple slaves

Scalability for read/write-intensive applications using multi-master replication and partitioning

Scalability for heavily used web sites using MySQL Cluster

Read More…

In the case of YouTube, we know the following to be true today:

–          Over 1 billion views per day

–          Uses NetScalar for Load Balancing

–          Uses a LAMP implementation with Python

Youtube Statistics

Youtube Statistics

YouTube Logo

YouTube Logo

When YouTube first started off, it hosted only a handful of videos and did not require much hardware or bandwidth. Over the past 5 years, this has changed dramatically, and the LAMP software bundle has allowed YouTube to stay alive without crashing.

Commoditization of hardware, bandwidth and software has meant that prices have gone down lower, and it is much more feasible now. Since LAMP uses Open Source software, reusing existing code libraries means we can now *stand on the shoulders of giants* so to speak. No time is wasted re inventing the wheel.

Apart from this, there is also the concept of having a light weight model. In YouTube’s case, this can be seen by noticing how similar and uniform all the pages are. The use of templates in PHP means that the design of the entire website can be uniform and centrally controlled and expanded, without an increase in overhead. In other words, PHP templates allow greater versatility and much easier access when editing content.

LAMP Logo

LAMP Logo (Yes I was bored so I Photoshop'd it =D)

Advertisements

Leveraging “The Long Tail”

May 3, 2010

The Long Tail is a concept in marketing and economics that has been getting far greater attention recently than ever before. For those of you who don’t understand the concept, I’ll explain.

The related underlying concepts such as the 80/20 rule are not new (see Pareto Principle, Power Law) but what has given this concept widespread attention recently is it’s near ubiquitous use in the world of web 2.0.

Sticking to tradition, I shall quote Wikipedia for a concise definition:

[The Long Tail] refers to the statistical property that a larger share of population rests within the tail of a probability distribution than observed under a ‘normal’ or Gaussian distribution.

Translated to English in the context of Web 2.0, this basically means that for any given set of items (e.g. Music, Clothing etc.) there is a large market share for just the top items, but more importantly there is also a sizable demographic for the remaining items. In case you’re confused, look at the following graph I have prepared:

Leveraging The Long Tail of eBay

Leveraging The Long Tail of eBay

Needless to say, I will be using eBay as an example to demonstrate what The Long Tail is really about, hence the graph being tailored to eBay. The dark grey portion of the market refers to those items that are main stream and are available from regular retail outlets such as brick mortar stores. This comparatively small range of items comprises “the head” of the graph and essentially controls a large portion of the market. However, the light grey portion, which is referred to as “the long tail” is comprised of all other items which, although individually do not sell much, as a whole, comprise a sizeable market share. This situation can be summarized below:

Dark Grey: Few types of products, each sell high number of units

Light Grey: Large types of products, each sell low number of units

Now, if it isn’t obvious, the reason why the Long Tail is of a comparable size is because of the sheer number of non mainstream products out there. Although each individual product will not have a big market, it will have a Niche Market. The number of such Niche Markets adds up considerably to form this concept of “The Long Tail”. As a business, eBay has done a rather excellent job of cultivating this.

Here is an example of a product that might be found in a retail store:

Harvey Norman – PS3 120GB Gaming Console

And the same product on eBay

eBay – PS3 120GB Gaming Console

This would be a typical example from the dark grey section of the graph.

But what about a rarer product or something less known? I very highly doubt the following items are easily available through regular brick mortar stores (if at all):

eBay – Bizarre Chinese old animal shape LOCK *****Collectible

eBay – Tibet Man-Made coral engrave buddha snuff bottle+++++++

eBay – Xact XTR-1 Sirius Satellite Receiver Remote RARE

eBay – Haunted Powerful Utukku Demon Djinn Ring of Wealth

As you can see, there is not a big demand for these seemingly random (and sometimes bizarre) items, but their sheer number makes their combined market share a sizeable one.

This “potential” in the long tail had previously been left untapped. This was due to many reasons, such as the small physical reach of traditional Brick Mortar stores, cost of purchasing inventory, storing and distributing the products. Only recently have these factors become negligible in cost, and have therefore allowed the tail’s potential to be recognized and cultivated.