Lightweight Models and Cost Effective Scalability

May 12, 2010

Today, large scale websites are amazingly complex systems. As a result of this (and other factors), many websites eventually run into problems such as cost, performance and scalability. In fact, the problem is so ubiquitous that it’s not highly uncommon for websites to undergo major re-engineering once they hit a certain size threshold, in order to effectively keep up with growth and demand.

Fortunately, there are concepts and methodologies that, if followed correctly and consistently, will help minimize the aforementioned problems.

One such important concept is choosing the right Web Application Stack. The stack is made up of various layers (see diagram below). At each layer of the stack, there are several technologies to choose from. Often they are interchangeable, resulting in many different combinations depending on what suites the business needs best (cost wise, performance wise etc.).

As a demonstration, I will be talking about the popular video website, YouTube. The reason I have picked YouTube is because it is well known and widely used, and also because it’s Web Application Stack is one that is very widespread on the World Wide Web today. It’s called LAMP.

LAMP STACK

LAMP STACK

LAMP is an acronym that stands for LinuxApache HTTP ServerMySQLPHP/Perl/Python. As you can see, it’s made up of all the technologies used at various levels of the stack. As I mentioned earlier, the levels are quite interchangeable, which is why you have other competing stacks such as:

WAMP:

Windows Server – Apache HTTP Server – MySQL – PHP/Perl/Python

WIMP

Windows Server – IIS  – MSAccess – PHP/Perl/Python

WISA

Windows Server – IIS – SQL Server – ASP.NET

One of the most prominent reasons this stack has become popular is because of its free of cost, open source status. Many of todays big websites such as Google, Facebook, Youtube, Gmail, Digg, Wikipedia and Flikr use the LAMP stack.

Before proceeding further, lets define what scalability means. Scalability refers to the amount of data a website contains, and also the number of users it can simultaneously support. A scalable website is one that can easily support additional users and traffic by expanding hardware and bandwidth, without making changes to the software or structure of the website. If a website structure cannot cope with additional users, it has reached its scalability threshold.

Where does LAMP come in to all of this? Well firstly, as mentioned above, LAMP consists of free software, which makes it the perfect choice for Cost Effective scalability. As far as scalability itself is concerned, the software in LAMP, such as PHP allow for horizontal scalability. Each layer in the stack can expand and grow on its own, without having any unwanted effect on the other layers.

At this point, I’d like to mention the difference between scaling up and scaling out.

Scaling Up: This refers to upgrading already existing equipment, e.g. adding more RAM to the database server

Scaling Out: This refers to expanding the number of equipment working simultaneously, e.g. adding a second database server

Most times scaling up is more cost effective than scaling out, however over time scaling up will reach its limits and scaling out is inevitable.

Software in the LAMP stack encourages and supports scalability.

Apache – Multiple Apache HTTP Servers can be set to run in networked fashion (Load Balancing software such as NetScalar required)

MySQL – Some of the ways in which MySQL supports scalability is by providing:

Scalability for read-intensive applications through multiple slaves

Scalability for read/write-intensive applications using multi-master replication and partitioning

Scalability for heavily used web sites using MySQL Cluster

Read More…

In the case of YouTube, we know the following to be true today:

–          Over 1 billion views per day

–          Uses NetScalar for Load Balancing

–          Uses a LAMP implementation with Python

Youtube Statistics

Youtube Statistics

YouTube Logo

YouTube Logo

When YouTube first started off, it hosted only a handful of videos and did not require much hardware or bandwidth. Over the past 5 years, this has changed dramatically, and the LAMP software bundle has allowed YouTube to stay alive without crashing.

Commoditization of hardware, bandwidth and software has meant that prices have gone down lower, and it is much more feasible now. Since LAMP uses Open Source software, reusing existing code libraries means we can now *stand on the shoulders of giants* so to speak. No time is wasted re inventing the wheel.

Apart from this, there is also the concept of having a light weight model. In YouTube’s case, this can be seen by noticing how similar and uniform all the pages are. The use of templates in PHP means that the design of the entire website can be uniform and centrally controlled and expanded, without an increase in overhead. In other words, PHP templates allow greater versatility and much easier access when editing content.

LAMP Logo

LAMP Logo (Yes I was bored so I Photoshop'd it =D)

Advertisements

Leveraging “The Long Tail”

May 3, 2010

The Long Tail is a concept in marketing and economics that has been getting far greater attention recently than ever before. For those of you who don’t understand the concept, I’ll explain.

The related underlying concepts such as the 80/20 rule are not new (see Pareto Principle, Power Law) but what has given this concept widespread attention recently is it’s near ubiquitous use in the world of web 2.0.

Sticking to tradition, I shall quote Wikipedia for a concise definition:

[The Long Tail] refers to the statistical property that a larger share of population rests within the tail of a probability distribution than observed under a ‘normal’ or Gaussian distribution.

Translated to English in the context of Web 2.0, this basically means that for any given set of items (e.g. Music, Clothing etc.) there is a large market share for just the top items, but more importantly there is also a sizable demographic for the remaining items. In case you’re confused, look at the following graph I have prepared:

Leveraging The Long Tail of eBay

Leveraging The Long Tail of eBay

Needless to say, I will be using eBay as an example to demonstrate what The Long Tail is really about, hence the graph being tailored to eBay. The dark grey portion of the market refers to those items that are main stream and are available from regular retail outlets such as brick mortar stores. This comparatively small range of items comprises “the head” of the graph and essentially controls a large portion of the market. However, the light grey portion, which is referred to as “the long tail” is comprised of all other items which, although individually do not sell much, as a whole, comprise a sizeable market share. This situation can be summarized below:

Dark Grey: Few types of products, each sell high number of units

Light Grey: Large types of products, each sell low number of units

Now, if it isn’t obvious, the reason why the Long Tail is of a comparable size is because of the sheer number of non mainstream products out there. Although each individual product will not have a big market, it will have a Niche Market. The number of such Niche Markets adds up considerably to form this concept of “The Long Tail”. As a business, eBay has done a rather excellent job of cultivating this.

Here is an example of a product that might be found in a retail store:

Harvey Norman – PS3 120GB Gaming Console

And the same product on eBay

eBay – PS3 120GB Gaming Console

This would be a typical example from the dark grey section of the graph.

But what about a rarer product or something less known? I very highly doubt the following items are easily available through regular brick mortar stores (if at all):

eBay – Bizarre Chinese old animal shape LOCK *****Collectible

eBay – Tibet Man-Made coral engrave buddha snuff bottle+++++++

eBay – Xact XTR-1 Sirius Satellite Receiver Remote RARE

eBay – Haunted Powerful Utukku Demon Djinn Ring of Wealth

As you can see, there is not a big demand for these seemingly random (and sometimes bizarre) items, but their sheer number makes their combined market share a sizeable one.

This “potential” in the long tail had previously been left untapped. This was due to many reasons, such as the small physical reach of traditional Brick Mortar stores, cost of purchasing inventory, storing and distributing the products. Only recently have these factors become negligible in cost, and have therefore allowed the tail’s potential to be recognized and cultivated.


Perpetual Beta

April 26, 2010

This week’s discussion topic is about the concept of Perpetual Beta. As per usual, let’s start off with what that actually means. From Wikipedia:

Perpetual beta is a term used to describe software or a system which remains at the beta development stage for an extended or even indefinite period of time. It is often used by developers in order to allow them to constantly release new features that might not be fully tested. As a result, perpetual beta software is not recommended for mission critical machines.

So basically, what we’re talking about here is a system which has never really reached full maturity. At this stage, I think its important to note the difference between a final, mature system and a stable system. Many people associate the term BETA Release with not just an unfinished system, but also an unstable one. This might be the case some of the time, but not always. An immature system is not necessarily unstable, just like a mature system isn’t always stable. With this critical difference in mind, it’s easy to see how platforms such as Facebook are well within the scope of perpetual betas. They are fully functional systems that also receive regular upgrades.

Note: There are several conflicting definitions for the concept of software maturity. For the purpose of this blog, I define it as:

“The condition a system reaches when development is nonexistent or at most minimized, and majority of the work revolves around system maintenance”

Now that we know what Perpetual Beta is all about, the next step is to understand why this model is used and what benefits it serves.

Let’s take Facebook as an example. What are the reasons for keeping it in a perpetual state of beta?

First and foremost is flexibility. Being unfinished allows Facebook to add, change and remove features as they deep appropriate (through various means such as Collective Intelligence). Facebook can use data gathered from research and user feedback to find out what its end users want changed, added or removed from the system. Keeping customers happy is the key to any successful business, and make no mistake, Facebook is an online business (Read this for more information).

Being in a perpetual beta state also lets Facebook stay ahead of competitors, constantly throwing out new features or improving existing ones, to ensure they are the leading provider of online socialising.

Personally, I also think its a way to let users know that they can always expect improvements and new things to be rolled out in the future. It keeps them keen and interested, and also assured that their ever changing needs will be met in the future.

Shravan.


Software above the level of a single device

April 18, 2010

So this week I shall be talking about the concept of ‘software above the level of a single device’.

What does this mean? Well, in a nutshell, any software service which you are able to access from more than 1 device, whether it be a mobile phone, a laptop or a PC. To better explain this concept, let’s look at an example of software that runs above the level of a single device: Microsoft Windows Live Mesh.

Here is a short advertisement for Microsoft Windows Live Mesh featured at the Web 2.0 expo:

Windows Live Mesh is a free data sharing and synchronization service offered by Microsoft. It works in the following way. A user is able to add content (files, folders etc.) to “the mesh” (collective term for all devices synchronized with one another). For a typical user this mesh may consist of mobile devices, laptops and home computers. Depending on the application, it may also contain devices of friends or family. When content is uploaded to the mesh, e.g. a photo from a mobile phone, live mesh synchronizes the new content with the other devices in the mesh. To accomplish this, Live Mesh utilities the widely used open- source FeedSync system.

What is the use of such a service? Well, for starters this is the perfect example of what a Web 2.0 application aims to achieve, namely information collaboration and sharing over the Internet. Previously this was not possible as several restrictions such as cost of the service, number of users who would actually find it useful and also technical capability handicaps (such as Internet speed) existed. These days, devices that connect to the net are ubiquitous. Almost every person has a mobile phone and a computer device of some sort. Internet speeds and data allowances are more than capable of allowing this (in first world countries at least… click here to learn more about the technology gap that still exists between First World Countries and Third World Countries).

The most glaring benefit however, of software that runs above the level of a single device so to speak is that the information that is being dealt with is not restricted to a single device. This means the information is readily accessible anywhere you go. The software itself is of little importance here, which is a key concept to understand. We do not access Windows Live Mesh to access “it”… meaning it’s not an end point. It is simply a means to get to what’s really important, the information. In the case of Live Mesh, family photos, videos, music and even work documents need not be carried around with you. They can all be retrieved from “the mesh”. Think of it as your on line access to all your computing devices, whenever, wherever. It has tremendous impact on mobility and also efficiency. For example, working on a document at work due the next morning? No need to finish it then and there. Simply access the document from the mesh at home, finish it and it will automatically get synchronized with your computer at work. Another example would be an overseas vacation. No need to take all your photographs and data overseas, simply access it over the web.

This is basically how other such software such as Facebook and Twitter work.

To sum up, I would say that after the ubiquitousness of devices, the only logical step from here would be to make information ubiquitous as well. After all, it is the really what counts in the end.

livemesh_logo

Windows Live Mesh

References

[1] Live Mesh – Wikipedia, the free encyclopedia

[2] LiveMesh and FeedSync: software “above the level of a single device” – by Jon Udell


Rich User Experiences

March 28, 2010

As the Web 2.0 trend pulls us away from the traditional desktop oriented software model to a web based one, one of the biggest problem we encounter is replicating the same user experience online as we once had on the desktop.

Currently, a lot of online applications (e.g. Google Docs) are more or less light-weight versions of existing desktop applications we are used to working with (e.g. Microsoft Office) because they don’t provide the same level of functionalities as the desktop software. However, if the online server-client model is to persist and become ubiquitous, then it MUST match the user experience currently being offered by desktop applications. Having a rich user experience is a necessity for user acceptance over other solutions.

The phrase “rich user experience” encapsulates a multitude of different things including (but not limited to) functionality (what features the application provides), usability (how easy is it to use?) and presentation (is the GUI attractive?).

For example, let’s take the current market leader platform for providing such rich experiences that I have talked about: Adobe Flash. Wikipedia describes Adobe Flash concisely so I shall quote it instead of re-inventing the wheel:

Adobe Flash (formerly Macromedia Flash) is a multimedia platform that is popular for adding animation and interactivity to web pages. Originally acquired by Macromedia, Flash was introduced in 1996, and is currently developed and distributed by Adobe Systems.”

Today, Flash has become widespread and has reached a market penetration of 95 – 97%¹

Why did Flash become so popular? Because Flash provided web developers with a way to provide richer experiences than what standard HTML pages could. This included incorporating multimedia elements such as animations and video directly into the web pages. I don’t have exact statistics, but it is beyond a doubt that companies which used Flash to add richness to their websites would have seen their popularity rise.

Over the years we saw Flash take a strong market hold as many websites had some sort of flash on their pages, be it videos, advertisements & banners or any other components.

During subsequent versions, Flash included its own scripting language called ActionScript which allowed developers to build their own web applications using Flash. Users could interact using their computer’s input devices such as the keyboard, mouse, microphone & webcam. ActionScript itself evolved from a scripting language into an Object Oriented programming language, allowing users to build even more complex and richer applications using Flash.

In the end I think we can see that Adobe Flash is a perfect example of how rich internet applications can transform websites and provide richer user experiences which can only lead to greater user acceptance.

PS. Some other technologies that are interesting to look at are AJAX and Adobe Flex.

¹ http://www.statowl.com/custom_ria_market_penetration.php


Innovation in Assembly – Journey from Applications to Platforms

March 21, 2010

Not many years ago, as recent as the late 1990’s, the most commonly used architecture that was prevalent on the web was application based. Many developers including e-businesses focused themselves on developing standalone web applications to suit their business needs and purposes. This was at a time when applications weren’t as complex in functionality and features as they are today.  Then, as technology and business requirements both grew more and more complex, we started to see an increase in the number of applications being used by any particular business or web developer. The problem that surfaced was interconnectivity between these applications was often anything but easy. There was no standard way for any number of applications to communicate with each other. This is essentially when the web platforms came in. Platforms made it easier for a company to build an extensible set of applications all based on the one platform. This allowed for easier management, scalability and more recently, being able to take advantage of 3rd party developers through offering public Application Programming Interfaces (API’s).

One of the most recent examples of such a platform is Google’s Android Operating System for smart phones. Android was unveiled Q4 2007. This platform is similar to Apple’s iPhone platform for mobile phones, with the key difference being that Android is Open source and iPhone is not.

One of the ways in which Google has used Android as a platform is by integrating other Google applications into it. These include Google Maps, Google Mail, Google Voice and Google Translate. As previously mentioned, interconnecting applications is one of the key benefits a common platform provides.

Apart from Google services, Android is also open to a variety of 3rd party Applications developed using Android Software Development Kit (SDK) which includes debuggers, emulators, sample code and documentation, and also an extensive set of API’s.  

As of March 2010, there are over 30,000 applications in existence for the Android Operating System. These can be accessed and downloaded via the Android Market, a similar concept to the iPhone App Store.

All in all I think that focus on Platform as opposed to Applications will be the future of software and services development not just for the web or for mobile devices. The paradigm shift has already taken place widely with the onset of “web 2.0”.


Data – The next “Intel Inside”?

March 14, 2010

With the explosion of Web 2.0, web applications are increasingly becoming data driven. Examples for this include Facebook, MySpace, Twitter, Youtube and the list goes on. During earlier times, the focus had always been on the application itself – providing features. The introduction of Web 2.0 has really brought about a paradigm shift on the Internet.

Creating a unique source of data that is hard to replicate has become a very important strategy for application developers. It has finally been realised that the data submitted by users is much more important than the application itself. This ties into my earlier post about Collective Intelligence – how everyones collected knowledge is worth so much.

Other aspects of CI – such as having feedback mechanisms like reviews, comments & ratings are all ways in which web application developers can increase their “data wealth” on their website.

In the end, it all boils down to the same type of questions which all basically say ‘what about the data is so important’? Well, it depends on what type of data is being gathered. Sometimes, the data benefits the application and thus in turn, the company. One particular example of this is Google Earth. Users are able to upload their own data such as information about landmarks and even photographs into Google Earth’s database, and this is integrated into the Google Earth user interface for easy access.

Other examples such as Facebook And Twitter are harder to speculate on. The nature of data they are collecting is quite personal. There are undoubtedly 3rd parties out there “in the wild” who would benefit in some way or form from having access to this data, such as companies wanting to promote their products and services through targeted advertisements. Even though the data may not identify any particular person by name, it is still accurate enough to include them and others in a group of ‘targets’ for advertising. Of course, Facebook wouldn’t just hand this information out for free.. they too would profit from  YOUR data.. YOUR inputs into the website.

So next time you are posting something on your favourite social networking site, remember it is worth a lot more than face value.

Also, I wonder what people’s thoughts are regarding what end users actually get in return for providing more feedback and hence more value to the web applications in question. Or are we simply putting them on a pedestal. Should we be providing such “services” for free? Or should WE be getting remunerated for our input into making those applications the success that they otherwise might not have been?

– Shravan.