Choosing a platform

When creating a new online service, two aspects of the development are particularly important:

Developing something quickly - if you don’t release something you’re going to have trouble making any money
Scalability and good design - if a million users show up overnight, do you have to buy more servers and fix the problem straight away or spend six months redesigning the system from scratch?

In many ways these two requirements are opposites - a system that does one well will not do well on the other. Choose scalable and you never get around to launching. Choose quick development and you get all twitter’s problems.

Fortunately it is possible to produce a reasonable compromise between these attributes. The key is flexibility - develop quickly with the easiest toolset, but make sure the system can change easily enough to add scalability when it is needed.

That is why we are starting development using a standard .net web application - SQL Server Express, LINQ, WCF and ASP.NET AJAX.

For much of what we want to do this is far from the ideal platform. The various cloud computing platforms offer massive scalability out of the box, and for the web search portion of the system this architecture is about as far from what google uses as you can get.

What makes up for that is the ease of development - It’s relatively easy to find developers who know .net, and once you install visual studio and download the source code it only takes a single keypress to get everything running.

Having chosen the platform based on ease of development, we need to work out how scalability can be built in. There are two parts to this.

First is the completely scalable design, which is what you would build if you had an unlimited development budget and the requirement to support a billion users. You don’t have to go into a lot of detail with this, but you need to have some idea of how you would handle the scenario where using a more powerful server is no longer an option. In particular you should look for things that will cause trouble later that might be possible to remove from the design early on.

The other part is the interim solution that lets you grow the service quickly without requiring a complete redesign. This is the part that .NET does really well. Deploying a web application to a new server is trivial, and at the database level (usually the bottleneck when you get to large numbers of users) you can go from a single instance of SQL Express running on a development box to a high performance cluster without needing to change any code.

In planning this part of scaling, you leave out the harder problem of splitting up the database, but there are still design decisions to be made so that the rest of the system can be split across servers easily. Examples in our system include:

Strong separation of application layers - Using as many distinct modules as reasonably possible means that the application is easily broken up across servers later. For ease of development we are taking advantage of the fact that a WCF service can also be referenced as a regular class.
As much as possible, operations are stateless - since operations only affect the database, the application server should be easy to duplicate.

Any operations that may take a long time with larger datasets (mostly in the search part of the system) are designed to be asynchronous. While they actually run synchronously within the one process to start with as that is much easier to develop, they have minimal connection to the rest of the code so they can be moved to a separate server if necessary.