Sunday, December 21, 2008

Capacity Planning for Cloud Computing



At the recent CMG conference I volunteered to lead a panel session on Capacity Planning for Cloud Computing, and these slides were the result.

Now, this was a conference full of people who already know how to do capacity planning, so they were interested in what was new about Cloud Computing (or even what it was all about from first principles), so the slides do not explain how to do capacity planning in the cloud, they talk about what changes for the capacity planner in this new world.

I'm going to be developing more material in this area over the coming year. I have a head start, in 2002 and 2003 I led projects at Sun where I researched how to do Capacity Planning in Virtualized Datacenters, built some tools, filed a couple of patents, presented papers at conferences and failed to get Sun's N1 project to implement some of it. Well, it seems I was a few years ahead of my time, so I'm going to start by digging out the papers I published at the time.

Solid State Disks - time to give up that iron oxide habit

An interesting note at http://www.theregister.co.uk/2008/12/19/sun_micron_extended_flash_life/ talks about a 2009 technology that gives flash a million write cycles at the NAND cell level. This adds two orders of magnitude to the "typical" cycle lifetime that is often used as an argument that flash isn't ready for prime time.

In any case this applies to each block on the SSD, and blocks are remapped using write leveling, and writes are cancelled using RAM based SSD controllers, and when the limit is reached, only that block becomes unusable... Some people seem to think that as soon as you do 10,000 writes to an SSD it fails totally, like a head crash on a disk.

I also see that at least one vendor has announced a 512GB SSD in a 2.5" laptop disk format package.

So to reiterate something I've been saying for a long time, spinning rust is dead, and a large number of basic assumptions about how computers behave and the best way to architect them are now wrong. In 2009 SSD's will be faster for read, faster for write, faster for sequential and much much faster for random access, more reliable, more durable, lower power, higher capacity, than discs. Give it another year or so and they will be cheaper as well.

SAN's are now a complete waste of time. There is so much reliable I/O performance available in a single drive, that it makes much more sense to put SSD's in the systems and access them directly. Accessing an SSD over a SAN adds a huge latency and cost overhead. It makes much more sense to use node-to-node replication for critical data (log scraping databases or cluster filesystems).