Monday, August 15, 2005

Solving storage tuning problems

I wrote a while ago about Dave Fisk's Ortera Atlas tool for storage analysis. I recently had a chance to use a beta release of Atlas on a real problem, and they are about to do a GA release, its ready for prime time.

Like most tools, it can produce masses of numbers and graphs, but compared to other storage analysis tools I've seen it goes further in three ways:

1) It collects critically important data that is not provided by the OS
2) It processes the data to tell you exactly what is wrong
3) It runs heuristics to tell you how to fix the problem

I wish more tools spent this much effort on solving the actual problem rather than making pretty graphs that only an expert would understand.

What we actually did was run the tool on a pre-production Oracle system using Veritas Filesystem and Volume Manager with Solaris on a SAN connected to a Hitachi storage array. Atlas starts off by looking at all the active processes on the system, and ignoring any that are not doing any I/O. It collects data on which files are being read or written by which process, and what the pattern and sizes are at the system call, file system and device level. You can also set the tool to focus on a set of devices, and gather information on the processes that actually talk to those devices.

Atlas immediately pointed out that two volumes had been concatenated to form a filesystem, and that 98% of the accesses were to one of the volumes. It recommended that the volumes be striped together for better overall performance.

It also pointed out that some of the I/O accesses were taking two seconds to complete at the filesystem level, but only two milliseconds at the device level. I guessed this was CPU starvation caused by fsflush running flat out on this machine which had over 50GB of RAM. Adding set autoup=600 to /etc/system and rebooting made the problem go away. We also saw this effect in the terminal window, where our typing would stop echoing for a few seconds every now and again. I've been told by Sun that the very latest patches finally fix fsflush so that it can't use a lot of CPU time, so large memory machines will finally work properly without needing this tweak.

Finally Atlas showed that the filesystem block size was set too small and Oracle was doing large reads that were being chopped into smaller reads by the filesystem layer before being sent to the device. It gave a specific recommendation for the block size that should be used. Reconfiguring the disks takes a long time to do, but we'll fix it before it goes into production.

We could have figured out the concatenation problem using iostat data, but the other two problems are normally invisible, and the topic of what filesystem block size to use can generate masses of discussion and confusion, so having "virtual Dave Fisk" tell you what blocksize to use can save a lot of time :-)