Sizing a large database

In general, it's a case of building a model of your development system through measurments and scaling it to your production system.

First, estimate your I/O requirements.

For the insertions, you need to decide what is the peak over a short time scale. Peak numbers of inserts per second are a useful metric. If you're lucky enough to have fairly constant load, then it's easy. If you have massive peaks in the load, you should pick an acceptible maximum.

Writes are relativly easy, you should run tests on your development system to determine things like average numbers of I/O operations required for each insert. This can be reasonably safely scaled up, assuming your development system and production system are configured identically for things like block_size.

Pay attention to the amount of redo you're generating, and your commit rate. This will scale too, and keeping redo logs performing well is vital in maintaining a high insert rate. As a side note on application design, for heavy insert activity, don't commit for every row inserted.

For reads, the physical read rate is more difficult to scale up, as it's heavily influenced by the amount of buffer cache you have. If you have enough memory on your development system, you may be able to develop a model relating average number of I/Os per query to size of the buffer cache, and the number of queries per second. You can then scale this model to your production environment, where you know the number of queries per second that you require, and work out where on the curve from in-memory database to everything on disk you want to be.

All this will give you a total number of I/Os per second that you'll need to perform. This will indicate how many spindles you'll need in the stripe sets for each tablespace. I really can't state strongly enough that for performance you buy number of spindles, not gigabytes. Even if this means "wasting" some space on the drives. Buy many smaller disks, rather than the biggest available ones. This is, of course, a performance specialists fantasy, and in the real world more disks cost much more, both for the disks, and the infrastructure. However, it's another trade off you have to make, if you want high performance I/O you need to buy four 18GB disks, not one 72GB. High performance redo logs really uses up disks at an alarming rate, for example you might want a 10 disk stripe with just 20GB of redo log on them.

Right, since you've already started sizing your memory requirements, it may be time to think about the other memory requirements, most likely sort area and library cache sizes, but you may have other requirements. You'll need enough log buffer, but on a large system this is trivial compared to everything else. If your queries are never going to fit the sorts in memory you need to make a comprimise, but you may be able to size your sort area to keep all sorts in RAM.

If nearly all of your SQL is using bind variables, you can get away with quite a small library cache, which is the same on your development system to the server. Otherwise you may need to do the measure and scale exercise for this too.

CPU use is the usual scaling exercise. If your development system has different CPUs to the production one, I tend to use SPECint_rate benchmarks to compare them.

Storage requirements may well just be satisfied by the I/O requirements. Remember, for performance you buy spindles, not gigabytes. Otherwise, simply find the number of bytes each row takes on average, and multiply by the number of rows.

© Andrew Mobbs 2001