суббота, 12 сентября 2009 г.

Postgresql's huge disadvantage

When building a database which has a big deal of ranged queries, it might be extremely helpful to have clustered index support.

Let say, you keep a table of messages. To query last 10 at inbox it takes about 1 disk seek in worst case with clustered on index by (receiver, timestamp). When there is no clustering, be ready to issue 10 disk seeks.

InnoDB and MS SQL Server both have clustered index support. Instead, Postgresql provides CLUSTER command, which must be explicitly issued to rebuild internal database structure to cluster rows according to specified index. In order to keep you DB more or less clustered, you have to cron the CLUSTER command.

But:

1) CLUSTER takes exclusive lock on table. It took 2 hours to cluster my 3 GB of data. Daily cron would render my application to have 10% downtime. Nice. You can try to aid this by using pg_reorg.

2) Clustering does not change any logical data, only physical storage layout. Nevertheless, it generates the amount of WAL equal to the size of the data. Again, daily CLUSTER would add 3 GB of backup traffic. Same with pg_reorg.

All this makes clustered indexing in Postgresql unusable.

Комментариев нет: