суббота, 13 ноября 2010 г.

Apache Cassandra experience

At one of my projects I switched from Postgresql to Cassandra. There were reasons for the switch.

First. For each user I had to keep an inbox for storing incoming messages and events. What is inbox? It is a sorted collection of items. Items are accessed using ranged queries. This caused huge IO overhead on Postgres, because of lack of clustered indexes. All "tables" in Cassandra are clustered, because they are kept as SST (sorted string tables).

Second. My application had huge write throughput. Postgres is good at write with all that write-ahead logs and absence of table-locks on write. And even after write-aware optimizations it still was not enough. Cassandra's data write process is completely different. And it better suits my needs.

Third. Application servers are Python Twisted applications. There is one Postgres binding for Twisted and it is abandoned and buggy. Cassandra API is available via Thrift, which in turn supports Twisted. I recommend great Telephus wrapper for Thrift and Twisted.

At Cassandra's IRC channel people are telling each other of their Cassandra clusters. I look a bit stupid when saying I have a single node. But who cares? If it works better than Postgres for me - why not?

Disclaimer: I am not telling here that Cassandra is better than Postgresql. It just suits better in this certain application. I use Postgresql a lot at in many other projects.

Комментариев нет: