вторник, 15 мая 2012 г.

Never use run_erl

In one of my projects I used run_erl to launch Erlang VM in daemon mode, rotate logs, etc. If seemed fine to use standard Erlang tool. run_erl is a part of Erlang distribution.

I discovered unexpected performance problem running production application with run_erl. Application caused high iowait without any reason.

It was hard to discover the certain reason, but after few hours I found out that run_erl fsyncs on every log message, causing inadequate io load. 

четверг, 29 декабря 2011 г.

gen_server antipattern

Two gen_server-s should never gen_server:call each other:

  1. gen_server A calls gen_server B: it sends request message and blocks on receive.
  2. gen_server B calls gen_server A: it sends request message and blocks on receive.
  3. Neither A nor B can respond each others requests, rendering a deadlock.

среда, 21 сентября 2011 г.

Cassandra migration from 0.6 to 0.7

Cassandra is not mature. I discovered data corruption errors in 0.6. I found nothing that could help me to fix this and I decided to migrate to 0.7 hoping the errors are fixed there.

All you have to do is to follow NEWS file instructions on migration. But there are three pitfalls:

libjna problem in DEB package. Ubuntu has earlier version than required by Cassandra. But package installs fine (wrong dependency version numbers). This results to very strange effects and errors. To fix this you want to install libjna manually, as described here.

Saved caches problem. Before starting up 0.7 you have to manually delete old saved_caches dir. Otherwise you get "Negative array size" exceptions on start up.

Java heap size problem. After fixing previous problems, I discovered a performance degrade in production. Analyzing this I noticed that Java process occupies 13 GB (of 24) RAM. With 0.6 it took about 1-2 GB. In 0.7 Cassandra init scripts set both minimal and maximal (-Xms - Xmx) Java heap sizes to RAM/2. While it is ok for maximum, setting -Xms to 12 GB means that this memory is not going to be used for your actual data. Cassandra accesses data via mmap, and mmap only accesses data in system page cache. Which is shrinked by 12 GB (Java heap). You have to fix manually /etc/cassandra/cassandra-env.sh and set heap to 2 GB (or so).

воскресенье, 22 мая 2011 г.

ThinkPad X220

Finally got my X220. First, replaced HDD with Intel SSD 160GB G2. It required small hardware tweak: X220 has 7mm HDD and SSD was about 12mm height. I had to remove plastic frame from SSD.

DisplayPort turned out to be disadvantage. Only few display models support DisplayPort. And very-very few of them have corresponding cable in the box.

All the DisplayPort - (HDMI / DVI) cables seem to be halfworking.

вторник, 16 ноября 2010 г.

When to touch swappiness

There are lots of discussion on the lists on whether to touch or not to touch /proc/sys/vm/swappiness parameter and there is not definitive answers on that. I figured out a situation when tuning the parameter can really improve the performance.


On the machine:
  • RAID-1 of three HDDs
  • 12 GB RAM
  • Apache Cassandra instance with 25 GB of data
  • ejabberd instance
The IO is created by Cassandra, which reads many random data pages and occasionally writes sequential 100-200 Mb chunks of data. Also some IO is created by swapping ejabberd memory in and out.

So most write load is created by swapping out random ejabberd memory pages. And we know that RAID-1 is N times better on read than on write. Decreasing swappiness parameter from 60 to 20 I moved IO load from write to read. There left almost no random spaw writes.

The IO load has really decreased. Not a huge optimization, but worth doing.

суббота, 13 ноября 2010 г.

Apache Cassandra experience

At one of my projects I switched from Postgresql to Cassandra. There were reasons for the switch.

First. For each user I had to keep an inbox for storing incoming messages and events. What is inbox? It is a sorted collection of items. Items are accessed using ranged queries. This caused huge IO overhead on Postgres, because of lack of clustered indexes. All "tables" in Cassandra are clustered, because they are kept as SST (sorted string tables).

Second. My application had huge write throughput. Postgres is good at write with all that write-ahead logs and absence of table-locks on write. And even after write-aware optimizations it still was not enough. Cassandra's data write process is completely different. And it better suits my needs.

Third. Application servers are Python Twisted applications. There is one Postgres binding for Twisted and it is abandoned and buggy. Cassandra API is available via Thrift, which in turn supports Twisted. I recommend great Telephus wrapper for Thrift and Twisted.

At Cassandra's IRC channel people are telling each other of their Cassandra clusters. I look a bit stupid when saying I have a single node. But who cares? If it works better than Postgres for me - why not?

Disclaimer: I am not telling here that Cassandra is better than Postgresql. It just suits better in this certain application. I use Postgresql a lot at in many other projects.

вторник, 9 ноября 2010 г.

Google AppEngine Experience

At first glance, AppEngine is really nice with all that cloud-computing. Pay only for what you use. Scale indefinitely. Of course, you have some limitations, like custom (Python or Java) environment with predefined APIs. But APIs are really good and mostly sufficient.

At second glance, AppEngine is really, really nice! You'll fine a great toolset in SDK and application management console. Version management, quota settings, convenient shell scripts in SDK for deployment and testing. Also log managers, kind of simple profiler, etc. I cant imagine how many efforts were spent on the toolset.

At third glance you'll find AppEngine unusable.

  • After two years of being released there are unexpected errors in the management console. Sometimes I cannot enter it for some hours.
  • When you need to delete a table from datastore - cross your fingers. Sometimes a certain table becomes corrupted and you cannot delete it. Only application recreation helps.
  • AppEngine pricing claims 10 cents for CPU hour. Good. But you have to use the CPU through the API. When I tried to upload my 1 GB database to AppEngine, it took some hours of real time and some days of AppEngine CPU time. It cost me about $60 just to upload my database! I have to admit, this is hard part. But Postgresql does this database back and forth in minutes!
  • Finally, I managed to port my application and to upload all the data. But the cost per pageview is tremendous. I would cost me hundreds bucks a month instead of current inexpensive dedicated server (which is busy about 10% at peaks).