среда, 23 декабря 2009 г.

flashpolicytwistd Ubuntu package

I've created an Ubuntu DEB package for flashpolicytwistd - a simple Flash Policy Server written in Python/Twisted. This simplifies installation process a lot.

Find the deb at project's download page.

понедельник, 21 декабря 2009 г.

CPU benchmark

I wrote a simple Python program, which builds R-Tree index for 100k nodes.

The program runs single thread and this means that only a single core of a CPU is working.

3m 31.322s Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
3m 2.835s Quad-Core AMD Opteron(tm) Processor 2372 HE @ 2.1Ghz
1m 31.393s Intel(R) Core(TM) i7 CPU 975 @ 3.33GHz

суббота, 28 ноября 2009 г.

ssh hangup when working via router

I've discovered that my ssh connection occasionally hangs up when I am working through my WIFI router. And ssh works fine when the PC is connected directly to WAN.

This might happen because router drops the connection from its NAT table due to inactivity. To aid this, edit (or create) ~/.ssh/config and add there few lines:

Host *
ServerAliveInterval 60

пятница, 16 октября 2009 г.

Flash policy server

Flash player uses policy server to check its permission to open sockets to certain ports of certain server.

Adobe provides sample Flash policy server. But it is unusable for production. It creates a thread per connection. Also it shows strange virtual memory usage.

That is why I wrote simple flashpolicytwistd using Python/Twisted.

суббота, 12 сентября 2009 г.

Postgresql's huge disadvantage

When building a database which has a big deal of ranged queries, it might be extremely helpful to have clustered index support.

Let say, you keep a table of messages. To query last 10 at inbox it takes about 1 disk seek in worst case with clustered on index by (receiver, timestamp). When there is no clustering, be ready to issue 10 disk seeks.

InnoDB and MS SQL Server both have clustered index support. Instead, Postgresql provides CLUSTER command, which must be explicitly issued to rebuild internal database structure to cluster rows according to specified index. In order to keep you DB more or less clustered, you have to cron the CLUSTER command.

But:

1) CLUSTER takes exclusive lock on table. It took 2 hours to cluster my 3 GB of data. Daily cron would render my application to have 10% downtime. Nice. You can try to aid this by using pg_reorg.

2) Clustering does not change any logical data, only physical storage layout. Nevertheless, it generates the amount of WAL equal to the size of the data. Again, daily CLUSTER would add 3 GB of backup traffic. Same with pg_reorg.

All this makes clustered indexing in Postgresql unusable.

воскресенье, 6 сентября 2009 г.

Postgresql tuple/row

Q: What is the difference between tuple and row in Postgresql?
A: Tuple is a some version of a row.

понедельник, 17 августа 2009 г.

Serving static with nginx and varnish

I used nginx as reverse-proxy in front of amazon s3.

A month ago I decided to try varnish. It is designed from the ground up as a reverse-proxy. Also, I though that nginx solution wasted a lot of resources when keeping lots of tiny images in separate files.

But, after month of experiments, I discovered high iowait values and severe load on the hard disk, causing service problems. I rolled back to the previous nginx static scheme. Iowait dropped from frightening 100-150 to acceptable 25.

I used varnish 2.0.4 running with 3GB file storage. It consumed 0.5-1 GB of memory. Does anyone have a clue why varnish performed so much worser than nginx?

понедельник, 20 июля 2009 г.

Twisted logging pitfall

I run my Twisted processes as

twistd --logfile /var/log/somelogfile.log --pidfile /var/run/somepidfile.pid -y sometacfile.tac

Twisted chops and rotates log files by itself. By default it generates 10mb files chunks.

When current somelogfile.log becomes larger than 10mb, Twisted moves it to somelogfile.log.1 and continues logging to an empty file. If there are more than 2 chunks, they get their names so that larger number at the end corresponds to an older log. To achieve this, Twisted renames N log files, where N is the number of chunks.

In my system there were tens thousands of chunks. I did even realize that rotating them makes a huge stress for HDD, causing unexpected IOWAIT peaks. Moving the chunks to a separate folder eliminated the problem, preventing me from buying some more hardware :)

I'll investigate if it is possible to use logrotated or something similar to handle all this automatically.

вторник, 23 июня 2009 г.

Postgresql transaction counter (transactions per second)

There are transactions counters for each database in a cluster.

If you want to find out how many transactions has your system generated at the moment, you should connect to any database as a superuser (postgres) and

select sum(xact_commit) from pg_stat_database;

Easy, but took some time to find the recipe.

суббота, 13 июня 2009 г.

Write-heavy setup for Postgresql

My project has a database which is updated almost as frequently, as read. The main bottleneck for the database was disk speed. Here are some tips on how to optimize Postgresql configuration to avoid overusing disk IO. For my case it helped to reduce iowait from ~150 to less then 50 in average.

synchronous_commit. Since users score is not a critical parameter, it is safe to set sychronous_commit to off. The worst thing that can happen is that you loose several last transactions.

checkpoint_segments, checkpoint_timeout. Checkpoint causes all the modified data to be stored in actual table structures. Before the checkpoint happens, WAL guarantees data durability. If you have some frequently modified row, it is checkpointed each time. If checkpoints happen in your database too frequently it is a significant overhead. Increase the parameters to make checkpoint happen less frequently.

Background Writer. It writes dirty pages in background to lower amount of work for checkpoint. Again, some frequently modified value might be written each BW activity round. This is overkill for write-heavy database, because the value will be checkpointed anyways. I turned BW off at all, setting bgwriter_lru_maxpages = 0.

Hope it helps. Comments are extremely welcome.

пятница, 17 апреля 2009 г.

Postgresql in Ubuntu distribution

Ubuntu has default limitation for shared memory about 32 MB. This is why (I guess) packaged Postgres has shared_buffers parameter set to modest 24 MB.

This is quite low value for a large DB and for modern hardware. There are numerous recommendations on how big this value should be. It makes sense to try setting this value and run your benchmarks.

To increase kernel shared memory limitation edit /etc/sysctl.conf and add or replace the following (about 110 MB in this example):

kernel.shmmax = 110000000

Then run

sudo sysctl -p

to make the settings be effective immediately.

среда, 15 апреля 2009 г.

UTC datetime to UNIX timestamp

UNIX timestamp is somewhat reliable value which does not depend on timezone or daylight saving time.

There are number of posts in the "Internets" on how to convert timestamp to datetime. But all of them are either mistaken or consider converting timestamp to local timezoned datetime object.

The correct (and awfully akward) mean to convert timestamp to UTC datetime object:

from datetime import datetime
from time import mktime, timezone

def utcdatetime_to_ts(dt):
return mktime(dt.utctimetuple()) - timezone

Then you can always:

assert utcdatetime_to_ts(datetime.utcnow()) - time() <= 1

Check also a better and shorter version in the comments.

понедельник, 13 апреля 2009 г.

Hosting migration story

While our code develops and user base grows, we need to adjust our server hardware to be cheap and powerful enough to handle the load. Here is the timeline:
  • up to autumn 2008: Amazon EC2, small instance: 0.5 cores, 1.7 GB RAM
  • up to Jan 2009: Gandi.net: 1..2 cores, 1..2 GB RAM
  • up to Apr 2009: Serverloft.com L server: 4 cores, 4 GB RAM
  • since Apr 2009: Serverloft.com XL server: 4*2 cores, 8 GB RAM
While cloud solutions like EC2 and Gandi.net provide a great deal of flexibility, for us it is still cheaper to stick with traditional dedicated server. Serverloft, while being DS provider, still allows many features previously available only for VDS users: OS reinstall and hard reboot - via WEB interface.

вторник, 13 января 2009 г.

md5 hexdigest in Erlang

Recently I got my hands dirty with Erlang.

Erlang's builtin md5 function gives out a binary type, while I needed a hexademical string. I found a solution somewhere in the internet, but it was buggy. Note that case statement. This is to fix the bug in the original solution (sorry, dont remember the link). Here is the snippet:


md5_hexdigest(String) ->
string:to_lower(
lists:flatten(
lists:map(
fun(V) ->
case httpd_util:integer_to_hexlist(V) of
[A, B] -> [A, B];
[B] -> [$0, B]
end
end,
binary_to_list(erlang:md5(String))
))).