Flash player uses policy server to check its permission to open sockets to certain ports of certain server.
Adobe provides sample Flash policy server. But it is unusable for production. It creates a thread per connection. Also it shows strange virtual memory usage.
That is why I wrote simple flashpolicytwistd using Python/Twisted.
I'll share some bits of IT knowledge here: CScience, WEB, Networking, Python, C++, Unix, Linux, Ubuntu, etc.
пятница, 16 октября 2009 г.
суббота, 12 сентября 2009 г.
Postgresql's huge disadvantage
When building a database which has a big deal of ranged queries, it might be extremely helpful to have clustered index support.
Let say, you keep a table of messages. To query last 10 at inbox it takes about 1 disk seek in worst case with clustered on index by (receiver, timestamp). When there is no clustering, be ready to issue 10 disk seeks.
InnoDB and MS SQL Server both have clustered index support. Instead, Postgresql provides CLUSTER command, which must be explicitly issued to rebuild internal database structure to cluster rows according to specified index. In order to keep you DB more or less clustered, you have to cron the CLUSTER command.
But:
1) CLUSTER takes exclusive lock on table. It took 2 hours to cluster my 3 GB of data. Daily cron would render my application to have 10% downtime. Nice. You can try to aid this by using pg_reorg.
2) Clustering does not change any logical data, only physical storage layout. Nevertheless, it generates the amount of WAL equal to the size of the data. Again, daily CLUSTER would add 3 GB of backup traffic. Same with pg_reorg.
All this makes clustered indexing in Postgresql unusable.
Let say, you keep a table of messages. To query last 10 at inbox it takes about 1 disk seek in worst case with clustered on index by (receiver, timestamp). When there is no clustering, be ready to issue 10 disk seeks.
InnoDB and MS SQL Server both have clustered index support. Instead, Postgresql provides CLUSTER command, which must be explicitly issued to rebuild internal database structure to cluster rows according to specified index. In order to keep you DB more or less clustered, you have to cron the CLUSTER command.
But:
1) CLUSTER takes exclusive lock on table. It took 2 hours to cluster my 3 GB of data. Daily cron would render my application to have 10% downtime. Nice. You can try to aid this by using pg_reorg.
2) Clustering does not change any logical data, only physical storage layout. Nevertheless, it generates the amount of WAL equal to the size of the data. Again, daily CLUSTER would add 3 GB of backup traffic. Same with pg_reorg.
All this makes clustered indexing in Postgresql unusable.
воскресенье, 6 сентября 2009 г.
Postgresql tuple/row
Q: What is the difference between tuple and row in Postgresql?
A: Tuple is a some version of a row.
A: Tuple is a some version of a row.
понедельник, 17 августа 2009 г.
Serving static with nginx and varnish
I used nginx as reverse-proxy in front of amazon s3.
A month ago I decided to try varnish. It is designed from the ground up as a reverse-proxy. Also, I though that nginx solution wasted a lot of resources when keeping lots of tiny images in separate files.
But, after month of experiments, I discovered high iowait values and severe load on the hard disk, causing service problems. I rolled back to the previous nginx static scheme. Iowait dropped from frightening 100-150 to acceptable 25.
I used varnish 2.0.4 running with 3GB file storage. It consumed 0.5-1 GB of memory. Does anyone have a clue why varnish performed so much worser than nginx?
A month ago I decided to try varnish. It is designed from the ground up as a reverse-proxy. Also, I though that nginx solution wasted a lot of resources when keeping lots of tiny images in separate files.
But, after month of experiments, I discovered high iowait values and severe load on the hard disk, causing service problems. I rolled back to the previous nginx static scheme. Iowait dropped from frightening 100-150 to acceptable 25.
I used varnish 2.0.4 running with 3GB file storage. It consumed 0.5-1 GB of memory. Does anyone have a clue why varnish performed so much worser than nginx?
понедельник, 20 июля 2009 г.
Twisted logging pitfall
I run my Twisted processes as
Twisted chops and rotates log files by itself. By default it generates 10mb files chunks.
When current somelogfile.log becomes larger than 10mb, Twisted moves it to somelogfile.log.1 and continues logging to an empty file. If there are more than 2 chunks, they get their names so that larger number at the end corresponds to an older log. To achieve this, Twisted renames N log files, where N is the number of chunks.
In my system there were tens thousands of chunks. I did even realize that rotating them makes a huge stress for HDD, causing unexpected IOWAIT peaks. Moving the chunks to a separate folder eliminated the problem, preventing me from buying some more hardware :)
I'll investigate if it is possible to use logrotated or something similar to handle all this automatically.
twistd --logfile /var/log/somelogfile.log --pidfile /var/run/somepidfile.pid -y sometacfile.tac
Twisted chops and rotates log files by itself. By default it generates 10mb files chunks.
When current somelogfile.log becomes larger than 10mb, Twisted moves it to somelogfile.log.1 and continues logging to an empty file. If there are more than 2 chunks, they get their names so that larger number at the end corresponds to an older log. To achieve this, Twisted renames N log files, where N is the number of chunks.
In my system there were tens thousands of chunks. I did even realize that rotating them makes a huge stress for HDD, causing unexpected IOWAIT peaks. Moving the chunks to a separate folder eliminated the problem, preventing me from buying some more hardware :)
I'll investigate if it is possible to use logrotated or something similar to handle all this automatically.
вторник, 23 июня 2009 г.
Postgresql transaction counter (transactions per second)
There are transactions counters for each database in a cluster.
If you want to find out how many transactions has your system generated at the moment, you should connect to any database as a superuser (postgres) and
Easy, but took some time to find the recipe.
If you want to find out how many transactions has your system generated at the moment, you should connect to any database as a superuser (postgres) and
select sum(xact_commit) from pg_stat_database;
Easy, but took some time to find the recipe.
суббота, 13 июня 2009 г.
Write-heavy setup for Postgresql
My project has a database which is updated almost as frequently, as read. The main bottleneck for the database was disk speed. Here are some tips on how to optimize Postgresql configuration to avoid overusing disk IO. For my case it helped to reduce iowait from ~150 to less then 50 in average.
synchronous_commit. Since users score is not a critical parameter, it is safe to set sychronous_commit to off. The worst thing that can happen is that you loose several last transactions.
checkpoint_segments, checkpoint_timeout. Checkpoint causes all the modified data to be stored in actual table structures. Before the checkpoint happens, WAL guarantees data durability. If you have some frequently modified row, it is checkpointed each time. If checkpoints happen in your database too frequently it is a significant overhead. Increase the parameters to make checkpoint happen less frequently.
Background Writer. It writes dirty pages in background to lower amount of work for checkpoint. Again, some frequently modified value might be written each BW activity round. This is overkill for write-heavy database, because the value will be checkpointed anyways. I turned BW off at all, setting bgwriter_lru_maxpages = 0.
Hope it helps. Comments are extremely welcome.
synchronous_commit. Since users score is not a critical parameter, it is safe to set sychronous_commit to off. The worst thing that can happen is that you loose several last transactions.
checkpoint_segments, checkpoint_timeout. Checkpoint causes all the modified data to be stored in actual table structures. Before the checkpoint happens, WAL guarantees data durability. If you have some frequently modified row, it is checkpointed each time. If checkpoints happen in your database too frequently it is a significant overhead. Increase the parameters to make checkpoint happen less frequently.
Background Writer. It writes dirty pages in background to lower amount of work for checkpoint. Again, some frequently modified value might be written each BW activity round. This is overkill for write-heavy database, because the value will be checkpointed anyways. I turned BW off at all, setting bgwriter_lru_maxpages = 0.
Hope it helps. Comments are extremely welcome.
Подписаться на:
Сообщения (Atom)