пятница, 17 апреля 2009 г.

Postgresql in Ubuntu distribution

Ubuntu has default limitation for shared memory about 32 MB. This is why (I guess) packaged Postgres has shared_buffers parameter set to modest 24 MB.

This is quite low value for a large DB and for modern hardware. There are numerous recommendations on how big this value should be. It makes sense to try setting this value and run your benchmarks.

To increase kernel shared memory limitation edit /etc/sysctl.conf and add or replace the following (about 110 MB in this example):

kernel.shmmax = 110000000

Then run

sudo sysctl -p

to make the settings be effective immediately.

среда, 15 апреля 2009 г.

UTC datetime to UNIX timestamp

UNIX timestamp is somewhat reliable value which does not depend on timezone or daylight saving time.

There are number of posts in the "Internets" on how to convert timestamp to datetime. But all of them are either mistaken or consider converting timestamp to local timezoned datetime object.

The correct (and awfully akward) mean to convert timestamp to UTC datetime object:

from datetime import datetime
from time import mktime, timezone

def utcdatetime_to_ts(dt):
return mktime(dt.utctimetuple()) - timezone

Then you can always:

assert utcdatetime_to_ts(datetime.utcnow()) - time() <= 1

Check also a better and shorter version in the comments.

понедельник, 13 апреля 2009 г.

Hosting migration story

While our code develops and user base grows, we need to adjust our server hardware to be cheap and powerful enough to handle the load. Here is the timeline:
  • up to autumn 2008: Amazon EC2, small instance: 0.5 cores, 1.7 GB RAM
  • up to Jan 2009: Gandi.net: 1..2 cores, 1..2 GB RAM
  • up to Apr 2009: Serverloft.com L server: 4 cores, 4 GB RAM
  • since Apr 2009: Serverloft.com XL server: 4*2 cores, 8 GB RAM
While cloud solutions like EC2 and Gandi.net provide a great deal of flexibility, for us it is still cheaper to stick with traditional dedicated server. Serverloft, while being DS provider, still allows many features previously available only for VDS users: OS reinstall and hard reboot - via WEB interface.

вторник, 13 января 2009 г.

md5 hexdigest in Erlang

Recently I got my hands dirty with Erlang.

Erlang's builtin md5 function gives out a binary type, while I needed a hexademical string. I found a solution somewhere in the internet, but it was buggy. Note that case statement. This is to fix the bug in the original solution (sorry, dont remember the link). Here is the snippet:


md5_hexdigest(String) ->
string:to_lower(
lists:flatten(
lists:map(
fun(V) ->
case httpd_util:integer_to_hexlist(V) of
[A, B] -> [A, B];
[B] -> [$0, B]
end
end,
binary_to_list(erlang:md5(String))
))).

понедельник, 15 декабря 2008 г.

C:

char buf[1024]
strcpy(buf, user_data)

Python:

buf = user_data[:1024]
if len(user_data) > 1024: security_hole(user_data[1024:])

actually, the translation is not difficult, so long as you implement security_hole() properly.

(c) Twisted Quotes

суббота, 6 декабря 2008 г.

Python syslog as it should be

First, the code:

from logging.handlers import SysLogHandler, SYSLOG_UDP_PORT
from logging import Handler
import socket

class UdpSysLogHandler(SysLogHandler):
def createLock(self): pass
def acquire(self): pass
def release(self): pass

def __init__(self, address=('127.0.0.1', SYSLOG_UDP_PORT), facility=SysLogHandler.LOG_USER):
Handler.__init__(self)
assert type(address) == tuple
self.address = address
self.facility = facility
self.socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
self.formatter = None

def emit(self, record):
msg = self.format(record)
msg = self.log_format_string % (
self.encodePriority(self.facility,
self.mapPriority(record.levelname)),
msg)
try:
self.socket.sendto(msg, self.address)
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)

def close(self):
Handler.close(self)
self.socket.close()

How it is different from standard SysLogHandler in Python logging package?

First, a lock is eliminated. This lock is taken (in original version) for each logging operation. And IO is performed inside the lock, which is very bad.

def createLock(self): pass
def acquire(self): pass
def release(self): pass

Second, there is no support for logging through UNIX domain socket (/dev/log). This has slightly simplified the emit method.

One might say that UNIX domain sockets should be faster that actual networking UDP sockets, because they dont involve all the networking stack. In practice, however, I noticed that UNIX domain sockets perform much worser. I dont know why. If someone has a clue, please, let me know.

четверг, 4 декабря 2008 г.

How to save $60 a month with 20 lines of code

We use Amazon AWS S3 service to store users pictures. There is huge amount of small pictures.

As the traffic at out WEB service increased, the cost for serving the pictures from the S3 increased also. Last month we had about 250 GB of traffic for pictures and this was around $60.

While we could move all the pictures to a separate static server this would imply some downtime and complexity in further migrations of our servers. And also reduce reliability.

I modified few lines in Nginx configuration to make it work as a proxy to Amazon S3:

location @s3 {
internal;
proxy_pass http://your-bucket-name.s3.amazonaws.com;
proxy_store on;
proxy_store_access user:rw group:rw all:r;
proxy_temp_path /var/static_data/tmp;
root /var/static_data;
}

location ~* \.(jpg|jpeg|gif|png|ico|css|bmp|js|swf|mp3)$ {
access_log off;
error_log /var/log/nginx/static_cache_miss.log;
expires max;
root /var/static_data;
error_page 404 = @s3;
}

Now the traffic for S3 has reduced from ~10 GB a day to less than 400 MB.