Using Python profiler I found out that threads spend more than half of their time blocked on a lock inside the logging module:
logging/__init__.py
class Handler(Filterer):
.................
def acquire(self):
"""
Acquire the I/O thread lock.
"""
if self.lock:
self.lock.acquire()
..................
def handle(self, record):
"""
Conditionally emit the specified logging record.
Emission depends on filters which may have been added to the handler.
Wrap the actual emission of the record with acquisition/release of
the I/O thread lock. Returns whether the filter passed the record for
emission.
"""
rv = self.filter(record)
if rv:
self.acquire()
try:
self.emit(record)
finally:
self.release()
return rv
................
As you can see, the lock is taken on every log message. Handler instance is created as a singleton for the application. Bad thing happens: all the threads lock on a single lock and perform an IO operation holding the lock!
What I've done so solve this: I created my own SmartSysLogHander class which uses a separate socket per thread to write to /dev/log. It does not contain lock a all.
Strange thing that such a thing happens using standard library for logging. I guess standard library is designed to be rock solid, not rocket speed. As Guido Van Rossum said, if you dont like threads and GIL in Python, just spawn several Python processes of your scalable application.
Discussion is welcome.