To thread or not to thread? that is the question!

Posted On Mar-27

Every so often, I see myself pondering over this same question. And sometimes I pen my thoughts and conclusions down on a piece of paper , so that I don’t have to re-think later. This beautiful piece of paper, always seems to get lost among the clutter ( which I call organized stack) of papers  at my desk. So Threads or Processes?

Agreed, it comes down to the language of choice, but these days I use python and perl heavily in my scripts. And the GIL in python, always seems to make multithreading a daunting task on my multi-core machine.

Of course, we all know that multithreading has its advantages over multiprocessing- less footprint, stack size, much lighter , etc etc.. , but what happens with threading  behind the scenes , is what drives early programmers nuts, and make them conclude that threading arose from the dead sea.

The multithreading-multiprocessing debate…

In this particular project that I was working on, the ease of initiating a thread pool, lured me into the path of threads. Also, the fact that my underlying netsnmp c-library, was async, turned out to be an added bonus.

Yes, I started threading at first…

A little tidbit here- the scheduling of threads is actually done by the OS, and this turns out to be a disaster sometimes.

Yes, python relies on the OS to schedule threads. It just releases and re-acquires the GIL. Why not?  After all, the underlying OS is all about multithreading, and the guys who made it, should be experts at scheduling, right? Actually yes, the OS does a damn good job at this.

However, in a multicore machine, the OS is aware of all available cores. And chances are that it schedules the thread on another core…. boom! …. and you just created context-switching, and increased the overall time before the thread actually acquires the GIL. This is invariably what causes multithreading to give poorer than expected results…

So I went back to multiprocessing for this. Yes,  I know it has higher startup costs, significantly more memory…etc. But at the end , there isnt a lot of crazy context switching. And in this particular instance , I had 32 cores.  I was just planning to start the processes once, and then pipe their output in a subprocess stream. They worked fine!

…and then went back to processes

I’m not against threads, and I dont always favor processes either. But often times, we have to decide between the two based on the environment, the hardware, the throughput requirement, the desired speed of execution ( and a little bit of personal choice).  Its called “Using the right tool for the job”