Between "async event based" and "multi-threading" programming models, which one gives better performance and why?
11 18 Jun 2016 01:57 by u/prahladyeri
We are presently living in a world where programming can be broadly categorized into these two paradigms as far as parallelism is concerned (there are many others like OOP, procedural, functional, etc. in other departments: async event based and multi-threading based.
In the category of multi-threading, we have traditional languages like Java, C#, Python, PHP, etc. who all support multi-threading to some extent, either in a clean way using threads, mutexes, semaphores, etc. or using some hacks.
In the category of asynchronous even based model OTOH, we have client-side JavaScript and server-side Node.JS which are also time-tested and proven to be fast.
Besides, the recent advice given is that if your app is I/O bound, then go for Node.JS, otherwise, if the app is more CPU bound, then go with more traditional languages like PHP.
But apps in today's world are much more complex than that and rarely can they be classified as I/O bound or CPU bound, can they? My question is that in general, which programming model gives a better performance and why?
I don't know much about system programming, but the trade offs generally go something like this:
With multi-threading, you have the overhead of more threads to handle. For example, Apache creates a new thread for each and every request from the user. If a million users came in suddenly, then a million threads are created. Now, to keep track of what each thread is doing, getting results from each other one by one, etc. is going to cause a lot of overhead, isn't it? Plus, each will consume its own memory space, so there is a chance that your RAM might have already been over-flooded, right?
But in case of Node.JS, there is only a single thread running, be it one or million users. This means, you don't have the thread context switching overhead that you have in case of apache, but if one single thread handles a million requests in a single event loop, is that going to cause any problems to it? I don't have any idea, but that's what I want to ask to you experts out there.
10 comments
5 u/roznak 18 Jun 2016 02:26
In software you choose the tools and technology depending on your program you want to built. What is even more important in multi-threaded models is not asynch versus multi threaded but how you design your application.
Now how do you know which is better? You can't deduce it, you must measure it. And you can only measure it by building different variations of your code.
Edit: I want to expand on that. You build your code in such a way that you have small functions that can be used in both async and multi threaded solutions. If you create these functions thread safe (but avoid locks as much as possible) then you can use it in both systems depending on your needs.
0 u/SlappyHo 18 Jun 2016 12:29
Locks are fine if you know how to use them, ie limiting to the critical sections only.
1 u/roznak 18 Jun 2016 12:55
But every lock is a potential chance of a bottleneck. Writing your code in such a way that you have almost no locks is preferred.
0 u/SlappyHo 18 Jun 2016 13:01
Yes, hence critical sections only.
0 u/roznak 18 Jun 2016 13:11
Even critical sections should be avoided if possible. Every lock that fails adds stalling time and increases the chance that the next thread that wants to lock this part will end up waiting. It all depends of how long that code is locked.
Every lock in your code creates an expectational stall curve. Not having locks in the first place has a linearized the stall curves. Not having locks means that you have to rewrite your code and maybe use a bit more memory.
0 u/SlappyHo 18 Jun 2016 13:25
sounds good in theory, but in practice its not always possible because:
Is a constraint in and of itself.
5 u/dickvomit 18 Jun 2016 05:53
Systems programmer here who's worked on both for more than a decade, I'll try my best to describe the tradeoffs I've learned and used, though my background is entirely C and C++.
First, "async event based" is common and necessary in threaded environments, so lets call the two competing solutions "thread based" and "event loop" based. Event loop is I think what you are really referring to, where a single event is processed from beginning to end, where flow is explicitly yielded back to a dispatcher for processing of another event.
Second, lets restrict this to single-core programming for a second. In any multicore environment you obviously need either multithreading or multiprocess design to get full utilization of your CPU. When "possible", as in your threads don't really need to be sharing data, I advise multiprocess over multithread but that's for simplicity and safety. But when thinking about each single core in that multicore solution...
Event loop programming is way faster (and actually easier to write) when your items of work (the things to be accomplished by the event) are all relatively the same length in terms of execution time AND have few IO operations. You don't need to worry about concurrency/races and there's no context switching. The moment either of those two conditions are violated, and threading starts becoming the better solution.
In a single threaded environment, IO means either blocking (shitty for perf) or making that IO a new event, and hand-writing state to store to continue processing the overall event later. Its a hand-optimized version of threading. You can do it and the payoffs are large, but as the complexity of each item of work grows, or the amount of IO you have to do grows, you will be writing more and more state structures and the efficiency of your coding (not the code execution) just keeps slowing down. In its worst case, you may have non-IO operations that are just really CPU intensive that make you save state as well to just avoid holding up everything else and creating large latencies. I remember having to write state off for a large signal processing routine that ground the CPU for seconds, gritting my teeth wishing we had threading for that particular project.
So threading becomes increasingly tantalizing as those items of work grow in complexity. When you have lots of IO in one item of work, or some items of work take way longer than others, you probably want to separate those off into threads. When you find items of work that are similar in computational size and without vast amounts of IO, put those into state machines / event-loops.
The two can even co-exist in the same project, with some threads hosting event loops, and others dedicated to special long running tasks.
2 u/RevanProdigalKnight 18 Jun 2016 03:02
Not an expert, but here's my two cents:
Really, it all comes down to the question of what kind of tasks you're running and how interdependent they may or may not be. If you're running a whole bunch of really big, independent tasks, multi-threading is the way to go because then you can run them all in parallel. If you're running a whole bunch of really small interdependent tasks, then an asynchronous event-based model is probably a better choice. Then you have your typical muddy grey area in the middle where it's kind of up to what you feel like programming as to what's best.
To be honest though, Node.JS isn't the best choice for a server because it all runs in a single thread. You would need to make clusters of Node.JS servers just to get the equivalent of a single Apache server thanks to the fact that Apache is multi-threaded by nature. Regardless of multi-threaded or asynchronous programming model, you will get better performance out of a compiled language than you would with an interpreted language, and you'll get better performance across parallel sessions with a multi-threaded architecture.
Personally, I would love it if JavaScript interpreters added multithreading capabilities, because then there would be pretty much no question as to what the go-to language for a given task should be; you'd get the best of both worlds. At the same time, though, JavaScript still has a performance peak that it can't surpass since it is an interpreted language. The interpreters may get faster, or better at predicting how to run the code, but it still has to be interpreted at runtime, and that adds overhead that can't be ignored.
1 u/SlappyHo 18 Jun 2016 12:39
I recently wrote a P2P VoIP app that used threading for the capture, playback and audio mixing & aysnc for the p2p mesh service. Each solve a different problem. Both are useful techniques.
0 u/goatsandbros 18 Jun 2016 15:17
You're not punting your comprehensives questions to Voat, are you? ;)