Developing for speed: Know your hard disk.
1 21 Jul 2015 20:03 by u/roznak
A lot of people focus on software technology for speed, like dictionaries, and parallelism... but rarely have I seen developers actually understand what they are doing.
Know your hardware it will run on. That can speed up far more than software solutions. Data on your hard disk, I am now testing read speeds from my hard disks (actually test them on defects) and see something interesting.
- One hard disk has a flat data transmission. The drive is incredibly fast when it is at the beginning of the hard disk, but slows down 50% when the data is at the end of the hard disk.
- One hard drive has a flat curve. No matter where the data is read from it stays constant.
My experience with developing for speed is make different versions of optimizations, and let the software select the one that works fastest for him by actually measuring which one is the fastest for that computer.
e.g. When it detects one of the drives that does not have a flat curve, put the data where it gets is fastest, even distribute it over different hard-disk first before filling up the one hard-disk and continue to the other one. This could also mean that you have to fill fake data on the hard disk to get the most optimized data in the right spot.
Edit: Added some graphics about disk speed: https://slimgur.com/image/01y
The graphics clearly demonstrates that you lose 70% to 90% of data reading from a hard disk depending on the location of that data on the hard disk. Not filling up your hard disk so that your data is located more to the start can almost double your speed compared to the data on a too full hard disk.
3 comments
0 u/leixiaotie 22 Jul 2015 09:05
Context is needed here.
For programming language closer with hardware such as C/C++ and assembly maybe your point is on the spot here.
For higher one such as Java and C#, the function that you are describing here is not applicable, since in those language the function has been abstracted and handled behind the scene.
And not every apps using hard disk to operate. I think the people will like to discuss more general-applicable case rather than specific like this.
0 u/roznak [OP] 22 Jul 2015 21:41
No it is not C++ only, it is every language that gets influenced. The hard disk is the bottle neck no matter how fast the language generated code. It must wait for the hart disk data to be read or to be written.
I have the intention to put some graphs up, I am here testing my hard disks and I do notice one hard disk a completely flat curve, one that basically loses its read speed by half, and another one that also have a curve (lose read speed the further it gets) but more slowly. So by not filling your hard disk can actually double the processing speed, or from other point of view, filling your hard disk might slow your processing down by 50%.
I do not know about Java, but C# can escape its abstraction and can go screaming fast that almost gets as close as C++. With the benefits that your project can be developed in weeks instead of months. C++ can't go any faster because it is bound to Windows functions. C# can use these same windows functions that C++ can.
The C++ code can be better processor optimized however that microsecond it wins does not add up to the milliseconds it needs to read in 100 bytes.
In this comment I was focusing on hard disks. Understanding the hardware it has to work on can greatly improve performance.
0 u/leixiaotie 23 Jul 2015 03:57
In C#, I just found this. It is using winapi or similar lower module to do direct access to harddisk.
I still believe that the majority of higher-level abstraction language such as C# and java don't have many harddisk operation. Most of them provide memory processing or they are using database as storage instead. The file saving operation usually lower than 10 mB. Don't know about C/C++, but I think it should be around the same.
So in non-intensive harddisk operation apps, the technique to direct access harddisk is overkill. A more general performance optimization such as sorting algorithm can be used almost anywhere that involve sorting.
I don't say that your tips isn't useful or significant. I just stated why there is lack of interest of direct-hardware operation, compared to general-applicable optimization such as algorithm. Moreover, hardware is different from one and another, such as NAS-hdd, make it harder to do programming through hardware.