Optimal buffer size

I guess this is a matter of computing performance. I am writing a C program that produces a large amount of output, much more than can normally be stored in full RAM. I intend to just write the output to stdout

; so it can just go to the screen or it can be redirected to a file. My problem is how do I choose the optimal buffer size for the data to be stored in RAM?

The output itself is not very important, so let's say it creates a massive list of random integers.

I intend to have 2 threads: one that creates data and writes it to a buffer and another that writes that buffer to stdout

. This way I can start creating the next output buffer while the previous buffer is still being written to stdout

.

To be clear, my question is not how to use functions such as malloc()

and pthread_create()

, etc. My question is how to choose the number of bytes (512, 1024, 1048576) for the optimal buffer size that will give the best performance?

Ideally, I would like to find a way that I could choose the optimal buffer size dynamically so that my program can tune to whatever hardware it was running on at the time. I tried to find answers to this problem, and although I found several threads about buffer size, I couldn't find anything that is particularly relevant to this problem. So I just wanted to post it as a question in the hope that I could get a few different points of view and come up with something better than I could myself.

+3


source to share


5 answers


It's a big waste of time mixing design and optimization. This is considered one of the upper canonical errors. This can damage your design and not actually optimize.

Run your program, and if there is an indication of a performance problem, profile it and consider analyzing the part that is actually causing the problem.

I would think this is especially true for complex architectural optimizations such as multithreading your application. Single image multithreading is something you never want to do: impossible to test, error prone, unreproducible, it will work differently in different runtimes, and there are other problems. But some programs require multi-threaded parallel execution for functionality or is one way to get the required performance. This is widely supported, and in essence it is a necessary evil from time to time.



This is not what you want in the original design without convincing evidence that programs like yours need it.

Pretty much any other parallelism (messaging?) Method will be easier to implement and debug, and you still get a lot in your OS's I / O system.

+6


source


Short answer: Measure it.

Long answer. From my experience, this depends too much on factors that are difficult to predict in advance. On the other hand, you don't need to commit yourself before starting. Just implement the generic solution, and when you're done, do some performance tests and tweak with the best results. A profiler can help you focus on performance-critical parts of your program.

From what I've seen, the ones that produce the fastest code tried the simplest, simplest approach first. What's better than average programmers is that they have a very good technique at writing good performance tests, which is far from trivial.

Without experience, it's easy to fall into certain pitfalls, such as ignoring the effects of caching, or (perhaps in your application ?!) underestimating the cost of I / O. In the worst case, you end up compressing parts of the program that don't affect overall performance at all.

Back to original question:



In the scenario you are describing (one processor bound and one IO bound consumer), it is likely that a bottleneck will become one of them (if the rate at which the producer does not generate data varies greatly). Depending on which one is faster, the whole situation changes radically:

Let's assume first that the IO related consumer is your bottleneck (it doesn't matter if it writes to a stdout file or a file). What are the likely consequences?

Optimizing the algorithm to retrieve the data will not result in performance gains; instead, you should maximize write performance. I would assume, however, that write performance will not be highly dependent on the buffer size (if the buffer is too small).

Otherwise, if the manufacturer is the limiting factor, the situation is reversed. This is where you need to profile the generation code and improve the speed of the algorithm and possibly transfer data between the reader and the write stream. However, the size of the buffer still doesn't matter, as the buffer will be empty most of the time.

Of course, the situation can be more complex than I have described. But unless you're really sure you're not in one of the edge cases, I wouldn't invest in adjusting the buffer size. Just keep it customizable and you should be fine. I don't think the problem needs to be redone later for other hardware environments.

+1


source


I personally feel that you are wasting your time.

First run time ./myprog > /dev/null

Now use time dd if=/dev/zero of=myfile.data bs=1k count=12M

.

dd

is a simple program that you can get, and it will write a file quickly. But it still takes a little time to write a few gigabytes. (12G takes about 4 minutes on my machine - it's probably not the fastest disk in the world - a file of the same size for / dev / null takes about 5 seconds).

You can experiment with some different numbers in bs=x count=y

, where the combination makes the same size as the output of your program for a test run. But I found that if you are doing VERY large blocks, it actually takes longer (1MB per write - perhaps because the OS needs to copy 1MB before it can write data, then write it, and then copy next 1MB where with smaller blocks (I tested 1k and 4k) it takes a lot less time to copy data, and actually less "disk spinning around doing nothing before we write it").

Compare both of these times with the execution time of your program. Does it take time to write a file dd

much shorter than what your program was writing to the file?

If not much of a difference, then look at the time it takes to write to / dev / null with your program - is this accounting for some or all of the difference?

+1


source


Most modern operating systems make good use of the disk as a backup storage for RAM. I suggest you leave the heuristic to the OS and just ask for as much memory as you want until you hit a performance bottleneck.

0


source


No buffering is necessary, the OS will automatically swap disk space for you when needed, you don't need to program this. You just need to leave in RAM if you don't need to save the data, otherwise you are probably better off saving it after the data is generated because it is better for the i / o disk.

0


source







All Articles