Why am I getting * worse performance with a * larger buffer in my BufferedReader?
I am getting weird results that I cannot explain with help BufferedReader
when I change the buffer size.
I strongly expected the performance to gradually increase as the buffer size increases with decreasing recoil in a fairly fast mode, and then the performance will be more or less flat. But it seems that after a very modest buffer size, increasing the buffer size makes it slower .
Here's a minimal example. All this is done through a text file and calculates the sum of the line lengths.
public int traverseFile(int bufSize) throws IOException {
BufferedReader reader = new BufferedReader(new FileReader("words16"), bufSize*1024);
String line;
int total=0;
while ((line=reader.readLine())!=null)
total+=line.length();
reader.close();
return total;
}
I tried to compare this with different buffer sizes and the results were rather strange. Performance up to 256 Kbytes; after that moment it gets worse. I wondered if it was time taken to allocate the buffer, so I tried to add something so that it always allocates the same total amount of memory (see second line below):
public int traverseFile(int bufSize) throws IOException {
byte[] pad = new byte[(65536-bufSize)*1024];
BufferedReader reader = new BufferedReader(new FileReader("words16"), bufSize*1024);
String line;
int total=0;
while ((line=reader.readLine())!=null)
total+=line.length();
reader.close();
return total;
}
This makes no chance. I am still getting the same results on two different machines. Here are the full results:
Benchmark Mode Samples Score Error Units
j.t.BufferSizeBenchmark.traverse_test1_4K avgt 100 363.987 ± 1.901 ms/op
j.t.BufferSizeBenchmark.traverse_test2_16K avgt 100 356.551 ± 0.330 ms/op
j.t.BufferSizeBenchmark.traverse_test3_64K avgt 100 353.462 ± 0.557 ms/op
j.t.BufferSizeBenchmark.traverse_test4_256K avgt 100 350.822 ± 0.562 ms/op
j.t.BufferSizeBenchmark.traverse_test5_1024K avgt 100 356.949 ± 0.338 ms/op
j.t.BufferSizeBenchmark.traverse_test6_4096K avgt 100 358.377 ± 0.388 ms/op
j.t.BufferSizeBenchmark.traverse_test7_16384K avgt 100 367.890 ± 0.393 ms/op
j.t.BufferSizeBenchmark.traverse_test8_65536K avgt 100 363.271 ± 0.228 ms/op
As you can see, the sweet spot is around 256KB. The difference is not huge, but it is definitely measurable.
All I can think of is that it might be memory cache related. Is it because the RAM that is being written is farther from the RAM that is being read? But if this is a circular buffer, I'm not even sure if it's true: what is written will only be behind what is read.
The file words16
is 80MB so I can't publish it, but it's a Fedora standard file /usr/share/dict/words
, sixteen times. I can find a way to post the link if needed.
Here's the comparison code:
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(1)
@Warmup(iterations = 30, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 100, time = 10000, timeUnit = TimeUnit.MILLISECONDS)
@State(Scope.Thread)
@Threads(1)
@Fork(1)
public class BufferSizeBenchmark {
public int traverseFile(int bufSize) throws IOException {
byte[] pad = new byte[(65536-bufSize)*1024];
BufferedReader reader = new BufferedReader(new FileReader("words16"), bufSize*1024);
String line;
int total=0;
while ((line=reader.readLine())!=null)
total+=line.length();
reader.close();
return total;
}
@Benchmark
public int traverse_test1_4K() throws IOException {
return traverseFile(4);
}
@Benchmark
public int traverse_test2_16K() throws IOException {
return traverseFile(16);
}
@Benchmark
public int traverse_test3_64K() throws IOException {
return traverseFile(64);
}
@Benchmark
public int traverse_test4_256K() throws IOException {
return traverseFile(256);
}
@Benchmark
public int traverse_test5_1024K() throws IOException {
return traverseFile(1024);
}
@Benchmark
public int traverse_test6_4096K() throws IOException {
return traverseFile(4096);
}
@Benchmark
public int traverse_test7_16384K() throws IOException {
return traverseFile(16384);
}
@Benchmark
public int traverse_test8_65536K() throws IOException {
return traverseFile(65536);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(
".*" + BufferSizeBenchmark.class.getSimpleName() + ".*")
.forks(1).build();
new Runner(opt).run();
}
}
Why do I get worse when I increase the buffer size?
source to share
256k is typical processor cache size! What type of processor have you tested?
So what happens: if you are reading chunks of 256K or less, the content that was written to the buffer is still in the CPU cache when the read accesses it. If you have more than 256k blocks, then the last 256k that were read are in the CPU cache, so when the read starts from the beginning, the contents should be fetched from main memory.
The second problem is buffer allocation. The fill buffer trick is smart, but doesn't really average the allocation cost. The reason for this is that the real cost of allocation is not memory reservation, but flushing it. In addition, the OS can be transferred to the card in real memory until the moment it is first accessed. But you never access the fill buffer.
source to share