Why am I getting * worse performance with a * larger buffer in my BufferedReader?

I am getting weird results that I cannot explain with help BufferedReader

when I change the buffer size.

I strongly expected the performance to gradually increase as the buffer size increases with decreasing recoil in a fairly fast mode, and then the performance will be more or less flat. But it seems that after a very modest buffer size, increasing the buffer size makes it slower .

Here's a minimal example. All this is done through a text file and calculates the sum of the line lengths.

public int traverseFile(int bufSize) throws IOException {
    BufferedReader reader = new BufferedReader(new FileReader("words16"), bufSize*1024);
    String line;
    int total=0;
    while ((line=reader.readLine())!=null)
        total+=line.length();
    reader.close();
    return total;
}

      

I tried to compare this with different buffer sizes and the results were rather strange. Performance up to 256 Kbytes; after that moment it gets worse. I wondered if it was time taken to allocate the buffer, so I tried to add something so that it always allocates the same total amount of memory (see second line below):

public int traverseFile(int bufSize) throws IOException {
    byte[] pad = new byte[(65536-bufSize)*1024];
    BufferedReader reader = new BufferedReader(new FileReader("words16"), bufSize*1024);
    String line;
    int total=0;
    while ((line=reader.readLine())!=null)
        total+=line.length();
    reader.close();
    return total;
}

      

This makes no chance. I am still getting the same results on two different machines. Here are the full results:

Benchmark                                        Mode  Samples    Score   Error  Units
j.t.BufferSizeBenchmark.traverse_test1_4K        avgt      100  363.987 ± 1.901  ms/op
j.t.BufferSizeBenchmark.traverse_test2_16K       avgt      100  356.551 ± 0.330  ms/op
j.t.BufferSizeBenchmark.traverse_test3_64K       avgt      100  353.462 ± 0.557  ms/op
j.t.BufferSizeBenchmark.traverse_test4_256K      avgt      100  350.822 ± 0.562  ms/op
j.t.BufferSizeBenchmark.traverse_test5_1024K     avgt      100  356.949 ± 0.338  ms/op
j.t.BufferSizeBenchmark.traverse_test6_4096K     avgt      100  358.377 ± 0.388  ms/op
j.t.BufferSizeBenchmark.traverse_test7_16384K    avgt      100  367.890 ± 0.393  ms/op
j.t.BufferSizeBenchmark.traverse_test8_65536K    avgt      100  363.271 ± 0.228  ms/op

      

As you can see, the sweet spot is around 256KB. The difference is not huge, but it is definitely measurable.

All I can think of is that it might be memory cache related. Is it because the RAM that is being written is farther from the RAM that is being read? But if this is a circular buffer, I'm not even sure if it's true: what is written will only be behind what is read.

The file words16

is 80MB so I can't publish it, but it's a Fedora standard file /usr/share/dict/words

, sixteen times. I can find a way to post the link if needed.

Here's the comparison code:

@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(1)
@Warmup(iterations = 30, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 100, time = 10000, timeUnit = TimeUnit.MILLISECONDS)
@State(Scope.Thread)
@Threads(1)
@Fork(1)
public class BufferSizeBenchmark {

    public int traverseFile(int bufSize) throws IOException {
        byte[] pad = new byte[(65536-bufSize)*1024];
        BufferedReader reader = new BufferedReader(new FileReader("words16"), bufSize*1024);
        String line;
        int total=0;
        while ((line=reader.readLine())!=null)
            total+=line.length();
        reader.close();
        return total;
    }

    @Benchmark
    public int traverse_test1_4K() throws IOException {
        return traverseFile(4);
    }

    @Benchmark
    public int traverse_test2_16K() throws IOException {
        return traverseFile(16);
    }

    @Benchmark
    public int traverse_test3_64K() throws IOException {
        return traverseFile(64);
    }

    @Benchmark
    public int traverse_test4_256K() throws IOException {
        return traverseFile(256);
    }

    @Benchmark
    public int traverse_test5_1024K() throws IOException {
        return traverseFile(1024);
    }

    @Benchmark
    public int traverse_test6_4096K() throws IOException {
        return traverseFile(4096);
    }

    @Benchmark
    public int traverse_test7_16384K() throws IOException {
        return traverseFile(16384);
    }

    @Benchmark
    public int traverse_test8_65536K() throws IOException {
        return traverseFile(65536);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(
                        ".*" + BufferSizeBenchmark.class.getSimpleName() + ".*")
                .forks(1).build();

        new Runner(opt).run();
    }

}

      

Why do I get worse when I increase the buffer size?

+3


source to share


2 answers


This most likely affects the size of the cache line. Since the cache uses an LRU eviction policy using a buffer that is too large, you do what you wrote in the "seed" buffer, which must be evicted before you can read it.



0


source


256k is typical processor cache size! What type of processor have you tested?

So what happens: if you are reading chunks of 256K or less, the content that was written to the buffer is still in the CPU cache when the read accesses it. If you have more than 256k blocks, then the last 256k that were read are in the CPU cache, so when the read starts from the beginning, the contents should be fetched from main memory.



The second problem is buffer allocation. The fill buffer trick is smart, but doesn't really average the allocation cost. The reason for this is that the real cost of allocation is not memory reservation, but flushing it. In addition, the OS can be transferred to the card in real memory until the moment it is first accessed. But you never access the fill buffer.

0


source







All Articles