What is __memset_sse2 and why does it execute so many instructions?

I have two C ++ implementations of the algorithm, call them A

and B

. The only difference between A

and B

is what A

uses std::unordered_map<int, int> hashmap;

but B

uses google::dense_hash_map<int, int> hashmap;

.

I found the input where is A

much slower compared to B

and I cannot figure out why.

For the same input, I run sudo perf record -e instructions ./A input.txt

and then I get this result:

Overhead  Command  Shared Object        Symbol
  65.90%  A        libc-2.23.so         [.] __memset_sse2
   6.63%  A        libc-2.23.so         [.] _int_malloc
   3.44%  A        libc-2.23.so         [.] malloc
   2.61%  A        libc-2.23.so         [.] _int_free

      

When I do the same for B

which is faster, I get this:

Overhead  Command  Shared Object        Symbol
  15.17%  B        libc-2.23.so         [.] _int_malloc
  14.94%  B        B                    [.] B::func1()
   5.72%  B        B                    [.] B::func2()
   5.58%  B        B                    [.] B::func3()

      

What is it __memset_sse2

and why is it following so many instructions?

+3


source to share


1 answer


__memset_sse2

used in function implementationsmemset

optimized for architectures supporting SSE2 . When you see that it takes about two-thirds of the execution time, it means that most of the time is spent initializing one block of memory, which is relatively large. The hash table is most likely used memset

to initialize its hash codes.

It looks like it is google::dense_hash_map

optimized for size, so it doesn't need to initialize that much memory when running your example.



Note. Seeing that initialization takes two-thirds of the CPU time may mean that your test is not designed correctly. Perhaps the amount of data you put in your hash container is relatively small, or you keep rebuilding your container on each run.

+4


source







All Articles