What is __memset_sse2 and why does it execute so many instructions?
I have two C ++ implementations of the algorithm, call them A
and B
. The only difference between A
and B
is what A
uses std::unordered_map<int, int> hashmap;
but B
uses google::dense_hash_map<int, int> hashmap;
.
I found the input where is A
much slower compared to B
and I cannot figure out why.
For the same input, I run sudo perf record -e instructions ./A input.txt
and then I get this result:
Overhead Command Shared Object Symbol
65.90% A libc-2.23.so [.] __memset_sse2
6.63% A libc-2.23.so [.] _int_malloc
3.44% A libc-2.23.so [.] malloc
2.61% A libc-2.23.so [.] _int_free
When I do the same for B
which is faster, I get this:
Overhead Command Shared Object Symbol
15.17% B libc-2.23.so [.] _int_malloc
14.94% B B [.] B::func1()
5.72% B B [.] B::func2()
5.58% B B [.] B::func3()
What is it __memset_sse2
and why is it following so many instructions?
source to share
__memset_sse2
used in function implementationsmemset
optimized for architectures supporting SSE2 . When you see that it takes about two-thirds of the execution time, it means that most of the time is spent initializing one block of memory, which is relatively large. The hash table is most likely used memset
to initialize its hash codes.
It looks like it is google::dense_hash_map
optimized for size, so it doesn't need to initialize that much memory when running your example.
Note. Seeing that initialization takes two-thirds of the CPU time may mean that your test is not designed correctly. Perhaps the amount of data you put in your hash container is relatively small, or you keep rebuilding your container on each run.
source to share