C ++ virtual function compiler optimization
class Base
{
public:
virtual void fnc(size_t nm)
{
// do some work here
}
void process()
{
for(size_t i = 0; i < 1000; i++)
{
fnc(i);
}
}
}
Can the C ++ compiler optimize calls to fnc from funtion, assuming it will be the same function every time it is called inside a loop? Or will it fetch the address of the function from the vtable every time the function is called?
source to share
I checked an example at godbolt.org. as a result NO, none of the compilers optimize this.
Here's the testing source:
class Base
{
public:
// made it pure virtual to decrease clutter
virtual void fnc(int nm) =0;
void process()
{
for(int i = 0; i < 1000; i++)
{
fnc(i);
}
}
};
void test(Base* b ) {
return b->process();
}
and the generated asm:
test(Base*):
push rbp ; setup function call
push rbx
mov rbp, rdi ; Base* rbp
xor ebx, ebx ; int ebx=0;
sub rsp, 8 ; advance stack ptr
.L2:
mov rax, QWORD PTR [rbp+0] ; read 8 bytes from our Base*
; rax now contains vtable ptr
mov esi, ebx ; int parameter for fnc
add ebx, 1 ; i++
mov rdi, rbp ; (Base*) this parameter for fnc
call [QWORD PTR [rax]] ; read vtable and call fnc
cmp ebx, 1000 ; back to the top of the loop
jne .L2
add rsp, 8 ; reset stack ptr and return
pop rbx
pop rbp
ret
as you can see it reads the vtable for every call. I think this is because the compiler cannot prove that you are not modifying the vtable inside the function call (for example, if you call a new location or something stupid), so technically the virtual function call can change between iterations.
source to share
Usually compilers are allowed to optimize anything that does not change the observed behavior of the program. There are some exceptions, such as the exception of non-trivial copy constructors when returning from a function, but it can be assumed that any change to expected code generation that does not change the result or side effects of a program in an abstract C ++ machine could be done by the compiler.
So, can function devirtualization change the observed behavior? According to this article , yes.
Matching pass:
Optimizer[...] will have to assume that [virtual function] can change the vptr in the passed object. [...]
void A::foo() { // virtual static_assert(sizeof(A) == sizeof(Derived)); new(this) Derived; }
This is a call to place a new statement - it doesn't allocate new memory, it just creates a new object at the location provided. So, having built a Derived object where the type A object was, we will change the vptr to point to Deriveds vtable. Is this code even legal? The C ++ standard says yes.
Therefore, if the compiler does not have access to the definition of the virtual function (and knows the specific type *this
when compiling the type), then this optimization is risky.
According to this same article, you are using -fstrict-vtable-pointers
on Clang to allow this optimization, at the risk of making the code less C ++ standard.
source to share
I wrote a very small implementation and compiled them with g++ --save-temps opt.cpp
. This flag stores a temporary preprocessed file, an assembly file, and an object file. I ran it once with the keyword virtual
and once without it. Here's the program.
class Base
{
public:
virtual int fnc(int nm)
{
int i = 0;
i += 3;
return i;
}
void process()
{
int x = 9;
for(int i = 0; i < 1000; i++)
{
x += i;
}
}
};
int main(int argc, char* argv[]) {
Base b;
return 0;
}
When I ran with the keyword virtual
, the resulting build in the x86_64 Linux box was:
.file "opt.cpp" .section .text._ZN4Base3fncEi, "axG", @ progbits, _ZN4Base3fncEi, comdat .align 2 .weak _ZN4Base3fncEi .type _ZN4Base3fncEi, @function _ZN4Base3fncEi: .LFB0: .cfi_startproc pushq% rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq% rsp,% rbp .cfi_def_cfa_register 6 movq% rdi, -24 (% rbp) movl% esi, -28 (% rbp) movl $ 0, -4 (% rbp) addl $ 3, -4 (% rbp) movl -4 (% rbp),% eax popq% rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size _ZN4Base3fncEi,.-_ ZN4Base3fncEi .text .globl main .type main, @function main: .LFB2: .cfi_startproc pushq% rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq% rsp,% rbp .cfi_def_cfa_register 6 subq $ 32,% rsp movl% edi, -20 (% rbp) movq% rsi, -32 (% rbp) movq% fs: 40,% rax movq% rax, -8 (% rbp) xorl% eax,% eax leaq 16 + _ZTV4Base (% rip),% rax movq% rax, -16 (% rbp) movl $ 0,% eax movq -8 (% rbp),% rdx xorq% fs: 40,% rdx je .L5 call __stack_chk_fail @ PLT .L5: leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE2: .size main,.-main .weak _ZTV4Base .section .data.rel.ro.local._ZTV4Base, "awG", @ progbits, _ZTV4Base, comdat .align 8 .type _ZTV4Base, @object .size _ZTV4Base, 24 _ZTV4Base: .quad 0 .quad _ZTI4Base .quad _ZN4Base3fncEi .weak _ZTI4Base .section .data.rel.ro._ZTI4Base, "awG", @ progbits, _ZTI4Base, comdat .align 8 .type _ZTI4Base, @object .size _ZTI4Base, 16 _ZTI4Base: .quad _ZTVN10__cxxabiv117__class_type_infoE + 16 .quad _ZTS4Base .weak _ZTS4Base .section .rodata._ZTS4Base, "aG", @ progbits, _ZTS4Base, comdat .type _ZTS4Base, @object .size _ZTS4Base, 6 _ZTS4Base: .string "4Base" .ident "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005" .section .note.GNU-stack, "", @ progbits
Without the keyword, the virtual
final build was:
.file "opt.cpp" .text .globl main .type main, @function main: .LFB2: .cfi_startproc pushq% rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq% rsp,% rbp .cfi_def_cfa_register 6 movl% edi, -20 (% rbp) movq% rsi, -32 (% rbp) movl $ 0,% eax popq% rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE2: .size main,.-main .ident "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005" .section .note.GNU-stack, "", @ progbits
Now, unlike the question posed, this example doesn't even use a virtual method and the resulting assembly is much larger. I haven't tried to compile with optimizations, but let it be clear.
source to share