C ++ virtual function compiler optimization

class Base 
{
public:
    virtual void fnc(size_t nm) 
    {
        // do some work here
    }

    void process()
    {
        for(size_t i = 0; i < 1000; i++)
        {
            fnc(i);
        }
    }
}  

      

Can the C ++ compiler optimize calls to fnc from funtion, assuming it will be the same function every time it is called inside a loop? Or will it fetch the address of the function from the vtable every time the function is called?

+3


source to share


3 answers


I checked an example at godbolt.org. as a result NO, none of the compilers optimize this.

Here's the testing source:

class Base 
{
public:
// made it pure virtual to decrease clutter
    virtual void fnc(int nm) =0;
    void process()
    {
        for(int i = 0; i < 1000; i++)
        {
            fnc(i);
        }
    }
};

void test(Base* b ) {
    return b->process();
}

      



and the generated asm:

test(Base*):
        push    rbp       ; setup function call 
        push    rbx
        mov     rbp, rdi  ; Base* rbp 
        xor     ebx, ebx  ; int ebx=0;
        sub     rsp, 8    ; advance stack ptr
.L2:
        mov     rax, QWORD PTR [rbp+0]  ; read 8 bytes from our Base*
                                        ; rax now contains vtable ptr 
        mov     esi, ebx                ; int parameter for fnc
        add     ebx, 1                  ; i++
        mov     rdi, rbp                ; (Base*) this parameter for fnc
        call    [QWORD PTR [rax]]       ; read vtable and call fnc
        cmp     ebx, 1000               ; back to the top of the loop 
        jne     .L2
        add     rsp, 8                  ; reset stack ptr and return
        pop     rbx
        pop     rbp
        ret

      

as you can see it reads the vtable for every call. I think this is because the compiler cannot prove that you are not modifying the vtable inside the function call (for example, if you call a new location or something stupid), so technically the virtual function call can change between iterations.

+1


source


Usually compilers are allowed to optimize anything that does not change the observed behavior of the program. There are some exceptions, such as the exception of non-trivial copy constructors when returning from a function, but it can be assumed that any change to expected code generation that does not change the result or side effects of a program in an abstract C ++ machine could be done by the compiler.

So, can function devirtualization change the observed behavior? According to this article , yes.

Matching pass:



Optimizer

[...] will have to assume that [virtual function] can change the vptr in the passed object. [...]

void A::foo() { // virtual 
 static_assert(sizeof(A) == sizeof(Derived)); 
 new(this) Derived; 
}

      

This is a call to place a new statement - it doesn't allocate new memory, it just creates a new object at the location provided. So, having built a Derived object where the type A object was, we will change the vptr to point to Deriveds vtable. Is this code even legal? The C ++ standard says yes.

Therefore, if the compiler does not have access to the definition of the virtual function (and knows the specific type *this

when compiling the type), then this optimization is risky.

According to this same article, you are using -fstrict-vtable-pointers

on Clang to allow this optimization, at the risk of making the code less C ++ standard.

+1


source


I wrote a very small implementation and compiled them with g++ --save-temps opt.cpp

. This flag stores a temporary preprocessed file, an assembly file, and an object file. I ran it once with the keyword virtual

and once without it. Here's the program.

class Base
{
    public:
        virtual int fnc(int nm)
        {
           int i = 0;
           i += 3;
           return i;
        }

        void process()
        {
           int x = 9;
           for(int i = 0; i < 1000; i++)
           {
              x += i;
           }
       }
   };

   int main(int argc, char* argv[]) {
       Base b;

       return 0;
   }

      

When I ran with the keyword virtual

, the resulting build in the x86_64 Linux box was:

.file "opt.cpp"
    .section .text._ZN4Base3fncEi, "axG", @ progbits, _ZN4Base3fncEi, comdat
    .align 2
    .weak _ZN4Base3fncEi
    .type _ZN4Base3fncEi, @function
_ZN4Base3fncEi:
.LFB0:
    .cfi_startproc
    pushq% rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq% rsp,% rbp
    .cfi_def_cfa_register 6
    movq% rdi, -24 (% rbp)
    movl% esi, -28 (% rbp)
    movl $ 0, -4 (% rbp)
    addl $ 3, -4 (% rbp)
    movl -4 (% rbp),% eax
    popq% rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size _ZN4Base3fncEi,.-_ ZN4Base3fncEi
    .text
    .globl main
    .type main, @function
main:
.LFB2:
    .cfi_startproc
    pushq% rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq% rsp,% rbp
    .cfi_def_cfa_register 6
    subq $ 32,% rsp
    movl% edi, -20 (% rbp)
    movq% rsi, -32 (% rbp)
    movq% fs: 40,% rax
    movq% rax, -8 (% rbp)
    xorl% eax,% eax
    leaq 16 + _ZTV4Base (% rip),% rax
    movq% rax, -16 (% rbp)
    movl $ 0,% eax
    movq -8 (% rbp),% rdx
    xorq% fs: 40,% rdx
    je .L5
    call     __stack_chk_fail @ PLT
.L5:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size main,.-main
    .weak _ZTV4Base
    .section .data.rel.ro.local._ZTV4Base, "awG", @ progbits, _ZTV4Base, comdat
    .align 8
    .type _ZTV4Base, @object
    .size _ZTV4Base, 24
_ZTV4Base:
    .quad 0
    .quad _ZTI4Base
    .quad _ZN4Base3fncEi
    .weak _ZTI4Base
    .section .data.rel.ro._ZTI4Base, "awG", @ progbits, _ZTI4Base, comdat
    .align 8
    .type _ZTI4Base, @object
    .size _ZTI4Base, 16
_ZTI4Base:
    .quad _ZTVN10__cxxabiv117__class_type_infoE + 16
    .quad _ZTS4Base
    .weak _ZTS4Base
    .section .rodata._ZTS4Base, "aG", @ progbits, _ZTS4Base, comdat
    .type _ZTS4Base, @object
    .size _ZTS4Base, 6
_ZTS4Base:
    .string "4Base"
    .ident "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005"
    .section .note.GNU-stack, "", @ progbits

Without the keyword, the virtual

final build was:

    .file "opt.cpp"
    .text
    .globl main
    .type main, @function
main:
.LFB2:
    .cfi_startproc
    pushq% rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq% rsp,% rbp
    .cfi_def_cfa_register 6
    movl% edi, -20 (% rbp)
    movq% rsi, -32 (% rbp)
    movl $ 0,% eax
    popq% rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size main,.-main
    .ident "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005"
    .section .note.GNU-stack, "", @ progbits

Now, unlike the question posed, this example doesn't even use a virtual method and the resulting assembly is much larger. I haven't tried to compile with optimizations, but let it be clear.

0


source







All Articles