How do I sync C & C ++ libraries with minimal performance penalty?

I have a C library with many math routines for working with vectors, matrices, quaternions, etc. It should stay in C because I often use it for embedded work and as a Lua extension. Also, I have C ++ class wrappers to allow for better object management and operator overloading for math operations using the C API. The wrapper only consists of a header file, and use as much as possible on inline.

Is there a noticeable penalty for packaging C code from porting and nesting the implementation directly into a C ++ class? This library is used in temporary mission critical applications. So, does eliminating indirection go up to maintain the headache of serving two ports?

Example C interface:

typedef float VECTOR3[3];

void v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs);

      

C ++ wrapper example:

class Vector3
{
private:
    VECTOR3 v_;

public:
    // copy constructors, etc...

    Vector3& operator+=(const Vector3& rhs)
    {
        v3_add(&this->v_, this->v_, const_cast<VECTOR3> (rhs.v_));
        return *this;
    }

    Vector3 operator+(const Vector3& rhs) const
    {
        Vector3 tmp(*this);
        tmp += rhs;
        return tmp;
    }

    // more methods...
};

      

+1


source to share


6 answers


Your wrapper itself will be inline, however method calls in the C library will usually not. (This would require the technical feasibility of optimizing connection times, but AFAIK at best in today's tools)

Generally, calling a function itself is not very expensive. In recent years, the cost of the cycle has decreased significantly, and it can be easily predicted, so that the penalty for an execution as such is negligible.

However, inlining opens the door to more optimization: if you have v = a + b + c, your wrapper class forces you to generate stack variables, whereas for inline calls, most of the data can be stored on the FPU stack. In addition, inline code allows for simplification of instructions, considering constant values, etc.

So, if the rule is before you invest , or rather, I expect there will be a few improvements here.




A typical solution is to cast the C implementation into a format that can be used as built-in functions or as the body of "C":

// V3impl.inl
void V3DECL v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs)
{
    // here you maintain the actual implementations
    // ...
}

// C header
#define V3DECL 
void V3DECL v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs);

// C body
#include "V3impl.inl"


// CPP Header
#define V3DECL inline
namespace v3core {
  #include "V3impl.inl"
} // namespace

class Vector3D { ... }

      

This seems to only make sense for the chosen methods, compared to simple bodies. I would move the methods to a separate namespace for implementation in C ++ since you usually don't need them directly.

(Note that the inline is a compiler hint, it does not force the method to be inline. But this is good: if the size of the inner loop code exceeds the instruction cache, nesting easily degrades performance)

Whether it is possible to allow pass / return by reference depends on the strength of your compiler, I've seen a lot of foo (X * out) forcing stack variables whereas X foo () stores values ​​in registers.

+2


source


If you just wrap your C library calls into functions of a C ++ class (in other words, C ++ functions do nothing but call C functions), then the compiler will optimize those calls so that there is no performance penalty.



+4


source


As with the performance question, you will be prompted to measure to get your answer (and this is the strictly correct answer).

But generally, for simple inline methods that can actually be inlined, you won't see a performance penalty. In general, an inline method that does nothing but pass a call to another function is a great candidate for nesting.

However, even if your wrapper methods weren't inlined, I suspect you won't notice any performance penalty - not even measurable - unless the wrapper method is called in a critical loop. Even then, it will probably only be measurable if the wrapped function itself doesn't put a lot of effort into it.

This question is about the last thing to worry about. First, think about how to make the code correct, maintainable, and use the appropriate algorithms.

+3


source


As usual with everything about optimization, the answer is that you have to evaluate the performance yourself before you know whether to optimize.

  • Define two different functions, one calls C-style functions directly and the other calls through a wrapper. See which one is faster, or if the difference is within your measurement error (which would mean there is no difference you can measure).
  • Look at the assembly code generated by the two functions in the previous step (on gcc, use -S

    or -save-temps

    ). See if the compiler did something stupid, or if your wrappers have a performance bug.

If the performance difference is too big to not use the wrapper, overriding is not a good idea, as you risk introducing errors (which might even lead to results that look reasonable, but not true). Even if the difference is large, it would be easier and less risky to just remember that C ++ is very C compatible and uses your C style library even inside C ++ code.

+2


source


I don't think you will notice much of a difference in performance. Assuming your target framework supports all of your datatypes,

I am coding on DS and several other ARM devices and floats are evil ... I had to typedef float in FixedPoint <16.8>

+1


source


If you are concerned that the overhead of calling functions is slowing things down, why not try pasting in C code or turning it into macros?

Also, why not improve the const-correctness of your C code while it is being found - const_cast should really be used sparingly, especially on interfaces that you control.

+1


source







All Articles