Mix ABI hard drive with code that works without VFP / NEON

Question

Mix ABI hard drive with code that works without VFP / NEON

Scenario

My target platform is ARM cortex-A7 with VFP / neon enabled, where my system software running on the target platform does not always include VPF / neon, but only enables VPF / neon on demand (by calling some system API).
My toolchain is a bare metal GCC-4.7.4 cross compiler on the pc-cygwin host (arm-none-eabi).

We have a C source file a.c

, the code can run when the system software disables VPF / neon. This a.c

code is pure integer with intensive long long

(64-bit integer) operations. In applying the GCC option -mfloat-abi=hard

or -mfloat-abi=softfp

to compile a.c

will be established some guidelines on the use of VFP / neon registers (eg vldr d7,[..]

, vstr d7,[sp,..]

etc.). If you run this generated code when VFP / neon is off, an undefined error occurs on the target platform. When using the GCC option, the VFP / neon register will not be used -mfloat-abi=soft

for compilation a.c

and will solve the undefined error.

On the other hand, there is another C source file b.c

that contains floating point / vector operations (VFP / neon) and only works if the system software includes VFP / neon. I would like to compile b.c

with the GCC option -mfloat-abi=hard

instead -mfloat-abi=softfp

for better performance.

If b.c

compiled with -mfloat-abi=hard

and a.c

compiled with option -mfloat-abi=soft

, they cannot be linked together, and the linker will complain, "... uses VFP register arguments, but XXX does not."

Question

Is there any way (with GCC options in toolchain, install configurations, versions) to compile the specified pure integer a.c

, hard float ABI compliant (passing floating point arguments through VFP / neon registers), but without creating any VFP / neon or a register for pure integer operations?

Notes

As a workaround, both a.c

option -mfloat-abi=soft

and b.c

option -mfloat-abi=softfp

can be linked, but lower performance is undesirable.
The GCC 4.9 release notes said: "By default, the use of enhanced SIMD (Neon) for 64-bit scalar computing is disabled. It has been found to generate better code only in a small number of cases. It can be enabled with the option -mneon-for-64bits

." This seems to be related to my problem, but I am using GCC 4.7.4.

+3

gcc arm neon

anmin Jul 24 15 at 7:06

source to share

1 answer

Jake 'Alquimista' LEE · Answer 1 · 2015-07-24T13:23:04+0000

You should stick with the option softfp

regardless.

In fact, there is no difference between hard

and softfp

at all when no type parameter is passed float

.

Even so, softfp

causes very little overhead at the start outside of the possible loop. (moving the contents of the ARM registers to the VFP registers and eventually some loads from the stack when there are more than four parameters)

Since vfp does not support int64

, instructions starting with v

must be NEON. You should try -fno-tree-vectorize

disabling the dumb automatic vetoing that is automatically enabled with the option -o3

.

PS: If you are so knowledgeable about performance why not write optimized NEON codes? Properly written NEON codes make distinctive jaw-dropping as opposed to annoying automatic vectorized ones that are completely useless.

Mix ABI hard drive with code that works without VFP / NEON

Scenario

Question

Notes

More articles: