What happens to deferred exit instructions on a branch in an ARM build?
I am optimizing an algorithm in an ARM assembly and have to figure out in what order to place instructions to minimize pipelines. The loop counter at http://pulsar.webshaker.net/ccc/index.php?lng=us is very helpful in doing this, but is unaware of what is going on when the functions / branches are called. What I want to do is basically (this is just an example):
mul r4, r0, r1
mov r0, #0
mov r1, #12
mov r4, r4, ASR #14
str r4, [r5]
bl foo
The pipeline breakdown between instructions mul
and is mov
pretty terrible, and there is nothing stopping me from making a function call between them. But what exactly happens to the pipeline when I do an affiliate? I know what foo
to do push {r4-r12, lr}
as the first instruction. I see two possible outcomes:
- The branch instruction takes multiple loops that allow the instruction to
mul
deliver its result before executionpush
, thereby reducing pipeline counterparts. - The pipeline stall is increasing as it
push
takesr4
several cycles before it is executed (this was before ARMv7 IIRC, the cycle counter in the link doesn't seem to think it is necessary).
In short:
What happens to lazy statements ( mul
is the main example) when you make a function call (which is supposed to push a register on the stack) or even a normal branch?
source to share
If I understand that you don't need to do
mov r4, r4, ASR #14
str r4, [r5]
before calling. Making a call before mov
bl foo
mov r4, r4, ASR #14
str r4, [r5]
- a good idea.
The mule will have more time to finish while talking. STM will be a problem to be understood. You can of course press R4 before calculating it.
If foo is an asm function, you can save R4 later in foo (you can probably try not to use r4 and then not save it).
if the function foo is a C function (or if you can change the push command). use r12 instead of r4 as the MUL destination register.
R12 will be required later by STM instruction. Then it is possible that mul has enough time to complete before the destination register (R12) needs STM!
source to share
I'm not sure what the answer is, but I'm sure if the answer is public, it will be in the Cortex-A8 Technical Reference Manual .
source to share