Assembler on 64-bit iOS (A64)

I am trying to replace certain methods with asm implementations. Target is arm64 on iOS (iPhone 5S or newer). I want to use a dedicated assembler file as inline assembler comes with additional overhead and is quite cumbersome to use with A64 memory offsets .

There isn't a lot of documentation on the internet, so I'm not sure how much I can get it done. Therefore, I will describe the process I followed to move the function to ASM.


The candidate function for this question is a 256 bit integer comparison function.

UInt256.h

@import Foundation;

typedef struct {
    uint64_t value[4];
} UInt256;

bool eq256(const UInt256 *lhs, const UInt256 *rhs);

      

Bridging-Header.h

#import "UInt256.h"

      

Reference implementation (Swift)

let result = x.value.0 == y.value.0
          && x.value.1 == y.value.1
          && x.value.2 == y.value.2
          && x.value.3 == y.value.3

      

UInt256.s

.globl _eq256
.align 2
_eq256:
    ldp        x9, x10, [x0]
    ldp       x11, x12, [x1]
    cmp        x9, x11
    ccmp      x10, x12, 0, eq
    ldp        x9, x10, [x0, 16]
    ldp       x11, x12, [x1, 16]
    ccmp       x9, x11, 0, eq
    ccmp      x10, x12, 0, eq
    cset       x0, eq
    ret

      


Resources I Found


Questions

I tested the code with XCTest, generating two random numbers, running Swift and Asm implementations on them, and checking that both report the same result. The code seems to be correct.

  • In the asm file: .align

    seems to be for optimization - is it really necessary and if so what is the correct value for alignment?

  • Is there a source that clearly explains how the calling convention is for my particular function signature?

    and. How can I know that inputs are actually being passed through x0

    and x1

    ?

    b. How can I know what is the correct way to pass the output to x0

    ?

    from. How can I know what clobber x9

    is safe - x12

    and status registers?

    Is the function named the same when I call it from C instead of Swift?

  • What does "indirect result location register" mean to describe a register r8

    in an ARM document?

  • Do I need any other assembly directives besides .globl

    ?

  • When I set breakpoints, the debugger seems to get confused where it really is, showing the wrong lines, etc. Am I doing something wrong?

+3


source to share


1 answer


  • A directive is required for the program to be correct .align 2

    . A64 instructions must be aligned on 32-bit boundaries.
  • The documentation you provided seems clear to me and unfortunately this is not the place to ask for recommendations.
    • You can determine what registers are lhs

      both rhs

      stored in X0

      and X1

      by following the instructions in section 5.4.2 (Parameter Passing Rules) of the ARM 64 Procedure Calling Standard (AArch64) that you linked. Since parameters are pointers, the only applicable rule is C.7.
    • You can determine which register is used to return values ​​by following the instructions in Section 5.5 (Returning Results). It's just that you follow the same rules as for the parameters. Since the function returns an integer, only rule C.7 applies, so the value is returned at X0.
    • It is safe to change the values ​​stored in registers X9 through X12 because they are listed as temporary registers in the table in Section 5.1.1 (General Purpose Registers).
    • The question is whether a function is actually called in the same way in Swift as in C. Both the standard procedure call documents and the Apple-specific exception document are defined in C and C ++ terms. Presumably Swift follows the same conventions, but I don't know if Apple has made this explicit.
  • Objective R8 is described in Section 5.5 (Returning a Result). It is used when the return value is too large to fit into the registers used to return values. In this case, the caller creates a buffer for the return value and places it in R8. The function then copies the return value into this register.
  • I don't believe you will need anything else in your example build program.
  • You have asked too many questions. You should post a separate and more detailed question describing your problem.

I have to say that one of the benefits of writing code with inline assembly is that you don't have to worry about it. Something like the following untested C code shouldn't be too cumbersome:



bool eq256(const UInt256 *lhs, const UInt256 *rhs) {
     const __int128 *lv = (__int128 const *) lhs->value;
     const __int128 *rv = (__int128 const *) rhs->value;

     uint64_t l1, l2, r1, r2, ret;

     asm("ldp       %1, %2, %5\n\t"
         "ldp       %3, %4, %6\n\t"
         "cmp       %1, %3\n\t"
         "ccmp      %2, %4, 0, eq\n\t"
         "ldp       %1, %2, %7\n\t"
         "ldp       %3, %4, %8\r\n"
         "ccmp      %1, %3, 0, eq\n\t"
         "ccmp      %2, %4, 0, eq\n\t"
         "cset      %0, eq\n\t",
         : "=r" (ret), "=r" (l1), "=r" (l2), "=r" (r1), "=r" (r2)
         : "Ump" (lv[0]), "Ump" (rv[0]), "Ump" (lv[1]), "Ump" (rv[1])
         : "cc")

     return ret;
}

      

Ok, maybe this is a little cumbersome.

+1


source







All Articles