ARM64 Assembly
What You Will Learn
- How ARM64 registers work (X and W variants)
- Core instructions: mov, add, mul, ldr/str, ldp/stp, branches
- How function prologue and epilogue work on ARM64
- How syscalls work on Linux and macOS ARM64
- What PAC (Pointer Authentication Codes) is
What Is It?
ARM64 (also called AArch64) is the 64-bit ARM instruction set. It is used on Apple Silicon (M1/M2/M3), iOS devices, Android devices, and modern Linux servers. Understanding ARM64 assembly is essential for iOS/macOS security research and embedded exploitation.
Registers
ARM64 registers are either 64-bit X registers (X0..X30) or 32-bit W registers (W0..W30). W registers are the lower 32 bits of the corresponding X register.
| Register | Purpose |
|---|---|
| X0–X7 | Arguments and return values |
| X8 | Indirect result register |
| X9–X15 | Caller-saved (temporary) |
| X19–X28 | Callee-saved |
| X29 (FP) | Frame pointer |
| X30 (LR) | Link register (return address) |
| SP | Stack pointer |
| PC | Program counter (not directly accessible) |
Equivalent x86-64 Registers
RIP → PC
RAX → X0 (first integer/return arg)
RBX → X19 (callee-saved)
RCX → X1
RDX → X2
RSP → SP (stack pointer)
RBP → X29 (frame pointer)
LR (return address) → X30 (link register)
Instructions
mov — Load Immediate
mov x1, #0xbeef
movk x1, #0xdead, lsl 16 ; load upper 16 bits
; result: x1 = 0xdeadbeef
Arithmetic
add X0, X1, X2 ; X0 = X1 + X2
mul X1, X0, X1 ; X1 = X0 * X1
udiv X2, X0, X1 ; X2 = X0 / X1 (unsigned)
madd X3, X0, X1, X2 ; X3 = X2 + (X0 * X1)
msub X3, X0, X1, X2 ; X3 = X2 - (X0 * X1)
Modulus
ARM64 has no modulus instruction. Compute it manually:
5 / 2 = 2 remainder 1
remainder = dividend - (quotient * divisor)
= 5 - (2 * 2) = 1
udiv X2, X0, X1 ; X2 = X0 / X1 (quotient)
msub X3, X2, X1, X0 ; X3 = X0 - (X2 * X1) = remainder
Shifts
lsl X3, X2, #3 ; X3 = X2 << 3 (multiply by 8)
lsr X3, X2, #1 ; X3 = X2 >> 1 (divide by 2)
Load and Store
ldr x0, [x1] ; x0 = *x1 (load 8 bytes from address x1)
ldr x0, [x1, #8] ; x0 = *(x1 + 8)
str x0, [x1] ; *x1 = x0 (store x0 to address x1)
str X4, [X3, #0x10] ; *(X3 + 0x10) = X4
Load/Store Pair
stp X0, X1, [X3] ; stores X0 at [X3], X1 at [X3+8]
ldp X0, X1, [X3] ; loads X0 from [X3], X1 from [X3+8]
; Push/pop on stack
stp x29, x30, [sp, #-16]! ; push X29, X30 (pre-decrement SP)
ldp x29, x30, [sp], #16 ; pop X29, X30 (post-increment SP)
Note: stp x29, x30 saves x29 first, then x30. ldp x29, x30 loads x29 first, then x30.
Indexing Modes
[SP, #-16]! ; pre-indexing: SP -= 16, then access [SP]
[SP], #16 ; post-indexing: access [SP], then SP += 16
[SP, #offset] ; offset: access [SP + offset], SP unchanged
Branches
b #0x40 ; unconditional branch to PC + 0x40
br X0 ; branch to address in X0
cbz x0, label ; branch to label if x0 == 0
cbnz x1, label ; branch to label if x1 != 0
PC-Relative Address
adr x0, my_label ; x0 = address of my_label (PC-relative, like lea rax, [rip+offset])
Function Prologue and Epilogue
; Prologue — save frame pointer and link register
stp x29, x30, [sp, #-16]!
mov x29, sp
; Epilogue — restore and return
ldp x29, x30, [sp], #16
ret
Fibonacci in ARM64
fib:
cmp X0, #1
b.le finish
stp x29, x30, [sp, #-0x20]!
mov x29, sp
sub X1, X0, #1
str X1, [sp, #0x10]
sub X2, X0, #2
str X2, [sp, #0x18]
mov X0, X1
bl fib
str X0, [sp, #0x10]
ldr X2, [sp, #0x18]
mov X0, X2
bl fib
ldr X3, [sp, #0x10]
add X0, X3, X0
ldp x29, x30, [sp], #0x20
finish:
ret
Syscalls
Linux ARM64
mov x8, <syscall_number> ; syscall number in x8
svc #0 ; invoke syscall
Reference: https://arm64.syscall.sh/
macOS ARM64
mov x16, 0x2000000 | <syscall_number> ; macOS uses BSD syscall number with 0x2000000 offset
svc #0x80
Reference: https://github.com/opensource-apple/xnu/blob/master/bsd/kern/syscalls.master
macOS chmod Example
from pwn import *
context.arch = 'aarch64'
asm_bytes = asm("""
mov x0, #-100 ; AT_FDCWD
adr x1, flag ; path
mov x2, #0o777 ; mode
movz x16, #0x1d3
movk x16, #0x2000, lsl #16 ; chmod syscall
svc #0x80
flag:
.ascii "/flag\\0"
""")
PAC (Pointer Authentication Codes)
PAC is a hardware security feature on ARMv8.3-A+ that prevents pointer corruption (ROP/JOP exploits). A cryptographic tag is embedded in unused pointer bits. If the tag does not match when the pointer is used, the CPU faults.
64-bit addresses use only the lower 48 bits (or 39 bits on XNU). The upper bits hold the PAC tag.
PAC Keys
| Key | Used for |
|---|---|
| IA, IB | Instruction pointers (return addresses, code pointers) |
| DA, DB | Data pointers |
| GA | Generic (less common) |
PAC keys are per-process but shared across threads.
PAC Operations
PACIA X8, X9 ; sign X8 using IA key with context X9
PACIZA X8 ; sign X8 using IA key with context 0
AUTIA X8, X9 ; authenticate X8 using IA key with context X9 — faults if invalid
XPACD x1 ; strip PAC from data pointer
BLRAA X8, X9 ; authenticate X8 using IA key with X9 context, then branch
LDRAA X8, [X9] ; authenticate X9 using DA key, load result into X8
RETAB ; authenticate LR using IB key with SP context, then return