Files
wasmtime/cranelift/filetests/filetests/isa/x64/amode-opt.clif
Chris Fallin 1dddba649a x64 regalloc register order: put caller-saves (volatiles) first.
The x64 backend currently builds the `RealRegUniverse` in a way that
is generating somewhat suboptimal code. In many blocks, we see uses of
callee-save (non-volatile) registers (r12, r13, r14, rbx) first, even in
very short leaf functions where there are plenty of volatiles to use.
This is leading to unnecessary spills/reloads.

On one (local) test program, a medium-sized C benchmark compiled to Wasm
and run on Wasmtime, I am seeing a ~10% performance improvement with
this change; it will be less pronounced in programs with high register
pressure (there we are likely to use all registers regardless, so the
prologue/epilogue will save/restore all callee-saves), or in programs
with fewer calls, but this is a clear win for small functions and in
many cases removes prologue/epilogue clobber-saves altogether.

Separately, I think the RA's coalescing is tripping up a bit in some
cases; see e.g. the filetest touched by this commit that loads a value
into %rsi then moves to %rax and returns immediately. This is an
orthogonal issue, though, and should be addressed (if worthwhile) in
regalloc.rs.
2020-12-06 22:37:43 -08:00

42 lines
913 B
Plaintext

test compile
target x86_64
feature "experimental_x64"
function %amode_add(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = iadd v0, v1
v3 = load.i64 v2
return v3
; check: movq 0(%rdi,%rsi,1), %rsi
}
function %amode_add_imm(i64) -> i64 {
block0(v0: i64):
v1 = iconst.i64 42
v2 = iadd v0, v1
v3 = load.i64 v2
return v3
; check: movq 42(%rdi), %rsi
}
;; Same as above, but add operands have been reversed.
function %amode_add_imm_order(i64) -> i64 {
block0(v0: i64):
v1 = iconst.i64 42
v2 = iadd v1, v0
v3 = load.i64 v2
return v3
; check: movq 42(%rdi), %rsi
}
;; Make sure that uextend(cst) are ignored when the cst will naturally sign-extend.
function %amode_add_uext_imm(i64) -> i64 {
block0(v0: i64):
v1 = iconst.i32 42
v2 = uextend.i64 v1
v3 = iadd v2, v0
v4 = load.i64 v3
return v4
; check: movq 42(%rdi), %rsi
}