The x64 backend currently builds the `RealRegUniverse` in a way that is generating somewhat suboptimal code. In many blocks, we see uses of callee-save (non-volatile) registers (r12, r13, r14, rbx) first, even in very short leaf functions where there are plenty of volatiles to use. This is leading to unnecessary spills/reloads. On one (local) test program, a medium-sized C benchmark compiled to Wasm and run on Wasmtime, I am seeing a ~10% performance improvement with this change; it will be less pronounced in programs with high register pressure (there we are likely to use all registers regardless, so the prologue/epilogue will save/restore all callee-saves), or in programs with fewer calls, but this is a clear win for small functions and in many cases removes prologue/epilogue clobber-saves altogether. Separately, I think the RA's coalescing is tripping up a bit in some cases; see e.g. the filetest touched by this commit that loads a value into %rsi then moves to %rax and returns immediately. This is an orthogonal issue, though, and should be addressed (if worthwhile) in regalloc.rs.
24 lines
582 B
Plaintext
24 lines
582 B
Plaintext
test compile
|
|
target x86_64
|
|
feature "experimental_x64"
|
|
|
|
function %f(i32, i64 vmctx) -> i64 {
|
|
gv0 = vmctx
|
|
gv1 = load.i64 notrap aligned gv0+0
|
|
gv2 = load.i32 notrap aligned gv0+8
|
|
heap0 = dynamic gv1, bound gv2, offset_guard 0x1000, index_type i32
|
|
|
|
block0(v0: i32, v1: i64):
|
|
|
|
v2 = heap_addr.i64 heap0, v0, 0x8000
|
|
; check: movl 8(%rsi), %ecx
|
|
; nextln: movq %rdi, %rax
|
|
; nextln: addl $$32768, %eax
|
|
; nextln: jnb ; ud2 heap_oob ;
|
|
; nextln: cmpl %ecx, %eax
|
|
; nextln: jbe label1; j label2
|
|
; check: Block 1:
|
|
|
|
return v2
|
|
}
|