x64 regalloc register order: put caller-saves (volatiles) first.

The x64 backend currently builds the `RealRegUniverse` in a way that
is generating somewhat suboptimal code. In many blocks, we see uses of
callee-save (non-volatile) registers (r12, r13, r14, rbx) first, even in
very short leaf functions where there are plenty of volatiles to use.
This is leading to unnecessary spills/reloads.

On one (local) test program, a medium-sized C benchmark compiled to Wasm
and run on Wasmtime, I am seeing a ~10% performance improvement with
this change; it will be less pronounced in programs with high register
pressure (there we are likely to use all registers regardless, so the
prologue/epilogue will save/restore all callee-saves), or in programs
with fewer calls, but this is a clear win for small functions and in
many cases removes prologue/epilogue clobber-saves altogether.

Separately, I think the RA's coalescing is tripping up a bit in some
cases; see e.g. the filetest touched by this commit that loads a value
into %rsi then moves to %rax and returns immediately. This is an
orthogonal issue, though, and should be addressed (if worthwhile) in
regalloc.rs.
This commit is contained in:
Chris Fallin
2020-12-06 22:26:42 -08:00
parent fc752efa89
commit 1dddba649a
7 changed files with 74 additions and 68 deletions

View File

@@ -1,14 +1,20 @@
//! Registers, the Universe thereof, and printing.
//!
//! These are ordered by sequence number, as required in the Universe. The strange ordering is
//! intended to make callee-save registers available before caller-saved ones. This is a net win
//! provided that each function makes at least one onward call. It'll be a net loss for leaf
//! functions, and we should change the ordering in that case, so as to make caller-save regs
//! available first.
//! These are ordered by sequence number, as required in the Universe.
//!
//! TODO Maybe have two different universes, one for leaf functions and one for non-leaf functions?
//! Also, they will have to be ABI dependent. Need to find a way to avoid constructing a universe
//! for each function we compile.
//! The caller-saved registers are placed first in order to prefer not to clobber (requiring
//! saves/restores in prologue/epilogue code) when possible. Note that there is no other heuristic
//! in the backend that will apply such pressure; the register allocator's cost heuristics are not
//! aware of the cost of clobber-save/restore code.
//!
//! One might worry that this pessimizes code with many callsites, where using caller-saves causes
//! us to have to save them (as we are the caller) frequently. However, the register allocator
//! *should be* aware of *this* cost, because it sees that the call instruction modifies all of the
//! caller-saved (i.e., callee-clobbered) registers.
//!
//! Hence, this ordering encodes pressure in one direction (prefer not to clobber registers that we
//! ourselves have to save) and this is balanaced against the RA's pressure in the other direction
//! at callsites.
use crate::settings;
use alloc::vec::Vec;
@@ -31,44 +37,44 @@ fn gpr(enc: u8, index: u8) -> Reg {
Reg::new_real(RegClass::I64, enc, index)
}
pub(crate) fn r12() -> Reg {
gpr(ENC_R12, 16)
}
pub(crate) fn r13() -> Reg {
gpr(ENC_R13, 17)
}
pub(crate) fn r14() -> Reg {
gpr(ENC_R14, 18)
}
pub(crate) fn rbx() -> Reg {
gpr(ENC_RBX, 19)
}
pub(crate) fn rsi() -> Reg {
gpr(6, 20)
gpr(6, 16)
}
pub(crate) fn rdi() -> Reg {
gpr(7, 21)
gpr(7, 17)
}
pub(crate) fn rax() -> Reg {
gpr(0, 22)
gpr(0, 18)
}
pub(crate) fn rcx() -> Reg {
gpr(1, 23)
gpr(1, 19)
}
pub(crate) fn rdx() -> Reg {
gpr(2, 24)
gpr(2, 20)
}
pub(crate) fn r8() -> Reg {
gpr(8, 25)
gpr(8, 21)
}
pub(crate) fn r9() -> Reg {
gpr(9, 26)
gpr(9, 22)
}
pub(crate) fn r10() -> Reg {
gpr(10, 27)
gpr(10, 23)
}
pub(crate) fn r11() -> Reg {
gpr(11, 28)
gpr(11, 24)
}
pub(crate) fn r12() -> Reg {
gpr(ENC_R12, 25)
}
pub(crate) fn r13() -> Reg {
gpr(ENC_R13, 26)
}
pub(crate) fn r14() -> Reg {
gpr(ENC_R14, 27)
}
pub(crate) fn rbx() -> Reg {
gpr(ENC_RBX, 28)
}
pub(crate) fn r15() -> Reg {
@@ -176,13 +182,6 @@ pub(crate) fn create_reg_universe_systemv(flags: &settings::Flags) -> RealRegUni
// Integer regs.
let first_gpr = regs.len();
// Callee-saved, in the SystemV x86_64 ABI.
regs.push((r12().to_real_reg(), "%r12".into()));
regs.push((r13().to_real_reg(), "%r13".into()));
regs.push((r14().to_real_reg(), "%r14".into()));
regs.push((rbx().to_real_reg(), "%rbx".into()));
// Caller-saved, in the SystemV x86_64 ABI.
regs.push((rsi().to_real_reg(), "%rsi".into()));
regs.push((rdi().to_real_reg(), "%rdi".into()));
@@ -194,6 +193,13 @@ pub(crate) fn create_reg_universe_systemv(flags: &settings::Flags) -> RealRegUni
regs.push((r10().to_real_reg(), "%r10".into()));
regs.push((r11().to_real_reg(), "%r11".into()));
// Callee-saved, in the SystemV x86_64 ABI.
regs.push((r12().to_real_reg(), "%r12".into()));
regs.push((r13().to_real_reg(), "%r13".into()));
regs.push((r14().to_real_reg(), "%r14".into()));
regs.push((rbx().to_real_reg(), "%rbx".into()));
// Other regs, not available to the allocator.
debug_assert_eq!(r15(), pinned_reg());
let allocable = if use_pinned_reg {