Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182)
RA2 recently removed the need for a dedicated scratch register for cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive performance impact on function bodies that were register-constrained, as it means that one more register is available. In Sightglass, I measured +5-8% on `blake3-scalar`, at least among current benchmarks.
This commit is contained in:
@@ -209,24 +209,9 @@ pub fn create_reg_env(flags: &settings::Flags) -> MachineEnv {
|
||||
preg(vreg(28)),
|
||||
preg(vreg(29)),
|
||||
preg(vreg(30)),
|
||||
// v31 is the scratch reg, to allow for parallel moves.
|
||||
preg(vreg(31)),
|
||||
],
|
||||
],
|
||||
scratch_by_class: [
|
||||
// We use tmp2 (x17) as the regalloc scratch register,
|
||||
// used to resolve cyclic parallel moves. This is valid
|
||||
// because tmp2 is never live between regalloc-visible
|
||||
// instructions, only within them (i.e. in expansion into
|
||||
// multiple machine instructions when that
|
||||
// occurs). spilltmp is used for moves to/from spillslots,
|
||||
// but tmp2 never is, so it is available for this
|
||||
// purpose. (Its only other use is in prologue stack
|
||||
// checks, and the prologue is prepended after regalloc
|
||||
// runs.)
|
||||
preg(tmp2_reg()),
|
||||
// We use v31 for Float/Vec-class parallel moves.
|
||||
preg(vreg(31)),
|
||||
],
|
||||
fixed_stack_slots: vec![],
|
||||
};
|
||||
|
||||
|
||||
Reference in New Issue
Block a user