Allow hoisting vconst instructions out of loops (#5909)

* Allow hoisting `vconst` instructions out of loops

Staring at some SIMD code and what LLVM and v8 both generate it appears
that a common technique for SIMD-loops is to hoist constants outside of
loops since they're nontrivial to rematerialize unlike integer
constants. This commit updates the `loop_hoist_level` calculation with
egraphs to have a nonzero default for instructions that have no
arguments (e.g. consts) which enables hoisting these instructions out of
loops.

Note, though, that for now I've listed the maximum as hoisting outside
of one loop, but not all of them. While theoretically vconsts could move
up to the top of the function I'd be worried about their impact on
register pressure and having to save/restore around calls or similar, so
hopefully if the hot part of a program is a single loop then hoisting
out of one loop is a reasonable-enough heuristic for now.

Locally on x64 with a benchmark that just encodes binary to hex this saw
a 15% performance improvement taking hex encoding from ~6G/s to ~6.7G/s.

* Test vconst is only hoisted one loop out
This commit is contained in:
Alex Crichton
2023-03-06 09:29:43 -06:00
committed by GitHub
parent 354bc48015
commit 18ee645ebe
2 changed files with 81 additions and 1 deletions

View File

@@ -401,6 +401,17 @@ impl<'a> Elaborator<'a> {
let arg_values = &self.elab_result_stack[arg_idx..];
// Compute max loop depth.
//
// Note that if there are no arguments then this instruction
// is allowed to get hoisted up one loop. This is not
// usually used since no-argument values are things like
// constants which are typically rematerialized, but for the
// `vconst` instruction 128-bit constants aren't as easily
// rematerialized. They're hoisted out of inner loops but
// not to the function entry which may run the risk of
// placing too much register pressure on the entire
// function. This is modeled with the `.saturating_sub(1)`
// as the default if there's otherwise no maximum.
let loop_hoist_level = arg_values
.iter()
.map(|&value| {
@@ -423,7 +434,7 @@ impl<'a> Elaborator<'a> {
hoist_level
})
.max()
.unwrap_or(self.loop_stack.len());
.unwrap_or(self.loop_stack.len().saturating_sub(1));
trace!(
" -> loop hoist level: {:?}; cur loop depth: {:?}, loop_stack: {:?}",
loop_hoist_level,