Resolve aliases before checking for unique values (#4966)

At control-flow join points, cranelift-frontend's SSA builder currently
checks to see if only one definition of a variable reaches the current
block. If so, it can eliminate the corresponding block parameter and use
the original def directly. It implements this by turning the block
parameter into an alias for the original value.

However, it didn't resolve aliases during this check, except after it
had already determined that there was only one definition.

Resolving aliases first instead allows it to detect that more block
parameters are redundant. And as more block parameters get converted to
aliases, later blocks can see common definitions from further away, so
this has a compounding effect.

This also merges a special case, where there's exactly one unique
non-sentinel definition but it's actually an alias for the sentinel,
into the general case where all definitions are from the sentinel. As a
result there's only one case that has to introduce a definition of the
variable to zero.

According to `valgrind --tool=dhat`, this is a significant memory
savings. On the pulldown-cmark benchmark from Sightglass:

- 15.3% (1.9MiB) less memory allocated at maximum heap
- 4.1% (6.7MiB) less memory allocated in total
- 9.8% (57MiB) fewer bytes read
- 12.6% (36MiB) fewer bytes written
- 5.4% fewer instructions retired
- 1.04x faster by instructions retired (per Sightglass/perf)
- 1.03x to 1.04x faster by CPU cycles (per Sightglass/perf)
- 1.03 ± 0.01 times faster by CPU time (per hyperfine)
- 1.04x faster by cache accesses (per Sightglass/perf)

On the bz2 benchmark:

- 1.06x faster by instructions retired (per Sightglass/perf)
- 1.05x faster by CPU cycles (per Sightglass/perf)
- 1.04 ± 0.01 times faster by CPU time (per hyperfine)
- 1.02x to 1.03x faster by cache accesses (per Sightglass/perf)

Even on the largest benchmark in Sightglass (spidermonkey.wasm), this is
a measurable improvement:

- 1.03x faster by instructions retired (per Sightglass/perf)
- 1.02x faster by CPU cycles (per Sightglass/perf)
- 1.02 ± 0.00 times faster by CPU time (per hyperfine)

There was no significant difference in cache misses for any benchmark,
according to Sightglass/perf.
This commit is contained in:
Jamey Sharp
2022-09-27 13:59:37 -07:00
committed by GitHub
parent 29c7de7340
commit 9715d91c50

View File

@@ -600,9 +600,20 @@ impl SSABuilder {
) { ) {
let mut pred_values: ZeroOneOrMore<Value> = ZeroOneOrMore::Zero; let mut pred_values: ZeroOneOrMore<Value> = ZeroOneOrMore::Zero;
// Determine how many predecessors are yielding unique, non-temporary Values. // Determine how many predecessors are yielding unique, non-temporary Values. If a variable
// is live and unmodified across several control-flow join points, earlier blocks will
// introduce aliases for that variable's definition, so we resolve aliases eagerly here to
// ensure that we can tell when the same definition has reached this block via multiple
// paths. Doing so also detects cyclic references to the sentinel, which can occur in
// unreachable code.
let num_predecessors = self.predecessors(dest_block).len(); let num_predecessors = self.predecessors(dest_block).len();
for &pred_val in self.results.iter().rev().take(num_predecessors) { for pred_val in self
.results
.iter()
.rev()
.take(num_predecessors)
.map(|&val| func.dfg.resolve_aliases(val))
{
match pred_values { match pred_values {
ZeroOneOrMore::Zero => { ZeroOneOrMore::Zero => {
if pred_val != sentinel { if pred_val != sentinel {
@@ -648,19 +659,9 @@ impl SSABuilder {
// so we don't need to have it as a block argument. // so we don't need to have it as a block argument.
// We need to replace all the occurrences of val with pred_val but since // We need to replace all the occurrences of val with pred_val but since
// we can't afford a re-writing pass right now we just declare an alias. // we can't afford a re-writing pass right now we just declare an alias.
// Resolve aliases eagerly so that we can check for cyclic aliasing,
// which can occur in unreachable code.
let mut resolved = func.dfg.resolve_aliases(pred_val);
if sentinel == resolved {
// Cycle detected. Break it by creating a zero value.
resolved = emit_zero(
func.dfg.value_type(sentinel),
FuncCursor::new(func).at_first_insertion_point(dest_block),
);
}
func.dfg.remove_block_param(sentinel); func.dfg.remove_block_param(sentinel);
func.dfg.change_to_alias(sentinel, resolved); func.dfg.change_to_alias(sentinel, pred_val);
resolved pred_val
} }
ZeroOneOrMore::More => { ZeroOneOrMore::More => {
// There is disagreement in the predecessors on which value to use so we have // There is disagreement in the predecessors on which value to use so we have