Files
wasmtime/cranelift/frontend
Jamey Sharp 9715d91c50 Resolve aliases before checking for unique values (#4966)
At control-flow join points, cranelift-frontend's SSA builder currently
checks to see if only one definition of a variable reaches the current
block. If so, it can eliminate the corresponding block parameter and use
the original def directly. It implements this by turning the block
parameter into an alias for the original value.

However, it didn't resolve aliases during this check, except after it
had already determined that there was only one definition.

Resolving aliases first instead allows it to detect that more block
parameters are redundant. And as more block parameters get converted to
aliases, later blocks can see common definitions from further away, so
this has a compounding effect.

This also merges a special case, where there's exactly one unique
non-sentinel definition but it's actually an alias for the sentinel,
into the general case where all definitions are from the sentinel. As a
result there's only one case that has to introduce a definition of the
variable to zero.

According to `valgrind --tool=dhat`, this is a significant memory
savings. On the pulldown-cmark benchmark from Sightglass:

- 15.3% (1.9MiB) less memory allocated at maximum heap
- 4.1% (6.7MiB) less memory allocated in total
- 9.8% (57MiB) fewer bytes read
- 12.6% (36MiB) fewer bytes written
- 5.4% fewer instructions retired
- 1.04x faster by instructions retired (per Sightglass/perf)
- 1.03x to 1.04x faster by CPU cycles (per Sightglass/perf)
- 1.03 ± 0.01 times faster by CPU time (per hyperfine)
- 1.04x faster by cache accesses (per Sightglass/perf)

On the bz2 benchmark:

- 1.06x faster by instructions retired (per Sightglass/perf)
- 1.05x faster by CPU cycles (per Sightglass/perf)
- 1.04 ± 0.01 times faster by CPU time (per hyperfine)
- 1.02x to 1.03x faster by cache accesses (per Sightglass/perf)

Even on the largest benchmark in Sightglass (spidermonkey.wasm), this is
a measurable improvement:

- 1.03x faster by instructions retired (per Sightglass/perf)
- 1.02x faster by CPU cycles (per Sightglass/perf)
- 1.02 ± 0.00 times faster by CPU time (per hyperfine)

There was no significant difference in cache misses for any benchmark,
according to Sightglass/perf.
2022-09-27 13:59:37 -07:00
..

This crate provides a straightforward way to create a Cranelift IR function and fill it with instructions translated from another language. It contains an SSA construction module that provides convenient methods for translating non-SSA variables into SSA Cranelift IR values via use_var and def_var calls.