* Do conflict-set hash lookups once, not twice This makes the small wasmtime bz2 benchmark 1% faster, per Hyperfine and Sightglass. The effect disappears into the noise on larger benchmarks. * Inline PosWithPrio::key When compiling the pulldown-cmark benchmark from Sightglass, this is the single most frequently called function: it's invoked 2.5 million times. Inlining it reduces instructions retired by 1.5% on that benchmark, according to `valgrind --tool=callgrind`. This patch is "1.01 ± 0.01 times faster" according to Hyperfine for the bz2, pulldown-cmark, and spidermonkey benchmarks from Sightglass. Sightglass, in turn, agrees that all three benchmarks are 1.01x faster by instructions retired, and the first two are around 1.01x faster by CPU cycles as well. * Inline and simplify AdaptiveMap::expand Previously, `get_or_insert` would iterate over the keys to find one that matched; then, if none did, iterate over the values to check if any are 0; then iterate again to remove all zero values and compact the map. This commit instead focuses on picking an index to use: preferably one where the key already exists; but if it's not in the map, then an unused index; but if there aren't any, then an index where the value is zero. As a result this iterates the two arrays at most once each, and both iterations can stop early. The downside is that keys whose value is zero are not removed as aggressively. It might be worth pruning such keys in `IndexSet::set`. Also: - `#[inline]` both implementations of `Iterator::next` - Replace `set_bits` with using the `SetBitsIter` constructor directly These changes together reduce instructions retired when compiling the pulldown-cmark benchmark by 0.9%.
regalloc2: another register allocator
This is a register allocator that started life as, and is about 50% still, a port of IonMonkey's backtracking register allocator to Rust. In many regards, it has been generalized, optimized, and improved since the initial port, and now supports both SSA and non-SSA use-cases. (However, non-SSA should be considered deprecated; we want to move to SSA-only in the future, to enable some performance improvements. See #4.)
In addition, it contains substantial amounts of testing infrastructure (fuzzing harnesses and checkers) that does not exist in the original IonMonkey allocator.
See the design overview for (much!) more detail on how the allocator works.
License
This crate is licensed under the Apache 2.0 License with LLVM
Exception. This license text can be found in the file LICENSE.
Parts of the code are derived from regalloc.rs: in particular,
src/checker.rs and src/domtree.rs. This crate has the same license
as regalloc.rs, so the license on these files does not differ.