* Make regalloc2 `#![no_std]`
This crate doesn't require any features from the standard library, so it
can be made `no_std` to allow it to be used in environments that can't
use the Rust standard library.
This PR mainly performs the following mechanical changes:
- `std::collections` is replaced with `alloc::collections`.
- `std::*` is replaced with `core::*`.
- `Vec`, `vec!`, `format!` and `ToString` are imported when needed since
they are no longer in the prelude.
- `HashSet` and `HashMap` are taken from the `hashbrown` crate, which is
the same implementation that the standard library uses.
- `FxHashSet` and `FxHashMap` are typedefs in `lib.rs` that are based on
the `hashbrown` types.
The only functional change is that `RegAllocError` no longer implements
the `Error` trait since that is not available in `core`.
Dependencies were adjusted to not require `std` and this is tested in CI
by building against the `thumbv6m-none-eabi` target that doesn't have
`std`.
* Add the Error trait impl back under a "std" feature
* Do conflict-set hash lookups once, not twice
This makes the small wasmtime bz2 benchmark 1% faster, per Hyperfine and
Sightglass. The effect disappears into the noise on larger benchmarks.
* Inline PosWithPrio::key
When compiling the pulldown-cmark benchmark from Sightglass, this is the
single most frequently called function: it's invoked 2.5 million times.
Inlining it reduces instructions retired by 1.5% on that benchmark,
according to `valgrind --tool=callgrind`.
This patch is "1.01 ± 0.01 times faster" according to Hyperfine for the
bz2, pulldown-cmark, and spidermonkey benchmarks from Sightglass.
Sightglass, in turn, agrees that all three benchmarks are 1.01x faster
by instructions retired, and the first two are around 1.01x faster by
CPU cycles as well.
* Inline and simplify AdaptiveMap::expand
Previously, `get_or_insert` would iterate over the keys to find one that
matched; then, if none did, iterate over the values to check if any are
0; then iterate again to remove all zero values and compact the map.
This commit instead focuses on picking an index to use: preferably one
where the key already exists; but if it's not in the map, then an unused
index; but if there aren't any, then an index where the value is zero.
As a result this iterates the two arrays at most once each, and both
iterations can stop early.
The downside is that keys whose value is zero are not removed as
aggressively. It might be worth pruning such keys in `IndexSet::set`.
Also:
- `#[inline]` both implementations of `Iterator::next`
- Replace `set_bits` with using the `SetBitsIter` constructor directly
These changes together reduce instructions retired when compiling the
pulldown-cmark benchmark by 0.9%.