Commit Graph

288 Commits

Author SHA1 Message Date
T0b1
d31dbaaa16 calculate use positions 2023-04-16 02:03:50 +02:00
T0b1
9d1dbadd04 fix stackmaps 2023-04-16 01:17:49 +02:00
T0b1
c4a0d85b72 dont calc cfginfo 2023-04-15 03:11:32 +02:00
T0b1
dcb95541a7 only do clobber string computation if tracing 2023-04-14 19:36:21 +02:00
T0b1
2c8b9a680f change regs_allocated to PRegSet 2023-04-14 19:33:54 +02:00
T0b1
e2061d2e04 first impl 2023-04-14 18:18:15 +02:00
T0b1
993074a974 something 2023-04-13 03:38:59 +02:00
T0b1
706c44513e init 2023-04-12 03:49:50 +02:00
Trevor Elliott
f0e9cde328 Use drain instead of clear (#123) 2023-04-06 14:21:42 -07:00
Johan Milanov
9c6d6dc9aa Slightly more efficient vec initialization (#120)
This will, at least on x86_64, compile down to simpler, shorter assembly that uses a zeroed allocation instead of a regular allocation, a memset and various `raw_vec` methods.
2023-04-03 13:49:36 -07:00
Amanieu d'Antras
2bd03256b3 Make regalloc2 #![no_std] (#119)
* Make regalloc2 `#![no_std]`

This crate doesn't require any features from the standard library, so it
can be made `no_std` to allow it to be used in environments that can't
use the Rust standard library.

This PR mainly performs the following mechanical changes:
- `std::collections` is replaced with `alloc::collections`.
- `std::*` is replaced with `core::*`.
- `Vec`, `vec!`, `format!` and `ToString` are imported when needed since
  they are no longer in the prelude.
- `HashSet` and `HashMap` are taken from the `hashbrown` crate, which is
  the same implementation that the standard library uses.
- `FxHashSet` and `FxHashMap` are typedefs in `lib.rs` that are based on
  the `hashbrown` types.

The only functional change is that `RegAllocError` no longer implements
the `Error` trait since that is not available in `core`.

Dependencies were adjusted to not require `std` and this is tested in CI
by building against the `thumbv6m-none-eabi` target that doesn't have
`std`.

* Add the Error trait impl back under a "std" feature
2023-03-09 11:25:59 -08:00
Amanieu d'Antras
7354cfedde Remove support for program moves (#118) 2023-03-04 16:38:05 -08:00
Amanieu d'Antras
54f074e507 Re-introduce optional dedicated scratch registers (#117)
* Re-introduce optional dedicated scratch registers

Dedicated scratch registers used for resolving move cycles were removed
in #51 and replaced with an algorithm to automatically allocate a
scratch register as needed.

However in many cases, a client will already have a non-allocatable
scratch register available for things like extended jumps (see #91). It
makes sense to re-use this register for regalloc than potentially
spilling an existing register.

* Clarify comment
2023-03-04 14:49:10 -08:00
Trevor Elliott
34a9ae7379 Misc refactorings (#116)
* Use a while-let instead of checking is_empty and popping

* This conditional should always be true, as we expect the input is in ssa

* Use iter_mut instead of iterating the index

* We don't support multiple defs of the same vreg anymore

* Drain instead of clear
2023-02-28 10:42:13 -08:00
Chris Fallin
1e8da4f99b Bump version to 0.6.1. (#115) 2023-02-16 02:06:40 +00:00
Jamey Sharp
7bb83a3361 checker: Use a couple of Rust idioms (#114)
This should have no functional change, just makes the source slightly
easier to read and reason about.
2023-02-16 01:59:20 +00:00
Chris Fallin
c3e513c4cb Fix checker when empty blocks result in unchanged-from-Top entry state. (#113)
The checker works by keeping a worklist of blocks to process, and adds a
block to the worklist when its entry state changes. Every entry state is
initially `Top` (in a lattice). The entry block is explicitly added to
the worklist to kick off the processing.

In ordinary cases, the entry block has some instructions that change
state from `Top` to something else (lower in the lattice), and this is
propagated to its successors; its successors are added to the worklist;
and so on. No other state is `Top` from then on (because of
monotonicity) so every reachable block is processed.

However, if the entry block is completely empty except for the
terminating branch, the state remains `Top`; then the entry state of its
successors, even when updated, is still `Top`; and the state didn't
change so the blocks are not added to the worklist. (Nevermind that they
were not processed in the first place!) The bug is that the invariant
"has been processed already with current state" is not true initially,
when the current state is set to `Top` but nothing has been processed.

This PR makes a simple fix: it adds every block to the worklist
initially to be processed, in input order (which is usually RPO order in
practice) as a good first heuristic; then if after processing the input
state changes again, it can be reprocessed until fixpoint as always.

Fixes bytecodealliance/wasmtime#5791.
2023-02-15 17:42:51 -08:00
Trevor Elliott
50b9cf8fe2 Bump to version 0.6.0 (#112)
Bump the version to 0.6.0 for the next publish to crates.io.

This version removes pinned registers and mod operands, which requires the bump to 0.6.0.
2023-02-07 14:57:22 -08:00
Chris Fallin
376294e828 Fix indent/conditional removal in #108 that removed wrong brace. (#111)
In #108, we had removed a conditional that became constant-false with
the removal of pinned vregs. Unfortunately, in a pretty embarrassing
text-editing error, I removed the wrong end-brace and re-indented. This
had the effect of moving some of the move-resolution code around in the
regalloc backend, skipping moves in some cases and generating incorrect
results.

Caught as a fuzzbug by OSS-Fuzz in [0] (public after this merges and is
detected as fixed).

[0]: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=55391
2023-01-25 13:20:26 -08:00
Chris Fallin
0edb11d3a7 Remove mod operands. (#109)
* Remove Mod operands.

* Typo fix.
2023-01-24 17:41:46 -08:00
Chris Fallin
e09f6519a6 Remove pinned VRegs. (#108) 2023-01-24 17:31:41 -08:00
Trevor Elliott
e41c6140de Bump to 0.5.1 (#106) 2022-12-06 12:51:08 -08:00
Trevor Elliott
bc398bd19b SSA Validator Fixes (#105)
* Don't check for defs on a use of a fixed-nonallocatable operand

* Add traces to the ssa validator to aid debugging

* Format
2022-11-30 16:31:27 -08:00
Trevor Elliott
346b7f38a3 Bump to 0.5.0 (#104) 2022-11-29 10:03:24 -08:00
Trevor Elliott
7f0d0b98d0 Expose ssa verification as a regalloc2 option (#102)
Adds the validate_ssa flag to the RegallocOptions struct, enabling ssa validation of inputs before register allocation takes place.
2022-11-29 09:30:59 -08:00
Trevor Elliott
25b08c6cff Bump the checkout action to v3 to avoid node12 deprecation (#103) 2022-11-28 17:55:01 -08:00
Trevor Elliott
51561285d3 Add a From<&MachineEnv> impl for PRegSet (#101) 2022-11-09 11:31:56 -08:00
Nick Fitzgerald
b41b1f9a3c Use maximum inline capacity available for SmallVec<VRegIndex> in SpillSet (#100)
* Use maximum inline capacity available for `SmallVec<VRegIndex>` in `SpillSet`

We were using 2, which is the maximum for 32-bit architectures, but on 64-bit
architectures we can get 4 inline elements without growing the size of the
`SmallVec`.

This is a statistically significant speed up, but is so small that our
formatting of floats truncates it (so less than 1%).

```
compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 3360297.85 ± 40136.18 (confidence = 99%)

  more-inline-capacity.so is 1.00x to 1.00x faster than main.so!

  [945563401 945906690.73 946043245] main.so
  [942192473 942546392.88 942729104] more-inline-capacity.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 1780540.13 ± 39362.84 (confidence = 99%)

  more-inline-capacity.so is 1.00x to 1.00x faster than main.so!

  [1544990595 1545359408.41 1545626251] main.so
  [1543269057 1543578868.28 1543851201] more-inline-capacity.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 36577153.54 ± 243753.54 (confidence = 99%)

  more-inline-capacity.so is 1.00x to 1.00x faster than main.so!

  [33956158997 33957780594.50 33959538220] main.so
  [33919762415 33921203440.96 33923023358] more-inline-capacity.so
```

* Use a `const fn` to calculate number of inline elements
2022-11-02 12:16:22 -07:00
Nick Fitzgerald
eb0a8fd22f Bump to version 0.4.2 (#99)
* Bump to version 0.4.2

* Remove take-and-replace dance

There are no conflicting borrows of `self` anymore.
2022-11-01 10:30:30 -07:00
Nick Fitzgerald
fdcf7b694f Avoid indexing and use iteration (#98)
...while still appeasing the borrow checker by taking spillsets out of `self`
and then putting them back in again when we're done.

I was just doing this to enable playing with some data structures in follow up
commits, but decided to benchmark this commit as-is and found 2-4% speed ups to
Cranelift compilation!

```
compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 39946528.13 ± 38398.29 (confidence = 99%)

  no-index.so is 1.04x to 1.04x faster than main.so!

  [985704952 985984130.24 986180413] main.so
  [945649144 946037602.11 946262076] no-index.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 48413802.56 ± 34288.05 (confidence = 99%)

  no-index.so is 1.03x to 1.03x faster than main.so!

  [1593663899 1593926801.92 1594246604] main.so
  [1545196678 1545512999.36 1545802144] no-index.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 841028066.56 ± 253404.59 (confidence = 99%)

  no-index.so is 1.02x to 1.02x faster than main.so!

  [34798712681 34801346430.28 34802786661] main.so
  [33958847844 33960318363.72 33962177143] no-index.so
```
2022-11-01 10:00:09 -07:00
Chris Fallin
b4eedf3f32 Fix the moves fuzz target. (#97)
* Fix iteration bug in move fuzz-target driver.

* Remove panic no longer needed: stack-to-stack moves are now properly handled.
2022-10-21 09:08:27 -07:00
Alex Crichton
d6b15095c5 Add explicit dependency on libfuzzer-sys (#96)
This fixes a build issue with the most recent version of `libfuzzer-sys`
where the prior version of the `fuzz_target!` macro didn't need the
crate as an explicit dependency but the latest version does.
2022-10-21 08:33:50 -07:00
Jamey Sharp
eb259e8aba Some small perf improvements (#95)
* Do conflict-set hash lookups once, not twice

This makes the small wasmtime bz2 benchmark 1% faster, per Hyperfine and
Sightglass. The effect disappears into the noise on larger benchmarks.

* Inline PosWithPrio::key

When compiling the pulldown-cmark benchmark from Sightglass, this is the
single most frequently called function: it's invoked 2.5 million times.
Inlining it reduces instructions retired by 1.5% on that benchmark,
according to `valgrind --tool=callgrind`.

This patch is "1.01 ± 0.01 times faster" according to Hyperfine for the
bz2, pulldown-cmark, and spidermonkey benchmarks from Sightglass.
Sightglass, in turn, agrees that all three benchmarks are 1.01x faster
by instructions retired, and the first two are around 1.01x faster by
CPU cycles as well.

* Inline and simplify AdaptiveMap::expand

Previously, `get_or_insert` would iterate over the keys to find one that
matched; then, if none did, iterate over the values to check if any are
0; then iterate again to remove all zero values and compact the map.

This commit instead focuses on picking an index to use: preferably one
where the key already exists; but if it's not in the map, then an unused
index; but if there aren't any, then an index where the value is zero.

As a result this iterates the two arrays at most once each, and both
iterations can stop early.

The downside is that keys whose value is zero are not removed as
aggressively. It might be worth pruning such keys in `IndexSet::set`.

Also:

- `#[inline]` both implementations of `Iterator::next`
- Replace `set_bits` with using the `SetBitsIter` constructor directly

These changes together reduce instructions retired when compiling the
pulldown-cmark benchmark by 0.9%.
2022-10-11 08:23:02 -07:00
Chris Fallin
1efaa73943 Modify a SmallVec inline size for UseList to be slightly larger. (#93)
This PR updates the `UseList` type alias to a `SmallVec` with 4
`Use`s (which are 4 bytes each) rather than 2, because we get 16 bytes
of space "for free" in a `SmallVec` on a 64-bit machine.

This PR improves the compilation performance of Cranelift by 1% on
SpiderMonkey.wasm (measured on a Linux desktop with pinned CPU
frequency, and pinned to one core).

It's worth noting also that before making these changes, I explored
whether it would be possible to put the lists of uses and liveranges
in single large backing `Vec`s; the basic reason why we can't do this
is that during liverange construction, we append to many lists
concurrently. One could use a linked-list arrangement, and in fact RA2
did this early in its development; the separate `SmallVec`s were
better for performance overall because the cache locality wins when we
traverse the lists many times. It may still be worth investigating use
of an arena to allocate the vecs rather than the default heap allocator.
2022-10-07 13:37:27 -07:00
Chris Fallin
6394913c1d Add README note that non-SSA support is deprecated. (#94)
As noted in #4.
2022-10-05 10:43:15 -07:00
Amanieu d'Antras
227a9fde91 Cache HashSet in try_to_allocate_bundle_to_reg (#90)
Keep `conflict_set` allocated in `Env` instead of allocating a new one
on every call. This improves register allocation performance by about
2%.
2022-09-26 16:14:43 -07:00
Amanieu d'Antras
67f5c167a8 Fix documentation for inst_clobbers (#89)
Clobbers don't conflict with early uses, so they effectively act as a
late def, not an early def.
2022-09-26 16:14:08 -07:00
Chris Fallin
061963f469 Version 0.4.1. (#86) 2022-09-22 15:20:16 -07:00
Chris Fallin
bcfc10c44e Fix fallback-split behavior: trim start of minimal bundle wrt start of original LR. (#85)
When a liverange starts at a *late* point of an instruction, and it
undergoes the fallback "split into all minimal pieces" transform, we end
up creating one minimal bundle that starts at the *early* point of the
instruction at the start of the original LR. This can create
impossible-to-allocate situations where a fixed-constraint LR overlaps
another constrained to the same register (e.g. at calls). We fix this by
ensuring the minimal bundle is trimmed only to the half of the
instruction that overlaps the original LR.

This is analogous to the third fix in #74, but on the other end (start
of LR rather than end of it).
2022-09-22 15:09:49 -07:00
Chris Fallin
0fb9c03e09 Version 0.4.0. (#84)
This is identical to v0.3.3, but it turns out that the removal of the
register class from `SpillSlot` (#80) is an API change. I missed this
and published 0.3.3 but yanked it after it was causing build issues for
folks.
2022-09-20 17:27:38 -07:00
Chris Fallin
b98e8ad517 Version 0.3.3. (#83) 2022-09-20 16:08:01 -07:00
Chris Fallin
1b38a71e38 Some fixes to allow for call instructions to name args, returns, and clobbers with constraints. (#74)
* Some fixes to allow for call instructions to name args, returns, and clobbers with constraints.

- Allow early-pos uses with fixed regs that conflict with
  clobbers (which happen at late-pos), in addition to the
  existing logic for conflicts with late-pos defs with fixed
  regs.

  This is a pretty subtle issue that was uncovered in #53 for the def
  case, and the fix here is the mirror of that fix for clobbers. The
  root cause for all this complexity is that we can't split in the
  middle of an instruction (because there's no way to insert a move
  there!) so if a use is live-downward, we can't let it live in preg A
  at early-pos and preg B != A at late-pos; instead we need to rewrite
  the constraints and use a fixup move.

  The earlier change to fix #53 was actually a bit too conservative in
  that it always applied when such conflicts existed, even if the
  downward arg was not live. This PR fixes that (it's fine for the
  early-use and late-def to be fixed to the same reg if the use's
  liverange ends after early-pos) and adapts the same flexibility to
  the clobbers case as well.

- Reworks the fixups for the def case mentioned above to not shift the
  def to the Early point. Doing so causes issues when the def is to a
  reffy vreg: it can then be falsely included in a stackmap if the
  instruction containing this operand is a safepoint.

- Fixes the last-resort split-bundle-into-minimal-pieces logic from
  #59 to properly limit a minimal bundle piece to end after the
  early-pos, rather than cover the entire instruction. This was causing
  artificial overlaps between args that end after early-pos and defs
  that start at late-pos when one of the vregs hit the fallback split
  behavior.

* Fix fuzzbug: do not merge when a liverange has a fixed-reg def.

This can create impossible situations: e.g., if a vreg is constrained
to p0 as a late-def, and another, completely different vreg is
constrained to p0 as an early-use on the same instruction, and the
instruction also has a third vreg (early-use), we do not want to merge
the liverange for the third vreg with the first, because it would
result in an unsolveable conflict for p0 at the early-point.

* Review comments.
2022-09-20 15:58:20 -07:00
Amanieu d'Antras
906a053208 Remove register class from SpillSlot (#80)
* Remove register class from `SpillSlot`

The register allocator was already allowing moves between spillslots and
registers of different classes, so this PR formalizes this by making
spillslots independent of register class.

This also fixes #79 by properly tracking the register class of an
`InsertedMove` with the `to_vreg` field which turns out to never
be `None` in practice. Removing the `Option` allows the register
class of the `VReg` to be used when building the per-class move lists.

Fixes #79

* Address review feedback
2022-09-20 14:05:23 -07:00
Jeffrey Crocker
3db1b7199b Fix design doc typo (#82) 2022-09-20 14:04:56 -07:00
Amanieu d'Antras
520cafa129 Handle fixed stack slots in the move resolver (#78)
Fixed stack slots are treated as `PReg`s by most of the register
allocator, but need some additional handling the move resolver to avoid
generating stack-to-stack moves.
2022-09-19 12:27:24 -07:00
Amanieu d'Antras
1495c1e342 Add fixed-non-allocatable operand support (#77)
This allows a non-allocatable `PReg` to be passed on directly to the
allocations vector without any liverange tracking from the register
allocator. The main intended use case is to support ISA-specific special
registers such as a fixed zero register.
2022-09-19 12:23:53 -07:00
Amanieu d'Antras
aeef47a06b Make more PReg & VReg methods const (#76)
Also remove the previous workaround for const-assert now that it is
available on stable Rust.
2022-09-12 10:03:29 -07:00
Chris Fallin
be47ac39e8 Fuzzing function generator: bound the debug-labels size. (#73)
Currently there is a loop that takes a variable step toward an end point
with an integer from `Arbitrary`; if this integer is always zero (for
example due to end-of-input?) then we add debug labels to a particular
input SSA value forever. This eventually causes an OOM crash.  This PR
bounds the loop at a reasonable count (10) instead.
2022-08-31 10:10:38 -07:00
Teymour Aldridge
ad39c66fe7 Fix typo. (#71) 2022-08-04 20:24:36 -07:00
Jamey Sharp
81c11d37d6 Version 0.3.2 (#70)
Includes a modest improvement in memory usage and performance by
removing analysis that was only used during fuzzing (#66).
2022-08-03 15:25:45 -07:00