regalloc2

Author	SHA1	Message	Date
Amanieu d'Antras	2bd03256b3	Make regalloc2 `#![no_std]` (#119 ) * Make regalloc2 `#![no_std]` This crate doesn't require any features from the standard library, so it can be made `no_std` to allow it to be used in environments that can't use the Rust standard library. This PR mainly performs the following mechanical changes: - `std::collections` is replaced with `alloc::collections`. - `std::` is replaced with `core::`. - `Vec`, `vec!`, `format!` and `ToString` are imported when needed since they are no longer in the prelude. - `HashSet` and `HashMap` are taken from the `hashbrown` crate, which is the same implementation that the standard library uses. - `FxHashSet` and `FxHashMap` are typedefs in `lib.rs` that are based on the `hashbrown` types. The only functional change is that `RegAllocError` no longer implements the `Error` trait since that is not available in `core`. Dependencies were adjusted to not require `std` and this is tested in CI by building against the `thumbv6m-none-eabi` target that doesn't have `std`. * Add the Error trait impl back under a "std" feature	2023-03-09 11:25:59 -08:00
Amanieu d'Antras	7354cfedde	Remove support for program moves (#118 )	2023-03-04 16:38:05 -08:00
Chris Fallin	e09f6519a6	Remove pinned VRegs. (#108 )	2023-01-24 17:31:41 -08:00
Chris Fallin	1b38a71e38	Some fixes to allow for call instructions to name args, returns, and clobbers with constraints. (#74 ) * Some fixes to allow for call instructions to name args, returns, and clobbers with constraints. - Allow early-pos uses with fixed regs that conflict with clobbers (which happen at late-pos), in addition to the existing logic for conflicts with late-pos defs with fixed regs. This is a pretty subtle issue that was uncovered in #53 for the def case, and the fix here is the mirror of that fix for clobbers. The root cause for all this complexity is that we can't split in the middle of an instruction (because there's no way to insert a move there!) so if a use is live-downward, we can't let it live in preg A at early-pos and preg B != A at late-pos; instead we need to rewrite the constraints and use a fixup move. The earlier change to fix #53 was actually a bit too conservative in that it always applied when such conflicts existed, even if the downward arg was not live. This PR fixes that (it's fine for the early-use and late-def to be fixed to the same reg if the use's liverange ends after early-pos) and adapts the same flexibility to the clobbers case as well. - Reworks the fixups for the def case mentioned above to not shift the def to the Early point. Doing so causes issues when the def is to a reffy vreg: it can then be falsely included in a stackmap if the instruction containing this operand is a safepoint. - Fixes the last-resort split-bundle-into-minimal-pieces logic from #59 to properly limit a minimal bundle piece to end after the early-pos, rather than cover the entire instruction. This was causing artificial overlaps between args that end after early-pos and defs that start at late-pos when one of the vregs hit the fallback split behavior. * Fix fuzzbug: do not merge when a liverange has a fixed-reg def. This can create impossible situations: e.g., if a vreg is constrained to p0 as a late-def, and another, completely different vreg is constrained to p0 as an early-use on the same instruction, and the instruction also has a third vreg (early-use), we do not want to merge the liverange for the third vreg with the first, because it would result in an unsolveable conflict for p0 at the early-point. * Review comments.	2022-09-20 15:58:20 -07:00
Chris Fallin	4eb2a2528b	Limit split count per original bundle with fallback 1-to-N split. (#59 ) * Limit split count per original bundle with fallback 1-to-N split. Right now, splitting a bundle produces two halves. Furthermore, it has cost linear in the length of the bundle, because the resulting half-bundles have their requirements recomputed with a new scan, and because we copy half the use-list over to the tail end sub-bundle. This works fine when a bundle has a handful of splits overall, but not when an input has a systematic pattern of conflicts that will require O(\|bundle\|) splits (e.g., every Use is constrained to a different fixed register than the last one). In such a case, we get quadratic behavior. This PR adds a per-spillset (so, per-original-bundle) counter for splits, and when it reaches a preset threshold (10 for now), we instead split directly into minimal bundles along the whole length of the bundle, putting the regions without uses in the spill bundle. This basically approximates what a non-splitting allocator would do: it "spills" the whole bundle to possibly a stackslot, or a second-chance register allocation at best, via the spill bundle; and then does minimal reservations of registers just at uses/defs and moves the "spilled" value into/out of them immediately. Together with another small optimization, this PR results in a 4x compilation speedup and 24x memory use reduction on one particularly bad case with alternating conflicting requirements on a vreg (see bytecodealliance/wasmtime#4291 for details). * Review comments.	2022-06-27 13:23:09 -07:00
Chris Fallin	ad41f8a7a5	Record vreg classes explicitly during liverange pass. (#35 ) This resolves an issue seen when the source program uses multiple regclasses (Int and Float): in some cases, the logic that grabs the vregs and retains them (with class) in `vreg_regs` missed a register and we had a class mismatch. This occurred because data structures were initialized assuming `Int` regclass at first. This PR instead removes the `vreg_regs` array, stores the class explicitly as an `Option<RegClass>` in the `VRegData`, and provides a `Env::vreg()` method that reconstitutes a `VReg` given its index and its observed class. We "observe" the class of every vreg seen during the liveness pass (and we assert that every occurrence of the vreg index has the same class). In this way, we still have a single source-of-truth for the vreg class (the mention of the vreg itself) and we explicitly represent the "not observed yet" state (and panic on attempting to use such a vreg) rather than implicitly taking the wrong class.	2022-03-29 14:00:14 -07:00
Chris Fallin	fe021ad6d4	Simplify pinned-vreg API: don't require slice of all pinned vregs. (#28 ) Simplify pinned-vreg API: don't require slice of all pinned vregs. Previously, we kept a bool flag `is_pinned` in the `VRegData`, and we required a `&[VReg]` of all pinned vregs to be provided by `Function::pinned_vregs()`. This was (I think) done for convenience, but it turns out not to really be necessary, as we can just query `is_pinned_vreg` where needed (and in the likely implementation, e.g. in Cranelift, this will be a `< NUM_PINNED_VREGS` check that can be inlined). This adds convenience for the embedder (the main benefit), and also reduces complexity, removes some state, and avoids some work initializing the regalloc state for a run.	2022-03-04 15:12:16 -08:00
Amanieu d'Antras	ee4de54240	Guard trace! behind cfg!(debug_assertions) Even if the trace log level is disabled, the presence of the trace! macro still has a significant impact on performance because it is present in the inner loops of the allocator. Removing the trace! calls at compile-time reduces instruction count by ~7%.	2022-01-11 13:30:13 +00:00
Amanieu d'Antras	2d9d5dd82b	Rearrange some struct fields to work better with u64_key/u128_key This allows the compiler to load the whole key with 1 or 2 64-bit accesses, assuming little-endian ordering. Improves instruction count by ~1%.	2022-01-11 13:24:51 +00:00
Amanieu d'Antras	74928b83fa	Replace all assert! with debug_assert! This results in a ~6% reduction in instruction count.	2022-01-11 03:54:08 +00:00
Amanieu d'Antras	51493ab03a	Apply review feedback	2021-12-12 00:33:30 +00:00
Amanieu d'Antras	4f8e115115	Refactor requirement computation	2021-12-11 22:31:58 +00:00
Amanieu d'Antras	77e6a9e0d7	Add support for fixed stack slots This works by allowing a PReg to be marked as being a stack location instead of a physical register.	2021-12-11 22:31:58 +00:00
Chris Fallin	cf0d515709	Relicense fully to Apache-2.0 WITH LLVM-exception. Large parts of the code in regalloc2 are currently licensed under the Mozilla Public License (MPL) 2.0, because they derive in meaningful ways from the register allocator in IonMonkey, which is part of Firefox. The relevant source files are marked as such, with references to the files in the Firefox source tree. The intent of the regalloc2 project was to port the register allocator from Firefox to use in Cranelift, borrowing good technology and improving on it in the spirit of open source. However, Several use-cases of Cranelift require, or at least strongly prefer, the Apache-2.0 license with the LLVM exception (matching the license of Cranelift itself, and Bytecode Alliance projects generally). While using this license is not strictly necessary for regalloc2 to be usable (The MPL is an excellent open-source license!), relicensing fully under this license to harmonize with the rest of Cranelift and Bytecode Alliance codebases significantly widens possibilities and reduces friction; then regalloc2 is "just another part of Cranelift" and doesn't have to be treated specially. The source in `src/ion/` specifically began as a fairly direct port of the algorithms in the following files in the `mozilla-central` repository (Firefox codebase): * The bulk of the "backtracking allocator" algorithm: * `js/src/jit/BacktrackingAllocator.{cpp,h}` * Helpers and definitions in the surrounding infrastructure: * `js/src/jit/RegisterAllocator.h` * `js/src/jit/RegisterAllocator.cpp` * `js/src/jit/StackSlotAllocator.h` * `js/src/jit/LIR.h` * A few data structure implementations: * `js/src/ds/SplayTree.h` * `js/src/ds/PriorityQueue.h` Subsequent work in improving regalloc2 has caused it to drift from the direct port -- for example, it no longer uses splay trees or the direct port of the priority queue above -- but it is of course very clearly still a derivative work. Analysis of the contributors to these files indicates that we need signoff from the following folks: * Mozilla Corp, for contributions made by Mozilla employees (the majority of the code). Communications with Mozilla (thanks @tschneidereit and @bholley for doing the work here!) indicate that @ekr is able to sign off when ready here. * Andy Wingo, specifically for the work done in [Bug 1620197](https://bugzilla.mozilla.org/show_bug.cgi?id=1620197) and [Bug 1609057](https://bugzilla.mozilla.org/show_bug.cgi?id=1609057) to generalize the stack allocator for a Wasm feature (multiple returns). Additionally, since the initial port, we have had three contributions from @Amanieu: [#9](https://github.com/bytecodealliance/regalloc2/pull/9), [#11](https://github.com/bytecodealliance/regalloc2/pull/11), [#13](https://github.com/bytecodealliance/regalloc2/pull/13). So, if everyone applicable is happy with this relicensing, this PR removes the MPL-2.0 license in `src/ion/` and marks all files as covered under `Apache-2.0 WITH LLVM-exception`. Please let us know if this is OK! Signoffs: - [ ] @ekr, for Mozilla's contributions - [ ] @wingo, for contributions to original code in `mozilla-central` - [ ] @Amanieu, for the three PRs linked above Thanks!	2021-11-10 10:54:28 -08:00
Chris Fallin	6f0893d69d	Address review comments.	2021-08-31 17:56:06 -07:00
Chris Fallin	6d313f2b56	Address review comments: more doc comments and some minor refactorings.	2021-08-30 17:15:37 -07:00
Chris Fallin	69ad31f013	Replace remaining instances of use of `debug` feature with `debug_assertions`. Also fix some code that did not build in debug mode anymore (d'oh!) in `src/ion/merges.rs`, as exposed by this change.	2021-08-12 17:35:55 -07:00
Chris Fallin	3e1e0f39b6	Convert all log::debug to log::trace.	2021-08-12 12:05:19 -07:00
Chris Fallin	84285c26fb	Rename OperandPolicy to OperandConstraint as per feedback from @julian-seward1.	2021-08-12 11:17:52 -07:00
Chris Fallin	b36a563d69	Cleanup: split allocator implemntation into 11 files of more reasonable size.	2021-06-18 16:51:41 -07:00

20 Commits