regalloc2

Author	SHA1	Message	Date
Amanieu d'Antras	7354cfedde	Remove support for program moves (#118 )	2023-03-04 16:38:05 -08:00
Trevor Elliott	7f0d0b98d0	Expose ssa verification as a regalloc2 option (#102 ) Adds the validate_ssa flag to the RegallocOptions struct, enabling ssa validation of inputs before register allocation takes place.	2022-11-29 09:30:59 -08:00
Amanieu d'Antras	227a9fde91	Cache `HashSet` in `try_to_allocate_bundle_to_reg` (#90 ) Keep `conflict_set` allocated in `Env` instead of allocating a new one on every call. This improves register allocation performance by about 2%.	2022-09-26 16:14:43 -07:00
Benjamin Bouvier	a33b044d6c	Streamline log enablement (#64 ) * Remove log_enabled statements around annotate calls, which are already guarded against annotations_enabled * Use a trace_enabled!() macro that follows the same logic as trace!() to find if additional traces have been enabled or not	2022-07-20 10:44:31 -07:00
Chris Fallin	869c21e79c	Remove an explicitly-set-aside scratch register per class. (#51 ) Currently, regalloc2 sets aside one register per class, unconditionally, to make move resolution possible. To solve the "parallel moves problem", we sometimes need to conjure a cyclic permutation of data among registers or stack slots (this can result, for example, from blockparam flow that swaps two values on a loop backedge). This set-aside scratch register is used when a cycle exists. regalloc2 also uses the scratch register when needed to break down a stack-to-stack move (which could happen due to blockparam moves on edges when source and destination are both spilled) into a stack-to-reg move followed by reg-to-stack, because most machines have loads and stores but not memory-to-memory moves. A set-aside register is certainly the simplest solution, but it is not optimal: it means that we have one fewer register available for use by the program, and this can be costly especially on machines with fewer registers (e.g., 16 GPRs/XMMs on x86-64) and especially when some registers may be set aside by our embedder for other purposes too. Every register we can reclaim is some nontrivial performance in large function bodies! This PR removes this restriction and allows regalloc2 to use all available physical registers. It then solves the two problems above, cyclic moves and stack-to-stack moves, with a two-stage approach: - First, it finds a location to use to resolve cycles, if any exist. If a register is unallocated at the location of the move, we can use it. Often we get lucky and this is the case. Otherwise, we allocate a stackslot to use as the temp. This is perfectly fine at this stage, even if it means that we have more stack-to-stack moves. - Then, it resolves stack-to-stack moves into stack-to-reg / reg-to-stack. There are two subcases here. If there is another available free physical register, we opportunistically use it for this decomposition. If not, we fall back to our last-ditch option: we pick a victim register of the appropriate class, we allocate another temporary stackslot, we spill the victim to that slot just for this move, we do the move in the above way (stack-to-reg / reg-to-stack) with the victim, then we reload the victim. So one move (original stack-to-stack) becomes four moves, but no state is clobbered. This PR extends the `moves` fuzz-target to exercise this functionality as well, randomly choosing for some spare registers to exist or not, and randomly generating {stack,reg}-to-{stack,reg} moves in the initial parallel-move input set. The target does a simple symbolic simulation of the sequential move sequence and ensures that the final state is equivalent to the parallel-move semantics. I fuzzed both the `moves` target, focusing on the new logic; as well as the `ion_checker` target, checking the whole register allocator, and both seem clean (~150M cases on the former, ~1M cases on the latter).	2022-05-23 10:48:37 -07:00
Chris Fallin	ad41f8a7a5	Record vreg classes explicitly during liverange pass. (#35 ) This resolves an issue seen when the source program uses multiple regclasses (Int and Float): in some cases, the logic that grabs the vregs and retains them (with class) in `vreg_regs` missed a register and we had a class mismatch. This occurred because data structures were initialized assuming `Int` regclass at first. This PR instead removes the `vreg_regs` array, stores the class explicitly as an `Option<RegClass>` in the `VRegData`, and provides a `Env::vreg()` method that reconstitutes a `VReg` given its index and its observed class. We "observe" the class of every vreg seen during the liveness pass (and we assert that every occurrence of the vreg index has the same class). In this way, we still have a single source-of-truth for the vreg class (the mention of the vreg itself) and we explicitly represent the "not observed yet" state (and panic on attempting to use such a vreg) rather than implicitly taking the wrong class.	2022-03-29 14:00:14 -07:00
Chris Fallin	14442df3fc	Support for debug-labels. (#27 ) Support for debug-labels. If the client adds labels to vregs across ranges of instructions in the input program, the regalloc will provide metadata in the `Output` that describes the `Allocation`s in which each such vreg is stored for those ranges. This allows the client to emit debug metadata telling a debugger where to find program values at each point in the program.	2022-03-03 16:58:33 -08:00
Amanieu d'Antras	2d9d5dd82b	Rearrange some struct fields to work better with u64_key/u128_key This allows the compiler to load the whole key with 1 or 2 64-bit accesses, assuming little-endian ordering. Improves instruction count by ~1%.	2022-01-11 13:24:51 +00:00
Amanieu d'Antras	707aacd818	Split up functions in liverange.rs This helps with profiling even if they are inlined since perf with DWARF callgraph profiling can attribute execution time to inlined functions.	2021-12-11 22:31:58 +00:00
Chris Fallin	c7bc6c941c	Merge pull request #15 from cfallin/relicensing Relicense fully to Apache-2.0 WITH LLVM-exception.	2021-11-18 12:40:54 -08:00
Amanieu d'Antras	a527a6d25a	Remove unused clobbers vector	2021-11-16 00:46:05 +00:00
Chris Fallin	cf0d515709	Relicense fully to Apache-2.0 WITH LLVM-exception. Large parts of the code in regalloc2 are currently licensed under the Mozilla Public License (MPL) 2.0, because they derive in meaningful ways from the register allocator in IonMonkey, which is part of Firefox. The relevant source files are marked as such, with references to the files in the Firefox source tree. The intent of the regalloc2 project was to port the register allocator from Firefox to use in Cranelift, borrowing good technology and improving on it in the spirit of open source. However, Several use-cases of Cranelift require, or at least strongly prefer, the Apache-2.0 license with the LLVM exception (matching the license of Cranelift itself, and Bytecode Alliance projects generally). While using this license is not strictly necessary for regalloc2 to be usable (The MPL is an excellent open-source license!), relicensing fully under this license to harmonize with the rest of Cranelift and Bytecode Alliance codebases significantly widens possibilities and reduces friction; then regalloc2 is "just another part of Cranelift" and doesn't have to be treated specially. The source in `src/ion/` specifically began as a fairly direct port of the algorithms in the following files in the `mozilla-central` repository (Firefox codebase): * The bulk of the "backtracking allocator" algorithm: * `js/src/jit/BacktrackingAllocator.{cpp,h}` * Helpers and definitions in the surrounding infrastructure: * `js/src/jit/RegisterAllocator.h` * `js/src/jit/RegisterAllocator.cpp` * `js/src/jit/StackSlotAllocator.h` * `js/src/jit/LIR.h` * A few data structure implementations: * `js/src/ds/SplayTree.h` * `js/src/ds/PriorityQueue.h` Subsequent work in improving regalloc2 has caused it to drift from the direct port -- for example, it no longer uses splay trees or the direct port of the priority queue above -- but it is of course very clearly still a derivative work. Analysis of the contributors to these files indicates that we need signoff from the following folks: * Mozilla Corp, for contributions made by Mozilla employees (the majority of the code). Communications with Mozilla (thanks @tschneidereit and @bholley for doing the work here!) indicate that @ekr is able to sign off when ready here. * Andy Wingo, specifically for the work done in [Bug 1620197](https://bugzilla.mozilla.org/show_bug.cgi?id=1620197) and [Bug 1609057](https://bugzilla.mozilla.org/show_bug.cgi?id=1609057) to generalize the stack allocator for a Wasm feature (multiple returns). Additionally, since the initial port, we have had three contributions from @Amanieu: [#9](https://github.com/bytecodealliance/regalloc2/pull/9), [#11](https://github.com/bytecodealliance/regalloc2/pull/11), [#13](https://github.com/bytecodealliance/regalloc2/pull/13). So, if everyone applicable is happy with this relicensing, this PR removes the MPL-2.0 license in `src/ion/` and marks all files as covered under `Apache-2.0 WITH LLVM-exception`. Please let us know if this is OK! Signoffs: - [ ] @ekr, for Mozilla's contributions - [ ] @wingo, for contributions to original code in `mozilla-central` - [ ] @Amanieu, for the three PRs linked above Thanks!	2021-11-10 10:54:28 -08:00
Chris Fallin	6d313f2b56	Address review comments: more doc comments and some minor refactorings.	2021-08-30 17:15:37 -07:00
Chris Fallin	2f856435f4	Review feedback.	2021-08-12 14:08:10 -07:00
Chris Fallin	f27abc9c48	Remove infinite-loop check: it is not a high enough bound in some pathological cases (e.g., `gc::many_live_refs` test in wasmtime), and it has served its purpose in testing. We can rely on more detailed assertions, e.g. that splits actually shrink bundles and that bundles evict only lower-priority bundles, instead.	2021-06-22 12:06:12 -07:00
Chris Fallin	b36a563d69	Cleanup: split allocator implemntation into 11 files of more reasonable size.	2021-06-18 16:51:41 -07:00
Chris Fallin	a686d5a513	Always recompute prio when recomputing bundle properties; otherwise is zero on new bundles and causes spill weight to be zeroed	2021-06-11 13:21:50 -07:00
Chris Fallin	ea814225a2	Update TODO list	2021-06-10 23:34:18 -07:00
Chris Fallin	1bd1248cb5	Avoid stack-to-stack moves by allocating an extra spillslot and re-using the scratch reg instead.	2021-06-10 22:36:02 -07:00
Chris Fallin	09b2dd4e73	TODO list update	2021-06-10 17:34:04 -07:00
Chris Fallin	fcbf384d74	Use hashset to avoid linear scan in conflict-bundle-set deduplication	2021-06-09 23:53:54 -07:00
Chris Fallin	4ba7b2f57e	Improve redundant-move elimination: don't skip the case where we don't hvae vreg metadata within a BB. Instead, keep the checker happy by feeding more metadata to it on every move.	2021-06-09 23:29:48 -07:00
Chris Fallin	2851ac80c7	Working redundant-move elimination	2021-06-09 23:03:16 -07:00
Chris Fallin	f898b8dcbd	Some fuzzbug fixes	2021-06-09 20:58:54 -07:00
Chris Fallin	e33790d8e7	do not remove redundant move if we don't have local (within-basic-block) vreg tags, as this might throw off the checker	2021-06-08 01:30:29 -07:00
Chris Fallin	0f270e5bcc	WIP.	2021-06-08 01:14:10 -07:00
Chris Fallin	940bc40fae	Redundant move eliminator.	2021-06-08 00:08:28 -07:00
Chris Fallin	c6bcd3c941	WIP: redundant-move elimination.	2021-06-07 21:15:32 -07:00
Chris Fallin	2be7bdbc22	Split-at-first-conflict: first conflict is first of (start of our range), (start of conflict range), not just the latter; otherwise we have a too-early split sometimes	2021-06-07 12:27:58 -07:00
Chris Fallin	30f42a8717	Fix fuzzbug: properly detect too-many-live-regs condition on fuzzing input. Must be careful in how we probe the BTree when we have multiple "equal" (overlapping) keys.	2021-06-03 23:48:33 -07:00
Chris Fallin	5560499b80	Adaptive commitment-map scanning: re-probe from root if we skip too many entries in linear BTree scan	2021-06-03 18:17:28 -07:00
Chris Fallin	00e4240c93	merge bundles much faster by just concatenating range-lists and unstable-sorting, rather than a merge-sort-like traversal. Rust stdlib sort is very optimized. clang.wasm 9.1s -> 6.8s now.	2021-06-03 17:34:19 -07:00
Chris Fallin	6a0739b62a	Implement spill-bundle: move all empty ranges, and empty leading/trailing pieces surrounding split points, to a single spill bundle, in an attempt to avoid excessive movement	2021-06-03 00:18:27 -07:00
Chris Fallin	dc2b0d1913	Add a perf idea to TODO list	2021-06-01 23:13:08 -07:00
Chris Fallin	2fe276ca04	BTreeMap probe fix (fuzzbug): BTree does not interact nicely with LiveRangeKey definition of equality; need to probe with one-less-than start to get proper range	2021-06-01 23:10:34 -07:00
Chris Fallin	a2a770ec50	Fuzzbug fix	2021-06-01 18:57:07 -07:00
Chris Fallin	2614eac21e	fuzzbug fix: restore clean error exit required by regalloc.rs fuzzer on too-many-live-regs error	2021-06-01 16:31:33 -07:00
Chris Fallin	e49727dc75	Fuzzbug fix: fix some weirdness with BTree iteration inner loop	2021-06-01 15:32:12 -07:00
Chris Fallin	44ca1893c3	Fuzzbug fix: properly check for conflicting reqs before merging bundles (cached values are not computed yet)	2021-06-01 14:52:59 -07:00
Chris Fallin	f49167e0fe	emit annotations at Info level, for easier selective perf-debugging	2021-05-28 18:40:39 -07:00
Chris Fallin	789651f947	Rework inner allocation-loop code: choose more wisely between splitting and evicting based on costs (and unify the fixed-reg-constraint case)	2021-05-28 17:36:06 -07:00
Chris Fallin	43d7095cbd	Properly split when we hit a fixed conflict	2021-05-28 16:49:32 -07:00
Chris Fallin	7171624750	Don't generate r1->scratch,scratch-r1 sequence for cyclic moves of r1->r1 that are generated to change vreg ownership and keep the checker happy. Seems to eliminate a bit of braindeadness and improve bz2 by ~5-10%.	2021-05-26 21:35:43 -07:00
Chris Fallin	13bde99d7d	bugfix with clean-spill opt: avoid if liverange starts at start of block (this is like a def) or if has starts-at-def flag.	2021-05-26 18:08:41 -07:00
Chris Fallin	e521811b88	Avoid re-spilling to spillslot when still clean: intra-block edition (inter-block needs more analysis and careful thought)	2021-05-26 17:08:14 -07:00
Chris Fallin	dcf6f473ca	inline some things	2021-05-26 00:48:41 -07:00
Chris Fallin	4e0dd1f296	little tweak to avoid a div/mod on every iter of a PReg alloc loop	2021-05-26 00:38:53 -07:00
Chris Fallin	b3dc2b25a5	Alloc spillsets for whole vreg, not just spilled LRs. This is a prerequisite to allowing a "clean" value to remain in spillslot while also in reg and avoiding the re-spill. It should also reduce stack-to-stack moves (though they can still come from progmoves).	2021-05-25 18:19:25 -07:00
Chris Fallin	ca5f24f6b7	Hint the same PReg for both halves of a split	2021-05-24 23:49:47 -07:00
Chris Fallin	7cdcb2031e	Split heuristic: split before entering deeper loop nest	2021-05-24 23:09:05 -07:00

1 2 3

121 Commits