regalloc2

Author	SHA1	Message	Date
Amanieu d'Antras	7354cfedde	Remove support for program moves (#118 )	2023-03-04 16:38:05 -08:00
Amanieu d'Antras	54f074e507	Re-introduce optional dedicated scratch registers (#117 ) * Re-introduce optional dedicated scratch registers Dedicated scratch registers used for resolving move cycles were removed in #51 and replaced with an algorithm to automatically allocate a scratch register as needed. However in many cases, a client will already have a non-allocatable scratch register available for things like extended jumps (see #91). It makes sense to re-use this register for regalloc than potentially spilling an existing register. * Clarify comment	2023-03-04 14:49:10 -08:00
Chris Fallin	0edb11d3a7	Remove mod operands. (#109 ) * Remove Mod operands. * Typo fix.	2023-01-24 17:41:46 -08:00
Chris Fallin	e09f6519a6	Remove pinned VRegs. (#108 )	2023-01-24 17:31:41 -08:00
Trevor Elliott	7f0d0b98d0	Expose ssa verification as a regalloc2 option (#102 ) Adds the validate_ssa flag to the RegallocOptions struct, enabling ssa validation of inputs before register allocation takes place.	2022-11-29 09:30:59 -08:00
Trevor Elliott	51561285d3	Add a From<&MachineEnv> impl for PRegSet (#101 )	2022-11-09 11:31:56 -08:00
Amanieu d'Antras	67f5c167a8	Fix documentation for `inst_clobbers` (#89 ) Clobbers don't conflict with early uses, so they effectively act as a late def, not an early def.	2022-09-26 16:14:08 -07:00
Amanieu d'Antras	906a053208	Remove register class from `SpillSlot` (#80 ) * Remove register class from `SpillSlot` The register allocator was already allowing moves between spillslots and registers of different classes, so this PR formalizes this by making spillslots independent of register class. This also fixes #79 by properly tracking the register class of an `InsertedMove` with the `to_vreg` field which turns out to never be `None` in practice. Removing the `Option` allows the register class of the `VReg` to be used when building the per-class move lists. Fixes #79 * Address review feedback	2022-09-20 14:05:23 -07:00
Amanieu d'Antras	520cafa129	Handle fixed stack slots in the move resolver (#78 ) Fixed stack slots are treated as `PReg`s by most of the register allocator, but need some additional handling the move resolver to avoid generating stack-to-stack moves.	2022-09-19 12:27:24 -07:00
Amanieu d'Antras	1495c1e342	Add fixed-non-allocatable operand support (#77 ) This allows a non-allocatable `PReg` to be passed on directly to the allocations vector without any liverange tracking from the register allocator. The main intended use case is to support ISA-specific special registers such as a fixed zero register.	2022-09-19 12:23:53 -07:00
Amanieu d'Antras	aeef47a06b	Make more `PReg` & `VReg` methods const (#76 ) Also remove the previous workaround for const-assert now that it is available on stable Rust.	2022-09-12 10:03:29 -07:00
Teymour Aldridge	ad39c66fe7	Fix typo. (#71 )	2022-08-04 20:24:36 -07:00
Jamey Sharp	1955c6dfb5	Only record vreg definitions when fuzzing (#66 ) The `vreg_def_blockparam` and `vreg_def_inst` fields on CFGInfo are only used in `validate_ssa`, which in turn is only used in the ssagen fuzz target. Since these fields are never read in normal usage, initializing them is entirely wasted effort. According to valgrind/DHAT, when running `wasmtime compile` on the Sightglass Spidermonkey benchmark, removing these fields saves about 100M instructions, 23k heap allocations totalling 40MiB, and 47MiB of writes to the heap.	2022-07-29 10:25:27 -07:00
Benjamin Bouvier	a33b044d6c	Streamline log enablement (#64 ) * Remove log_enabled statements around annotate calls, which are already guarded against annotations_enabled * Use a trace_enabled!() macro that follows the same logic as trace!() to find if additional traces have been enabled or not	2022-07-20 10:44:31 -07:00
Chris Fallin	9733cb2227	Clobbers: use a more efficient bitmask representation in API. (#58 ) * Clobbers: use a more efficient bitmask representation in API. Currently, the `Function` trait requires a `&[PReg]` for the clobber-list for a given instruction. In most cases where clobbers are used, the list may be long: e.g., ABIs specify a fixed set of registers that are clobbered and there may be ~half of all registers in this list. What's more, the list can't be shared for e.g. all calls of a given ABI, because actual return-values (defs) can't be clobbers. So we need to allocate space for long, sometimes-slightly-different lists; this is inefficient for the embedder and for us. It's much more efficient to use a bitmask to represent a set of physical registers. With current data structure bitpacking limitations, we can support at most 128 physical registers; this means we can use a `u128` bitmask. This also allows e.g. an embedder to start with a constant for a given ABI, and mask out bits for actual return-value registers on call instructions. This PR makes that change, for minor but positive performance impact. * Review comments.	2022-06-27 12:27:19 -07:00
Chris Fallin	869c21e79c	Remove an explicitly-set-aside scratch register per class. (#51 ) Currently, regalloc2 sets aside one register per class, unconditionally, to make move resolution possible. To solve the "parallel moves problem", we sometimes need to conjure a cyclic permutation of data among registers or stack slots (this can result, for example, from blockparam flow that swaps two values on a loop backedge). This set-aside scratch register is used when a cycle exists. regalloc2 also uses the scratch register when needed to break down a stack-to-stack move (which could happen due to blockparam moves on edges when source and destination are both spilled) into a stack-to-reg move followed by reg-to-stack, because most machines have loads and stores but not memory-to-memory moves. A set-aside register is certainly the simplest solution, but it is not optimal: it means that we have one fewer register available for use by the program, and this can be costly especially on machines with fewer registers (e.g., 16 GPRs/XMMs on x86-64) and especially when some registers may be set aside by our embedder for other purposes too. Every register we can reclaim is some nontrivial performance in large function bodies! This PR removes this restriction and allows regalloc2 to use all available physical registers. It then solves the two problems above, cyclic moves and stack-to-stack moves, with a two-stage approach: - First, it finds a location to use to resolve cycles, if any exist. If a register is unallocated at the location of the move, we can use it. Often we get lucky and this is the case. Otherwise, we allocate a stackslot to use as the temp. This is perfectly fine at this stage, even if it means that we have more stack-to-stack moves. - Then, it resolves stack-to-stack moves into stack-to-reg / reg-to-stack. There are two subcases here. If there is another available free physical register, we opportunistically use it for this decomposition. If not, we fall back to our last-ditch option: we pick a victim register of the appropriate class, we allocate another temporary stackslot, we spill the victim to that slot just for this move, we do the move in the above way (stack-to-reg / reg-to-stack) with the victim, then we reload the victim. So one move (original stack-to-stack) becomes four moves, but no state is clobbered. This PR extends the `moves` fuzz-target to exercise this functionality as well, randomly choosing for some spare registers to exist or not, and randomly generating {stack,reg}-to-{stack,reg} moves in the initial parallel-move input set. The target does a simple symbolic simulation of the sequential move sequence and ensures that the final state is equivalent to the parallel-move semantics. I fuzzed both the `moves` target, focusing on the new logic; as well as the `ion_checker` target, checking the whole register allocator, and both seem clean (~150M cases on the former, ~1M cases on the latter).	2022-05-23 10:48:37 -07:00
Chris Fallin	a5c48fda8a	Support program moves, including pinned vregs, in the checker. (#43 ) The checker was built to validate programs produced by the fuzzing testcase generator, which was built before regalloc2 supported special handling of moves. (In a pure-SSA world, move elision is not needed, because moves are not needed, and blockparams are the only way of tying together vregs.) Due to this, the checker works great for our independent regalloc2 fuzzing setup, but when used on regalloc inputs produced by Cranelift, cannot prove correctness. This PR extends the checker's analysis to properly handle "program moves", which are distinct from regalloc-inserted moves in that they are present in the original program and hence are semantically relevant. A program move edits all sets of symbolic vregs at all allocs, and where the source vreg appears, it inserts the dest vreg as well. (It also removes the dest vreg from all other sets, since the old value becomes stale, as is done for other defs.) Given this, and given some additional checking for moves to/from pinned vregs, the checker can now be used to fully validate Cranelift-sourced regalloc2 invocations.	2022-04-18 10:36:26 -07:00
Chris Fallin	4cac1614bf	Add serde support for exposed types. (#40 ) This adds derived `Serialize` and `Deserialize` implementations for exposed types that describe registers, operands, and related program inputs; entity indices; and regalloc output types. This allows serialization of any of the embedder's IR data types that may embed or build upon regalloc2 types. These implementations (and the dependency on the `serde` crate itself) are enabled only when the non-default `enable-serde` feature is specified.	2022-04-13 10:14:00 -07:00
Chris Fallin	0cb08095e6	Make multiple defs of one vreg possible on one instruction. (#38 ) Currently, if this is done, overlapping liveranges are created, and we hit an assert that ensures our non-overlapping and built-in-reverse-order invariants during liverange construction. One could argue that multiple defs of a single vreg don't make a ton of sense -- which def's value is valid after the instruction? -- but if they all get the same alloc, then the answer is "whatever was put in that alloc", and this is just a case of an instruction being a bit over-eager when listing its registers. This can arise in practice when operand lists come from combinations or concatenations: for example, in Cranelift's s390x backend, there is a "Loop" pseudo-instruction, and the operands of the Loop are the operands of all the sub-instructions. It seems more logically cohesive overall to say that one can state an operand as many times as one likes; so this PR makes it so.	2022-04-04 16:22:05 -07:00
Chris Fallin	433e8b3776	Early defs reserve a register for whole instruction. (#32 ) The `Operand` abstraction allows a def to be positioned at the "early" point of an instruction, before its effect and alongside its normal uses. This is intended to allow the embedder to express that a def may be written before all uses are read, so it should not conflict with the uses. It's also convenient to use early defs to express temporaries, which should be available throughout a regalloc-level instruction's emitted sequence. In such a case, the register should not be used again after the instruction, so it is dead following the instruction. Strictly speaking, and according to regalloc2 prior to this PR, then the temp will only conflict with the uses at the early-point, and not the defs at the late-point (after the instruction), because it's dead past its point of definition. But for a temp we really want it to register conflicts not just with the normal uses but with the normal defs as well. This PR changes the semantics so that an early def builds a liverange that spans the early- and late-point of an instruction when the vreg is dead flowing down from the instruction, giving the semantics we want for temps.	2022-03-18 10:32:49 -07:00
Chris Fallin	4f1161d9e4	Generalize debug-info support a bit. (#34 ) * Generalize debug-info support a bit. Previously, debug value-label support required each vreg to have a disjoint sequence of instruction ranges, each with one label. Unfortunately, it's entirely possible for multiple values at the program level to map to one vreg at the IR level, leading to multiple labels. This PR generalizes the debug-info generation support to allow for arbitrary (label, range, vreg) tuples, as long as they are sorted by vreg, with no other requirements. The lookup is a little more costly when we generate the debuginfo, but in practice we shouldn't have more than a few debug value labels per vreg, so in practice the constants should be small. * Typo fix from Amanieu Co-authored-by: Amanieu d'Antras <amanieu@gmail.com> Co-authored-by: Amanieu d'Antras <amanieu@gmail.com>	2022-03-18 10:32:27 -07:00
Chris Fallin	bf92e7c02f	Allow pinned-vregs to be implicit liveins. (#30 ) Previously, the regalloc required all liveins to be defined by a pseudoinstruction at the start of the function body. The regalloc.rs compatibility shim did this, but it's slightly inconvenient when using the API directly. This change allows pinned vregs to be implicit liveins to the function body instead.	2022-03-18 10:13:56 -07:00
Chris Fallin	fe021ad6d4	Simplify pinned-vreg API: don't require slice of all pinned vregs. (#28 ) Simplify pinned-vreg API: don't require slice of all pinned vregs. Previously, we kept a bool flag `is_pinned` in the `VRegData`, and we required a `&[VReg]` of all pinned vregs to be provided by `Function::pinned_vregs()`. This was (I think) done for convenience, but it turns out not to really be necessary, as we can just query `is_pinned_vreg` where needed (and in the likely implementation, e.g. in Cranelift, this will be a `< NUM_PINNED_VREGS` check that can be inlined). This adds convenience for the embedder (the main benefit), and also reduces complexity, removes some state, and avoids some work initializing the regalloc state for a run.	2022-03-04 15:12:16 -08:00
Chris Fallin	14442df3fc	Support for debug-labels. (#27 ) Support for debug-labels. If the client adds labels to vregs across ranges of instructions in the input program, the regalloc will provide metadata in the `Output` that describes the `Allocation`s in which each such vreg is stored for those ranges. This allows the client to emit debug metadata telling a debugger where to find program values at each point in the program.	2022-03-03 16:58:33 -08:00
Chris Fallin	ccd6b4fc2c	Remove DefAlloc -- no longer needed.	2022-01-19 23:57:31 -08:00
Amanieu d'Antras	6b1a5e8b1b	Address review feedback	2022-01-11 22:27:15 +00:00
Amanieu d'Antras	ee4de54240	Guard trace! behind cfg!(debug_assertions) Even if the trace log level is disabled, the presence of the trace! macro still has a significant impact on performance because it is present in the inner loops of the allocator. Removing the trace! calls at compile-time reduces instruction count by ~7%.	2022-01-11 13:30:13 +00:00
Amanieu d'Antras	693fb6a975	Only emit DefAlloc edits when the "checker" feature is enabled. This reduces instruction counts by ~2% when disabled.	2022-01-11 13:03:24 +00:00
Amanieu d'Antras	74928b83fa	Replace all assert! with debug_assert! This results in a ~6% reduction in instruction count.	2022-01-11 03:54:08 +00:00
Amanieu d'Antras	8ab44c383e	Add a helper to iterate over insts and edits of a block in order	2021-12-27 22:08:36 +01:00
Amanieu d'Antras	51493ab03a	Apply review feedback	2021-12-12 00:33:30 +00:00
Amanieu d'Antras	38ffc479c2	Simplify the internal representation of PReg	2021-12-11 22:39:19 +00:00
Amanieu d'Antras	8f435243e0	Properly handle fixed stack slots during multi-fixed-reg fixup	2021-12-11 22:39:14 +00:00
Amanieu d'Antras	77e6a9e0d7	Add support for fixed stack slots This works by allowing a PReg to be marked as being a stack location instead of a physical register.	2021-12-11 22:31:58 +00:00
Amanieu d'Antras	0cb3a8019f	Rework the API for outgoing blockparams	2021-12-01 01:43:20 +00:00
Amanieu d'Antras	a516e6d6f3	Return safepoint_slots as Allocations instead of SpillSlots This enables us to support reftype vregs in register locations in the future.	2021-11-16 00:47:43 +00:00
Amanieu d'Antras	358c831b31	Remove regs from MachineEnv It isn't exactly clear what purpose it serves.	2021-09-19 16:40:27 +01:00
Amanieu d'Antras	af527aca88	Fix PReg indexing with >32 pregs	2021-09-19 16:39:56 +01:00
Amanieu d'Antras	9e2ab3d5f7	Address review feedback	2021-09-14 13:12:52 +01:00
Amanieu d'Antras	35ed2109b1	Adjust Operand encoding The encoding for OperandConstraint is adjusted to free up 2 bits which allows for 2^21 vregs and 2^6 pregs.	2021-09-13 08:33:17 +01:00
Amanieu d'Antras	a243c4e575	Remove Function::is_call The documentation says that this is only used for heuristics, but it is never actually called. This should be removed for now and perhaps added back later if we find an actual use for it.	2021-09-09 11:16:11 +01:00
Chris Fallin	6f0893d69d	Address review comments.	2021-08-31 17:56:06 -07:00
Chris Fallin	6389071e09	Address review comments.	2021-08-31 17:42:50 -07:00
Chris Fallin	b19fa4857f	Rename operand positions to Early and Late, and make weights f16/f32 values.	2021-08-31 17:31:23 -07:00
Chris Fallin	3a18564e98	Addressed more review comments.	2021-08-30 17:51:55 -07:00
Chris Fallin	6d313f2b56	Address review comments: more doc comments and some minor refactorings.	2021-08-30 17:15:37 -07:00
Chris Fallin	ffc06b2099	Debug output for Operands: omit default/most common positions.	2021-08-12 14:49:42 -07:00
Chris Fallin	c071e44fc0	Derive PartialOrd/Ord/Hash for Operand.	2021-08-12 14:43:13 -07:00
Chris Fallin	7652b4b109	Review feedback.	2021-08-12 14:27:20 -07:00
Chris Fallin	2f856435f4	Review feedback.	2021-08-12 14:08:10 -07:00

1 2

82 Commits