regalloc2

Author	SHA1	Message	Date
Chris Fallin	0eb3deb384	Version 0.3.1. (#65 )	2022-07-20 10:57:15 -07:00
Benjamin Bouvier	a33b044d6c	Streamline log enablement (#64 ) * Remove log_enabled statements around annotate calls, which are already guarded against annotations_enabled * Use a trace_enabled!() macro that follows the same logic as trace!() to find if additional traces have been enabled or not	2022-07-20 10:44:31 -07:00
Chris Fallin	8bede950d0	Release 0.3.0. (#63 ) Includes improvements in splitting performance (#59) and more efficient handling of clobber lists (#58). Semver break because of API change in #58.	2022-06-27 14:09:15 -07:00
Chris Fallin	4eb2a2528b	Limit split count per original bundle with fallback 1-to-N split. (#59 ) * Limit split count per original bundle with fallback 1-to-N split. Right now, splitting a bundle produces two halves. Furthermore, it has cost linear in the length of the bundle, because the resulting half-bundles have their requirements recomputed with a new scan, and because we copy half the use-list over to the tail end sub-bundle. This works fine when a bundle has a handful of splits overall, but not when an input has a systematic pattern of conflicts that will require O(\|bundle\|) splits (e.g., every Use is constrained to a different fixed register than the last one). In such a case, we get quadratic behavior. This PR adds a per-spillset (so, per-original-bundle) counter for splits, and when it reaches a preset threshold (10 for now), we instead split directly into minimal bundles along the whole length of the bundle, putting the regions without uses in the spill bundle. This basically approximates what a non-splitting allocator would do: it "spills" the whole bundle to possibly a stackslot, or a second-chance register allocation at best, via the spill bundle; and then does minimal reservations of registers just at uses/defs and moves the "spilled" value into/out of them immediately. Together with another small optimization, this PR results in a 4x compilation speedup and 24x memory use reduction on one particularly bad case with alternating conflicting requirements on a vreg (see bytecodealliance/wasmtime#4291 for details). * Review comments.	2022-06-27 13:23:09 -07:00
Chris Fallin	9733cb2227	Clobbers: use a more efficient bitmask representation in API. (#58 ) * Clobbers: use a more efficient bitmask representation in API. Currently, the `Function` trait requires a `&[PReg]` for the clobber-list for a given instruction. In most cases where clobbers are used, the list may be long: e.g., ABIs specify a fixed set of registers that are clobbered and there may be ~half of all registers in this list. What's more, the list can't be shared for e.g. all calls of a given ABI, because actual return-values (defs) can't be clobbers. So we need to allocate space for long, sometimes-slightly-different lists; this is inefficient for the embedder and for us. It's much more efficient to use a bitmask to represent a set of physical registers. With current data structure bitpacking limitations, we can support at most 128 physical registers; this means we can use a `u128` bitmask. This also allows e.g. an embedder to start with a constant for a given ABI, and mask out bits for actual return-value registers on call instructions. This PR makes that change, for minor but positive performance impact. * Review comments.	2022-06-27 12:27:19 -07:00
Chris Fallin	06b3baf9f9	Version 0.2.3. (#61 )	2022-06-27 11:37:57 -07:00
Chris Fallin	68aa61571b	Bugfix: no hole in liveranges for pinned vreg move src. (#60 ) Right now, pinned vregs are a way of naming real registers (a compatibility shim of sorts for Cranelift's `RealReg`s) and can be used as sources and dests of moves. When the input program does so, regalloc2 converts these into "ghost" uses and defs on the other vreg of the move (dest or source, respectively) with a fixed-register constraint. So `move v128, v0` where `v0` is pinned to `p0` turns into a "ghost def" on `v128` with constraint `fixed p0`. There is some fancy manipulation of liveranges to make this all work while properly recording where the preg's value must be preserved. Unfortunately, there was an off-by-one in the location of the move and transition of live-ranges which interacts poorly with the "implicit live-in" of pinned vregs at function start. As a result, a function body that starts like: ``` move v128, v0 def v9000 move v129, v1 ``` might allocate `p1` (to which `v1` is pinned) for `v9000`. This clobbers the original value. Fortunately this only impacts the implicit live-in, and Cranelift's use of regalloc2 is such that it will always copy all values out of pinned vregs (creating ghost defs) without intervening defs, except in the case of `sret` ("structure return") arguments. If a program does not use `sret` arguments (and the `cranelift-wasm` frontend does not), then this bug should not be reachable. Long-term, we really need to kill pinned vregs with fire (#3); the special cases that arise from these, and from special handling of moves, are too much incidental complexity. All of this can go away once Cranelift migrates all fixed-register cases to operand constraints instead. That will be a happy day. Thanks to @bjorn3 for finding and reporting this issue!	2022-06-27 11:21:19 -07:00
Chris Fallin	b78ccbce6e	Version 0.2.2. (#57 )	2022-06-03 16:15:54 -07:00
Chris Fallin	427e041f1c	Fix spillslot allocation to actually reuse spillslots. (#56 ) * Fix spillslot allocation to actually reuse spillslots. The old logic, which did some linked-list rearranging to try to probe more-likely-to-be-free slots first and which was inherited straight from the original IonMonkey allocator, was slightly broken (error in translation and not in IonMonkey, to be clear): it did not get the list-splicing right, so quite often dropped a slot on the floor and failed to consider it for further reuse. On some experimentation, it seems to work just as well to keep a SmallVec of spillslot indices per size class instead, and save the last probe-point in order to spread load throughout the allocated slots while limiting the number of probes (to bound quadratic behavior). This change moves the maximum slot count from 285 to 92 in `python.wasm` from bytecodealliance/wasmtime#4214, and the maximum frame size from 2384 bytes to 752 bytes.	2022-06-03 16:01:10 -07:00
Chris Fallin	257c5ccc18	Bump version to 0.2.1. (#55 )	2022-05-31 14:33:55 -07:00
Chris Fallin	52818a7ed6	Handle conflicting Before and After fixed-reg constraints with a copy. (#54 ) * Extend fuzzer to generate cases like #53. Currently, the fuzz testcase generator will add at most one fixed-register constraint to an instruction per physical register. This avoids impossible situations, such as specifying that both `v0` and `v1` must be placed into the same `p0`. However, it should be possible to say that `v0` is in `p0` before the instruction, and `v1` is in `p0` after the instruction (i.e., at `Early` and `Late` operand positions). This in fact exposes a limitation in the current allocator design: when `v0` is live downward, with the above constraints, it will result in an impossible allocation situation because we cannot split in the middle of an instruction. A subsequent fix will rectify this by using the multi-fixed-reg fixup mechanism. * Handle conflicting Before and After fixed-reg constraints with a copy. This fixes #53. Previously, if two operands on an instruction specified different vregs constrained to the same physical register at the Before (Early) and After (Late) points of the instruction, and the Before was live downward as well, we would panic: we can't insert a move into the middle of an instruction, so putting the first vreg in the preg at Early implies we have an unsolveable conflict at Late. We can solve this issue by adding some new logic to insert a copy, and rewrite the constraint. This reuses the multi-fixed-reg-constraint fixup logic. While that logic handles the case where the same vreg has multiple different fixed-reg constraints, this new logic handles different vregs with the same fixed-reg constraints, but at different program points; so the two are complementary. This addresses the specific test case in #53, and also fuzzes cleanly with the change to the fuzz testcase generator to generate these cases (which also immediately found the bug). * Add a reservation to the PReg when rewriting constraint so it is not doubly-allocated. * Distinguish initial fixup moves from secondary moves. * Use `trace` macro, not `log::trace`, to avoid trace output when feature is disabled. * Rework operand rewriting to properly handle bundle-merging edge case. When the liverange for the defined vreg with fixed constraint at Late is merged with the liverange for the used vreg with fixed constraint at Early, the strategy of putting a fixed reservation on the preg at Early fails, because the whole bundle is minimal (if it spans just the instruction's Early and Late and nothing else). This could happen if e.g. the def flows into a blockparam arg that merges with a blockparam defining the used value. Instead we move the def one halfstep earlier, to the Early point, with its fixed-reg constraint still in place. This has the same effect but works when the two are merged. * Fix checker issue: make more flexible in the presence of victim-register saves.	2022-05-31 14:01:27 -07:00
Chris Fallin	0395614545	Bump version to 0.2.0. (#52 ) The change in #51 was API-visible (removal of the "scratch register" field in the `Env`) so this is a semver bump.	2022-05-23 11:13:16 -07:00
Chris Fallin	869c21e79c	Remove an explicitly-set-aside scratch register per class. (#51 ) Currently, regalloc2 sets aside one register per class, unconditionally, to make move resolution possible. To solve the "parallel moves problem", we sometimes need to conjure a cyclic permutation of data among registers or stack slots (this can result, for example, from blockparam flow that swaps two values on a loop backedge). This set-aside scratch register is used when a cycle exists. regalloc2 also uses the scratch register when needed to break down a stack-to-stack move (which could happen due to blockparam moves on edges when source and destination are both spilled) into a stack-to-reg move followed by reg-to-stack, because most machines have loads and stores but not memory-to-memory moves. A set-aside register is certainly the simplest solution, but it is not optimal: it means that we have one fewer register available for use by the program, and this can be costly especially on machines with fewer registers (e.g., 16 GPRs/XMMs on x86-64) and especially when some registers may be set aside by our embedder for other purposes too. Every register we can reclaim is some nontrivial performance in large function bodies! This PR removes this restriction and allows regalloc2 to use all available physical registers. It then solves the two problems above, cyclic moves and stack-to-stack moves, with a two-stage approach: - First, it finds a location to use to resolve cycles, if any exist. If a register is unallocated at the location of the move, we can use it. Often we get lucky and this is the case. Otherwise, we allocate a stackslot to use as the temp. This is perfectly fine at this stage, even if it means that we have more stack-to-stack moves. - Then, it resolves stack-to-stack moves into stack-to-reg / reg-to-stack. There are two subcases here. If there is another available free physical register, we opportunistically use it for this decomposition. If not, we fall back to our last-ditch option: we pick a victim register of the appropriate class, we allocate another temporary stackslot, we spill the victim to that slot just for this move, we do the move in the above way (stack-to-reg / reg-to-stack) with the victim, then we reload the victim. So one move (original stack-to-stack) becomes four moves, but no state is clobbered. This PR extends the `moves` fuzz-target to exercise this functionality as well, randomly choosing for some spare registers to exist or not, and randomly generating {stack,reg}-to-{stack,reg} moves in the initial parallel-move input set. The target does a simple symbolic simulation of the sequential move sequence and ensures that the final state is equivalent to the parallel-move semantics. I fuzzed both the `moves` target, focusing on the new logic; as well as the `ion_checker` target, checking the whole register allocator, and both seem clean (~150M cases on the former, ~1M cases on the latter).	2022-05-23 10:48:37 -07:00
Chris Fallin	33611a68b9	Bump to version 0.1.3. (#50 )	2022-05-16 22:44:35 -07:00
Chris Fallin	1379c65a6a	Handle conflict-related liverange splits arising from stack constraints without falling back to spill bundle. (#49 ) Currently, we unconditionally trim the ends of liveranges around a split when we do a split, including splits due to conflicts in a liverange/bundle's requirements (e.g., a liverange with both a register and a stack use). These trimmed ends, if they exist, go to the spill bundle, and the spill bundle may receive a register during second-chance allocation or otherwise will receive a stack slot. This was previously measured to reduce contention significantly, because it reduces the sizes of liveranges that participate in the first-chance competition for allocations. When a split has to occur, we might as well relegate the "connecting pieces" to a process that comes later, with a hint to try to get the right register if possible but no hard connection to either end. However, in the case of a split arising from a reg-to-stack / stack-to-reg conflict, as happens when references are used or def'd as registers and then cross safepoints, this extra step in the connectivity (normal LR with register use, then spill bundle, then normal LR with stack use) can lead to extra moves. Additionally, when one of the LRs has a stack constraint, contention is far less important; so it doesn't hurt to skip the trimming step. In fact, it's likely much better to put the "connecting piece" together with the stack side of the conflict. Ideally we would handle this with the same move-cost logic we use for conflicts detected during backtracking, but the requirements-related splitting happens separately and that logic would need to be generalized further. For now, this is sufficient to eliminate redundant moves as seen in e.g. bytecodealliance/wasmtime#3785.	2022-05-16 22:36:51 -07:00
Chris Fallin	9b83635980	Bump version to 0.1.2. (#44 ) This will get #43 into a release to allow us to use the checker on Cranelift outputs.	2022-04-18 13:20:07 -07:00
Chris Fallin	a5c48fda8a	Support program moves, including pinned vregs, in the checker. (#43 ) The checker was built to validate programs produced by the fuzzing testcase generator, which was built before regalloc2 supported special handling of moves. (In a pure-SSA world, move elision is not needed, because moves are not needed, and blockparams are the only way of tying together vregs.) Due to this, the checker works great for our independent regalloc2 fuzzing setup, but when used on regalloc inputs produced by Cranelift, cannot prove correctness. This PR extends the checker's analysis to properly handle "program moves", which are distinct from regalloc-inserted moves in that they are present in the original program and hence are semantically relevant. A program move edits all sets of symbolic vregs at all allocs, and where the source vreg appears, it inserts the dest vreg as well. (It also removes the dest vreg from all other sets, since the old value becomes stale, as is done for other defs.) Given this, and given some additional checking for moves to/from pinned vregs, the checker can now be used to fully validate Cranelift-sourced regalloc2 invocations.	2022-04-18 10:36:26 -07:00
Chris Fallin	f307ed170c	Bump version to 0.1.1. (#41 )	2022-04-13 10:37:43 -07:00
Chris Fallin	4cac1614bf	Add serde support for exposed types. (#40 ) This adds derived `Serialize` and `Deserialize` implementations for exposed types that describe registers, operands, and related program inputs; entity indices; and regalloc output types. This allows serialization of any of the embedder's IR data types that may embed or build upon regalloc2 types. These implementations (and the dependency on the `serde` crate itself) are enabled only when the non-default `enable-serde` feature is specified.	2022-04-13 10:14:00 -07:00
Chris Fallin	94cd6c421c	Bump version to 0.1.0 for release. (#39 )	2022-04-04 16:37:01 -07:00
Chris Fallin	0cb08095e6	Make multiple defs of one vreg possible on one instruction. (#38 ) Currently, if this is done, overlapping liveranges are created, and we hit an assert that ensures our non-overlapping and built-in-reverse-order invariants during liverange construction. One could argue that multiple defs of a single vreg don't make a ton of sense -- which def's value is valid after the instruction? -- but if they all get the same alloc, then the answer is "whatever was put in that alloc", and this is just a case of an instruction being a bit over-eager when listing its registers. This can arise in practice when operand lists come from combinations or concatenations: for example, in Cranelift's s390x backend, there is a "Loop" pseudo-instruction, and the operands of the Loop are the operands of all the sub-instructions. It seems more logically cohesive overall to say that one can state an operand as many times as one likes; so this PR makes it so.	2022-04-04 16:22:05 -07:00
Chris Fallin	a369150213	Make some improvements to clarity of checker implementation. (#37 ) This PR makes two changes, both suggested by @fitzgen in #36: 1. It updates the top-level description of the analysis to more simply and accurately describe the analysis lattice. 2. It modifies both the `CheckerValue` and `CheckerState` types to be enums with separate arms for the top/universe value, and adds helpers as appropriate to update the values. There should be no functional change; this update just makes the meet-functions and updates more clear, and makes a bad state ("top" but with values) unrepresentable. Closes #36.	2022-03-30 11:23:56 -07:00
Chris Fallin	ad41f8a7a5	Record vreg classes explicitly during liverange pass. (#35 ) This resolves an issue seen when the source program uses multiple regclasses (Int and Float): in some cases, the logic that grabs the vregs and retains them (with class) in `vreg_regs` missed a register and we had a class mismatch. This occurred because data structures were initialized assuming `Int` regclass at first. This PR instead removes the `vreg_regs` array, stores the class explicitly as an `Option<RegClass>` in the `VRegData`, and provides a `Env::vreg()` method that reconstitutes a `VReg` given its index and its observed class. We "observe" the class of every vreg seen during the liveness pass (and we assert that every occurrence of the vreg index has the same class). In this way, we still have a single source-of-truth for the vreg class (the mention of the vreg itself) and we explicitly represent the "not observed yet" state (and panic on attempting to use such a vreg) rather than implicitly taking the wrong class.	2022-03-29 14:00:14 -07:00
Chris Fallin	433e8b3776	Early defs reserve a register for whole instruction. (#32 ) The `Operand` abstraction allows a def to be positioned at the "early" point of an instruction, before its effect and alongside its normal uses. This is intended to allow the embedder to express that a def may be written before all uses are read, so it should not conflict with the uses. It's also convenient to use early defs to express temporaries, which should be available throughout a regalloc-level instruction's emitted sequence. In such a case, the register should not be used again after the instruction, so it is dead following the instruction. Strictly speaking, and according to regalloc2 prior to this PR, then the temp will only conflict with the uses at the early-point, and not the defs at the late-point (after the instruction), because it's dead past its point of definition. But for a temp we really want it to register conflicts not just with the normal uses but with the normal defs as well. This PR changes the semantics so that an early def builds a liverange that spans the early- and late-point of an instruction when the vreg is dead flowing down from the instruction, giving the semantics we want for temps.	2022-03-18 10:32:49 -07:00
Chris Fallin	4f1161d9e4	Generalize debug-info support a bit. (#34 ) * Generalize debug-info support a bit. Previously, debug value-label support required each vreg to have a disjoint sequence of instruction ranges, each with one label. Unfortunately, it's entirely possible for multiple values at the program level to map to one vreg at the IR level, leading to multiple labels. This PR generalizes the debug-info generation support to allow for arbitrary (label, range, vreg) tuples, as long as they are sorted by vreg, with no other requirements. The lookup is a little more costly when we generate the debuginfo, but in practice we shouldn't have more than a few debug value labels per vreg, so in practice the constants should be small. * Typo fix from Amanieu Co-authored-by: Amanieu d'Antras <amanieu@gmail.com> Co-authored-by: Amanieu d'Antras <amanieu@gmail.com>	2022-03-18 10:32:27 -07:00
Chris Fallin	00dc692489	Allow for reused inputs when the reused vreg is also used as other (normal) uses. (#33 ) The "reused input" operand constraint allows for an instruction to have a def-operand whose allocation is constrained to reuse the same allocation as one of the uses. This is useful to express constraints needed for some instruction sets, like x86, where at the ISA level, one register serves both as an input and the output. Unfortunately the way that we lower the constraints to liveranges does not work if we have the same vreg used both for the reused input and another input -- it results in impossible-to-solve constraints. For example, the instruction ``` alu_op v42 use, v42 use, v43 def reuse(0) ``` would result in an impossible allocation. This fixes liverange construction to properly handle all uses of the vreg whose operand is reused, rather than just the one reused operand.	2022-03-18 10:14:27 -07:00
Chris Fallin	bf92e7c02f	Allow pinned-vregs to be implicit liveins. (#30 ) Previously, the regalloc required all liveins to be defined by a pseudoinstruction at the start of the function body. The regalloc.rs compatibility shim did this, but it's slightly inconvenient when using the API directly. This change allows pinned vregs to be implicit liveins to the function body instead.	2022-03-18 10:13:56 -07:00
Chris Fallin	b1a512dbf6	Checker analysis: change order of block processing for better efficiency. (#29 ) After going through the checker with @fitzgen, we discussed the dataflow analysis and @fitzgen noted that it would likely be more efficient to, for example, process an inner cycle of blocks in a loop nest and converge before returning to the outer loop. I had written a BFS-style workqueue loop to converge the dataflow analysis without much thought, with a FIFO workqueue. Any workqueue ordering will work and will converge to the same fixpoint (as long as we are operating on a lattice), but indeed some orderings will be more efficient, and a DFS-style (LIFO stack) workqueue will give us this property of converging inner loops first. In measurements, there doesn't seem to be much of a difference for small fuzz testcases, but this will likely matter more if/when we try to run the checker to validate register allocation on large functions.	2022-03-10 10:35:08 -08:00
Chris Fallin	fe021ad6d4	Simplify pinned-vreg API: don't require slice of all pinned vregs. (#28 ) Simplify pinned-vreg API: don't require slice of all pinned vregs. Previously, we kept a bool flag `is_pinned` in the `VRegData`, and we required a `&[VReg]` of all pinned vregs to be provided by `Function::pinned_vregs()`. This was (I think) done for convenience, but it turns out not to really be necessary, as we can just query `is_pinned_vreg` where needed (and in the likely implementation, e.g. in Cranelift, this will be a `< NUM_PINNED_VREGS` check that can be inlined). This adds convenience for the embedder (the main benefit), and also reduces complexity, removes some state, and avoids some work initializing the regalloc state for a run.	2022-03-04 15:12:16 -08:00
Chris Fallin	14442df3fc	Support for debug-labels. (#27 ) Support for debug-labels. If the client adds labels to vregs across ranges of instructions in the input program, the regalloc will provide metadata in the `Output` that describes the `Allocation`s in which each such vreg is stored for those ranges. This allows the client to emit debug metadata telling a debugger where to find program values at each point in the program.	2022-03-03 16:58:33 -08:00
Chris Fallin	d9d97451f8	Merge pull request #26 from cfallin/new-checker Rework checker to not require DefAlloc by tracking all vregs on each alloc.	2022-01-25 16:18:05 -08:00
Chris Fallin	7ce69de5b0	Address review comments.	2022-01-20 19:45:41 -08:00
Chris Fallin	2133606366	Fix doc-tests: escape figure properly	2022-01-20 17:21:32 -08:00
Chris Fallin	ccd6b4fc2c	Remove DefAlloc -- no longer needed.	2022-01-19 23:57:31 -08:00
Chris Fallin	3b037f3c9e	Rework checker to not require DefAlloc by tracking all vregs on each alloc. The symbolic checker currently works by tracking a single symbolic vreg label for each "alloc" (physical register or stack slot). On definition of a vreg into an alloc, the label is updated to the new vreg's name. This worked quite well when the regalloc was simpler, but in the presence of redundant move elimination, it started to become apparent that the analysis has a shortcoming: when multiple vregs have the same value, and the regalloc has deduced this, it can make use of an alloc that is labeled with one vreg but use it as another vreg and the checker will not validate the use. In other words, the regalloc became smart enough to avoid emitting unnecessary moves, but the checker was relying on those moves to know the most up-to-date symbolic name for a value in a physical location. In a sense, a register or stackslot can contain both vreg1 and vreg2, and the regalloc can use it as either. The stopgap measure of emitting more DefAllocs as part of the redundant move elimination never quite sat right with me. It works, but it's asking too much of the regalloc to prove why its moves are correct. We should rely less on the regalloc and on complex built-just-for-the-checker plumbing; we should instead improve the checker so that it can prove on its own that the result is correct. This PR modifies the checker so that its basic abstraction for an alloc's value is a set of virtual register labels, rather than just one. The transfer function is a little more complex, but manageable: a move keeps the old label(s) and adds a new one; redefining a vreg into one alloc needs to remove that vreg label from all other alloc's sets. This completely removes the need for metadata from the regalloc (!); all we need is the original program (pre-alloc, with vregs), the set of allocations, and the set of inserted moves, and we can validate the result. This should mean that we trust our checker-validated allocation results more, and should result in less complexity and maintenance going forward if we improve the allocator further.	2022-01-19 23:50:35 -08:00
Chris Fallin	56a8f844a8	Merge pull request #25 from Amanieu/perf Performance improvements	2022-01-11 14:35:10 -08:00
Amanieu d'Antras	6b1a5e8b1b	Address review feedback	2022-01-11 22:27:15 +00:00
Amanieu d'Antras	be61078e4e	Format Cargo.toml	2022-01-11 13:34:50 +00:00
Amanieu d'Antras	ee4de54240	Guard trace! behind cfg!(debug_assertions) Even if the trace log level is disabled, the presence of the trace! macro still has a significant impact on performance because it is present in the inner loops of the allocator. Removing the trace! calls at compile-time reduces instruction count by ~7%.	2022-01-11 13:30:13 +00:00
Amanieu d'Antras	2d9d5dd82b	Rearrange some struct fields to work better with u64_key/u128_key This allows the compiler to load the whole key with 1 or 2 64-bit accesses, assuming little-endian ordering. Improves instruction count by ~1%.	2022-01-11 13:24:51 +00:00
Amanieu d'Antras	693fb6a975	Only emit DefAlloc edits when the "checker" feature is enabled. This reduces instruction counts by ~2% when disabled.	2022-01-11 13:03:24 +00:00
Amanieu d'Antras	d95a9d9399	Combine sort keys into u64/u128 This allows the compiler to perform branch-less comparisons, which are more efficient. This results in ~5% fewer instructions executed.	2022-01-11 13:03:21 +00:00
Amanieu d'Antras	053375f049	Remove PRegData::reg and use PReg::from_index instead Performance impact is negligible but this is a good cleanup.	2022-01-11 13:02:08 +00:00
Amanieu d'Antras	74928b83fa	Replace all assert! with debug_assert! This results in a ~6% reduction in instruction count.	2022-01-11 03:54:08 +00:00
Chris Fallin	a27f93f01e	Merge pull request #23 from Amanieu/iter Add a helper to iterate over insts and edits of a block in order	2022-01-05 10:02:35 -08:00
Amanieu d'Antras	6f59cd407b	Use block_insts_and_edits in the checker	2021-12-27 22:09:07 +01:00
Amanieu d'Antras	8ab44c383e	Add a helper to iterate over insts and edits of a block in order	2021-12-27 22:08:36 +01:00
Chris Fallin	8752a8c5bd	Merge pull request #17 from Amanieu/fixed_stack Add support for fixed stack slots	2021-12-12 22:15:55 -08:00
Amanieu d'Antras	51493ab03a	Apply review feedback	2021-12-12 00:33:30 +00:00
Amanieu d'Antras	38ffc479c2	Simplify the internal representation of PReg	2021-12-11 22:39:19 +00:00

1 2 3 4 5

235 Commits