Commit Graph

33 Commits

Author SHA1 Message Date
Alex Crichton
07518dfd36 Remove the Cranelift vselect instruction (#5918)
* Remove the Cranelift `vselect` instruction

This instruction is documented as selecting lanes based on the "truthy"
value of the condition lane, but the current status of the
implementation of this instruction is:

* x64 - uses the high bit for `f32x4` and `f64x2` and otherwise uses the
  high bit of each byte doing a byte-wise lane select rather than
  whatever the controlling type is.

* AArch64 - this is the same as `bitselect` which is a bit-wise
  selection rather than a lane-wise selection.

* s390x - this is the same as AArch64, a bit-wise selection rather than
  lane-wise.

* interpreter - the interpreter implements the documented semantics of
  selecting based on "truthy" values.

Coupled with the status of the implementation is the fact that this
instruction is not used by WebAssembly SIMD today either. The only use
of this instruction in Cranelift is the nan-canonicalization pass. By
moving nan-canonicalization to `bitselect`, since that has the desired
semantics, there's no longer any need for `vselect`.

Given this situation this commit subsqeuently removes `vselect` and all
usage of it throughout Cranelift.

Closes #5917

* Review comments

* Bring back vselect opts as bitselect opts

* Clean up vselect usage in the interpreter

* Move bitcast in nan canonicalization

* Add a comment about float optimization
2023-03-08 00:42:05 +00:00
Alex Crichton
afde4ea4e3 Fix the original block for block params in egraphs (#5960)
This fixes an issue where block params were always listed as being
members of the current block in egraphs, even when the block param was
actually defined in a separate block. This then enables instructions
which depend on these parameters to get hoisted up out of inner loops at
least to the block that defined the argument.

Closes #5957
2023-03-07 23:58:03 +00:00
Alex Crichton
18ee645ebe Allow hoisting vconst instructions out of loops (#5909)
* Allow hoisting `vconst` instructions out of loops

Staring at some SIMD code and what LLVM and v8 both generate it appears
that a common technique for SIMD-loops is to hoist constants outside of
loops since they're nontrivial to rematerialize unlike integer
constants. This commit updates the `loop_hoist_level` calculation with
egraphs to have a nonzero default for instructions that have no
arguments (e.g. consts) which enables hoisting these instructions out of
loops.

Note, though, that for now I've listed the maximum as hoisting outside
of one loop, but not all of them. While theoretically vconsts could move
up to the top of the function I'd be worried about their impact on
register pressure and having to save/restore around calls or similar, so
hopefully if the hot part of a program is a single loop then hoisting
out of one loop is a reasonable-enough heuristic for now.

Locally on x64 with a benchmark that just encodes binary to hex this saw
a 15% performance improvement taking hex encoding from ~6G/s to ~6.7G/s.

* Test vconst is only hoisted one loop out
2023-03-06 15:29:43 +00:00
Alex Crichton
3ff3994a12 Add egraph optimization for fneg's cancelling out (#5910)
This implements comments from #5895 to cancel out `fneg` operations in
`fma` instructions. Additional support for `fmul` is added as well.
2023-03-02 18:28:32 +00:00
Chris Fallin
7b8854f803 egraphs: fix handling of effectful-but-idempotent ops and GVN. (#5800)
* Revert "egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)"

This reverts commit c7e2571866.

* egraphs: fix handling of effectful-but-idempotent ops and GVN.

This PR addresses #5796: currently, ops that are effectful, i.e., remain
in the side-effecting skeleton (which we keep in the `Layout` while the
egraph exists), but are idempotent and thus mergeable by a GVN pass, are
not handled properly.

GVN is still possible on effectful but idempotent ops precisely because
our GVN does not create partial redundancies: it removes an instruction
only when it is dominated by an identical instruction. An isntruction
will not be "hoisted" to a point where it could execute in the optimized
code but not in the original.

However, there are really two parts to the egraph implementation that
produce this effect: the deduplication on insertion into the egraph, and
the elaboration with a scoped hashmap. The deduplication lets us give a
single name (value ID) to all copies of an identical instruction, and
then elaboration will re-create duplicates if GVN should not hoist or
merge some of them.

Because deduplication need not worry about dominance or scopes, we use a
simple (non-scoped) hashmap to dedup/intern ops as "egraph nodes".

When we added support for GVN'ing effectful but idempotent ops (#5594),
we kept the use of this simple dedup'ing hashmap, but these ops do not
get elaborated; instead they stay in the side-effecting skeleton. Thus,
we inadvertently created potential for weird code-motion effects.

The proposal in #5796 would solve this in a clean way by treating these
ops as pure again, and keeping them out of the skeleton, instead putting
"force" pseudo-ops in the skeleton. However, this is a little more
complex than I would like, and I've realized that @jameysharp's earlier
suggestion is much simpler: we can keep an actual scoped hashmap
separately just for the effectful-but-idempotent ops, and use it to GVN
while we build the egraph. In effect, we're fusing a separate GVN pass
with the egraph pass (but letting it interact corecursively with
egraph rewrites. This is in principle similar to how we keep a separate
map for loads and fuse this pass with the egraph rewrite pass as well.

Note that we can use a `ScopedHashMap` here without the "context" (as
needed by `CtxHashMap`) because, as noted by @jameysharp, in practice
the ops we want to GVN have all their args inline. Equality on the
`InstructinoData` itself is conservative: two insts whose struct
contents compare shallowly equal are definitely identical, but identical
insts in a deep-equality sense may not compare shallowly equal, due to
list indirection. This is fine for GVN, because it is still sound to
skip any given GVN opportunity (and keep the original instructions).

Fixes #5796.

* Add comments from review.
2023-03-02 02:10:42 +00:00
Jamey Sharp
6cf7155052 Cranelift: Generalize (x << k) >> k optimization (#5746)
* Generalize unsigned `(x << k) >> k` optimization

Split the existing rule into three parts:
- A dual of the rule for `(x >> k) << k` that is only valid for unsigned
  shifts.
- Known-bits analysis for `(band (uextend x) k)`.
- A new rule for converting `sextend` to `uextend` if the sign-extended
  bits are masked out anyway.

The first two together cover the existing rule.

* Generalize signed `(x << k) >> k` optimization

* Review comments

* Generalize sign-extending shifts further

The shifts can be eliminated even if the shift amount isn't exactly
equal to the difference in bit-widths between the narrow and wide types.

* Add filetests
2023-02-27 17:34:46 +00:00
Trevor Elliott
cc073593a4 Fix block label printing in precise-output tests (#5798)
As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function.

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-16 02:35:26 +00:00
Trevor Elliott
f04decc4a1 Use capstone to validate precise-output tests (#5780)
Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.
2023-02-15 16:35:10 -08:00
Afonso Bordado
76539ef9f2 cranelift: Optimize select+icmp into {s,u}{min,max} (#5546)
* cranelift: Optimize `select+icmp` into `{s,u}{min,max}`

* cranelift: Add generic egraph icmp reverse rule

* cranelift: Optimize `vselect+icmp` into `{s,u}{min,max}`

* cranelift: Optimize some `vselect+fcmp` into `f{min,max}_pseudo`

* cranelift: Add inverted forms of min/max rules
2023-02-15 15:06:21 -08:00
Nick Fitzgerald
6df3bbbe60 Cranelift: Collapse double extends into a single extend (#5772) 2023-02-13 22:43:17 +00:00
Trevor Elliott
15fe9c7c93 Inline jump tables in parsed br_table instructions (#5755)
As jump tables are used by at most one br_table instruction, inline their definition in those instructions instead of requiring them to be declared as function-level metadata.
2023-02-09 14:24:04 -08:00
Alex Crichton
72962c9f08 Add some minor souper-harvested optimizations (#5735)
I was playing around with souper recently on some wasms I had lying
around and these are some optimization opportunities that popped out
which seemed easy-enough to add to the egraph-based optimizations.
2023-02-07 14:06:24 -06:00
Chris Fallin
75ae976adc egraphs: fix accidental remat of call. (#5726)
In the provided test case in #5716, the result of a call was then
added to 0. We have a rewrite rule that sets the remat-bit on any add
of a value and a constant, because these frequently appear (e.g. from
address offset calculations) and this can frequently reduce register
pressure (one long-lived base vs. many long-lived base+offset values).
Separately, we have an algebraic rule that `x+0` rewrites to `x`.

The result of this was that we had an eclass with the remat bit set on
the add, but the add was also union'd into the call. We pick the
latter during extraction, because it's cheaper not to do the add at
all; but we still get the remat bit, and try to remat a call (!),
which blows up later.

This PR fixes the logic to look up the "best value" for a value (i.e.,
whatever extraction determined), and look up the remat bit on *that*
node, not the canonical node.

(Why did the canonical node become the iadd and not the call? Because
the former had a lower value-number, as an accident of IR
construction; we don't impose any requirements on the input CLIF's
value-number ordering, and I don't think this breaks any of the
important acyclic properties, even though there is technically a
dependence from a lower-numbered to a higher-numbered node. In essence
one can think of them as having "virtual numbers" in any true
topologically-sorted order, and the only place the actual integer
indices matter should be in choosing the "canonical ID", which is just
used for dedup'ing, modulo this bug.)

Fixes #5716.
2023-02-06 23:36:16 +00:00
Alex Crichton
de0e0bea3f Legalize b{and,or,xor}_not into component instructions (#5709)
* Remove trailing whitespace in `lower.isle` files

* Legalize the `band_not` instruction into simpler form

This commit legalizes the `band_not` instruction into `band`-of-`bnot`,
or two instructions. This is intended to assist with egraph-based
optimizations where the `band_not` instruction doesn't have to be
specifically included in other bit-operation-patterns.

Lowerings of the `band_not` instruction have been moved to a
specialization of the `band` instruction.

* Legalize `bor_not` into components

Same as prior commit, but for the `bor_not` instruction.

* Legalize bxor_not into bxor-of-bnot

Same as prior commits. I think this also ended up fixing a bug in the
s390x backend where `bxor_not x y` was actually translated as `bnot
(bxor x y)` by accident given the test update changes.

* Simplify not-fused operands for riscv64

Looks like some delegated-to rules have special-cases for "if this
feature is enabled use the fused instruction" so move the clause for
testing the feature up to the lowering phase to help trigger other rules
if the feature isn't enabled. This should make the riscv64 backend more
consistent with how other backends are implemented.

* Remove B{and,or,xor}Not from cost of egraph metrics

These shouldn't ever reach egraphs now that they're legalized away.

* Add an egraph optimization for `x^-1 => ~x`

This adds a simplification node to translate xor-against-minus-1 to a
`bnot` instruction. This helps trigger various other optimizations in
the egraph implementation and also various backend lowering rules for
instructions. This is chiefly useful as wasm doesn't have a `bnot`
equivalent, so it's encoded as `x^-1`.

* Add a wasm test for end-to-end bitwise lowerings

Test that end-to-end various optimizations are being applied for input
wasm modules.

* Specifically don't self-update rustup on CI

I forget why this was here originally, but this is failing on Windows
CI. In general there's no need to update rustup, so leave it as-is.

* Cleanup some aarch64 lowering rules

Previously a 32/64 split was necessary due to the `ALUOp` being different
but that's been refactored away no so there's no longer any need for
duplicate rules.

* Narrow a x64 lowering rule

This previously made more sense when it was `band_not` and rarely used,
but be more specific in the type-filter on this rule that it's only
applicable to SIMD types with lanes.

* Simplify xor-against-minus-1 rule

No need to have the commutative version since constants are already
shuffled right for egraphs

* Optimize band-of-bnot when bnot is on the left

Use some more rules in the egraph algebraic optimizations to
canonicalize band/bor/bxor with a `bnot` operand to put the operand on
the right. That way the lowerings in the backends only have to list the
rule once, with the operand on the right, to optimize both styles of
input.

* Add commutative lowering rules

* Update cranelift/codegen/src/isa/x64/lower.isle

Co-authored-by: Jamey Sharp <jamey@minilop.net>

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-06 13:53:40 -06:00
Nick Fitzgerald
72c8513411 Cranelift: Correctly wrap shifts in constant propagation (#5695)
Fixes #5690
Fixes #5696

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-03 00:12:57 +00:00
Jamey Sharp
ac4d28f4dd Constant-fold icmp instructions (#5666)
We found examples of icmp instructions with both operands constant in
spidermonkey.wasm.
2023-02-01 21:55:36 +00:00
Nick Fitzgerald
ffbbfbffce Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) again (#5684)
This rewrite was introduced in #5676 and then reverted in #5682 due to a footgun
where we accidentally weren't actually checking the `y == !z` precondition. This
commit fixes the precondition check. It also fixes the arithmetic to be
correctly masked to the value type's width.

This reverts commit 268f6bfc1d.
2023-02-01 20:53:22 +00:00
Trevor Elliott
268f6bfc1d Revert "Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) (#5676)" (#5682)
This reverts commit 8c9eb9939b.

Fixes #5680
2023-02-01 02:53:23 +00:00
Nick Fitzgerald
8c9eb9939b Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) (#5676)
Co-authored-by: Rainy Sinclair <844493+itsrainy@users.noreply.github.com>
2023-01-31 22:44:45 +00:00
Nick Fitzgerald
253e28ca4f Cranelift: Rewrite (x>>k)<<k into masking off the bottom k bits (#5673)
* Cranelift: Rewrite `(x>>k)<<k` into masking off the bottom `k` bits

* Add a runtest for exercising our rewrite of `(x >> k) << k` into masking
2023-01-31 21:11:12 +00:00
Nick Fitzgerald
7aa240e0f2 Cranelift: constant propagate shifts (#5671)
Thanks to Souper for pointing out we weren't doing this!
2023-01-31 12:06:53 -08:00
Nick Fitzgerald
c9d1c068bc Cranelift: Add egraph rule to rewrite x * C ==> x << log2(C) when C is a power of two (#5647) 2023-01-31 18:04:17 +00:00
Trevor Elliott
a5698cedf8 cranelift: Remove brz and brnz (#5630)
Remove the brz and brnz instructions, as their behavior is now redundant with brif.
2023-01-30 20:34:56 +00:00
Chris Fallin
1faff8c2ce Enable egraph-based optimization by default. (#5587)
This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.

Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on.

This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!).

Fixes #5181.
2023-01-19 15:46:53 -08:00
Chris Fallin
8383e4b6bd egraph opt rules: do (icmp cc x x) == {0,1} only for integer types. (#5438)
We could do these for vectors too, in theory, but for now let's fix the
bug by applying the equivalence only for integer types.

Fixes #5437.
2022-12-14 19:50:42 +00:00
Chris Fallin
9397ea1abe Cranelift: implement general select_spectre_guard fallbacks. (#5420)
When adding some optimization rules for `icmp` in the egraph
infrastructure, we ended up creating a path to legal CLIF but with
patterns unsupported by three of our four backends: specifically,
`select_spectre_guard` with a general truthy input, rather than an
`icmp`.

In #5206 we discussed replacing `select_spectre_guard` with something
more specific, and that could still be a long-term solution here, but
doing so now would interfere with ongoing refactoring of heap access
lowering, so I've opted not to do so. (In that issue I was concerned
about complexity and didn't see the need but with this fuzzbug I'm
starting to feel a bit differently; maybe we should remove this
non-orthogonal op in the long run.)

Fixes #5417.
2022-12-12 17:13:34 -08:00
Chris Fallin
244dce93f6 Fix optimization rules for narrow types: wrap i8 results to 8 bits. (#5409)
* Fix optimization rules for narrow types: wrap i8 results to 8 bits.

This fixes #5405.

In the egraph mid-end's optimization rules, we were rewriting e.g. imuls
of two iconsts to an iconst of the result, but without masking off the
high bits (beyond the result type's width). This was producing iconsts
with set high bits beyond their types' width, which is not legal.

In addition, this PR adds some optimizations to the algebraic rules to
recognize e.g. `x == x` (and all other integer comparison operators) and
resolve to 1 or 0 as appropriate.

* Review feedback.

* Review feedback, again.
2022-12-09 22:29:25 +00:00
Chris Fallin
f980defe17 egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures.

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one.  This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have *multiple representations*: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

- One, a forward pass in RPO (to see defs before uses), that (i) removes
  "pure" instructions from the layout and (ii) optimizes as it goes. As
  before, we eagerly optimize, so we form the entire union of optimized
  forms of a value before we see any uses of that value. This lets us
  rewrite uses to use the most "up-to-date" form of the value and
  canonicalize and optimize that form.

  The eager rewriting and acyclic representation make each other work
  (we could not eagerly rewrite if there were cycles; and acyclicity
  does not miss optimization opportunities only because the first time
  we introduce a value, we immediately produce its "best" form). This
  design choice is also what allows us to avoid the "parent pointers"
  and fixpoint loop of traditional egraphs.

  This forward optimization pass keeps a scoped hashmap to "intern"
  nodes (thus performing GVN), and also interleaves on a per-instruction
  level with alias analysis. The interleaving with alias analysis allows
  alias analysis to see the most optimized form of each address (so it
  can see equivalences), and allows the next value to see any
  equivalences (reuses of loads or stored values) that alias analysis
  uncovers.

- Two, a forward pass in domtree preorder, that "elaborates" pure enodes
  back into the layout, possibly in multiple places if needed. This
  tracks the loop nest and hoists nodes as needed, performing LICM as it
  goes. Note that by doing this in forward order, we avoid the
  "fixpoint" that traditional LICM needs: we hoist a def before its
  uses, so when we place a node, we place it in the right place the
  first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On `spidermonkey.wasm` running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup).

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

* Review feedback.

* Review feedback.

* Review feedback.

* Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2022-12-06 14:58:57 -08:00
Chris Fallin
e62e530b7c egraphs: fix fill-in-the-types logic for multiple projections of one value. (#5112)
In particular, this was found to happen in #5099 because a `Result`
projection node was not deduplicating across two separate `isplit`s that
created it. (This is a separate issue we should also fix; `needs_dedup`
is I think overly conservative because `Result` can project out a single
value from a pure or impure node, but the projection itself should be
treated like any other pure operator.)

In any case, if we have a value `v0` and two separate `Result { value:
v0, result: N, ty }` nodes, each of these will fill in the type `ty` for
the `N`th output of `v0`, and the second will idempotently overwrite the
first; we should loosen the assert so that it allows this case.

Fixes #5099. Fixes #5100.
2022-10-25 05:22:28 +00:00
Afonso Bordado
c8791073d6 cranelift: Remove iconst.i128 (#5075)
* cranelift: Remove iconst.i128

* bugpoint: Report Changed when only one instruction is mutated

* cranelift: Fix egraph bxor rule

* cranelift: Remove some simple_preopt opts for i128
2022-10-24 12:43:28 -07:00
Chris Fallin
86e77953f8 Fix some egraph-related issues. (#5088)
This fixes #5086 by addressing two separate issues:

- The `ValueDataPacked::set_type()` helper had an embarrassing bitfield-manipulation bug that would mangle the rest of a `ValueDef` when setting its type. This is not normally used, only when the egraph elaboration fills in types after-the-fact on a multi-value node.
- The lowering rules for `isplit` on aarch64 and s390x were dispatching on the first output type, rather than the input type. When only the second output is used (as in the example in #5086), the first output type actually remains `INVALID` (and this is fine because it's never used).
2022-10-21 10:24:48 -07:00
Chris Fallin
1aaea279e5 egraph opts: fix uextend-of-i32. (#5061)
This is a simple error in the const-prop rules: uextend was not
masking iconst's u64 immediate when extending from i32 to
i64. Arguably an iconst.i32 should not have nonzero bits in the upper
32 of its immediate, but that's a separate design question. For now,
if our invariant is that the upper bits are ignored, then it is
required to mask the bits when const-evaling a `uextend`.

Fixes #5047.
2022-10-17 12:45:49 -07:00
Chris Fallin
2be12a5167 egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-10-11 18:15:53 -07:00