b46ad1b54da9910d2dda232b1e8e4c89257ea2ec
2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f980defe17 |
egraph support: rewrite to work in terms of CLIF data structures. (#5382)
* egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have *multiple representations*: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <jsharp@fastly.com> |
||
|
|
2be12a5167 |
egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> |