egraph-based midend: draw the rest of the owl (productionized). (#4953)

* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This commit is contained in:
Chris Fallin
2022-10-11 18:15:53 -07:00
committed by GitHub
parent e2f1ced0b6
commit 2be12a5167
59 changed files with 5125 additions and 1580 deletions

View File

@@ -0,0 +1,207 @@
;; Algebraic optimizations.
;; Rules here are allowed to rewrite pure expressions arbitrarily,
;; using the same inputs as the original, or fewer. In other words, we
;; cannot pull a new eclass id out of thin air and refer to it, other
;; than a piece of the input or a new node that we construct; but we
;; can freely rewrite e.g. `x+y-y` to `x`.
;; uextend/sextend of a constant.
(rule (simplify (uextend $I64 (iconst $I32 imm)))
(iconst $I64 imm))
(rule (simplify (sextend $I64 (iconst $I32 (u64_from_imm64 imm))))
(iconst $I64 (imm64 (u64_sextend_u32 imm))))
;; x+0 == 0+x == x.
(rule (simplify (iadd ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
(rule (simplify (iadd ty
(iconst ty (u64_from_imm64 0))
x))
(subsume x))
;; x-0 == x.
(rule (simplify (isub ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
;; 0-x == (ineg x).
(rule (simplify (isub ty
(iconst ty (u64_from_imm64 0))
x))
(ineg ty x))
;; x*1 == 1*x == x.
(rule (simplify (imul ty
x
(iconst ty (u64_from_imm64 1))))
(subsume x))
(rule (simplify (imul ty
(iconst ty (u64_from_imm64 1))
x))
(subsume x))
;; x*0 == 0*x == x.
(rule (simplify (imul ty
x
(iconst ty (u64_from_imm64 0))))
(iconst ty (imm64 0)))
(rule (simplify (imul ty
(iconst ty (u64_from_imm64 0))
x))
(iconst ty (imm64 0)))
;; x/1 == x.
(rule (simplify (sdiv ty
x
(iconst ty (u64_from_imm64 1))))
(subsume x))
(rule (simplify (udiv ty
x
(iconst ty (u64_from_imm64 1))))
(subsume x))
;; x>>0 == x<<0 == x rotr 0 == x rotl 0 == x.
(rule (simplify (ishl ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
(rule (simplify (ushr ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
(rule (simplify (sshr ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
(rule (simplify (rotr ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
(rule (simplify (rotl ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
;; x | 0 == 0 | x == x | x == x.
(rule (simplify (bor ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
(rule (simplify (bor ty
(iconst ty (u64_from_imm64 0))
x))
(subsume x))
(rule (simplify (bor ty x x))
(subsume x))
;; x ^ 0 == 0 ^ x == x.
(rule (simplify (bxor ty
x
(iconst ty (u64_from_imm64 0))))
(subsume x))
(rule (simplify (bxor ty
(iconst ty (u64_from_imm64 0))
x))
(subsume x))
;; x ^ x == 0.
(rule (simplify (bxor ty x x))
(subsume (iconst ty (imm64 0))))
;; x ^ not(x) == not(x) ^ x == -1.
(rule (simplify (bxor $I32 x (bnot $I32 x))) (subsume (iconst $I32 (imm64 0xffff_ffff))))
(rule (simplify (bxor $I32 (bnot $I32 x) x)) (subsume (iconst $I32 (imm64 0xffff_ffff))))
(rule (simplify (bxor $I64 x (bnot $I64 x))) (subsume (iconst $I64 (imm64 0xffff_ffff_ffff_ffff))))
(rule (simplify (bxor $I64 (bnot $I64 x) x)) (subsume (iconst $I64 (imm64 0xffff_ffff_ffff_ffff))))
;; x & -1 == -1 & x == x & x == x.
(rule (simplify (band ty x x)) x)
(rule (simplify (band $I32 x (iconst $I32 (u64_from_imm64 0xffff_ffff)))) (subsume x))
(rule (simplify (band $I32 (iconst $I32 (u64_from_imm64 0xffff_ffff)) x)) (subsume x))
(rule (simplify (band $I64 x (iconst $I64 (u64_from_imm64 0xffff_ffff_ffff_ffff)))) (subsume x))
(rule (simplify (band $I64 (iconst $I64 (u64_from_imm64 0xffff_ffff_ffff_ffff)) x)) (subsume x))
;; x & 0 == 0 & x == 0.
(rule (simplify (band ty x (iconst ty (u64_from_imm64 0)))) (iconst ty (imm64 0)))
(rule (simplify (band ty (iconst ty (u64_from_imm64 0)) x)) (iconst ty (imm64 0)))
;; not(not(x)) == x.
(rule (simplify (bnot ty (bnot ty x))) (subsume x))
;; DeMorgan's rule (two versions):
;; bnot(bor(x, y)) == band(bnot(x), bnot(y))
(rule (simplify (bnot ty (bor ty x y)))
(band ty (bnot ty x) (bnot ty y)))
;; bnot(band(x, y)) == bor(bnot(x), bnot(y))
(rule (simplify (bnot ty (band t x y)))
(bor ty (bnot ty x) (bnot ty y)))
;; x*2 == 2*x == x+x.
(rule (simplify (imul ty x (iconst _ (simm32 2))))
(iadd ty x x))
(rule (simplify (imul ty (iconst _ (simm32 2)) x))
(iadd ty x x))
;; x<<32>>32: uextend/sextend 32->64.
(rule (simplify (ushr $I64 (ishl $I64 (uextend $I64 x @ (eclass_type $I32)) (iconst _ (simm32 32))) (iconst _ (simm32 32))))
(uextend $I64 x))
(rule (simplify (sshr $I64 (ishl $I64 (uextend $I64 x @ (eclass_type $I32)) (iconst _ (simm32 32))) (iconst _ (simm32 32))))
(sextend $I64 x))
;; TODO: strength reduction: mul/div to shifts
;; TODO: div/rem by constants -> magic multiplications
;; Reassociate when it benefits LICM.
(rule (simplify (iadd ty (iadd ty x y) z))
(if-let (at_loop_level lx _) x)
(if-let (at_loop_level ly _) y)
(if-let (at_loop_level lz _) z)
(if (u8_lt lx ly))
(if (u8_lt lz ly))
(iadd ty (iadd ty x z) y))
(rule (simplify (iadd ty (iadd ty x y) z))
(if-let (at_loop_level lx _) x)
(if-let (at_loop_level ly _) y)
(if-let (at_loop_level lz _) z)
(if (u8_lt ly lx))
(if (u8_lt lz lx))
(iadd ty (iadd ty y z) x))
;; Select's selector input doesn't need bint; remove the redundant op.
(rule (simplify (select ty (bint _ b) x y))
(subsume (select ty b x y)))
;; Rematerialize ALU-op-with-imm and iconsts in each block where they're
;; used. This is neutral (add-with-imm) or positive (iconst) for
;; register pressure, and these ops are very cheap.
(rule (simplify x @ (iadd _ (iconst _ _) _))
(remat x))
(rule (simplify x @ (iadd _ _ (iconst _ _)))
(remat x))
(rule (simplify x @ (isub _ (iconst _ _) _))
(remat x))
(rule (simplify x @ (isub _ _ (iconst _ _)))
(remat x))
(rule (simplify x @ (band _ (iconst _ _) _))
(remat x))
(rule (simplify x @ (band _ _ (iconst _ _)))
(remat x))
(rule (simplify x @ (bor _ (iconst _ _) _))
(remat x))
(rule (simplify x @ (bor _ _ (iconst _ _)))
(remat x))
(rule (simplify x @ (bxor _ (iconst _ _) _))
(remat x))
(rule (simplify x @ (bxor _ _ (iconst _ _)))
(remat x))
(rule (simplify x @ (bnot _ _))
(remat x))
(rule (simplify x @ (iconst _ _))
(remat x))
(rule (simplify x @ (f32const _ _))
(remat x))
(rule (simplify x @ (f64const _ _))
(remat x))

View File

@@ -0,0 +1,134 @@
;; Constant propagation.
(rule (simplify
(iadd (fits_in_64 ty)
(iconst ty (u64_from_imm64 k1))
(iconst ty (u64_from_imm64 k2))))
(subsume (iconst ty (imm64 (u64_add k1 k2)))))
(rule (simplify
(isub (fits_in_64 ty)
(iconst ty (u64_from_imm64 k1))
(iconst ty (u64_from_imm64 k2))))
(subsume (iconst ty (imm64 (u64_sub k1 k2)))))
(rule (simplify
(imul (fits_in_64 ty)
(iconst ty (u64_from_imm64 k1))
(iconst ty (u64_from_imm64 k2))))
(subsume (iconst ty (imm64 (u64_mul k1 k2)))))
(rule (simplify
(sdiv (fits_in_64 ty)
(iconst ty (u64_from_imm64 k1))
(iconst ty (u64_from_imm64 k2))))
(if-let d (u64_sdiv k1 k2))
(subsume (iconst ty (imm64 d))))
(rule (simplify
(udiv (fits_in_64 ty)
(iconst ty (u64_from_imm64 k1))
(iconst ty (u64_from_imm64 k2))))
(if-let d (u64_udiv k1 k2))
(subsume (iconst ty (imm64 d))))
(rule (simplify
(bor (fits_in_64 ty)
(iconst ty (u64_from_imm64 k1))
(iconst ty (u64_from_imm64 k2))))
(subsume (iconst ty (imm64 (u64_or k1 k2)))))
(rule (simplify
(band (fits_in_64 ty)
(iconst ty (u64_from_imm64 k1))
(iconst ty (u64_from_imm64 k2))))
(subsume (iconst ty (imm64 (u64_and k1 k2)))))
(rule (simplify
(bxor (fits_in_64 ty)
(iconst ty (u64_from_imm64 k1))
(iconst ty (u64_from_imm64 k2))))
(subsume (iconst ty (imm64 (u64_xor k1 k2)))))
(rule (simplify
(bnot (fits_in_64 ty)
(iconst ty (u64_from_imm64 k))))
(subsume (iconst ty (imm64 (u64_not k)))))
;; Canonicalize via commutativity: push immediates to the right.
;;
;; (op k x) --> (op x k)
(rule (simplify
(iadd ty k @ (iconst ty _) x))
(iadd ty x k))
;; sub is not commutative, but we can flip the args and negate the
;; whole thing.
(rule (simplify
(isub ty k @ (iconst ty _) x))
(ineg ty (isub ty x k)))
(rule (simplify
(imul ty k @ (iconst ty _) x))
(imul ty x k))
(rule (simplify
(bor ty k @ (iconst ty _) x))
(bor ty x k))
(rule (simplify
(band ty k @ (iconst ty _) x))
(band ty x k))
(rule (simplify
(bxor ty k @ (iconst ty _) x))
(bxor ty x k))
;; Canonicalize via associativity: reassociate to a right-heavy tree
;; for constants.
;;
;; (op (op x k) k) --> (op x (op k k))
(rule (simplify
(iadd ty (iadd ty x k1 @ (iconst ty _)) k2 @ (iconst ty _)))
(iadd ty x (iadd ty k1 k2)))
;; sub is not directly associative, but we can flip a sub to an add to
;; make it work:
;; - (sub (sub x k1) k2) -> (sub x (add k1 k2))
;; - (sub (sub k1 x) k2) -> (sub (sub k1 k2) x)
;; - (sub (add x k1) k2) -> (sub x (sub k2 k1))
;; - (add (sub x k1) k2) -> (add x (sub k2 k1))
;; - (add (sub k1 x) k2) -> (sub (add k1 k2) x)
(rule (simplify (isub ty
(isub ty x (iconst ty (u64_from_imm64 k1)))
(iconst ty (u64_from_imm64 k2))))
(isub ty x (iconst ty (imm64 (u64_add k1 k2)))))
(rule (simplify (isub ty
(isub ty (iconst ty (u64_from_imm64 k1)) x)
(iconst ty (u64_from_imm64 k2))))
(isub ty (iconst ty (imm64 (u64_sub k1 k2))) x))
(rule (simplify (isub ty
(iadd ty x (iconst ty (u64_from_imm64 k1)))
(iconst ty (u64_from_imm64 k2))))
(isub ty x (iconst ty (imm64 (u64_sub k1 k2)))))
(rule (simplify (iadd ty
(isub ty x (iconst ty (u64_from_imm64 k1)))
(iconst ty (u64_from_imm64 k2))))
(iadd ty x (iconst ty (imm64 (u64_sub k2 k1)))))
(rule (simplify (iadd ty
(isub ty (iconst ty (u64_from_imm64 k1)) x)
(iconst ty (u64_from_imm64 k2))))
(isub ty (iconst ty (imm64 (u64_add k1 k2))) x))
(rule (simplify
(imul ty (imul ty x k1 @ (iconst ty _)) k2 @ (iconst ty _)))
(imul ty x (imul ty k1 k2)))
(rule (simplify
(bor ty (bor ty x k1 @ (iconst ty _)) k2 @ (iconst ty _)))
(bor ty x (bor ty k1 k2)))
(rule (simplify
(band ty (band ty x k1 @ (iconst ty _)) k2 @ (iconst ty _)))
(band ty x (band ty k1 k2)))
(rule (simplify
(bxor ty (bxor ty x k1 @ (iconst ty _)) k2 @ (iconst ty _)))
(bxor ty x (bxor ty k1 k2)))
;; TODO: fadd, fsub, fmul, fdiv, fneg, fabs

View File

@@ -0,0 +1,11 @@
//! Wrapper environment for generated code from optimization rules in ISLE.
// See https://github.com/rust-lang/rust/issues/47995: we cannot use `#![...]` attributes inside of
// the generated ISLE source below because we include!() it. We must include!() it because its path
// depends on an environment variable; and also because of this, we can't do the `#[path = "..."]
// mod generated_code;` trick either.
#![allow(dead_code, unreachable_code, unreachable_patterns)]
#![allow(unused_imports, unused_variables, non_snake_case, unused_mut)]
#![allow(irrefutable_let_patterns, non_camel_case_types)]
include!(concat!(env!("ISLE_DIR"), "/isle_opt.rs"));