egraph support: rewrite to work in terms of CLIF data structures. (#5382)

* egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have *multiple representations*: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2022-12-06 14:58:57 -08:00
parent 08d44e3746
commit f980defe17
42 changed files with 1890 additions and 3884 deletions
--- a/cranelift/codegen/src/ir/dfg.rs
+++ b/cranelift/codegen/src/ir/dfg.rs
@@ -125,23 +125,6 @@ impl DataFlowGraph {
        self.immediates.clear();
    }

-    /// Clear all instructions, but keep blocks and other metadata
-    /// (signatures, constants, immediates). Everything to do with
-    /// `Value`s is cleared, including block params and debug info.
-    ///
-    /// Used during egraph-based optimization to clear out the pre-opt
-    /// body so that we can regenerate it from the egraph.
-    pub(crate) fn clear_insts(&mut self) {
-        self.insts.clear();
-        self.results.clear();
-        self.value_lists.clear();
-        self.values.clear();
-        self.values_labels = None;
-        for block in self.blocks.values_mut() {
-            block.params = ValueList::new();
-        }
-    }
-
    /// Get the total number of instructions created in this function, whether they are currently
    /// inserted in the layout or not.
    ///
@@ -173,6 +156,11 @@ impl DataFlowGraph {
        self.values.len()
    }

+    /// Get an iterator over all values and their definitions.
+    pub fn values_and_defs(&self) -> impl Iterator<Item = (Value, ValueDef)> + '_ {
+        self.values().map(|value| (value, self.value_def(value)))
+    }
+
    /// Starts collection of debug information.
    pub fn collect_debug_info(&mut self) {
        if self.values_labels.is_none() {
@@ -279,12 +267,6 @@ impl DataFlowGraph {
        self.values[v].ty()
    }

-    /// Fill in the type of a value, only if currently invalid (as a placeholder).
-    pub(crate) fn fill_in_value_type(&mut self, v: Value, ty: Type) {
-        debug_assert!(self.values[v].ty().is_invalid() || self.values[v].ty() == ty);
-        self.values[v].set_type(ty);
-    }
-
    /// Get the definition of a value.
    ///
    /// This is either the instruction that defined it or the Block that has the value as an
@@ -298,6 +280,7 @@ impl DataFlowGraph {
                // detect alias loops without overrunning the stack.
                self.value_def(self.resolve_aliases(original))
            }
+            ValueData::Union { x, y, .. } => ValueDef::Union(x, y),
        }
    }

@@ -313,6 +296,7 @@ impl DataFlowGraph {
            Inst { inst, num, .. } => Some(&v) == self.inst_results(inst).get(num as usize),
            Param { block, num, .. } => Some(&v) == self.block_params(block).get(num as usize),
            Alias { .. } => false,
+            Union { .. } => false,
        }
    }

@@ -422,6 +406,8 @@ pub enum ValueDef {
    Result(Inst, usize),
    /// Value is the n'th parameter to a block.
    Param(Block, usize),
+    /// Value is a union of two other values.
+    Union(Value, Value),
 }

 impl ValueDef {
@@ -458,6 +444,7 @@ impl ValueDef {
    pub fn num(self) -> usize {
        match self {
            Self::Result(_, n) | Self::Param(_, n) => n,
+            Self::Union(_, _) => 0,
        }
    }
 }
@@ -476,6 +463,11 @@ enum ValueData {
    /// An alias value can't be linked as an instruction result or block parameter. It is used as a
    /// placeholder when the original instruction or block has been rewritten or modified.
    Alias { ty: Type, original: Value },
+
+    /// Union is a "fork" in representation: the value can be
+    /// represented as either of the values named here. This is used
+    /// for aegraph (acyclic egraph) representation in the DFG.
+    Union { ty: Type, x: Value, y: Value },
 }

 /// Bit-packed version of ValueData, for efficiency.
@@ -483,40 +475,71 @@ enum ValueData {
 /// Layout:
 ///
 /// ```plain
-///        | tag:2 |  type:14        |    num:16       | index:32          |
+///        | tag:2 |  type:14        |    x:24       | y:24          |
+///
+/// Inst       00     ty               inst output     inst index
+/// Param      01     ty               blockparam num  block index
+/// Alias      10     ty               0               value index
+/// Union      11     ty               first value     second value
 /// ```
 #[derive(Clone, Copy, Debug, PartialEq, Hash)]
 #[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
 struct ValueDataPacked(u64);

+/// Encodes a value in 0..2^32 into 0..2^n, where n is less than 32
+/// (and is implied by `mask`), by translating 2^32-1 (0xffffffff)
+/// into 2^n-1 and panic'ing on 2^n..2^32-1.
+fn encode_narrow_field(x: u32, bits: u8) -> u32 {
+    if x == 0xffff_ffff {
+        (1 << bits) - 1
+    } else {
+        debug_assert!(x < (1 << bits));
+        x
+    }
+}
+
+/// The inverse of the above `encode_narrow_field`: unpacks 2^n-1 into
+/// 2^32-1.
+fn decode_narrow_field(x: u32, bits: u8) -> u32 {
+    if x == (1 << bits) - 1 {
+        0xffff_ffff
+    } else {
+        x
+    }
+}
+
 impl ValueDataPacked {
-    const INDEX_SHIFT: u64 = 0;
-    const INDEX_BITS: u64 = 32;
-    const NUM_SHIFT: u64 = Self::INDEX_SHIFT + Self::INDEX_BITS;
-    const NUM_BITS: u64 = 16;
-    const TYPE_SHIFT: u64 = Self::NUM_SHIFT + Self::NUM_BITS;
-    const TYPE_BITS: u64 = 14;
-    const TAG_SHIFT: u64 = Self::TYPE_SHIFT + Self::TYPE_BITS;
-    const TAG_BITS: u64 = 2;
+    const Y_SHIFT: u8 = 0;
+    const Y_BITS: u8 = 24;
+    const X_SHIFT: u8 = Self::Y_SHIFT + Self::Y_BITS;
+    const X_BITS: u8 = 24;
+    const TYPE_SHIFT: u8 = Self::X_SHIFT + Self::X_BITS;
+    const TYPE_BITS: u8 = 14;
+    const TAG_SHIFT: u8 = Self::TYPE_SHIFT + Self::TYPE_BITS;
+    const TAG_BITS: u8 = 2;

-    const TAG_INST: u64 = 1;
-    const TAG_PARAM: u64 = 2;
-    const TAG_ALIAS: u64 = 3;
+    const TAG_INST: u64 = 0;
+    const TAG_PARAM: u64 = 1;
+    const TAG_ALIAS: u64 = 2;
+    const TAG_UNION: u64 = 3;

-    fn make(tag: u64, ty: Type, num: u16, index: u32) -> ValueDataPacked {
+    fn make(tag: u64, ty: Type, x: u32, y: u32) -> ValueDataPacked {
        debug_assert!(tag < (1 << Self::TAG_BITS));
        debug_assert!(ty.repr() < (1 << Self::TYPE_BITS));

+        let x = encode_narrow_field(x, Self::X_BITS);
+        let y = encode_narrow_field(y, Self::Y_BITS);
+
        ValueDataPacked(
            (tag << Self::TAG_SHIFT)
                | ((ty.repr() as u64) << Self::TYPE_SHIFT)
-                | ((num as u64) << Self::NUM_SHIFT)
-                | ((index as u64) << Self::INDEX_SHIFT),
+                | ((x as u64) << Self::X_SHIFT)
+                | ((y as u64) << Self::Y_SHIFT),
        )
    }

    #[inline(always)]
-    fn field(self, shift: u64, bits: u64) -> u64 {
+    fn field(self, shift: u8, bits: u8) -> u64 {
        (self.0 >> shift) & ((1 << bits) - 1)
    }

@@ -537,14 +560,17 @@ impl From<ValueData> for ValueDataPacked {
    fn from(data: ValueData) -> Self {
        match data {
            ValueData::Inst { ty, num, inst } => {
-                Self::make(Self::TAG_INST, ty, num, inst.as_bits())
+                Self::make(Self::TAG_INST, ty, num.into(), inst.as_bits())
            }
            ValueData::Param { ty, num, block } => {
-                Self::make(Self::TAG_PARAM, ty, num, block.as_bits())
+                Self::make(Self::TAG_PARAM, ty, num.into(), block.as_bits())
            }
            ValueData::Alias { ty, original } => {
                Self::make(Self::TAG_ALIAS, ty, 0, original.as_bits())
            }
+            ValueData::Union { ty, x, y } => {
+                Self::make(Self::TAG_ALIAS, ty, x.as_bits(), y.as_bits())
+            }
        }
    }
 }
@@ -552,25 +578,33 @@ impl From<ValueData> for ValueDataPacked {
 impl From<ValueDataPacked> for ValueData {
    fn from(data: ValueDataPacked) -> Self {
        let tag = data.field(ValueDataPacked::TAG_SHIFT, ValueDataPacked::TAG_BITS);
-        let ty = data.field(ValueDataPacked::TYPE_SHIFT, ValueDataPacked::TYPE_BITS) as u16;
-        let num = data.field(ValueDataPacked::NUM_SHIFT, ValueDataPacked::NUM_BITS) as u16;
-        let index = data.field(ValueDataPacked::INDEX_SHIFT, ValueDataPacked::INDEX_BITS) as u32;
+        let ty = u16::try_from(data.field(ValueDataPacked::TYPE_SHIFT, ValueDataPacked::TYPE_BITS))
+            .expect("Mask should ensure result fits in a u16");
+        let x = u32::try_from(data.field(ValueDataPacked::X_SHIFT, ValueDataPacked::X_BITS))
+            .expect("Mask should ensure result fits in a u32");
+        let y = u32::try_from(data.field(ValueDataPacked::Y_SHIFT, ValueDataPacked::Y_BITS))
+            .expect("Mask should ensure result fits in a u32");

        let ty = Type::from_repr(ty);
        match tag {
            ValueDataPacked::TAG_INST => ValueData::Inst {
                ty,
-                num,
-                inst: Inst::from_bits(index),
+                num: u16::try_from(x).expect("Inst result num should fit in u16"),
+                inst: Inst::from_bits(decode_narrow_field(y, ValueDataPacked::Y_BITS)),
            },
            ValueDataPacked::TAG_PARAM => ValueData::Param {
                ty,
-                num,
-                block: Block::from_bits(index),
+                num: u16::try_from(x).expect("Blockparam index should fit in u16"),
+                block: Block::from_bits(decode_narrow_field(y, ValueDataPacked::Y_BITS)),
            },
            ValueDataPacked::TAG_ALIAS => ValueData::Alias {
                ty,
-                original: Value::from_bits(index),
+                original: Value::from_bits(decode_narrow_field(y, ValueDataPacked::Y_BITS)),
+            },
+            ValueDataPacked::TAG_UNION => ValueData::Union {
+                ty,
+                x: Value::from_bits(decode_narrow_field(x, ValueDataPacked::X_BITS)),
+                y: Value::from_bits(decode_narrow_field(y, ValueDataPacked::Y_BITS)),
            },
            _ => panic!("Invalid tag {} in ValueDataPacked 0x{:x}", tag, data.0),
        }
@@ -582,8 +616,11 @@ impl From<ValueDataPacked> for ValueData {
 impl DataFlowGraph {
    /// Create a new instruction.
    ///
-    /// The type of the first result is indicated by `data.ty`. If the instruction produces
-    /// multiple results, also call `make_inst_results` to allocate value table entries.
+    /// The type of the first result is indicated by `data.ty`. If the
+    /// instruction produces multiple results, also call
+    /// `make_inst_results` to allocate value table entries. (It is
+    /// always safe to call `make_inst_results`, regardless of how
+    /// many results the instruction has.)
    pub fn make_inst(&mut self, data: InstructionData) -> Inst {
        let n = self.num_insts() + 1;
        self.results.resize(n);
@@ -608,6 +645,7 @@ impl DataFlowGraph {
        match self.value_def(value) {
            ir::ValueDef::Result(inst, _) => self.display_inst(inst),
            ir::ValueDef::Param(_, _) => panic!("value is not defined by an instruction"),
+            ir::ValueDef::Union(_, _) => panic!("value is a union of two other values"),
        }
    }

@@ -823,6 +861,19 @@ impl DataFlowGraph {
        self.insts[inst].put_value_list(branch_values)
    }

+    /// Clone an instruction, attaching new result `Value`s and
+    /// returning them.
+    pub fn clone_inst(&mut self, inst: Inst) -> Inst {
+        // First, add a clone of the InstructionData.
+        let inst_data = self[inst].clone();
+        let new_inst = self.make_inst(inst_data);
+        // Get the controlling type variable.
+        let ctrl_typevar = self.ctrl_typevar(inst);
+        // Create new result values.
+        self.make_inst_results(new_inst, ctrl_typevar);
+        new_inst
+    }
+
    /// Get the first result of an instruction.
    ///
    /// This function panics if the instruction doesn't have any result.
@@ -847,6 +898,14 @@ impl DataFlowGraph {
        self.results[inst]
    }

+    /// Create a union of two values.
+    pub fn union(&mut self, x: Value, y: Value) -> Value {
+        // Get the type.
+        let ty = self.value_type(x);
+        debug_assert_eq!(ty, self.value_type(y));
+        self.make_value(ValueData::Union { ty, x, y })
+    }
+
    /// Get the call signature of a direct or indirect call instruction.
    /// Returns `None` if `inst` is not a call instruction.
    pub fn call_signature(&self, inst: Inst) -> Option<SigRef> {
--- a/cranelift/codegen/src/ir/layout.rs
+++ b/cranelift/codegen/src/ir/layout.rs
@@ -61,18 +61,6 @@ impl Layout {
        self.last_block = None;
    }

-    /// Clear instructions from every block, but keep the blocks.
-    ///
-    /// Used by the egraph-based optimization to clear out the
-    /// function body but keep the CFG skeleton.
-    pub(crate) fn clear_insts(&mut self) {
-        self.insts.clear();
-        for block in self.blocks.values_mut() {
-            block.first_inst = None.into();
-            block.last_inst = None.into();
-        }
-    }
-
    /// Returns the capacity of the `BlockData` map.
    pub fn block_capacity(&self) -> usize {
        self.blocks.capacity()
--- a/cranelift/codegen/src/ir/mod.rs
+++ b/cranelift/codegen/src/ir/mod.rs
@@ -48,7 +48,7 @@ pub use crate::ir::function::{DisplayFunctionAnnotations, Function};
 pub use crate::ir::globalvalue::GlobalValueData;
 pub use crate::ir::heap::{HeapData, HeapStyle};
 pub use crate::ir::instructions::{
-    InstructionData, InstructionImms, Opcode, ValueList, ValueListPool, VariableArgs,
+    InstructionData, Opcode, ValueList, ValueListPool, VariableArgs,
 };
 pub use crate::ir::jumptable::JumpTableData;
 pub use crate::ir::known_symbol::KnownSymbol;
--- a/cranelift/codegen/src/ir/progpoint.rs
+++ b/cranelift/codegen/src/ir/progpoint.rs
@@ -37,6 +37,7 @@ impl From<ValueDef> for ProgramPoint {
        match def {
            ValueDef::Result(inst, _) => inst.into(),
            ValueDef::Param(block, _) => block.into(),
+            ValueDef::Union(_, _) => panic!("Union does not have a single program point"),
        }
    }
 }
@@ -78,6 +79,7 @@ impl From<ValueDef> for ExpandedProgramPoint {
        match def {
            ValueDef::Result(inst, _) => inst.into(),
            ValueDef::Param(block, _) => block.into(),
+            ValueDef::Union(_, _) => panic!("Union does not have a single program point"),
        }
    }
 }