Cranelift/ISLE: re-apply prio-trie fix, this time with fixed fix. (#4117)

* ISLE compiler: fix priority-trie interval bug. (#4093) This PR fixes a bug in the ISLE compiler related to rule priorities. An important note first: the bug did not affect the correctness of the Cranelift backends, either in theory (because the rules should be correct applied in any order, even contrary to the stated priorities) or in practice (because the generated code actually does not change at all with the DSL compiler fix, only with a separate minimized bug example). The issue was a simple swap of `min` for `max` (see first commit). This is the minimal fix, I think, to get a correct priority-trie with the minimized bug example in this commit. However, while debugging this, I started to convince myself that the complexity of merging multiple priority ranges using the sort of hybrid interval tree / string-matching trie data structure was unneeded. The original design was built with the assumption we might have a bunch of different priority levels, and would need the efficiency of merging where possible. But in practice we haven't used priorities this way: the vast majority of lowering rules exist at the default (priority 0), and just a few overrides are explicitly at prio 1, 2 or (rarely) 3. So, it turns out to be a lot simpler to label trie edges with (prio, symbol) rather than (prio-range, symbol), and delete the whole mess of interval-splitting logic on insertion. It's easier (IMHO) to convince oneself that the resulting insertion algorithm is correct. I was worried that this might impact the size of the generated Rust code or its runtime, but In fact, to my initial surprise (but it makes sense given the above "rarely used" factor), the generated code with this compiler fix is *exactly the same*. I rebuilt with `--features rebuild-isle,all-arch` but... there were no diffs to commit! This is to me the simplest evidence that we didn't really need that complexity. * Fix earlier commit from #4093: properly sort trie. This commit fixes an in-hindsight-obvious bug in #4093: the trie's edges must be sorted recursively, not just at the top level. With this fix, the generated code differs only in one cosmetic way (a let-binding moves) but otherwise is the same. This includes @fitzgen's fix to the CI (from the revert in #4102) that deletes manifests to actually check that the checked-in source is consistent with the checked-in compiler. The force-rebuild step is now in a shell script for convenience: anyone hacking on the ISLE compiler itself can use this script to more easily rebuild everything. * Add note to build.rs to remind to update force-rebuild-isle.sh
2022-05-09 16:36:48 -07:00
parent 8bd507db65
commit 2af8d1e93c
7 changed files with 185 additions and 279 deletions
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -196,10 +196,8 @@ jobs:
      with:
        submodules: true
    - uses: ./.github/actions/install-rust
-    - name: Delete ISLE manifest files to force a re-build
-      run: rm cranelift/codegen/src/isa/aarch64/lower/isle/generated_code.manifest cranelift/codegen/src/isa/s390x/lower/isle/generated_code.manifest cranelift/codegen/src/isa/x64/lower/isle/generated_code.manifest
-    - name: Rebuild ISLE DSL files
-      run: cargo build -p cranelift-codegen --features "rebuild-isle"
+    - name: Force-rebuild ISLE DSL files
+      run: ./scripts/force-rebuild-isle.sh
    - name: Reformat
      run: cargo fmt -p cranelift-codegen
    - name: Check that the ISLE DSL files are up-to-date
--- a/cranelift/codegen/build.rs
+++ b/cranelift/codegen/build.rs
@@ -232,6 +232,10 @@ fn get_isle_compilations(crate_dir: &std::path::Path) -> Result<IsleCompilations
    // There should be one entry for each backend that uses ISLE for lowering,
    // and if/when we replace our peephole optimization passes with ISLE, there
    // should be an entry for each of those as well.
+    //
+    // N.B.: add any new compilation outputs to
+    // `scripts/force-rebuild-isle.sh` if they do not fit the pattern
+    // `cranelift/codegen/src/isa/*/lower/isle/generated_code.rs`!
    Ok(IsleCompilations {
        items: vec![
            // The x86-64 instruction selector.
--- a/cranelift/codegen/src/isa/x64/lower/isle/generated_code.rs
+++ b/cranelift/codegen/src/isa/x64/lower/isle/generated_code.rs
@@ -1377,7 +1377,6 @@ pub fn constructor_amode_add<C: Context>(ctx: &mut C, arg0: &Amode, arg1: Value)
            }
        }
    }
-    let pattern0_0 = arg0;
    let pattern1_0 = arg1;
    if let Some(pattern2_0) = C::def_inst(ctx, pattern1_0) {
        let pattern3_0 = C::inst_data(ctx, pattern2_0);
@@ -1395,6 +1394,7 @@ pub fn constructor_amode_add<C: Context>(ctx: &mut C, arg0: &Amode, arg1: Value)
            }
        }
    }
+    let pattern0_0 = arg0;
    match pattern0_0 {
        &Amode::ImmReg {
            simm32: pattern1_0,
--- a/cranelift/isle/isle/isle_examples/pass/prio_trie_bug.isle
+++ b/cranelift/isle/isle/isle_examples/pass/prio_trie_bug.isle
@@ -0,0 +1,108 @@
+;; Minimized bug reproducer for earlier priority-range-merging trie
+;; implementation in ISLE compiler. This example, when compiled with
+;; old versions of islec, would result in the bottom-most rule (at
+;; priority 0) being applied before the rule involving `iconst` at
+;; priority 1 below.
+
+(type Unit (primitive Unit))
+(type u8 (primitive u8))
+(type u32 (primitive u32))
+(type Reg (primitive Reg))
+(type MemFlags (primitive MemFlags))
+(type MachLabel (primitive MachLabel))
+(type Value (primitive Value))
+
+(decl iadd (Value Value) Value)
+(extern extractor iadd iadd)
+(decl ishl (Value Value) Value)
+(extern extractor ishl ishl)
+(decl uextend (Value) Value)
+(extern extractor uextend uextend)
+(decl sextend (Value) Value)
+(extern extractor sextend sextend)
+(decl iconst (u32) Value)
+(extern extractor iconst iconst)
+
+(decl put_in_reg (Value) Reg)
+(convert Value Reg put_in_reg)
+(extern constructor put_in_reg put_in_reg)
+
+(decl invalid_reg () Reg)
+(extern extractor invalid_reg invalid_reg)
+(decl valid_reg () Reg)
+(extern extractor valid_reg valid_reg)
+
+(decl pure u32_lteq (u32 u32) Unit)
+(extern constructor u32_lteq u32_lteq)
+
+(decl pure s32_add_fallible (u32 u32) u32)
+(extern constructor s32_add_fallible s32_add_fallible)
+
+(decl x64_add (Reg Reg) Reg)
+(extern constructor x64_add x64_add)
+
+;; An `Amode` represents a possible addressing mode that can be used
+;; in instructions. These denote a 64-bit value only.
+(type Amode (enum
+             ;; Immediate sign-extended and a register
+             (ImmReg (simm32 u32)
+                     (base Reg)
+                     (flags MemFlags))
+
+             ;; Sign-extend-32-to-64(simm32) + base + (index << shift)
+             (ImmRegRegShift (simm32 u32)
+                             (base Reg)
+                             (index Reg)
+                             (shift u32)
+                             (flags MemFlags))
+
+             ;; Sign-extend-32-to-64(immediate) + RIP (instruction
+             ;; pointer). The appropriate relocation is emitted so
+             ;; that the resulting immediate makes this Amode refer to
+             ;; the given MachLabel.
+             (RipRelative (target MachLabel))))
+
+;; One step in amode processing: take an existing amode and add
+;; another value to it.
+(decl amode_add (Amode Value) Amode)
+
+;; -- Top-level driver: pull apart the addends.
+;;
+;; Any amode can absorb an `iadd` by absorbing first the LHS of the
+;; add, then the RHS.
+;;
+;; Priority 2 to take this above fallbacks and ensure we traverse the
+;; `iadd` tree fully.
+(rule 2 (amode_add amode (iadd x y))
+      (let ((amode1 Amode (amode_add amode x))
+            (amode2 Amode (amode_add amode1 y)))
+        amode2))
+
+;; -- Case 1 (adding a register to the initial Amode with invalid_reg).
+;;
+;; An Amode.ImmReg with invalid_reg (initial state) can absorb a
+;; register as the base register.
+(rule (amode_add (Amode.ImmReg off (invalid_reg) flags) value)
+      (Amode.ImmReg off value flags))
+
+;; -- Case 4 (absorbing constant offsets).
+;;
+;; An Amode can absorb a constant (i64, or extended i32) as long as
+;; the sum still fits in the signed-32-bit offset.
+;;
+;; Priority 3 in order to take this option above the fallback
+;; (immediate in register). Two rules, for imm+reg and
+;; imm+reg+scale*reg cases.
+(rule 1 (amode_add (Amode.ImmRegRegShift off base index shift flags)
+                   (iconst c))
+      (if-let sum (s32_add_fallible off c))
+      (Amode.ImmRegRegShift sum base index shift flags))
+
+;; -- Case 5 (fallback to add a new value to an imm+reg+scale*reg).
+;;
+;; An Amode.ImmRegRegShift can absorb any other value by creating a
+;; new add instruction and replacing the base with
+;; (base+value).
+(rule (amode_add (Amode.ImmRegRegShift off base index shift flags) value)
+      (let ((sum Reg (x64_add base value)))
+        (Amode.ImmRegRegShift off sum index shift flags)))
--- a/cranelift/isle/isle/src/codegen.rs
+++ b/cranelift/isle/isle/src/codegen.rs
@@ -693,16 +693,11 @@ impl<'a> Codegen<'a> {

            &TrieNode::Decision { ref edges } => {
                let subindent = format!("{}    ", indent);
-                // if this is a decision node, generate each match op
-                // in turn (in priority order). Sort the ops within
-                // each priority, and gather together adjacent
-                // MatchVariant ops with the same input and disjoint
-                // variants in order to create a `match` rather than a
-                // chain of if-lets.
-                let mut edges = edges.clone();
-                edges.sort_by(|e1, e2| {
-                    (-e1.range.min, &e1.symbol).cmp(&(-e2.range.min, &e2.symbol))
-                });
+                // If this is a decision node, generate each match op
+                // in turn (in priority order). Gather together
+                // adjacent MatchVariant ops with the same input and
+                // disjoint variants in order to create a `match`
+                // rather than a chain of if-lets.

                let mut i = 0;
                while i < edges.len() {
@@ -712,8 +707,8 @@ impl<'a> Codegen<'a> {
                    let mut adjacent_variants = BTreeSet::new();
                    let mut adjacent_variant_input = None;
                    log::trace!(
-                        "edge: range = {:?}, symbol = {:?}",
-                        edges[i].range,
+                        "edge: prio = {:?}, symbol = {:?}",
+                        edges[i].prio,
                        edges[i].symbol
                    );
                    while last < edges.len() {
--- a/cranelift/isle/isle/src/trie.rs
+++ b/cranelift/isle/isle/src/trie.rs
@@ -100,67 +100,19 @@ pub fn build_tries(typeenv: &TypeEnv, termenv: &TermEnv) -> BTreeMap<TermId, Tri
 ///
 /// To build this trie, we construct nodes with edges to child nodes;
 /// each edge consists of (i) one input token (a `PatternInst` or
-/// EOM), and (ii) the minimum and maximum priorities of rules along
-/// this edge. In a way this resembles an interval tree, though the
-/// intervals of children need not be disjoint.
+/// EOM), and (ii) the priority of rules along this edge. We do not
+/// merge rules of different priorities, because the logic to do so is
+/// complex and error-prone, necessitating "splits" when we merge
+/// together a set of rules over a priority range but later introduce
+/// a new possible match op in the "middle" of the range. (E.g., match
+/// op A at prio 10, B at prio 5, A at prio 0.) In fact, a previous
+/// version of the ISLE compiler worked this way, but in practice the
+/// complexity was unneeded.
 ///
 /// To add a rule to this trie, we perform the usual trie-insertion
-/// logic, creating edges and subnodes where necessary, and updating
-/// the priority-range of each edge that we traverse to include the
-/// priority of the inserted rule.
-///
-/// However, we need to be a little bit careful, because with only
-/// priority ranges in place and the potential for overlap, we have
-/// something that resembles an NFA. For example, consider the case
-/// where we reach a node in the trie and have two edges with two
-/// match ops, one corresponding to a rule with priority 10, and the
-/// other corresponding to two rules, with priorities 20 and 0. The
-/// final match could lie along *either* path, so we have to traverse
-/// both.
-///
-/// So, to avoid this, we perform a sort of moral equivalent to the
-/// NFA-to-DFA conversion "on the fly" as we insert nodes by
-/// duplicating subtrees. At any node, when inserting with a priority
-/// P and when outgoing edges lie in a range [P_lo, P_hi] such that P
-/// >= P_lo and P <= P_hi, we "priority-split the edges" at priority
-/// P.
-///
-/// To priority-split the edges in a node at priority P:
-///
-/// - For each out-edge with priority [P_lo, P_hi] s.g. P \in [P_lo,
-///   P_hi], and token T:
-///   - Trim the subnode at P, yielding children C_lo and C_hi.
-///   - Both children must be non-empty (have at least one leaf)
-///     because the original node must have had a leaf at P_lo
-///     and a leaf at P_hi.
-///   - Replace the one edge with two edges, one for each child, with
-///     the original match op, and with ranges calculated according to
-///     the trimmed children.
-///
-/// To trim a node into range [P_lo, P_hi]:
-///
-/// - For a decision node:
-///   - If any edges have a range outside the bounds of the trimming
-///     range, trim the bounds of the edge, and trim the subtree under the
-///     edge into the trimmed edge's range. If the subtree is trimmed
-///     to `None`, remove the edge.
-///   - If all edges are removed, the decision node becomes `None`.
-/// - For a leaf node:
-///   - If the priority is outside the range, the node becomes `None`.
-///
-/// As we descend a path to insert a leaf node, we (i) priority-split
-/// if any edges' priority ranges overlap the insertion priority
-/// range, and (ii) expand priority ranges on edges to include the new
-/// leaf node's priority.
-///
-/// As long as we do this, we ensure the two key priority-trie
-/// invariants:
-///
-/// 1. At a given node, no two edges exist with priority ranges R_1,
-///    R_2 such that R_1 ∩ R_2 ≠ ∅, unless R_1 and R_2 are unit ranges
-///    ([x, x]) and are on edges with different match-ops.
-/// 2. Along the path from the root to any leaf node with priority P,
-///    each edge has a priority range R such that P ∈ R.
+/// logic, creating edges and subnodes where necessary. A new edge is
+/// necessary whenever an edge does not exist for the (priority,
+/// symbol) tuple.
 ///
 /// Note that this means that multiple edges with a single match-op
 /// may exist, with different priorities.
@@ -187,78 +139,11 @@ impl TrieSymbol {
 /// A priority.
 pub type Prio = i64;

-/// An inclusive range of priorities.
-#[derive(Clone, Copy, Debug)]
-pub struct PrioRange {
-    /// The minimum of this range.
-    pub min: Prio,
-    /// The maximum of this range.
-    pub max: Prio,
-}
-
-impl PrioRange {
-    fn contains(&self, prio: Prio) -> bool {
-        prio >= self.min && prio <= self.max
-    }
-
-    fn is_unit(&self) -> bool {
-        self.min == self.max
-    }
-
-    fn overlaps(&self, other: PrioRange) -> bool {
-        // This can be derived via DeMorgan: !(self.begin > other.end
-        // OR other.begin > self.end).
-        self.min <= other.max && other.min <= self.max
-    }
-
-    fn intersect(&self, other: PrioRange) -> PrioRange {
-        PrioRange {
-            min: std::cmp::max(self.min, other.min),
-            max: std::cmp::min(self.max, other.max),
-        }
-    }
-
-    fn union(&self, other: PrioRange) -> PrioRange {
-        PrioRange {
-            min: std::cmp::min(self.min, other.min),
-            max: std::cmp::max(self.max, other.max),
-        }
-    }
-
-    fn split_at(&self, prio: Prio) -> (PrioRange, PrioRange) {
-        assert!(self.contains(prio));
-        assert!(!self.is_unit());
-        if prio == self.min {
-            (
-                PrioRange {
-                    min: self.min,
-                    max: self.min,
-                },
-                PrioRange {
-                    min: self.min + 1,
-                    max: self.max,
-                },
-            )
-        } else {
-            (
-                PrioRange {
-                    min: self.min,
-                    max: prio - 1,
-                },
-                PrioRange {
-                    min: prio,
-                    max: self.max,
-                },
-            )
-        }
-    }
-}
-
 /// An edge in our term trie.
 #[derive(Clone, Debug)]
 pub struct TrieEdge {
-    /// The priority range for this edge's sub-trie.
-    pub range: PrioRange,
+    /// The priority for this edge's sub-trie.
+    pub prio: Prio,
    /// The match operation to perform for this edge.
    pub symbol: TrieSymbol,
    /// This edge's sub-trie.
@@ -319,91 +204,19 @@ impl TrieNode {
            _ => panic!("insert on leaf node!"),
        };

-        // Do we need to split?
-        let needs_split = edges
-            .iter()
-            .any(|edge| edge.range.contains(prio) && !edge.range.is_unit());
-
-        // If so, pass over all edges/subnodes and split each.
-        if needs_split {
-            let mut new_edges = vec![];
-            for edge in std::mem::take(edges) {
-                if !edge.range.contains(prio) || edge.range.is_unit() {
-                    new_edges.push(edge);
-                    continue;
-                }
-
-                let (lo_range, hi_range) = edge.range.split_at(prio);
-                let lo = edge.node.trim(lo_range);
-                let hi = edge.node.trim(hi_range);
-                if let Some((node, range)) = lo {
-                    new_edges.push(TrieEdge {
-                        range,
-                        symbol: edge.symbol.clone(),
-                        node,
-                    });
-                }
-                if let Some((node, range)) = hi {
-                    new_edges.push(TrieEdge {
-                        range,
-                        symbol: edge.symbol,
-                        node,
-                    });
-                }
-            }
-            *edges = new_edges;
-        }
-
        // Now find or insert the appropriate edge.
-        let mut edge: Option<usize> = None;
-        let mut last_edge_with_op: Option<usize> = None;
-        let mut last_edge_with_op_prio: Option<Prio> = None;
-        for i in 0..(edges.len() + 1) {
-            if i == edges.len() || prio > edges[i].range.max {
-                // We've passed all edges with overlapping priority
-                // ranges. Maybe the last edge we saw with the op
-                // we're inserting can have its range expanded,
-                // however.
-                if last_edge_with_op.is_some() {
-                    // Move it to the end of the run of equal-unit-range ops.
-                    edges.swap(last_edge_with_op.unwrap(), i - 1);
-                    edge = Some(i - 1);
-                    edges[i - 1].range.max = prio;
-                    break;
-                }
-                edges.insert(
-                    i,
-                    TrieEdge {
-                        range: PrioRange {
-                            min: prio,
-                            max: prio,
-                        },
-                        symbol: op.clone(),
-                        node: TrieNode::Empty,
-                    },
-                );
-                edge = Some(i);
-                break;
-            }
-            if i == edges.len() {
-                break;
-            }
-            if edges[i].symbol == op {
-                last_edge_with_op = Some(i);
-                last_edge_with_op_prio = Some(edges[i].range.max);
-            }
-            if last_edge_with_op_prio.is_some()
-                && last_edge_with_op_prio.unwrap() < edges[i].range.max
-            {
-                last_edge_with_op = None;
-                last_edge_with_op_prio = None;
-            }
-            if edges[i].range.contains(prio) && edges[i].symbol == op {
-                edge = Some(i);
-                break;
-            }
-        }
-        let edge = edge.expect("Must have found an edge at least at last iter");
+        let edge = edges
+            .iter()
+            .position(|edge| edge.symbol == op && edge.prio == prio)
+            .unwrap_or_else(|| {
+                edges.push(TrieEdge {
+                    prio,
+                    symbol: op,
+                    node: TrieNode::Empty,
+                });
+                edges.len() - 1
+            });
+
        let edge = &mut edges[edge];

        if is_last {
@@ -420,57 +233,18 @@ impl TrieNode {
        }
    }

-    fn trim(&self, range: PrioRange) -> Option<(TrieNode, PrioRange)> {
+    /// Sort edges by priority.
+    pub fn sort(&mut self) {
        match self {
-            &TrieNode::Empty => None,
-            &TrieNode::Leaf { prio, ref output } => {
-                if range.contains(prio) {
-                    Some((
-                        TrieNode::Leaf {
-                            prio,
-                            output: output.clone(),
-                        },
-                        PrioRange {
-                            min: prio,
-                            max: prio,
-                        },
-                    ))
-                } else {
-                    None
-                }
-            }
-            &TrieNode::Decision { ref edges } => {
-                let edges = edges
-                    .iter()
-                    .filter_map(|edge| {
-                        if !edge.range.overlaps(range) {
-                            None
-                        } else {
-                            let range = range.intersect(edge.range);
-                            if let Some((node, range)) = edge.node.trim(range) {
-                                Some(TrieEdge {
-                                    range,
-                                    symbol: edge.symbol.clone(),
-                                    node,
-                                })
-                            } else {
-                                None
-                            }
-                        }
-                    })
-                    .collect::<Vec<_>>();
-
-                if edges.is_empty() {
-                    None
-                } else {
-                    let range = edges
-                        .iter()
-                        .map(|edge| edge.range)
-                        .reduce(|a, b| a.union(b))
-                        .expect("reduce on non-empty vec must not return None");
-                    Some((TrieNode::Decision { edges }, range))
+            TrieNode::Decision { edges } => {
+                // Sort by priority, highest integer value first; then
+                // by trie symbol.
+                edges.sort_by_cached_key(|edge| (-edge.prio, edge.symbol.clone()));
+                for child in edges {
+                    child.node.sort();
                }
            }
+            _ => {}
        }
    }

@@ -490,8 +264,8 @@ impl TrieNode {
                    for edge in edges {
                        s.push_str(indent);
                        s.push_str(&format!(
-                            "  edge: range = {:?}, symbol: {:?}\n",
-                            edge.range, edge.symbol
+                            "  edge: prio = {:?}, symbol: {:?}\n",
+                            edge.prio, edge.symbol
                        ));
                        pretty_rec(s, &edge.node, &new_indent);
                    }
@@ -535,6 +309,10 @@ impl TermFunctionBuilder {
            .chain(std::iter::once(TrieSymbol::EndOfMatch));
        self.trie.insert(prio, symbols, expr_seq);
    }
+
+    fn sort_trie(&mut self) {
+        self.trie.sort();
+    }
 }

 #[derive(Debug)]
@@ -574,6 +352,10 @@ impl<'a> TermFunctionsBuilder<'a> {
                .or_insert_with(|| TermFunctionBuilder::new())
                .add_rule(prio, pattern.clone(), expr.clone());
        }
+
+        for builder in self.builders_by_term.values_mut() {
+            builder.sort_trie();
+        }
    }

    fn finalize(self) -> BTreeMap<TermId, TrieNode> {
--- a/scripts/force-rebuild-isle.sh
+++ b/scripts/force-rebuild-isle.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+#
+# This script rebuilds all ISLE generated source that is checked in, even if
+# the source has not changed relative to the manifests.
+#
+# This is useful when one is developing the ISLE compiler itself; otherwise,
+# changing the compiler does not automatically change the generated code, even
+# if the `rebuild-isle` feature is specified.
+
+set -e
+
+# Remove the manifests (which contain hashes of ISLE source) to force the build
+# script to regenerate all backends.
+rm -f cranelift/codegen/src/isa/*/lower/isle/generated_code.manifest
+
+# `cargo check` will both invoke the build script to rebuild the backends, and
+# check that the output is valid Rust. We specify `all-arch` here to include
+# all backends.
+cargo check -p cranelift-codegen --features rebuild-isle,all-arch