Address review feedback.

2021-11-30 11:40:21 -08:00
parent b6bed81ba2
commit 001d91c73c
1 changed files with 131 additions and 75 deletions
--- a/cranelift/docs/isle.md
+++ b/cranelift/docs/isle.md
@@ -1,6 +1,6 @@
 # ISLE: Instruction Selection Lowering Expressions
-This document will describe ISLE (Instruction Set Lowering
+This document will describe ISLE (Instruction Selection Lowering
 Expressions), a DSL (domain-specific language) that we have developed
 in order to help us express certain parts of the Cranelift compiler
 backend more naturally. ISLE was first [described in RFC
@@ -24,7 +24,7 @@ instructions. For example:
  immediate.
 One could write something like the following in ISLE (simplified from
-the real code [here](../codegen/src/isa/x64/lower.isle):
+the real code [here](../codegen/src/isa/x64/lower.isle)):
 ```lisp
 ;; Add two registers.
@@ -71,7 +71,7 @@ This document is organized into the following sections:
  terms) and how rules are written.
 * ISLE with Rust: covers how ISLE provides an "FFI" (foreign function
-  interface) of sorts to allow interactino with Rust code, and
+  interface) of sorts to allow interaction with Rust code, and
  describes the scheme by which ISLE execution is mapped onto Rust
  (data structures and control flow).[^1]
@@ -85,13 +85,23 @@ This document is organized into the following sections:
    chance to introduce that backronym when we wrote the initial
    implementation.
-## Term-Rewriting Systems
+## Background: Term-Rewriting Systems
 *Note: this section provides general background on term-rewriting
 systems that is useful to better understand the context for ISLE and
 how to develop systems using it. Readers already familiar with
 term-rewriting systems, or wishing to skip to details on ISLE's
 version of term rewriting, can skip to the [next
 section](#core-isle-a-term-rewriting-system).*
 A [term-rewriting
-system](https://en.wikipedia.org/wiki/Rewriting#Term_rewriting_systems), or TRS,
+system](https://en.wikipedia.org/wiki/Rewriting#Term_rewriting_systems),
-is a system that works by representing data as *terms* and then
+or TRS, is a system that works by representing data as *terms* and
-applying *rules* to "rewrite" the terms until no more rules are
+then applying *rules* to "rewrite" the terms. This rewrite process
-applicable, at which point the resulting term is the system's output.
+continues until some application-specific end-condition is met, for
 example until no more rules are applicable or until the term reaches a
 "lowered" state by some definition, at which point the resulting term
 is the system's output.
 Term-rewriting systems are a general kind of computing system, at the
 same level as (e.g.) Turing machines or other abstract computing
@@ -119,7 +129,11 @@ Term rewriting as a process also naturally handles issues of
 *priority*, i.e. applying a more specific rule before a less specific
 one. This is because the abstraction allows for multiple rules to be
 "applicable", and so there is a natural place to reason about priority
-when we choose which rule to apply.
+when we choose which rule to apply. This permits a nice separation of
 concerns: we can specify which rewrites are *possible* to apply
 separately from which are *desirable* to apply, and adjust or tune the
 latter (the "strategy") at will without breaking the system's
 correctness.
 Additionally, term rewriting allows for a sort of *modularity* that is
 not present in hand-written pattern-matching code: the specific rules
@@ -130,7 +144,9 @@ the related rules at once. Said another way: hand-written code tends
 to accumulate a lot of nested conditionals and switch/match
 statements, i.e., resembles a very large decision tree, while
 term-rewriting code tends to resemble a flat list of simple patterns
-that, when composed and combined, become that more complex tree.
+that, when composed and combined, become that more complex tree. This
 allows the programmer to more easily maintain and update the set of
 lowering rules, considering each in isolation.
 ### Data: Nested Trees of Constructors
@@ -168,8 +184,7 @@ function call to Lisp-trained eyes) or a *primitive*. In the above,
 the `(a ...)`, `(b ...)`, `(d)`, and `(e ...)` terms/subterms are
 constructor invocations. A constructor takes some number of arguments
 (its *arity*), each of which is itself a term. Primitives can be
-things like integer, string, symbolic, or boolean constants, or
+things like integer, string, or boolean constants, or variable names.
 variable names.
 Some term-rewriting systems have other syntax conventions: for
 example, systems based on
@@ -213,7 +228,7 @@ actually perform the rewrites. The "program" itself, in a
 term-rewriting DSL, consists simply of an unordered list of
 rules. Each rule may or may not apply; if it applies, then it can be
 used to edit the term. Execution consists of repeated application of
-rules until no more rules can be applied.
+rules until some criteria are met.
 A rule consists of two parts: the left-hand side (LHS), or *pattern*,
 and right-hand side (RHS), or *expression*. The left-hand and
@@ -244,7 +259,7 @@ subterms:
 * `_` is a wildcard and matches anything, without capturing it.
 * Primitive constant values, such as `42` or `$Symbol`, match only if
-  the term is exactly equal to this constnat.
+  the term is exactly equal to this constant.
 These pattern-matching operators can be combined, so we could write,
 for example, `(A (B x _) z)`. This pattern would match the term `(A (B
@@ -287,7 +302,19 @@ somehow specify which rule will be applied in such a situation based
 on precedence, or specificity, or some other tie-breaker. A common
 heuristic is "more specific rule wins". We will see how ISLE resolves
 this question below by using both this heuristic and an explicit
-priority mechanism.
+priority mechanism.[^4]
 [^4]: Some term-rewriting systems actually elaborate the entire space
      of possibilities, following *all* possible rule application
      sequences / rewrite paths. For example, the *equality
      saturation* technique
      ([paper](https://cseweb.ucsd.edu/~lerner/papers/popl09.pdf),
      [example implementation
      Egg](https://blog.sigplan.org/2021/04/06/equality-saturation-with-egg/))
      builds a data structure that represents all equivalent terms
      under a set of rewrite rules, from which a heuristic
      (cost/goodness function) can be used to extract one answer when
      needed.
 #### Right-hand Sides: Rewrite Expressions
@@ -335,13 +362,13 @@ How is this useful? First, rewriting one term to another (here, `C` at
 the top level) that in turn appears in the left-hand side of other
 rules allows us to *factor* a "program" of term-rewriting rules in the
 same way that imperative programs are factored into separate
-functions.[^4] The usual advantages of a well-factored program, where
+functions.[^5] The usual advantages of a well-factored program, where
 each problem is solved with a small step that "reduces to a previously
 solved problem", apply here.
 Second, repeating the rewrite step is actually what grants
 term-rewriting its Turing-completeness: it allows for arbitrary
-control flow.[^5] This might be useful in cases where, for example, a
+control flow.[^6] This might be useful in cases where, for example, a
 term-rewriting program needs "loop" over a list of elements in the
 input: it can recurse and use intermediate terms to store state.
@@ -353,13 +380,13 @@ then defining additional rules to rewrite further -- when it helps to
 factor common behavior out of multiple rules, or aids in conceptual
 clarity.
-[^4]: In fact, ISLE actually compiles rules for different top-level
+[^5]: In fact, ISLE actually compiles rules for different top-level
      pattern terms (`(A ...)` and `(C ...)` in the example) into
      separate Rust functions, so factoring rules to use intermediate
      terms can provide code-size and compile-time benefits for the
      ISLE-generated Rust code as well.
-[^5]: The [lambda calculus' reduction
+[^6]: The [lambda calculus' reduction
      rules](https://en.wikipedia.org/wiki/Lambda_calculus#Reduction)
      are a good example of this.
@@ -397,7 +424,7 @@ application of a term-rewriting system should not depend for
 correctness on the order or choice of rule application: when multiple
 rules are applicable, then any sequence of rewrites that ends in a
 terminating state (a state with no further-applicable rules) should be
-considered a "correct" answer.[^6] Here, this is true: if, for
+considered a "correct" answer.[^7] Here, this is true: if, for
 example, we choose the register-plus-register form (the first rule)
 for an `iadd` operation, but the second argument is actually an
 `iconst`, then that is still valid, and the `iconst` will separately
@@ -407,7 +434,7 @@ rule (or second rule, if the constant is zero). Hence, rule ordering
 and prioritization is nevertheless important for the quality of the
 instruction selector.
-[^6]: Note that this suggests an interesting testing strategy: we
+[^7]: Note that this suggests an interesting testing strategy: we
      could choose arbitrary (random) orders of lowering rules to
      apply, or even deliberately worst-case orders according to some
      heuristic. If we can differentially test programs compiled with
@@ -465,13 +492,17 @@ operators:
 * Wildcards (`_`).
 * Integer constants (decimal/hex, positive/negative: `1`, `-1`,
-  `0x80`, `-0x80`).
+  `0x80`, `-0x80`) and boolean constants (`#t`, `#f`).
-* Symbolic constants (`$Symbol`).
+* constants imported from the embedding, of arbitrary type
  (`$MyConst`).
 * Variable captures (bare identifiers like `x`; an identifier consists
  of alphanumeric characters and underscores, and does not start with
  a digit).
 * Variable captures with sub-patterns: `x @ PAT`, which captures the
-  subterm in `x` as above but also matches `PAT` against the subterm.
+  subterm in `x` as above but also matches `PAT` against the
  subterm. For example, `x @ (A y z)` matches an `A` term and captures
  its arguments as `y` and `z`, but also captures the whole term as
  `x`.
 * "Equal-variable" constraints (`=x`): a subterm must match an
  already-captured value.
 * conjunctions of subpatterns: `(and PAT1 PAT2 ...)` matches all of
@@ -479,7 +510,8 @@ operators:
  then this matcher fails.
 * Term deconstruction: `(term PAT1 PAT2 ...)`, where `term` is a
  defined term (type variant or constructor) and the subpatterns are
-  applied to each argument value in turn.
+  applied to each argument value in turn. Note that `term` cannot be a
  wildcard; it must be a specific, concrete term.
 The expression (right-hand side) is made up of the following
 expression operators:
@@ -500,7 +532,7 @@ When multiple rules are applicable to rewrite a particular term, ISLE
 will choose the "more specific" rule according to a particular
 heuristic: in the lowered sequence of matching steps, when one
 left-hand side completes a match while another with the same prefix
-continues with further steps, the latter (more specific) is run first.
+continues with further steps, the latter (more specific) is chosen.
 The more-specific-first heuristic is usually good enough, but when an
 undesirable choice occurs, explicit priorities can be specified.
@@ -510,6 +542,18 @@ rule with a higher priority will always match before a rule with a
 lower priority. The default priority for all rules if not otherwise
 specified is `0`.
 Note that the system allows multiple applicable rules to exist with
 the same priority: that is, while the priority system allows for
 manual tie-breaking, this tie-breaking is not required.
 Finally, one important note: the priority system is considered part of
 the core language semantics and execution of rules with different
 priorities is well-defined, so can be relied upon when specifying
 correct rules. However, the tie-breaking heuristic is *not* part of
 the specified language semantics, and so the user should never write
 rules whose correctness depends on one rule overriding another
 according to the heuristic.
 ### Typed Terms
 ISLE allows the programmer to define types, and requires every term to
@@ -699,14 +743,15 @@ selectors is that rewrites are always driven by term reduction from
 such a toplevel term, rather than a series of equivalences directly
 from IR instruction to machine instruction forms.
-In other words, a conventional isel pattern engine might let one
+In other words, a conventional instruction selection pattern engine
-specify `(Inst.A ...) -> (Inst.X ...)`. Here the instruction/opcode
+might let one specify `(Inst.A ...) -> (Inst.X ...)`. In this
-type on the LHS and RHS must be the same single instruction type
+conventional design, the instruction/opcode type on the LHS and RHS
-(otherwise rewrites could not be chained), and rewrite relation (which
+must be the same single instruction type (otherwise rewrites could not
-we wrote as `->`) is in essence a single privileged relation. One can
+be chained), and rewrite relation (which we wrote as `->`) is in
-see ISLE as a generalization: we can define many different types, and
+essence a single privileged relation. One can see ISLE as a
-many different toplevel terms from which we can start the
+generalization: we can define many different types, and many different
-reduction. In principle, one could have:
+toplevel terms from which we can start the reduction. In principle,
 one could have:
 ```lisp
@@ -723,7 +768,7 @@ reduction. In principle, one could have:
 and then both translations are available. We are "rewriting" from `IR`
 to `Machine1` and from `IR` to `Machine2`, even if rewrites always
-preserve the same type; we get aorund the rule by using a constructor.
+preserve the same type; we get around the rule by using a constructor.
 ### Constructors and Extractors
@@ -810,7 +855,9 @@ write
 ```
 which will, for example, expand a pattern `(A (subterm ...) _)` into
-`(and (extractArg1 (subterm ...)) (extractArg2 _))`.
+`(and (extractArg1 (subterm ...)) (extractArg2 _))`: in other words,
 the arguments to `A` are substituted into the extractor body and then
 this body is inlined.
 #### Summary: Terms, Constructors, and Extractors
@@ -825,14 +872,12 @@ A term can have:
 1. A single internal extractor body, via a toplevel `(extractor ...)`
   form, OR
-2. A single external extractor binding, via an `(extern extractor
+2. A single external extractor binding (see next section); AND
   ...)` form, defined below; AND
 3. One or more `(rule (Term ...) ...)` toplevel forms, which together
   make up an internal constructor definition, OR
-4. A single external constructor binding, via an `(extern constructor
+4. A single external constructor binding (see next section).
   ...)` form, defined below.
 ## ISLE to Rust
@@ -849,19 +894,24 @@ its language semantics to Rust semantics. This means that the
 execution of ISLE rewriting has a well-defined implementation in
 Rust. The basic principles are:
-1. Every constructor with rules in ISLE becomes a Rust function. The
+1. Every term with rules in ISLE becomes a single Rust function. The
-   constructor arguments are the Rust function arguments. The
+   arguments are the Rust function arguments. The term's "return
-   constructor's "return value" is the Rust function's return value
+   value" is the Rust function's return value (wrapped in an `Option`
-   (usually wrapped in an `Option` because pattern coverage can be
+   because pattern coverage can be incomplete).
   incomplete).
 2. One rewrite step is one Rust function call.
 3. Rewriting is thus eager, and reified through ordinary Rust control
-   flow. The code that embeds the ISLE generated code will kick off
+   flow. When we construct a term that appears on the left-hand side
-   execution by calling a top-level "driver" constructor; this will
+   of rules, we do so by calling a function (the "constructor body");
-   invoke constructors in its rewrite expression, making other Rust
+   and this function *is* the rewrite logic, so the term is rewritten
-   function calls, until eventually a value is returned.
+   as soon as it exists. The code that embeds the ISLE generated code
   will kick off execution by calling a top-level "driver"
   constructor. The body of the constructor will eventually choose one
   of several rules to apply, and execute code to build the right-hand
   side expression; this can invoke further constructors for its
   subparts, kicking off more rewrites, until eventually a value is
   returned.
 4. This design means that "intermediate terms" -- constructed terms
   that are then further rewritten -- are never actually built as
@@ -872,13 +922,15 @@ Rust. The basic principles are:
   overhead and/or the effectiveness of the Rust inliner).
 5. Backtracking -- attempting to match rules, and backing up to follow
-   a different path when a match fails -- is entirely internal to the
+   a different path when a match fails -- exists, but is entirely
-   generated Rust function for a constructor. Once we are rewriting a
+   internal to the generated Rust function for rewriting one
-   constructor and its subterms, we have committed to that
+   term. Once we are rewriting a term, we have committed to that term
-   constructor. Internally to the rewrite-evaluation of one
+   existing as a rewrite step; we cannot backtrack further. However,
-   constructor, it evaluates left-hand side patterns, finding one that
+   backtracking can occur within the delimited scope of this one
-   matches, and then commits to that and starts to invoke constructors
+   term's rewrite; we have a phase during which we evaluate left-hand
-   to build the rewritten right-hand side.
+   sides, trying to find a matching rule, and once we find one, we
   commit and start to invoke constructors to build the right-hand
   side.
   Said another way, the principle is that left-hand sides are
   fallible, and have no side-effects as they execute; right-hand
@@ -1083,8 +1135,8 @@ agnostic to the embedding application. This means that ISLE knows
 nothing about, e.g., Cranelift or compiler concepts in
 general. Rather, the generated code provides function entry points
 with well-defined signatures based on the terms, and imports the
-extern constructors and extractors via a trait that the embedder must
+extern constructors and extractors via a context trait that the
-implement.
+embedder must implement.
 When a term `T` is defined like
@@ -1167,6 +1219,14 @@ what converts the unordered-list-of-rule representation into something
 that corresponds to the final Rust code's control flow and order of
 matching operations.
 We describe this data structure below with the intent to provide an
 understanding of how the DSL compiler weaves rules together into Rust
 control flow. While this understanding shouldn't be strictly necessary
 to use the DSL, it may be helpful. (The ultimate answer to "how does
 the generated code work" is, of course, found by reading the generated
 code; some care has been taken to ensure it is reasonably legible for
 human consumption!)
 ### Decision Trie
 The heart of the ISLE transformation lies in how the compiler converts
@@ -1211,27 +1271,23 @@ known to be *mutually exclusive*. The canonical example of this is
 when an enum-typed value is destructured into different variants by
 various edges; we can use a Rust `match` statement in the generated
 source and have `O(1)` (or close to it) cost for the dispatch at this
-level.
+level.[^8]
-Building the trie is a somewhat subtle procedure because we must be
+Building the trie is a somewhat subtle procedure; see [this block
 mindful of *priorities*. In order to ensure that rule priorities are
 obeyed, we actually label each edge with a *priority range* and a
 match op. Edges must then be considered in priority order. When
 building the trie, we uphold an important *non-overlapping-priority*
 invariant: there must not be two edges with overlapping priority
 ranges and different match ops, or else it would be unclear which to
 run next. Priority ranges are inclusive in a discrete (integer) space,
 so "overlap" is also subtle: we can allow edges with "unit" priority
 ranges (just one priority), and these do not overlap. So, e.g., we can
 allow an edge with priority `[0, 10]` (higher is more prioritized) and
 op `A`, followed by an edge with priority `[0, 0]` and op `B`,
 followed by an edge with priority `[-1, -10]`, `A`, though they must
 occur in that order.
 See [this block
 comment](https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/isle/isle/src/trie.rs#L15-L166)
 for more information regarding the trie construction algorithm.
 [^8]: The worst-case complexity for a single term rewriting operation
      is still the cost of evaluating each rule's left-hand side
      sequentially, because in general there is no guarantee of
      overlap between the patterns. Ordering of the edges out of a
      decision node also affects complexity: if mutually-exclusive
      match operations are not adjacent, then they cannot be merged
      into a single `match` with `O(1)` dispatch. In general this
      ordering problem is quite difficult. We could do better with
      stronger heuristics; this is an open area for improvement in the
      DSL compiler!
 ## Reference: ISLE Language Grammar
 Baseline: allow arbitrary whitespace, and Lisp-style comments (`;` to