diff --git a/cranelift/docs/isle.md b/cranelift/docs/isle.md index 565ccc4e3c..d338c4c52f 100644 --- a/cranelift/docs/isle.md +++ b/cranelift/docs/isle.md @@ -1,6 +1,6 @@ # ISLE: Instruction Selection Lowering Expressions -This document will describe ISLE (Instruction Set Lowering +This document will describe ISLE (Instruction Selection Lowering Expressions), a DSL (domain-specific language) that we have developed in order to help us express certain parts of the Cranelift compiler backend more naturally. ISLE was first [described in RFC @@ -24,7 +24,7 @@ instructions. For example: immediate. One could write something like the following in ISLE (simplified from -the real code [here](../codegen/src/isa/x64/lower.isle): +the real code [here](../codegen/src/isa/x64/lower.isle)): ```lisp ;; Add two registers. @@ -71,7 +71,7 @@ This document is organized into the following sections: terms) and how rules are written. * ISLE with Rust: covers how ISLE provides an "FFI" (foreign function - interface) of sorts to allow interactino with Rust code, and + interface) of sorts to allow interaction with Rust code, and describes the scheme by which ISLE execution is mapped onto Rust (data structures and control flow).[^1] @@ -85,13 +85,23 @@ This document is organized into the following sections: chance to introduce that backronym when we wrote the initial implementation. -## Term-Rewriting Systems +## Background: Term-Rewriting Systems + +*Note: this section provides general background on term-rewriting +systems that is useful to better understand the context for ISLE and +how to develop systems using it. Readers already familiar with +term-rewriting systems, or wishing to skip to details on ISLE's +version of term rewriting, can skip to the [next +section](#core-isle-a-term-rewriting-system).* A [term-rewriting -system](https://en.wikipedia.org/wiki/Rewriting#Term_rewriting_systems), or TRS, -is a system that works by representing data as *terms* and then -applying *rules* to "rewrite" the terms until no more rules are -applicable, at which point the resulting term is the system's output. +system](https://en.wikipedia.org/wiki/Rewriting#Term_rewriting_systems), +or TRS, is a system that works by representing data as *terms* and +then applying *rules* to "rewrite" the terms. This rewrite process +continues until some application-specific end-condition is met, for +example until no more rules are applicable or until the term reaches a +"lowered" state by some definition, at which point the resulting term +is the system's output. Term-rewriting systems are a general kind of computing system, at the same level as (e.g.) Turing machines or other abstract computing @@ -119,7 +129,11 @@ Term rewriting as a process also naturally handles issues of *priority*, i.e. applying a more specific rule before a less specific one. This is because the abstraction allows for multiple rules to be "applicable", and so there is a natural place to reason about priority -when we choose which rule to apply. +when we choose which rule to apply. This permits a nice separation of +concerns: we can specify which rewrites are *possible* to apply +separately from which are *desirable* to apply, and adjust or tune the +latter (the "strategy") at will without breaking the system's +correctness. Additionally, term rewriting allows for a sort of *modularity* that is not present in hand-written pattern-matching code: the specific rules @@ -130,7 +144,9 @@ the related rules at once. Said another way: hand-written code tends to accumulate a lot of nested conditionals and switch/match statements, i.e., resembles a very large decision tree, while term-rewriting code tends to resemble a flat list of simple patterns -that, when composed and combined, become that more complex tree. +that, when composed and combined, become that more complex tree. This +allows the programmer to more easily maintain and update the set of +lowering rules, considering each in isolation. ### Data: Nested Trees of Constructors @@ -168,8 +184,7 @@ function call to Lisp-trained eyes) or a *primitive*. In the above, the `(a ...)`, `(b ...)`, `(d)`, and `(e ...)` terms/subterms are constructor invocations. A constructor takes some number of arguments (its *arity*), each of which is itself a term. Primitives can be -things like integer, string, symbolic, or boolean constants, or -variable names. +things like integer, string, or boolean constants, or variable names. Some term-rewriting systems have other syntax conventions: for example, systems based on @@ -213,7 +228,7 @@ actually perform the rewrites. The "program" itself, in a term-rewriting DSL, consists simply of an unordered list of rules. Each rule may or may not apply; if it applies, then it can be used to edit the term. Execution consists of repeated application of -rules until no more rules can be applied. +rules until some criteria are met. A rule consists of two parts: the left-hand side (LHS), or *pattern*, and right-hand side (RHS), or *expression*. The left-hand and @@ -244,7 +259,7 @@ subterms: * `_` is a wildcard and matches anything, without capturing it. * Primitive constant values, such as `42` or `$Symbol`, match only if - the term is exactly equal to this constnat. + the term is exactly equal to this constant. These pattern-matching operators can be combined, so we could write, for example, `(A (B x _) z)`. This pattern would match the term `(A (B @@ -287,7 +302,19 @@ somehow specify which rule will be applied in such a situation based on precedence, or specificity, or some other tie-breaker. A common heuristic is "more specific rule wins". We will see how ISLE resolves this question below by using both this heuristic and an explicit -priority mechanism. +priority mechanism.[^4] + +[^4]: Some term-rewriting systems actually elaborate the entire space + of possibilities, following *all* possible rule application + sequences / rewrite paths. For example, the *equality + saturation* technique + ([paper](https://cseweb.ucsd.edu/~lerner/papers/popl09.pdf), + [example implementation + Egg](https://blog.sigplan.org/2021/04/06/equality-saturation-with-egg/)) + builds a data structure that represents all equivalent terms + under a set of rewrite rules, from which a heuristic + (cost/goodness function) can be used to extract one answer when + needed. #### Right-hand Sides: Rewrite Expressions @@ -335,13 +362,13 @@ How is this useful? First, rewriting one term to another (here, `C` at the top level) that in turn appears in the left-hand side of other rules allows us to *factor* a "program" of term-rewriting rules in the same way that imperative programs are factored into separate -functions.[^4] The usual advantages of a well-factored program, where +functions.[^5] The usual advantages of a well-factored program, where each problem is solved with a small step that "reduces to a previously solved problem", apply here. Second, repeating the rewrite step is actually what grants term-rewriting its Turing-completeness: it allows for arbitrary -control flow.[^5] This might be useful in cases where, for example, a +control flow.[^6] This might be useful in cases where, for example, a term-rewriting program needs "loop" over a list of elements in the input: it can recurse and use intermediate terms to store state. @@ -353,13 +380,13 @@ then defining additional rules to rewrite further -- when it helps to factor common behavior out of multiple rules, or aids in conceptual clarity. -[^4]: In fact, ISLE actually compiles rules for different top-level +[^5]: In fact, ISLE actually compiles rules for different top-level pattern terms (`(A ...)` and `(C ...)` in the example) into separate Rust functions, so factoring rules to use intermediate terms can provide code-size and compile-time benefits for the ISLE-generated Rust code as well. -[^5]: The [lambda calculus' reduction +[^6]: The [lambda calculus' reduction rules](https://en.wikipedia.org/wiki/Lambda_calculus#Reduction) are a good example of this. @@ -397,7 +424,7 @@ application of a term-rewriting system should not depend for correctness on the order or choice of rule application: when multiple rules are applicable, then any sequence of rewrites that ends in a terminating state (a state with no further-applicable rules) should be -considered a "correct" answer.[^6] Here, this is true: if, for +considered a "correct" answer.[^7] Here, this is true: if, for example, we choose the register-plus-register form (the first rule) for an `iadd` operation, but the second argument is actually an `iconst`, then that is still valid, and the `iconst` will separately @@ -407,7 +434,7 @@ rule (or second rule, if the constant is zero). Hence, rule ordering and prioritization is nevertheless important for the quality of the instruction selector. -[^6]: Note that this suggests an interesting testing strategy: we +[^7]: Note that this suggests an interesting testing strategy: we could choose arbitrary (random) orders of lowering rules to apply, or even deliberately worst-case orders according to some heuristic. If we can differentially test programs compiled with @@ -465,13 +492,17 @@ operators: * Wildcards (`_`). * Integer constants (decimal/hex, positive/negative: `1`, `-1`, - `0x80`, `-0x80`). -* Symbolic constants (`$Symbol`). + `0x80`, `-0x80`) and boolean constants (`#t`, `#f`). +* constants imported from the embedding, of arbitrary type + (`$MyConst`). * Variable captures (bare identifiers like `x`; an identifier consists of alphanumeric characters and underscores, and does not start with a digit). * Variable captures with sub-patterns: `x @ PAT`, which captures the - subterm in `x` as above but also matches `PAT` against the subterm. + subterm in `x` as above but also matches `PAT` against the + subterm. For example, `x @ (A y z)` matches an `A` term and captures + its arguments as `y` and `z`, but also captures the whole term as + `x`. * "Equal-variable" constraints (`=x`): a subterm must match an already-captured value. * conjunctions of subpatterns: `(and PAT1 PAT2 ...)` matches all of @@ -479,7 +510,8 @@ operators: then this matcher fails. * Term deconstruction: `(term PAT1 PAT2 ...)`, where `term` is a defined term (type variant or constructor) and the subpatterns are - applied to each argument value in turn. + applied to each argument value in turn. Note that `term` cannot be a + wildcard; it must be a specific, concrete term. The expression (right-hand side) is made up of the following expression operators: @@ -500,7 +532,7 @@ When multiple rules are applicable to rewrite a particular term, ISLE will choose the "more specific" rule according to a particular heuristic: in the lowered sequence of matching steps, when one left-hand side completes a match while another with the same prefix -continues with further steps, the latter (more specific) is run first. +continues with further steps, the latter (more specific) is chosen. The more-specific-first heuristic is usually good enough, but when an undesirable choice occurs, explicit priorities can be specified. @@ -510,6 +542,18 @@ rule with a higher priority will always match before a rule with a lower priority. The default priority for all rules if not otherwise specified is `0`. +Note that the system allows multiple applicable rules to exist with +the same priority: that is, while the priority system allows for +manual tie-breaking, this tie-breaking is not required. + +Finally, one important note: the priority system is considered part of +the core language semantics and execution of rules with different +priorities is well-defined, so can be relied upon when specifying +correct rules. However, the tie-breaking heuristic is *not* part of +the specified language semantics, and so the user should never write +rules whose correctness depends on one rule overriding another +according to the heuristic. + ### Typed Terms ISLE allows the programmer to define types, and requires every term to @@ -699,14 +743,15 @@ selectors is that rewrites are always driven by term reduction from such a toplevel term, rather than a series of equivalences directly from IR instruction to machine instruction forms. -In other words, a conventional isel pattern engine might let one -specify `(Inst.A ...) -> (Inst.X ...)`. Here the instruction/opcode -type on the LHS and RHS must be the same single instruction type -(otherwise rewrites could not be chained), and rewrite relation (which -we wrote as `->`) is in essence a single privileged relation. One can -see ISLE as a generalization: we can define many different types, and -many different toplevel terms from which we can start the -reduction. In principle, one could have: +In other words, a conventional instruction selection pattern engine +might let one specify `(Inst.A ...) -> (Inst.X ...)`. In this +conventional design, the instruction/opcode type on the LHS and RHS +must be the same single instruction type (otherwise rewrites could not +be chained), and rewrite relation (which we wrote as `->`) is in +essence a single privileged relation. One can see ISLE as a +generalization: we can define many different types, and many different +toplevel terms from which we can start the reduction. In principle, +one could have: ```lisp @@ -723,7 +768,7 @@ reduction. In principle, one could have: and then both translations are available. We are "rewriting" from `IR` to `Machine1` and from `IR` to `Machine2`, even if rewrites always -preserve the same type; we get aorund the rule by using a constructor. +preserve the same type; we get around the rule by using a constructor. ### Constructors and Extractors @@ -810,7 +855,9 @@ write ``` which will, for example, expand a pattern `(A (subterm ...) _)` into -`(and (extractArg1 (subterm ...)) (extractArg2 _))`. +`(and (extractArg1 (subterm ...)) (extractArg2 _))`: in other words, +the arguments to `A` are substituted into the extractor body and then +this body is inlined. #### Summary: Terms, Constructors, and Extractors @@ -825,14 +872,12 @@ A term can have: 1. A single internal extractor body, via a toplevel `(extractor ...)` form, OR -2. A single external extractor binding, via an `(extern extractor - ...)` form, defined below; AND +2. A single external extractor binding (see next section); AND 3. One or more `(rule (Term ...) ...)` toplevel forms, which together make up an internal constructor definition, OR -4. A single external constructor binding, via an `(extern constructor - ...)` form, defined below. +4. A single external constructor binding (see next section). ## ISLE to Rust @@ -849,19 +894,24 @@ its language semantics to Rust semantics. This means that the execution of ISLE rewriting has a well-defined implementation in Rust. The basic principles are: -1. Every constructor with rules in ISLE becomes a Rust function. The - constructor arguments are the Rust function arguments. The - constructor's "return value" is the Rust function's return value - (usually wrapped in an `Option` because pattern coverage can be - incomplete). +1. Every term with rules in ISLE becomes a single Rust function. The + arguments are the Rust function arguments. The term's "return + value" is the Rust function's return value (wrapped in an `Option` + because pattern coverage can be incomplete). 2. One rewrite step is one Rust function call. 3. Rewriting is thus eager, and reified through ordinary Rust control - flow. The code that embeds the ISLE generated code will kick off - execution by calling a top-level "driver" constructor; this will - invoke constructors in its rewrite expression, making other Rust - function calls, until eventually a value is returned. + flow. When we construct a term that appears on the left-hand side + of rules, we do so by calling a function (the "constructor body"); + and this function *is* the rewrite logic, so the term is rewritten + as soon as it exists. The code that embeds the ISLE generated code + will kick off execution by calling a top-level "driver" + constructor. The body of the constructor will eventually choose one + of several rules to apply, and execute code to build the right-hand + side expression; this can invoke further constructors for its + subparts, kicking off more rewrites, until eventually a value is + returned. 4. This design means that "intermediate terms" -- constructed terms that are then further rewritten -- are never actually built as @@ -872,13 +922,15 @@ Rust. The basic principles are: overhead and/or the effectiveness of the Rust inliner). 5. Backtracking -- attempting to match rules, and backing up to follow - a different path when a match fails -- is entirely internal to the - generated Rust function for a constructor. Once we are rewriting a - constructor and its subterms, we have committed to that - constructor. Internally to the rewrite-evaluation of one - constructor, it evaluates left-hand side patterns, finding one that - matches, and then commits to that and starts to invoke constructors - to build the rewritten right-hand side. + a different path when a match fails -- exists, but is entirely + internal to the generated Rust function for rewriting one + term. Once we are rewriting a term, we have committed to that term + existing as a rewrite step; we cannot backtrack further. However, + backtracking can occur within the delimited scope of this one + term's rewrite; we have a phase during which we evaluate left-hand + sides, trying to find a matching rule, and once we find one, we + commit and start to invoke constructors to build the right-hand + side. Said another way, the principle is that left-hand sides are fallible, and have no side-effects as they execute; right-hand @@ -1083,8 +1135,8 @@ agnostic to the embedding application. This means that ISLE knows nothing about, e.g., Cranelift or compiler concepts in general. Rather, the generated code provides function entry points with well-defined signatures based on the terms, and imports the -extern constructors and extractors via a trait that the embedder must -implement. +extern constructors and extractors via a context trait that the +embedder must implement. When a term `T` is defined like @@ -1167,6 +1219,14 @@ what converts the unordered-list-of-rule representation into something that corresponds to the final Rust code's control flow and order of matching operations. +We describe this data structure below with the intent to provide an +understanding of how the DSL compiler weaves rules together into Rust +control flow. While this understanding shouldn't be strictly necessary +to use the DSL, it may be helpful. (The ultimate answer to "how does +the generated code work" is, of course, found by reading the generated +code; some care has been taken to ensure it is reasonably legible for +human consumption!) + ### Decision Trie The heart of the ISLE transformation lies in how the compiler converts @@ -1211,27 +1271,23 @@ known to be *mutually exclusive*. The canonical example of this is when an enum-typed value is destructured into different variants by various edges; we can use a Rust `match` statement in the generated source and have `O(1)` (or close to it) cost for the dispatch at this -level. +level.[^8] -Building the trie is a somewhat subtle procedure because we must be -mindful of *priorities*. In order to ensure that rule priorities are -obeyed, we actually label each edge with a *priority range* and a -match op. Edges must then be considered in priority order. When -building the trie, we uphold an important *non-overlapping-priority* -invariant: there must not be two edges with overlapping priority -ranges and different match ops, or else it would be unclear which to -run next. Priority ranges are inclusive in a discrete (integer) space, -so "overlap" is also subtle: we can allow edges with "unit" priority -ranges (just one priority), and these do not overlap. So, e.g., we can -allow an edge with priority `[0, 10]` (higher is more prioritized) and -op `A`, followed by an edge with priority `[0, 0]` and op `B`, -followed by an edge with priority `[-1, -10]`, `A`, though they must -occur in that order. - -See [this block +Building the trie is a somewhat subtle procedure; see [this block comment](https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/isle/isle/src/trie.rs#L15-L166) for more information regarding the trie construction algorithm. +[^8]: The worst-case complexity for a single term rewriting operation + is still the cost of evaluating each rule's left-hand side + sequentially, because in general there is no guarantee of + overlap between the patterns. Ordering of the edges out of a + decision node also affects complexity: if mutually-exclusive + match operations are not adjacent, then they cannot be merged + into a single `match` with `O(1)` dispatch. In general this + ordering problem is quite difficult. We could do better with + stronger heuristics; this is an open area for improvement in the + DSL compiler! + ## Reference: ISLE Language Grammar Baseline: allow arbitrary whitespace, and Lisp-style comments (`;` to