* This PR is against a branch called `main` * Internally all docs/CI/etc is updated * The default branch of the repo is now `main` * All active PRs have been updated to retarget `main` Closes #1914
424 lines
14 KiB
Markdown
424 lines
14 KiB
Markdown
<div align="center">
|
|
<h1><code>peepmatic</code></h1>
|
|
|
|
<p>
|
|
<b>
|
|
<code>peepmatic</code> is a DSL and compiler for peephole optimizers for
|
|
<a href="https://github.com/bytecodealliance/wasmtime/tree/main/cranelift#readme">Cranelift</a>.
|
|
</b>
|
|
</p>
|
|
|
|
<img src="https://github.com/fitzgen/peepmatic/workflows/Rust/badge.svg"/>
|
|
|
|
</div>
|
|
|
|
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
|
|
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
|
|
|
|
|
|
- [About](#about)
|
|
- [Example](#example)
|
|
- [A DSL for Optimizations](#a-dsl-for-optimizations)
|
|
- [Variables](#variables)
|
|
- [Constants](#constants)
|
|
- [Nested Patterns](#nested-patterns)
|
|
- [Preconditions and Unquoting](#preconditions-and-unquoting)
|
|
- [Bit Widths](#bit-widths)
|
|
- [Implementation](#implementation)
|
|
- [Parsing](#parsing)
|
|
- [Type Checking](#type-checking)
|
|
- [Linearization](#linearization)
|
|
- [Automatization](#automatization)
|
|
- [References](#references)
|
|
- [Acknowledgments](#acknowledgments)
|
|
|
|
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
|
|
|
|
## About
|
|
|
|
Peepmatic is a DSL for peephole optimizations and compiler for generating
|
|
peephole optimizers from them. The user writes a set of optimizations in the
|
|
DSL, and then `peepmatic` compiles the set of optimizations into an efficient
|
|
peephole optimizer:
|
|
|
|
```
|
|
DSL ----peepmatic----> Peephole Optimizer
|
|
```
|
|
|
|
The generated peephole optimizer has all of its optimizations' left-hand sides
|
|
collapsed into a compact automata that makes matching candidate instruction
|
|
sequences fast.
|
|
|
|
The DSL's optimizations may be written by hand or discovered mechanically with a
|
|
superoptimizer like [Souper][]. Eventually, `peepmatic` should have a verifier
|
|
that ensures that the DSL's optimizations are sound, similar to what [Alive][]
|
|
does for LLVM optimizations.
|
|
|
|
Currently, `peepmatic` is targeting peephole optimizers that operate on
|
|
Cranelift's clif intermediate representation. The intended next target is
|
|
Cranelift's new backend's "vcode" intermediate representation. Supporting
|
|
non-Cranelift targets is not a goal.
|
|
|
|
[Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/main/cranelift#readme
|
|
[Souper]: https://github.com/google/souper
|
|
[Alive]: https://github.com/AliveToolkit/alive2
|
|
|
|
## Example
|
|
|
|
This snippet of our DSL describes optimizations for removing redundant
|
|
bitwise-or instructions that are no-ops:
|
|
|
|
```lisp
|
|
(=> (bor $x (bor $x $y))
|
|
(bor $x $y))
|
|
|
|
(=> (bor $y (bor $x $y))
|
|
(bor $x $y))
|
|
|
|
(=> (bor (bor $x $y) $x)
|
|
(bor $x $y))
|
|
|
|
(=> (bor (bor $x $y) $y)
|
|
(bor $x $y))
|
|
```
|
|
|
|
When compiled into a peephole optimizer automaton, they look like this:
|
|
|
|

|
|
|
|
## A DSL for Optimizations
|
|
|
|
A single peephole optimization has two parts:
|
|
|
|
1. A **left-hand side** that describes candidate instruction sequences that the
|
|
optimization applies to.
|
|
2. A **right-hand side** that contains the new instruction sequence that
|
|
replaces old instruction sequences that the left-hand side matched.
|
|
|
|
A left-hand side may bind sub-expressions to variables and the right-hand side
|
|
may contain those bound variables to reuse the sub-expressions. The operations
|
|
inside the left-hand and right-hand sides are a subset of clif operations.
|
|
|
|
Let's take a look at an example:
|
|
|
|
```lisp
|
|
(=> (imul $x 2)
|
|
(ishl $x 1))
|
|
```
|
|
|
|
As you can see, the DSL uses S-expressions. (S-expressions are easy to parse and
|
|
we also have a bunch of nice parsing infrastructure for S-expressions already
|
|
for our [`wat`][wat] and [`wast`][wast] crates.)
|
|
|
|
[wat]: https://crates.io/crates/wat
|
|
[wast]: https://crates.io/crates/wast
|
|
|
|
The left-hand side of this optimization is `(imul $x 2)`. It matches integer
|
|
multiplication operations where a value is multiplied by the constant two. The
|
|
value multiplied by two is bound to the variable `$x`.
|
|
|
|
The right-hand side of this optimization is `(ishl $x 1)`. It reuses the `$x`
|
|
variable that was bound in the left-hand side.
|
|
|
|
This optimization replaces expressions of the form `x * 2` with `x << 1`. This
|
|
is sound because multiplication by two is the same as shifting left by one for
|
|
binary integers, and it is desirable because a shift-left instruction executes
|
|
in fewer cycles than a multiplication.
|
|
|
|
The general form of an optimization is:
|
|
|
|
```lisp
|
|
(=> <left-hand-side> <right-hand-side>)
|
|
```
|
|
|
|
### Variables
|
|
|
|
Variables begin with a dollar sign and are followed by lowercase letters,
|
|
numbers, hyphens, and underscores: `$x`, `$y`, `$my-var`, `$operand2`.
|
|
|
|
Left-hand side patterns may contain variables that match any kind of
|
|
sub-expression and give it a name so that it may be reused in the right-hand
|
|
side.
|
|
|
|
```lisp
|
|
;; Replace `x + 0` with simply `x`.
|
|
(=> (iadd $x 0)
|
|
$x)
|
|
```
|
|
|
|
Within a pattern, every occurrence of a variable with the same name must match
|
|
the same value. That is `(iadd $x $x)` matches `(iadd 1 1)` but does not match
|
|
`(iadd 1 2)`. This lets us write optimizations such as this:
|
|
|
|
```lisp
|
|
;; Xor'ing a value with itself is always zero.
|
|
(=> (bxor $x $x)
|
|
(iconst 0))
|
|
```
|
|
|
|
### Constants
|
|
|
|
We've already seen specific integer literals and wildcard variables in patterns,
|
|
but we can also match any constant. These are written similar to variables, but
|
|
use uppercase letters rather than lowercase: `$C`, `$MY-CONST`, and `$OPERAND2`.
|
|
|
|
For example, we can use constant patterns to combine an `iconst` and `iadd` into
|
|
a single `iadd_imm` instruction:
|
|
|
|
```lisp
|
|
(=> (iadd (iconst $C) $x)
|
|
(iadd_imm $C $x))
|
|
```
|
|
|
|
### Nested Patterns
|
|
|
|
Patterns can also match nested operations with their own nesting:
|
|
|
|
```lisp
|
|
(=> (bor $x (bor $x $y))
|
|
(bor $x $y))
|
|
```
|
|
|
|
### Preconditions and Unquoting
|
|
|
|
Let's reconsider our first example optimization:
|
|
|
|
```lisp
|
|
(=> (imul $x 2)
|
|
(ishl $x 1))
|
|
```
|
|
|
|
This optimization is a little too specific. Here is another version of this
|
|
optimization that we'd like to support:
|
|
|
|
```lisp
|
|
(=> (imul $x 4)
|
|
(ishl $x 2))
|
|
```
|
|
|
|
We don't want to have to write out all instances of this general class of
|
|
optimizations! That would be a lot of repetition and could also bloat the size
|
|
of our generated peephole optimizer's matching automata.
|
|
|
|
Instead, we can generalize this optimization by matching any multiplication by a
|
|
power of two constant `C` and replacing it with a shift left of `log2(C)`.
|
|
|
|
First, rather than match `2` directly, we want to match any constant variable `C`:
|
|
|
|
```lisp
|
|
(imul $x $C)
|
|
```
|
|
|
|
Note that variables begin with lowercase letters, while constants begin with
|
|
uppercase letters. Both the constant pattern `$C` and variable pattern `$x` will
|
|
match `5`, but only the variable pattern `$x` will match a whole sub-expression
|
|
like `(iadd 1 2)`. The constant pattern `$C` only matches constant values.
|
|
|
|
Next, we augment our left-hand side's pattern with a **precondition** that the
|
|
constant `$C` must be a power of two. Preconditions are introduced by wrapping
|
|
a pattern in the `when` form:
|
|
|
|
```lisp
|
|
;; Our new left-hand side, augmenting a pattern with a precondition!
|
|
(when
|
|
;; The pattern matching multiplication by a constant value.
|
|
(imul $x $C)
|
|
|
|
;; The precondition that $C must be a power of two.
|
|
(is-power-of-two $C))
|
|
```
|
|
|
|
In the right-hand side, we use **unquoting** to perform compile-time evaluation
|
|
of `log2($C)`. Unquoting is done with the `$(...)` form:
|
|
|
|
```lisp
|
|
;; Our new right-hand side, using unqouting to do compile-time evaluation of
|
|
;; constants that were matched and bound in the left-hand side!
|
|
(ishl $x $(log2 $C))
|
|
```
|
|
|
|
Finally, here is the general optimization putting our new left-hand and
|
|
right-hand sides together:
|
|
|
|
```lisp
|
|
(=> (when (imul $x $C)
|
|
(is-power-of-two $C))
|
|
(ishl $x $(log2 $C)))
|
|
```
|
|
|
|
### Bit Widths
|
|
|
|
Similar to how Cranelift's instructions are bit-width polymorphic, `peepmatic`
|
|
optimizations are also bit-width polymorphic. Unless otherwise specified, a
|
|
pattern will match expressions manipulating `i32`s just the same as expressions
|
|
manipulating `i64`s, etc... An optimization that doesn't constrain its pattern's
|
|
bit widths must be valid for all bit widths:
|
|
|
|
* 1
|
|
* 8
|
|
* 16
|
|
* 32
|
|
* 64
|
|
* 128
|
|
|
|
To constrain an optimization to only match `i32`s, for example, you can use the
|
|
`bit-width` precondition:
|
|
|
|
```lisp
|
|
(=> (when (iadd $x $y)
|
|
(bit-width $x 32)
|
|
(bit-width $y 32))
|
|
...)
|
|
```
|
|
|
|
Alternatively, you can ascribe a type to an operation by putting the type inside
|
|
curly brackets after the operator, like this:
|
|
|
|
```lisp
|
|
(=> (when (sextend{i64} (ireduce{i32} $x))
|
|
(bit-width $x 64))
|
|
(sshr (ishl $x 32) 32))
|
|
```
|
|
|
|
## Implementation
|
|
|
|
Peepmatic has roughly four phases:
|
|
|
|
1. Parsing
|
|
2. Type Checking
|
|
3. Linearization
|
|
4. Automatization
|
|
|
|
(I say "roughly" because there are a couple micro-passes that happen after
|
|
linearization and before automatization. But those are the four main phases.)
|
|
|
|
### Parsing
|
|
|
|
Parsing transforms the DSL source text into an abstract syntax tree (AST).
|
|
|
|
We use [the `wast` crate][wast]. It gives us nicely formatted errors with source
|
|
context, as well as some other generally nice-to-have parsing infrastructure.
|
|
|
|
Relevant source files:
|
|
|
|
* `src/parser.rs`
|
|
* `src/ast.rs`
|
|
|
|
[wast]: https://crates.io/crates/wast
|
|
|
|
### Type Checking
|
|
|
|
Type checking operates on the AST. It checks that types and bit widths in the
|
|
optimizations are all valid. For example, it ensures that the type and bit width
|
|
of an optimization's left-hand side is the same as its right-hand side, because
|
|
it doesn't make sense to replace an integer expression with a boolean
|
|
expression.
|
|
|
|
After type checking is complete, certain AST nodes are assigned a type and bit
|
|
width, that are later used in linearization and when matching and applying
|
|
optimizations.
|
|
|
|
We walk the AST and gather type constraints. Every constraint is associated with
|
|
a span in the source file. We hand these constraints off to Z3. In the case that
|
|
there are type errors (i.e. Z3 returns `unsat`), we get the constraints that are
|
|
in conflict with each other via `z3::Solver::get_unsat_core` and report the type
|
|
errors to the user, with the source context, thanks to the constraints'
|
|
associated spans.
|
|
|
|
Using Z3 not only makes implementing type checking easier than it otherwise
|
|
would be, but makes it that much easier to extend type checking with searching
|
|
for counterexample inputs in the future. That is, inputs for which the RHS is
|
|
not equivalent to the LHS, implying that the optimization is unsound.
|
|
|
|
Relevant source files:
|
|
|
|
* `src/verify.rs`
|
|
|
|
### Linearization
|
|
|
|
Linearization takes the AST of optimizations and converts each optimization into
|
|
a linear form. The goal is to make automaton construction easier in the
|
|
automatization step, as well as simplifying the language to make matching and
|
|
applying optimizations easier.
|
|
|
|
Each optimization's left-hand side is converted into a sequence of
|
|
|
|
* match operation,
|
|
* path to the instruction/value/immediate to which the operation is applied, and
|
|
* expected result of the operation.
|
|
|
|
All match operations must have the expected result for the optimization to be
|
|
applicable to an instruction sequence.
|
|
|
|
Each optimization's right-hand side is converted into a sequence of build
|
|
actions. These are commands that describe how to construct the right-hand side,
|
|
given that the left-hand side has been matched.
|
|
|
|
Relevant source files:
|
|
|
|
* `src/linearize.rs`
|
|
* `src/linear_passes.rs`
|
|
* `crates/runtime/src/linear.rs`
|
|
|
|
### Automatization
|
|
|
|
Automatization takes a set of linear optimizations and combines them into a
|
|
transducer automaton. This automaton is the final, compiled peephole
|
|
optimizations. The goal is to de-duplicate as much as we can from all the linear
|
|
optimizations, producing as compact and cache-friendly a representation as we
|
|
can.
|
|
|
|
Plain automata can tell you whether it matches an input string. It can be
|
|
thought of as a compact representation of a set of strings. A transducer is a
|
|
type of automaton that doesn't just match input strings, but can map them to
|
|
output values. It can be thought of as a compact representation of a dictionary
|
|
or map. By using transducers, we de-duplicate not only the prefix and suffix of
|
|
the match operations, but also the right-hand side build actions.
|
|
|
|
Each state in the emitted transducers is associated with a match operation and
|
|
path. The transitions out of that state are over the result of the match
|
|
operation. Each transition optionally accumulates some RHS build actions. By the
|
|
time we reach a final state, the RHS build actions are complete and can be
|
|
interpreted to apply the matched optimization.
|
|
|
|
The relevant source files for constructing the transducer automaton are:
|
|
|
|
* `src/automatize.rs`
|
|
* `crates/automata/src/lib.rs`
|
|
|
|
The relevant source files for the runtime that interprets the transducers and
|
|
applies optimizations are:
|
|
|
|
* `crates/runtime/src/optimizations.rs`
|
|
* `crates/runtime/src/optimizer.rs`
|
|
|
|
## References
|
|
|
|
I found these resources helpful when designing `peepmatic`:
|
|
|
|
* [Extending tree pattern matching for application to peephole
|
|
optimizations](https://pure.tue.nl/ws/portalfiles/portal/125543109/Thesis_JanekvOirschot.pdf)
|
|
by van Oirschot
|
|
|
|
* [Interpreted Pattern Match Execution for
|
|
MLIR](https://drive.google.com/drive/folders/1hb_sXbdMbIz95X-aaa6Vf5wSYRwsJuve)
|
|
by Jeff Niu
|
|
|
|
* [Direct Construction of Minimal Acyclic Subsequential
|
|
Transducers](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3698&rep=rep1&type=pdf)
|
|
by Mihov and Maurel
|
|
|
|
* [Index 1,600,000,000 Keys with Automata and
|
|
Rust](https://blog.burntsushi.net/transducers/) and [the `fst`
|
|
crate](https://crates.io/crates/fst) by Andrew Gallant
|
|
|
|
## Acknowledgments
|
|
|
|
Thanks to [Jubi Taneja], [Dan Gohman], [John Regehr], and [Nuno Lopes] for their
|
|
input in design discussions and for sharing helpful resources!
|
|
|
|
[Jubi Taneja]: https://www.cs.utah.edu/~jubi/
|
|
[Dan Gohman]: https://github.com/sunfishcode
|
|
[John Regehr]: https://www.cs.utah.edu/~regehr/
|
|
[Nuno Lopes]: http://web.ist.utl.pt/nuno.lopes/
|