diff --git a/cranelift/docs/isle-integration.md b/cranelift/docs/isle-integration.md new file mode 100644 index 0000000000..1f7c7154f8 --- /dev/null +++ b/cranelift/docs/isle-integration.md @@ -0,0 +1,234 @@ +# How ISLE is Integrated with Cranelift + +This document contains an overview of and FAQ about how ISLE fits into +Cranelift. + +## What is ISLE? + +ISLE is a domain-specific language for authoring instruction selection and +rewrite rules. ISLE source text is [compiled down into Rust +code](https://github.com/bytecodealliance/wasmtime/tree/main/cranelift/isle#implementation). + +TODO: link to @cfallin's language reference/tutorial + +## How does ISLE integrate with the build system? + +The build integration is inside of `cranelift/codegen/build.rs`. + +For regular builds, we check a manifest that records the file hashes of the ISLE +source files that went into building a given ISLE-generated Rust file. If the +hashes of these files on disk don't match the hashes in the manifest, then the +ISLE-generated Rust file is stale and needs to be rebuilt. In this case, the +`build.rs` will report a build error. This way, downstream crates that use +Cranelift don't need to build ISLE, and get fewer transitive dependencies and +faster build times. + +To intentionally rebuild ISLE-generated Rust files, use the `rebuild-isle` Cargo +feature with `cranelift-codegen`: + +```shell +$ cargo check -p cranelift-codegen --features rebuild-isle +``` + +When this feature is active, we rerun the ISLE compiler on the ISLE sources to +create the new versions of the ISLE-generated Rust files and update the manifest +files. + +Additionally, the `cranelift-codegen-meta` crate will automatically generate +ISLE `extern` declarations and helpers for working with CLIF. The code that does +this is defined inside `cranelift/codegen/meta/src/gen_inst.rs` and it creates +the `cranelift/codegen/src/clif.isle` file. + +## Where are the relevant files? + +* `cranelift/isle`: The ISLE compiler's source code. + +* `cranelift/codegen/src/prelude.isle`: Common definitions and declarations for + ISLE. This gets included in every ISLE compilation. + +* `cranelift/codegen/src/clif.isle`: Auto-generated declarations and helpers for + working with CLIF inside ISLE. Generated by `cranelift/codegen/build.rs` when + the `rebuild-isle` feature is enabled. This gets included in every ISLE + compilation. + +* `cranelift/codegen/src/machinst/isle.rs`: Common Rust code for gluing + ISLE-generated code into a target architecture's backend. Contains + implementations of ISA-agnostic `extern` helpers declared in ISLE. + +* `cranelift/codegen/src/isa//inst.isle`: ISA-specific ISLE + helpers. Contains things like constructors for each instruction in the ISA, or + helpers to get a specific register. Helps bridge the gap between the raw, + non-SSA ISA and the pure, SSA view that the lowering rules have. + +* `cranelift/codegen/src/isa//lower.isle`: Instruction selection lowering + rules for an ISA. These should be pure, SSA rewrite rules, that lend + themselves to eventual verification. + +* `cranelift/codegen/src/isa//lower/isle.rs`: The Rust glue code for + integrating this ISA's ISLE-generate Rust code into the rest of the backend + for this ISA. Contains implementations of ISA-specific `extern` helpers + declared in ISLE. + +* `cranelift/codegen/src/isa//lower/isle/generated_code.rs`: The + ISLE-generated Rust code to perform instruction and CLIF-to-`MachInst` + lowering for each target architecture. + +## Gluing ISLE's generated code into Cranelift + +Each ISA-specific, ISLE-generated file is generic over a `Context` trait that +has a trait method for each `extern` helper defined in ISLE. There is one +concrete implementation of each of these traits, defined in +`cranelift/codegen/src/isa//lower/isle.rs`. In general, the way that +ISLE-generated code is glued into the rest of the system is with these trait +implementations. + +There may also be a `lower` function defined in `isle.rs` that encapsulates +creating the ISLE `Context` and calling into the generated code. + +## Lowering rules are always pure, use SSA + +The lowering rules themselves, defined in +`cranelift/codegen/src/isa//lower.isle`, must always be a pure mapping +from a CLIF instruction to the target ISA's `MachInst`. + +Examples of things that the lowering rules themselves shouldn't deal with or +talk about: + +* Registers that are modified (both read and written to, violating SSA) +* Implicit uses of registers +* Maintaining use counts for each CLIF value or virtual register + +Instead, these things should be handled by some combination of +`cranelift/codegen/src/isa//inst.isle` and general Rust code (either in +`cranelift/codegen/src/isa//lower/isle.rs` or elsewhere). + +When an instruction modifies a register, both reading from it and writing to it, +we should build an SSA view of that instruction that gets legalized via "move +mitosis" by splitting a move out from the register. + +For example, on x86 the `add` instruction reads and writes its first operand: + + add a, b == a = a + b + +So we present an SSA facade where `add` operates on three registers, instead of +two, and defines one of them, while reading the other two and leaving them +unmodified: + + add a, b, c == a = b + c + +Then, as an implementation detail of the facade, we emit moves as necessary: + + add a, b, c ==> mov a, b; add b, c + +We call the process of emitting these moves "move mitosis". For ISAs with +ubiquitous use of modified registers and instructions in two-operand form, like +x86, we implement move mitosis with methods on the ISA's `MachInst`. For other +ISAs that are RISCier and where modified registers are pretty rare, such as +aarch64, we implement the handful of move mitosis special cases at the +`inst.isle` layer. Either way, the important thing is that the lowering rules +remain pure. + +Finally, note that these moves are generally cleaned up by the register +allocator's move coalescing, and move mitosis will eventually go away completely +once we switch over to `regalloc2`, which takes instructions in SSA form +directly as input. + +Instructions that implicitly operate on specific registers, or which require +that certain operands be in certain registers, are handled similarly: the +lowering rules use a pure paradigm that ignores these constraints and has +instructions that explicitly take implicit operands, and we ensure the +constraints are fulfilled a layer below the lowering rules (in `inst.isle` or in +Rust glue code). + +## When are lowering rules allowed to have side effects? + +Extractors (the matchers that appear on the left-hand sides of `rule`s) should +**never** have side effects. When evaluating a rule's extractors, we haven't yet +committed to evaluating that rule's right-hand side. If the extractors performed +side effects, we could get deeply confusing action-at-a-distance bugs where +rules we never fully match pull the rug out from under our feet. + +Anytime you are tempted to perform side effects in an extractor, you should +instead just package up the things you would need in order to perform that side +effect, and then have a separate constructor that takes that package and +performs the side effect it describes. The constructor can only be called inside +a rule's right-hand side, which is only evaluated after we've committed to this +rule, which avoids the action-at-a-distance bugs described earlier. + +For example, loads have a side effect in CLIF: they might trap. Therefore, even +if a loaded value is never used, we will emit code that implements that +load. But if we are compiling for x86 we can sink loads into other the operand +for another operation depending on how the loaded value is used. If we sink that +load into, say, an `add` then we need to tell the lowering context *not* to +lower the CLIF `load` instruction anymore, because its effectively already +lowered as part of lowering the `add` that uses the loaded value. Marking an +instruction as "already lowered" is a side effect, and we might be tempted to +perform that side effect in the extractor that matches sinkable loads. But we +can't do that because although the load itself might be sinkable, there might be +a reason why we ultimately don't perform this load-sinking rule, and if that +happens we still need to lower the CLIF load. + +Therefore, we make the `sinkable_load` extractor create a `SinkableLoad` type +that packages up everything we need to know about the load and how to tell the +lowering context that we've sunk it and the lowering context doesn't need to +lower it anymore, but *it doesn't actually tell that to the lowering context +yet*. + +```lisp +;; inst.isle + +;; A load that can be sunk into another operation. +(type SinkableLoad extern (enum)) + +;; Extract a `SinkableLoad` from a value if the value is defined by a compatible +;; load. +(decl sinkable_load (SinkableLoad) Value) +(extern extractor sinkable_load sinkable_load) +``` + +Then, we pair that with a `sink_load` constructor that takes the `SinkableLoad`, +performs the associated side effect of telling the lowering context not to lower +the load anymore, and returns the x86 operand with the load sunken into it. + +```lisp +;; inst.isle + +;; Sink a `SinkableLoad` into a `RegMemImm.Mem`. +;; +;; This is a side-effectful operation that notifies the context that the +;; instruction that produced the `SinkableImm` has been sunk into another +;; instruction, and no longer needs to be lowered. +(decl sink_load (SinkableLoad) RegMemImm) +(extern constructor sink_load sink_load) +``` + +Finally, we can use `sinkable_load` and `sink_load` inside lowering rules that +create instructions where an operand is loaded directly from memory: + +``` +;; lower.isle + +(rule (lower (has_type (fits_in_64 ty) + (iadd x (sinkable_load y)))) + (value_reg (add ty + (put_in_reg x) + (sink_load y)))) +``` + +See the `sinkable_load`, `SinkableLoad`, and `sink_load` declarations inside +`cranelift/codegen/src/isa/x64/inst.isle` as well as their external +implementations inside `cranelift/codegen/src/isa/x64/lower/isle.rs` for +details. + +See also the "ISLE code should leverage types" section below. + +## ISLE code should leverage types + +ISLE is a typed language, and we should leverage that to prevent whole classes +of bugs where possible. Use newtypes liberally. + +For example, use the `with_flags` family of helpers to pair flags-producing +instructions with flags-consuming instructions, ensuring that no errant +instructions are ever inserted between our flags-using instructions, clobbering +their flags. See `with_flags`, `ProducesFlags`, and `ConsumesFlags` inside +`cranelift/codegen/src/prelude.isle` for details.