Currently, the `build.rs` script that generates Rust source from the
ISLE DSL will only do this generation if the `rebuild-isle` Cargo
feature is specified. By default, it is not. This is based on the
principle that we (the build script) do not modify the source tree as
managed by git; git-managed files are strictly a human-managed and
human-edited resource. By adding the opt-in Cargo feature, a developer
is requesting the build script to perform an explicit action. (In my
understanding at least, this principle comes from the general philosophy
of hermetic builds: the output should be a pure function of the input,
and part of this is that the input is read-only. If we modify the source
tree, then all bets are off.)
Unfortunately, requiring the opt-in feature also creates a footgun that
is easy to hit: if a developer modifies the ISLE DSL source, but forgets
to specify the Cargo feature, then the compiler will silently be built
successfully with stale source, and will silently exclude any changes
that were made.
The generated source is checked into git for a good reason: we want DSL
compiler to not affect build times for the overwhelmingly common case
that Cranelift is used as a dependency but the backends are not being
actively developed. (This overhead comes mainly from building `islec`
itself.)
So, what to do? This PR implements a middle ground first described in
[this conversation](https://github.com/bytecodealliance/wasmtime/pull/3506#discussion_r743113351), in which we:
- Generate a hash (SHA-512) of the ISLE DSL source and produce a
"manifest" of ISLE inputs alongside the generated source; and
- Always read the ISLE DSL source, and see if the manifest is still
valid, on builds that do not have the opt-in "rebuild" feature.
This allows us to know whether the ISLE compiler output would have been
the same (modulo changes to the DSL compiler itself, which are
out-of-scope here), without actually building the ISLE compiler and
running it.
If the compiler-backend developer modifies an ISLE source file and then
tries to build `cranelift-codegen` without adding the `rebuild-isle`
Cargo feature, they get the following output:
```text
Error: the ISLE source files that resulted in the generated Rust source
* src/isa/x64/lower/isle/generated_code.rs
have changed but the generated source was not rebuilt! These ISLE source
files are:
* src/clif.isle
* src/prelude.isle
* src/isa/x64/inst.isle
* src/isa/x64/lower.isle
Please add `--features rebuild-isle` to your `cargo build` command
if you wish to rebuild the generated source, then include these changes
in any git commits you make that include the changes to the ISLE.
For example:
$ cargo build -p cranelift-codegen --features rebuild-isle
(This build script cannot do this for you by default because we cannot
modify checked-into-git source without your explicit opt-in.)
```
which will tell them exactly what they need to do to fix the problem!
Note that I am leaving the "Rebuild ISLE" CI job alone for now, because
otherwise, we are trusting whomever submits a PR to generate the correct
generated source. In other words, the manifest is a communication from
the checked-in tree to the developer, but we still need to verify that
the checked-in generated Rust source and the manifest are correct with
respect to the checked-in ISLE source.
This moves the `f32const` and `f64const` instructions from `lower.rs` to
ISLE. I was originally going to add something else but due to the
`isle.rs`'s manual use of `constructor_imm(..)` it necessitated filling
out the `imm` cases for f32/f64 constants, so I figured I'd go ahead and
move these instructions as well.
The special case for 0 constants which use `xorp{s,d}` is preserved from
`lower.rs` today, but a special case isn't added for the all-ones
constant. The comment says to use `cmpeqp{s,d}` but as discovered on
other recent PRs that's not quite sufficient because comparing a
register against itself which happens to be NaN wouldn't work, so
something else will be required (perhaps `pcmpeq` or similar? I figured
I'd defer to later)
This was my first attempt at transitioning code to ISLE to originally
fix#3327 but that fix has since landed on `main`, so this is instead
now just porting a few operations to ISLE.
Closes#3336
This commit disables the `MachBuffer::check_label_branch_invariants`
debug check on the fuzzers due to it causing timeouts with the test case
from #3441. Fuzzing leads to a 20-30x slowdown of executed code and
locally the fuzz time it takes to instantiate #3441 drops from 3 minutes
to 6 seconds disabling this function. Note that this should still be
executed during our testing on CI since it's still enabled for debug
assertions.
There were a few previous code paths that attempted to handle this, but this new
check handles it for all callers.
Rematerializing constants, rather than assigning and reusing a register, allows
for lower register pressure.
This opcode was removed as part of the old-backend cleanup in #3446.
While this opcode will definitely go away eventually, it is
unfortunately still used today in Lucet (as we just discovered while
working to upgrade Lucet's pinned Cranelift version). Lucet is
deprecated and slated to eventually be completely sunset in favor of
Wasmtime; but until that happens, we need to keep this opcode.
This should be faster by avoiding matching on InstructionData to get the
Opcode, then on Opcode and then finally on the InstructionData again to
get the arguments
This eagerly evaluates the `format!` and produces a `String` with a heap
allocation, regardless whether `foo` is `Some`/`Ok` or `None`/`Err`. Using
`foo.unwrap_or_else(|| panic!(...))` makes it so that the error message
formatting is only evaluated if `foo` is `None`/`Err`.
Fixes#3468.
If a program has many instances of the pattern "goto next; next:" in a
row (i.e., no-op branches to the fallthrough address), the branch
simplification in `MachBuffer` would remove them all, as expected.
However, in order to work correctly, the algorithm needs to track all
labels that alias the current buffer tail, so that they can be adjusted
later if another branch chomp occurs.
When many thousands of this branch-to-next pattern occur, many thousands
of labels will reference the current buffer tail, and this list of
thousands of labels will be shuffled between the branch metadata struct
and the "labels at tail" struct as branches are appended and then
chomped immediately.
It's possible that with smarter data structure design, we could somehow
share the list of labels -- e.g., a single array of all labels, in order
they are bound, with ranges of indices in this array used to represent
lists of labels (actually, that seems like a better design in general);
but let's leave that to future optimization work.
For now, we can avoid the quadratic behavior by just "giving up" if the
list is too long; it's always valid to not optimize a branch. It is very
unlikely that the "normal" case will have more than 100 "goto next"
branches in a row, so this should not have any perf impact; if it does,
we will leave 1 out of every 100 such branches un-optimized in a long
sequence of thousands.
This takes total compilation time down on my machine from ~300ms to
~72ms for the `foo.wasm` case in #3441. For reference, the old backend
(now removed), built from arbitrarily-chosen-1-year-old commit
`c7fcc344`, takes 158ms, so we're ~twice as fast, which is what I would
expect.
This commit fixes an issue in Cranelift where legalization of
`heap_addr` instructions (used by wasm to represent heap accesses) could
be off-by-two where loads that should be valid were actually treated as
invalid. The bug here happened in an optimization where tests against
odd constants were being altered to tests against even constants by
subtracting one from the limit instead of adding one to the limit. The
comment around this area has been updated in accordance with a little
more math-stuff as well to help future readers.
On the build side, this commit introduces two things:
1. The automatic generation of various ISLE definitions for working with
CLIF. Specifically, it generates extern type definitions for clif opcodes and
the clif instruction data `enum`, as well as extractors for matching each clif
instructions. This happens inside the `cranelift-codegen-meta` crate.
2. The compilation of ISLE DSL sources to Rust code, that can be included in the
main `cranelift-codegen` compilation.
Next, this commit introduces the integration glue code required to get
ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif
instruction, we first try to use the ISLE code path. If it succeeds, then we are
done lowering this instruction. If it fails, then we proceed along the existing
hand-written code path for lowering.
Finally, this commit ports many lowering rules over from hand-written,
open-coded Rust to ISLE.
In the process of supporting ISLE, this commit also makes the x64 `Inst` capable
of expressing SSA by supporting 3-operand forms for all of the existing
instructions that only have a 2-operand form encoding:
dst = src1 op src2
Rather than only the typical x86-64 2-operand form:
dst = dst op src
This allows `MachInst` to be in SSA form, since `dst` and `src1` are
disentangled.
("3-operand" and "2-operand" are a little bit of a misnomer since not all
operations are binary operations, but we do the same thing for, e.g., unary
operations by disentangling the sole operand from the result.)
There are two motivations for this change:
1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE
lowering to translate a CLIF expression that evaluates to some value into a
`MachInst` expression that evaluates to the same value. We want both the
lowering itself and the resulting `MachInst` to be pure and referentially
transparent. This is both a nice paradigm for compiler writers that are
authoring and maintaining lowering rules and is a prerequisite to any sort of
formal verification of our lowering rules in the future.
2. Better align `MachInst` with `regalloc2`'s API, which requires that the input
be in SSA form.
* Add some debug logging for timing in module compiles
This is sometimes helpful when debugging slow compiles from fuzz bugs or
similar.
* Fix total duration calculation to not double-count
Some platforms such as AArch64 Linux support different memory page
sizes, so we need to be conservative when choosing the code section
alignment (which is equal to the page size) by using the maximum.
Copyright (c) 2021, Arm Limited.