Fixes#3468.
If a program has many instances of the pattern "goto next; next:" in a
row (i.e., no-op branches to the fallthrough address), the branch
simplification in `MachBuffer` would remove them all, as expected.
However, in order to work correctly, the algorithm needs to track all
labels that alias the current buffer tail, so that they can be adjusted
later if another branch chomp occurs.
When many thousands of this branch-to-next pattern occur, many thousands
of labels will reference the current buffer tail, and this list of
thousands of labels will be shuffled between the branch metadata struct
and the "labels at tail" struct as branches are appended and then
chomped immediately.
It's possible that with smarter data structure design, we could somehow
share the list of labels -- e.g., a single array of all labels, in order
they are bound, with ranges of indices in this array used to represent
lists of labels (actually, that seems like a better design in general);
but let's leave that to future optimization work.
For now, we can avoid the quadratic behavior by just "giving up" if the
list is too long; it's always valid to not optimize a branch. It is very
unlikely that the "normal" case will have more than 100 "goto next"
branches in a row, so this should not have any perf impact; if it does,
we will leave 1 out of every 100 such branches un-optimized in a long
sequence of thousands.
This takes total compilation time down on my machine from ~300ms to
~72ms for the `foo.wasm` case in #3441. For reference, the old backend
(now removed), built from arbitrarily-chosen-1-year-old commit
`c7fcc344`, takes 158ms, so we're ~twice as fast, which is what I would
expect.
It appears that some allocation heuristics have changed slightly since
0.0.31, so some of the golden-output filetests are updated as well.
Ideally we would rely more on runtests rather than golden-compilation
tests; but for now this is sufficient. (I'm not sure exactly what in
regalloc.rs changed to alter these heuristics; it's actually been almost
a year since the 0.0.31 release with several refactorings and tweaks
merged since then.)
Fixes#3441.
When there is a linking error caused by an undefined instance, list all
the instances exports in the error message. This will clarify errors for
undefined two-level imports that get desugared to one-level instance
imports under the module-linking proposal.
This commit fixes an issue in Cranelift where legalization of
`heap_addr` instructions (used by wasm to represent heap accesses) could
be off-by-two where loads that should be valid were actually treated as
invalid. The bug here happened in an optimization where tests against
odd constants were being altered to tests against even constants by
subtracting one from the limit instead of adding one to the limit. The
comment around this area has been updated in accordance with a little
more math-stuff as well to help future readers.
Before this commit we actually have two builders checking for security
advisories on CI, one is `cargo audit` and one is `cargo deny`. The
`cargo deny` builder is slightly different in that it checks a few other
things about our dependency tree such as licenses, duplicates, etc. This
commit removes the advisory check from `cargo deny` on CI and then moves
the `cargo audit` check to a separate workflow.
The `cargo audit` check will now run nightly and will open an issue on
the Wasmtime repository when an advisory is found. This should help make
it such that our CI is never broken by the publication of an advisory
but we're still promptly notified whenever an advisory is made. I've
updated the release process notes to indicate that the open issues
should be double-checked to ensure that there are no open advisories
that we need to take care of.
* Update the spec reference testsuite submodule
This commit brings in recent updates to the spec test suite. Most of the
changes here were already fixed in `wasmparser` with some tweaks to
esoteric modules, but Wasmtime also gets a bug fix where where import
matching for the size of tables/memories is based on the current runtime
size of the table/memory rather than the original type of the
table/memory. This means that during type matching the actual value is
consulted for its size rather than using the minimum size listed in its
type.
* Fix now-missing directories in build script
On the build side, this commit introduces two things:
1. The automatic generation of various ISLE definitions for working with
CLIF. Specifically, it generates extern type definitions for clif opcodes and
the clif instruction data `enum`, as well as extractors for matching each clif
instructions. This happens inside the `cranelift-codegen-meta` crate.
2. The compilation of ISLE DSL sources to Rust code, that can be included in the
main `cranelift-codegen` compilation.
Next, this commit introduces the integration glue code required to get
ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif
instruction, we first try to use the ISLE code path. If it succeeds, then we are
done lowering this instruction. If it fails, then we proceed along the existing
hand-written code path for lowering.
Finally, this commit ports many lowering rules over from hand-written,
open-coded Rust to ISLE.
In the process of supporting ISLE, this commit also makes the x64 `Inst` capable
of expressing SSA by supporting 3-operand forms for all of the existing
instructions that only have a 2-operand form encoding:
dst = src1 op src2
Rather than only the typical x86-64 2-operand form:
dst = dst op src
This allows `MachInst` to be in SSA form, since `dst` and `src1` are
disentangled.
("3-operand" and "2-operand" are a little bit of a misnomer since not all
operations are binary operations, but we do the same thing for, e.g., unary
operations by disentangling the sole operand from the result.)
There are two motivations for this change:
1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE
lowering to translate a CLIF expression that evaluates to some value into a
`MachInst` expression that evaluates to the same value. We want both the
lowering itself and the resulting `MachInst` to be pure and referentially
transparent. This is both a nice paradigm for compiler writers that are
authoring and maintaining lowering rules and is a prerequisite to any sort of
formal verification of our lowering rules in the future.
2. Better align `MachInst` with `regalloc2`'s API, which requires that the input
be in SSA form.