* Initial skeleton of some component model processing
This commit is the first of what will likely be many to implement the
component model proposal in Wasmtime. This will be structured as a
series of incremental commits, most of which haven't been written yet.
My hope is to make this incremental and over time to make this easier to
review and easier to test each step in isolation.
Here much of the skeleton of how components are going to work in
Wasmtime is sketched out. This is not a complete implementation of the
component model so it's not all that useful yet, but some things you can
do are:
* Process the type section into a representation amenable for working
with in Wasmtime.
* Process the module section and register core wasm modules.
* Process the instance section for core wasm modules.
* Process core wasm module imports.
* Process core wasm instance aliasing.
* Ability to compile a component with core wasm embedded.
* Ability to instantiate a component with no imports.
* Ability to get functions from this component.
This is already starting to diverge from the previous module linking
representation where a `Component` will try to avoid unnecessary
metadata about the component and instead internally only have the bare
minimum necessary to instantiate the module. My hope is we can avoid
constructing most of the index spaces during instantiation only for it
to all ge thrown away. Additionally I'm predicting that we'll need to
see through processing where possible to know how to generate adapters
and where they are fused.
At this time you can't actually call a component's functions, and that's
the next PR that I would like to make.
* Add tests for the component model support
This commit uses the recently updated wasm-tools crates to add tests for
the component model added in the previous commit. This involved updating
the `wasmtime-wast` crate for component-model changes. Currently the
component support there is quite primitive, but enough to at least
instantiate components and verify the internals of Wasmtime are all
working correctly. Additionally some simple tests for the embedding API
have also been added.
* Improve the `wasmtime` crate's README
This commit is me finally getting back to #2688 and improving the README
of the `wasmtime` crate. Currently we have a [pretty drab README][drab]
that doesn't really convey what we want about Wasmtime.
While I was doing this I opted to update the feature list of Wasmtime as
well in the main README (which is mirrored into the crate readme),
namely adding a bullet point for "secure" which I felt was missing
relative to how we think about Wasmtime.
Naturally there's a lot of ways to paint this shed, so feedback is of
course welcome on this! (I'm not the best writer myself)
[drab]: https://crates.io/crates/wasmtime/0.37.0
* Expand the "Fast" bullet a bit more
* Reference the book from the wasmtime crate
* Update more security docs
Also merge the sandboxing security page with the main security page to
avoid the empty security page.
This PR adds a basic *alias analysis*, and optimizations that use it.
This is a "mid-end optimization": it operates on CLIF, the
machine-independent IR, before lowering occurs.
The alias analysis (or maybe more properly, a sort of memory-value
analysis) determines when it can prove a particular memory
location is equal to a given SSA value, and when it can, it replaces any
loads of that location.
This subsumes two common optimizations:
* Redundant load elimination: when the same memory address is loaded two
times, and it can be proven that no intervening operations will write
to that memory, then the second load is *redundant* and its result
must be the same as the first. We can use the first load's result and
remove the second load.
* Store-to-load forwarding: when a load can be proven to access exactly
the memory written by a preceding store, we can replace the load's
result with the store's data operand, and remove the load.
Both of these optimizations rely on a "last store" analysis that is a
sort of coloring mechanism, split across disjoint categories of abstract
state. The basic idea is that every memory-accessing operation is put
into one of N disjoint categories; it is disallowed for memory to ever
be accessed by an op in one category and later accessed by an op in
another category. (The frontend must ensure this.)
Then, given this, we scan the code and determine, for each
memory-accessing op, when a single prior instruction is a store to the
same category. This "colors" the instruction: it is, in a sense, a
static name for that version of memory.
This analysis provides an important invariant: if two operations access
memory with the same last-store, then *no other store can alias* in the
time between that last store and these operations. This must-not-alias
property, together with a check that the accessed address is *exactly
the same* (same SSA value and offset), and other attributes of the
access (type, extension mode) are the same, let us prove that the
results are the same.
Given last-store info, we scan the instructions and build a table from
"memory location" key (last store, address, offset, type, extension) to
known SSA value stored in that location. A store inserts a new mapping.
A load may also insert a new mapping, if we didn't already have one.
Then when a load occurs and an entry already exists for its "location",
we can reuse the value. This will be either RLE or St-to-Ld depending on
where the value came from.
Note that this *does* work across basic blocks: the last-store analysis
is a full iterative dataflow pass, and we are careful to check dominance
of a previously-defined value before aliasing to it at a potentially
redundant load. So we will do the right thing if we only have a
"partially redundant" load (loaded already but only in one predecessor
block), but we will also correctly reuse a value if there is a store or
load above a loop and a redundant load of that value within the loop, as
long as no potentially-aliasing stores happen within the loop.
* Refactor binary-compatible-builds for releases
I was poking around this yesterday and noticed a few things that could
be improved for our release builds:
* The centos container for the x86_64 builds contained a bunch of extra
tooling we no longer need such as python3 and a C++ compiler. Along
with custom toolchain things this could all get removed since the C we
include now is quite simple.
* The aarch64 and s390x cross-compiled builds had relatively high glibc
version requirements compared to the x86_64 build. This was because we
don't use a container to build the cross-compiled binaries. I added
containers here along the lines of the x86_64 build to use an older
glibc to build the release binary to lower our version requirement.
This lower the aarch64 version requirement from glibc 2.28 to 2.17.
Additionally the s390x requirement dropped from 2.28 to 2.16.
* To make the containers a bit easier to read/write I added
`Dockerfile`s for them in a new `ci/docker` directory instead of
hardcoding install commands in JS.
This isn't intended to be a really big change or anything for anyone,
but it's intended to keep our Linux-based builds consistent at least as
best we can.
* Remove temporary change
* Improve documentation around `ResourceLimiter`
This commit takes a pass through the `Store::limiter` method and related
types/traits to improve the documentation with an example and soup up
any recent developments in the documentation.
Closes#4138
* Fix a broken doc link
This is required now that the simd specification has been merged into
the upstream specification, so to run the spec tests this must always be
enabled instead of being left to the whims of the fuzzer about whether
to enable it or not.
* Update the wasm-tools family of crates
This commit updates these crates as used by Wasmtime for the recently
published versions to pull in changes necessary to support the component
model. I've split this out from #4005 to make it clear what's impacted
here and #4005 can simply rebase on top of this to pick up the necessary
changes.
* More test fixes
A new version of rustc was released this morning and we have a few small
breakages on our CI which need fixing:
* A new warning was coming out of the c-api crate about an unneeded
`unsafe` block.
* The panic message of a task in `cranelift-object` needed updating
since the standard library changed how it formats strings with the nul
byte.
I ran across a case in Wasmtime today where a poor error message came
out of the CLI. For example before this commit you would get:
$ cargo run wast --wasm-features component-model foo.wast
error: Invalid value "wast" for '<MODULE>': module name cannot be the same as a subcommand
and now after this commit you get:
$ cargo run wast --wasm-features component-model foo.wast
error: Invalid value "component-model" for '--wasm-features <FEATURE,FEATURE,...>': unsupported WebAssembly feature 'component-model'
I believe this was an accidental regression from #4082 since Wasmtime
0.36.0 produces the error message as expected.
I opted to invert the conditional logic for falling back to the `run`
subcommand. Instead of having a small set of error kinds that print the
first-level error a small set of error kinds are now used to fall back
to the `run` subcommand by default. My hope is that as `ErrorKind` is
extended over time with various sorts of errors of parsing argumenst
this'll be more robust because most of the time we want the CLI
invocation to print out the normal CLI error, it's only in a select few
cases that using `run` is likely to succeed.
* Upgrade to regalloc2 0.1.3.
This pulls in bytecodealliance/regalloc2#49, which slightly improves
codegen in some cases where a safepoint (for reference-typed values)
occurs in the same liverange as a register-constrained use. For
example, in bytecodealliance/wasmtime#3785, an extra move instruction
appeared and a callee-save register was used (necessitating a more
expensive prologue) because of suboptimal splitting heuristics, which
this PR fixes. The updated RA2 heuristics appear to have no measured
downsides in existing benchmarks and improve the manually-observed
codegen issue.
* Update filetests where regalloc2 improvement altered behavior with reftypes.
As per discussion today, when we have a holiday (affecting any regular
attendee), we will push the meeting by a week. This does mean we
sometimes have meetings in contiguous weeks, but given the number of
topics we usually have to discuss, erring on the side of more discussion
time (rather than just canceling) is probably not a bad thing.
For the rest of this calendar year, given an otherwise regular
biweekly-on-Mondays cadence, the holiday conflicts I am aware of are: US
Memorial Day (falls on Mon May 28, pushed meeting to Mon Jun 6); US
Labor Day (falls on Mon Sept 5, pushed to Mon Sept 12). If there are any
other holidays in the below dates, I'm happy to update further!
* Run a callback when the interruption epoch is reached
Adds Store::epoch_deadline_callback. This accepts a callback which, when
invoked, can mutate the store's contents. The callback can either return
an error (in which case we trap) or return a delta which we'll use to
set the new epoch deadline.
* Add a basic test for epoch interruption callback
* Some small nits
- Remove use of &mut in the pattern match
- Return both yields and state from run_and_count_yields_or_trap in
test code and assert on them separately.
- Add a test for trapping on a state failure.
In #4143 we made ISLE compilation part of the normal build flow again,
to avoid the issues with the checked-in source. To make this acceptably
fast, we cut down dependencies of the ISLE compiler, so the "fancy"
error printing is now optional. When not included, it just prints error
messages to stderr in a list. However, this did not include file
locations. It might be nice to have this without enabling the "fancy
printing" and waiting for that to build.
Fortunately most of the plumbing for this was already present (we had it
at one point before switching to miette). This PR adds back locations to
the basic error output. It now looks like:
```
Error building ISLE files: ISLE errors:
src/isa/aarch64/inst.isle:1:1: parse error: Unexpected token Symbol("asdf")
```
* Don't attempt to track the generated clif.isle in cargo
This causes the build script to rerun every time for me.
* Put build script debug messages on stderr instead of stdout
This keeps stdout reserved for cargo build script directives
Current codegen had a number of logic errors confusing
NAND with AND WITH COMPLEMENT, and NOR with OR WITH COMPLEMENT.
Add support for the missing z15 instructions and fix logic.
The HashMap implementation is significantly simpler than the BTreeMap
implementation. Because of this switching reduces compilation time of
cranelift-isle by about 10%.
# Before
$ hyperfine --prepare "cargo clean" "cargo build"
Benchmark 1: cargo build
Time (mean ± σ): 5.221 s ± 0.094 s [User: 10.659 s, System: 0.734 s]
Range (min … max): 5.151 s … 5.420 s 10 runs
# After
$ hyperfine --prepare "cargo clean" "cargo build"
Benchmark 1: cargo build
Time (mean ± σ): 4.746 s ± 0.150 s [User: 9.109 s, System: 0.721 s]
Range (min … max): 4.630 s … 5.144 s 10 runs
This PR fixes#4066: it modifies the Cranelift `build.rs` workflow to
invoke the ISLE DSL compiler on every compilation, rather than only
when the user specifies a special "rebuild ISLE" feature.
The main benefit of this change is that it vastly simplifies the mental
model required of developers, and removes a bunch of failure modes
we have tried to work around in other ways. There is now just one
"source of truth", the ISLE source itself, in the repository, and so there
is no need to understand a special "rebuild" step and how to handle
merge errors. There is no special process needed to develop the compiler
when modifying the DSL. And there is no "noise" in the git history produced
by constantly-regenerated files.
The two main downsides we discussed in #4066 are:
- Compile time could increase, by adding more to the "meta" step before the main build;
- It becomes less obvious where the source definitions are (everything becomes
more "magic"), which makes exploration and debugging harder.
This PR addresses each of these concerns:
1. To maintain reasonable compile time, it includes work to cut down the
dependencies of the `cranelift-isle` crate to *nothing* (only the Rust stdlib),
in the default build. It does this by putting the error-reporting bits
(`miette` crate) under an optional feature, and the logging (`log` crate) under
a feature-controlled macro, and manually writing an `Error` impl rather than
using `thiserror`. This completely avoids proc macros and the `syn` build slowness.
The user can still get nice errors out of `miette`: this is enabled by specifying
a Cargo feature `--features isle-errors`.
2. To allow the user to optionally inspect the generated source, which nominally
lives in a hard-to-find path inside `target/` now, this PR adds a feature `isle-in-source-tree`
that, as implied by the name, moves the target for ISLE generated source into
the source tree, at `cranelift/codegen/isle_generated_source/`. It seems reasonable
to do this when an explicit feature (opt-in) is specified because this is how ISLE regeneration
currently works as well. To prevent surprises, if the feature is *not* specified, the
build fails if this directory exists.
In #4104 we discussed whether it makes sense for the division and
remainder ops to support vector types. We concluded that because most
hardware doesn't support it directly, it probably is not ideal to force
all backends to polyfill it. In the future we can always reverse this
decision, perhaps with a platform-independent legalization.
This PR restricts the allowed types on the CLIF ops to integer types
only.
This test was added between the last CI run on #4088 and its merge to
main, and the changes in #4088 (use of constants directly in instruction
via load from constant pool, rather than from a register initialized by
a separate instruction) cause it to fail now.
This PR alters the test to be invariant to regalloc and argument
decisions during lowering, as the test is really checking (per the
comment) that we get two cmoves without an intervening move. As such, it
just matches the instruction opcodes, irrespective of the arguments.
* Allow emitting u64 constants into constant pool.
* Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm.
* Fix rip-relative addressing bug in pinsrd emission.
* Narrow `allow(dead_code)` declarations
Having module wide `allow(dead_code)` may hide some code that's really
dead. In this commit I just narrowed the declarations to the specific
enum variants that were not used (as it seems reasonable to keep them
and their handling in all the matches, for future use). And the compiler
found more dead code that I think we can remove safely in the short
term.
With this, the only files annotated with a module-wide
`allow(dead_code)` are isle-generated files.
* resurrect some functions as test helpers
* ISLE compiler: fix priority-trie interval bug. (#4093)
This PR fixes a bug in the ISLE compiler related to rule priorities.
An important note first: the bug did not affect the correctness of the
Cranelift backends, either in theory (because the rules should be
correct applied in any order, even contrary to the stated priorities)
or in practice (because the generated code actually does not change at
all with the DSL compiler fix, only with a separate minimized bug
example).
The issue was a simple swap of `min` for `max` (see first
commit). This is the minimal fix, I think, to get a correct
priority-trie with the minimized bug example in this commit.
However, while debugging this, I started to convince myself that the
complexity of merging multiple priority ranges using the sort of
hybrid interval tree / string-matching trie data structure was
unneeded. The original design was built with the assumption we might
have a bunch of different priority levels, and would need the
efficiency of merging where possible. But in practice we haven't used
priorities this way: the vast majority of lowering rules exist at the
default (priority 0), and just a few overrides are explicitly at prio
1, 2 or (rarely) 3.
So, it turns out to be a lot simpler to label trie edges with (prio,
symbol) rather than (prio-range, symbol), and delete the whole mess of
interval-splitting logic on insertion. It's easier (IMHO) to convince
oneself that the resulting insertion algorithm is correct.
I was worried that this might impact the size of the generated Rust
code or its runtime, but In fact, to my initial surprise (but it makes
sense given the above "rarely used" factor), the generated code with
this compiler fix is *exactly the same*. I rebuilt with `--features
rebuild-isle,all-arch` but... there were no diffs to commit! This is
to me the simplest evidence that we didn't really need that
complexity.
* Fix earlier commit from #4093: properly sort trie.
This commit fixes an in-hindsight-obvious bug in #4093: the trie's edges
must be sorted recursively, not just at the top level.
With this fix, the generated code differs only in one cosmetic way (a
let-binding moves) but otherwise is the same.
This includes @fitzgen's fix to the CI (from the revert in #4102) that
deletes manifests to actually check that the checked-in source is
consistent with the checked-in compiler. The force-rebuild step is now
in a shell script for convenience: anyone hacking on the ISLE compiler
itself can use this script to more easily rebuild everything.
* Add note to build.rs to remind to update force-rebuild-isle.sh
This commit fixes an issue introduced in #4046 where the checks for
ensuring that the memory initialization image for a module was
constrained in its size failed to trigger and a very small module could
produce an arbitrarily large memory image.
The bug in question was that if a module only had empty data segments at
arbitrarily small and large addresses then the loop which checks whether
or not the image is allowed was skipped entirely since it was seen that
the memory had no data size. The fix here is to skip segments that are
empty to ensure that if the validation loop is skipped then no data
segments will be processed to create the image (and the module won't end
up having an image in the end).
* Revert "ISLE compiler: fix priority-trie interval bug. (#4093)"
This reverts commit 2122337112.
* ci: Delete ISLE manifest files to force an ISLE rebuild
This makes it so that we actually rebuild ISLE rather than just check that the
manifests are up-to-date.
The automated release process failed last night because the `./publish
bump` process failed to automatically increment the version number. This
is fixed in #4097 but it also seems prudent to run this continually on
CI as well to ensure that this doesn't regress again.
We won't be promising that this crate has API-compatible patch releases
and this also fixes the execution of the release process since the bump
of version numbers failed due to this being in this list.
* Remove the `Paged` memory initialization variant
This commit simplifies the `MemoryInitialization` enum by removing the
`Paged` variant. The `Paged` variant was originally added for uffd, but
that support has now been removed in #4040. This is no longer necessary
but is still used as an intermediate step of becoming a `Static` variant
of initialized memory (which copy-on-write uses). As a result this
commit largely modifies the static initialization of memory steps and
folds the two methods together.
* Apply suggestions from code review
Co-authored-by: Peter Huene <peter@huene.dev>
Co-authored-by: Peter Huene <peter@huene.dev>
As discussed previously, we need a way to be able to configure Wasmtime when running it in the Sightglass benchmark infrastructure. The easiest way to do this seemed to be to pass a string from Sightglass to the `bench-api` library and parse this in the same way that Wasmtime parses its CLI flags. The structure that contains these flags is `CommonOptions`, so it has been moved to its own crate to be depended on by both `wasmtime-cli` and `wasmtime-bench-api`. Also, this change adds an externally-visible function for parsing a string into `CommonOptions`, which is used for configuring an engine.
In #3744, we identified that extra `mov` instructions were inserted in
between the `cmov` instructions that CLIF's `select` lowers to. The
switch to regalloc2 resolved this and this test checks that no
intervening `mov`s are inserted. Closes#3744.
The pretty-printing had swapped dst and src2; this was introduced when
we moved to RA2 (sorry about that! IMHO we should do something to
automate the mapping between regalloc arg collection and pretty
printing/emission).
`src2` comes at the end because it has a variable number of register
mentions; this is in line with how many of the other inst formats work.
Actual emitted code was never incorrect, just the pretty-printing.
Updated test golden outputs look correct to me now, including the one
that we saw was incorrect in #3945.
This PR fixes a bug in the ISLE compiler related to rule priorities.
An important note first: the bug did not affect the correctness of the
Cranelift backends, either in theory (because the rules should be
correct applied in any order, even contrary to the stated priorities)
or in practice (because the generated code actually does not change at
all with the DSL compiler fix, only with a separate minimized bug
example).
The issue was a simple swap of `min` for `max` (see first
commit). This is the minimal fix, I think, to get a correct
priority-trie with the minimized bug example in this commit.
However, while debugging this, I started to convince myself that the
complexity of merging multiple priority ranges using the sort of
hybrid interval tree / string-matching trie data structure was
unneeded. The original design was built with the assumption we might
have a bunch of different priority levels, and would need the
efficiency of merging where possible. But in practice we haven't used
priorities this way: the vast majority of lowering rules exist at the
default (priority 0), and just a few overrides are explicitly at prio
1, 2 or (rarely) 3.
So, it turns out to be a lot simpler to label trie edges with (prio,
symbol) rather than (prio-range, symbol), and delete the whole mess of
interval-splitting logic on insertion. It's easier (IMHO) to convince
oneself that the resulting insertion algorithm is correct.
I was worried that this might impact the size of the generated Rust
code or its runtime, but In fact, to my initial surprise (but it makes
sense given the above "rarely used" factor), the generated code with
this compiler fix is *exactly the same*. I rebuilt with `--features
rebuild-isle,all-arch` but... there were no diffs to commit! This is
to me the simplest evidence that we didn't really need that
complexity.
This PR refactors the x64 backend address-mode lowering to use an
incremental-build approach, where it considers each node in a tree of
`iadd`s that feed into a load/store address and, at each step, builds
the best possible `Amode`. It will combine an arbitrary number of
constant offsets (an extension beyond the current rules), and can
capture a left-shifted (scaled) index in any position of the tree
(another extension).
This doesn't have any measurable performance improvement on our Wasm
benchmarks in Sightglass, unfortunately, because the IR lowered from
wasm32 will do address computation in 32 bits and then `uextend` it to
add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64
bits because this loses the wraparound semantics.
(We could label adds as "expected not to overflow", and allow *those* to
be lifted to 64 bit operations; wasm32 heap address computation should
fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's
likely my next step.)
Nevertheless, (i) this generalizes the cases we can handle, which should
be a good thing, all other things being equal (and in this case, no
compile time impact was measured); and (ii) might benefit non-Wasm
frontends.
Currently, we have partial Spectre mitigation: we protect heap accesses
with dynamic bounds checks. Specifically, we guard against errant
accesses on the misspeculated path beyond the bounds-check conditional
branch by adding a conditional move that is also dependent on the
bounds-check condition. This data dependency on the condition is not
speculated and thus will always pick the "safe" value (in the heap case,
a NULL address) on the misspeculated path, until the pipeline flushes
and recovers onto the correct path.
This PR uses the same technique both for table accesses -- used to
implement Wasm tables -- and for jumptables, used to implement Wasm
`br_table` instructions.
In the case of Wasm tables, the cmove picks the table base address on
the misspeculated path. This is equivalent to reading the first table
entry. This prevents loads of arbitrary data addresses on the
misspeculated path.
In the case of `br_table`, the cmove picks index 0 on the misspeculated
path. This is safer than allowing a branch to an address loaded from an
index under misspeculation (i.e., it preserves control-flow integrity
even under misspeculation).
The table mitigation is controlled by a Cranelift setting, on by
default. The br_table mitigation is always on, because it is part of the
single lowering pseudoinstruction. In both cases, the impact should be
minimal: a single extra cmove in a (relatively) rarely-used operation.
The table mitigation is architecture-independent (happens during
legalization); the br_table mitigation has been implemented for both x64
and aarch64. (I don't know enough about s390x to implement this
confidently there, but would happily review a PR to do the same on that
platform.)
This PR removes "argument polarity": the feature of ISLE extractors that lets them take
inputs aside from the value to be matched.
Cases that need this expressivity have been subsumed by #4072 with if-let clauses;
we can now finally remove this misfeature of the language, which has caused significant
confusion and has always felt like a bit of a hack.
This PR (i) removes the feature from the ISLE compiler; (ii) removes it from the reference
documentation; and (iii) refactors away all uses of the feature in our three existing
backends written in ISLE.
In #4072 I mistakenly put the subsection about if-let clauses in the
language doc just below the next section header, so it's in the wrong
section ("mapping to Rust"). This moves it back upward to where it
should be. Sorry about that!