GitHub recently made its merge queue feature available for use in public
repositories owned by organizations meaning that the Wasmtime repository
is a candidate for using this. GitHub's Merge Queue feature is a system
that's similar to Rust's bors integration where PRs are tested before
merging and only passing PRs are merged. This implements the "not rocket
science" rule where the `main` branch of Wasmtime, for example, is
always tested and passes CI. This is in contrast to our current
implementation of CI where PRs are merged when they pass their own CI,
but the code that was tested is not guaranteed to be the state of `main`
when the PR is merged, meaning that we're at risk now of a failing
`main` branch despite all merged PRs being green. While this has
happened with Wasmtime this is not a common occurrence, however.
The main motivation, instead, to use GitHub's Merge Queue feature is
that it will enable Wasmtime to greatly reduce the amount of CI running
on PRs themselves. Currently the full test suite runs on every push to
every PR, meaning that our workers on GitHub Actions are frequently
clogged throughout weekdays and PRs can take quite some time to come
back with a successful run. Through the use of a Merge Queue, however,
we're able to configure only a small handful of checks to run on PRs
while deferring the main body of checks to happening on the
merge-via-the-queue itself. This is hoped to free up capacity on CI and
overall improve CI times for Wasmtime and Cranelift developers.
The implementation of all of this required quite a lot of plumbing and
retooling of our CI. I've been testing this in an [external
repository][testrepo] and I think everything is working now. A list of
changes made in this PR are:
* The `build.yml` workflow is merged back into the `main.yml` workflow
as the original reason to split it out is not longer applicable (it'll
run on all merges). This was also done to fit in the dependency graph
of jobs of one workflow.
* Publication of the `gh-pages` branch, the `dev` tag artifacts, and
release artifacts have been moved to a separate
`publish-artifacts.yml` workflow. This workflow runs on all pushes to
`main` and all tags. This workflow no longer actually preforms any
builds, however, and relies on a merge queue or similar being used for
branches/tags where artifacts are downloaded from the workflow run to
be uploaded. For pushes to `main` this works because a merge queue is
run meaning that by the time the push happens all artifacts are ready.
For release branches this is handled by..
* The `push-tag.yml` workflow is subsumed by the `main.yml` workflow. CI
for a tag being pushed will upload artifacts to a release in GitHub,
meaning that all builds must finish first for the commit. The
`main.yml` workflow at the end now scans commits for the preexisting
magical marker and pushes a tag if necessary.
* CI is currently a flat list of "run all these jobs" and this is now
rearchitected to a "fan out" approach where some jobs run to determine
the next jobs to run which then get "joined" into a finish step. The
purpose for this is somewhat nuanced and this has implications for CI
runtime as well. The Merge Queue feature requires branches to be
protected with "these checks must pass" and then the same checks are
gates both to enter the merge queue as well as pass the merge queue.
The saving grace, however, is that a "skipped" check counts as
passing, meaning checks can be skipped on PRs but run to completion on
the merge queue. A problem with this though is the build matrix used
for tests where PRs want to only run one element of the build matrix
ideally but there's no means on GitHub Actions right now for the
skipped entries to show up as skipped easily (or not that I know of).
This means that the "join" step serves the purpose of being the single
gate for both PR and merge queue CI and there's just more inputs to it
for merge queue CI. The major consequence of this decision is that
GitHub's actions scheduling doesn't work out well here. Jobs are
scheduled in a FIFO order meaning that the job for "ok complete the CI
run" is queued up after everything else has completed, possibly
after lots of other CI requests in the middle for other PRs. The hope
here is that by using a merge queue we can keep CI relatively under
control and this won't affect merge times too much.
* All jobs in the `main.yml` workflow will not automatically cancel the
entire run if they fail. Previously this fail-fast behavior was only
part of the matrix runs (and just for that matrix), but this is
required to make the merge queue expedient. The gate of the merge
queue is the final "join" step which is only executed once all
dependencies have finished. This means, for example, that if rustfmt
fails quickly then the tests which take longer might run for quite
awhile before the join step reports failure, meaning that the PR sits
in the queue for longer than needed being tested when we know it's
already going to fail. By having all jobs cancel the run this means
that failures immediately bail out and mark the whole job as
cancelled.
* A new "determine" CI job was added to determine what CI actually needs
to run. This is a "choke point" which is scheduled at the start of CI
that quickly figures out what else needs to be run. This notably
indicates whether large swaths of ci (the `run-full` flag) like the
build matrix are executed. Additionally this dynamically calculates a
matrix of tests to run based on a new `./ci/build-test-matrix.js`
script. Various inputs are considered for this such as:
1. All pushes, meaning merge queue branches or release-branch merges,
will run full CI.
2. PRs to release branches will run full CI.
3. PRs to `main`, the most common, determine what to run based on
what's modified and what's in the commit message.
Some examples for (3) above are if modifications are made to
`cranelift/codegen/src/isa/*` then that corresponding builder is
executed on CI. If the `crates/c-api` directory is modified then the
CMake-based tests are run on PRs but are otherwise skipped.
Annotations in commit messages such as `prtest:*` can be used to
explicitly request testing.
Before this PR merges to `main` would perform two full runs of CI: one
on the PR itself and one on the merge to `main`. Note that the one as a
merge to `main` was quite frequently cancelled due to a merge happening
later. Additionally before this PR there was always the risk of a bad
merge where what was merged ended up creating a `main` that failed CI to
to a non-code-related merge conflict.
After this PR merges to `main` will perform one full run of CI, the one
as part of the merge queue. PRs themselves will perform one test job
most of the time otherwise. The `main` branch is additionally always
guaranteed to pass tests via the merge queue feature.
For release branches, before this PR merges would perform two full
builds - one for the PR and one for the merge. A third build was then
required for the release tag itself. This is now cut down to two full
builds, one for the PR and one for the merge. The reason for this is
that the merge queue feature currently can't be used for our
wildcard-based `release-*` branch protections. It is now possible,
however, to turn on required CI checks for the `release-*` branch PRs so
we can at least have a "hit the button and forget" strategy for merging
PRs now.
Note that this change to CI is not without its risks. The Merge Queue
feature is still in beta and is quite new for GitHub. One bug that
Trevor and I uncovered is that if a PR is being tested in the merge
queue and a contributor pushes to their PR then the PR isn't removed
from the merge queue but is instead merged when CI is successful, losing
the changes that the contributor pushed (what's merged is what was
tested). We suspect that GitHub will fix this, however.
Additionally though there's the risk that this may increase merge time
for PRs to Wasmtime in practice. The Merge Queue feature has the ability
to "batch" PRs together for a merge but this is only done if concurrent
builds are allowed. This means that if 5 PRs are batched together then 5
separate merges would be created for the stack of 5 PRs. If the CI for
all 5 merged together passes then everything is merged, otherwise a PR
is kicked out. We can't easily do this, however, since a major purpose
for the merge queue for us would be to cut down on usage of CI builders
meaning the max concurrency would be set to 1 meaning that only one PR
at a time will be merged. This means PRs may sit in the queue for awhile
since previously many `main`-based builds are cancelled due to
subsequent merges of other PRs, but now they must all run to 100%
completion.
[testrepo]: https://github.com/bytecodealliance/wasmtime-merge-queue-testing
Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.
This change adds the `wasmtime-wasi-threads` crate as a default crate
for the CLI application. This is no change for embedders of Wasmtime:
they would still have to include `wasmtime-wasi-threads` manually.
Enabling the crate by default in the CLI application has several
benefits, e.g., that it is simpler to experiment with and that it will
be part of more test runs (and thus bugs can be discovered more
quickly). Users will still have to add
`--wasi-modules=experimental-wasi-threads` to enable wasi-threads on the
command line.
As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function.
---------
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* cranelift: Add `adrp` encoding to AArch64 backend
* cranelift: Support GOT Symbol References in AArch64
* cranelift: Add MachO GOT relocations
* cranelift: Do not mark the GOT PageOffset12 MachO relocation as relative
This updates the signatures of the `xmm_rm_r` helper function and then
updates existing users and migrates other users to the helper now that
the type information is no longer required.
* Update world-selection in `bindgen!` macro
Inspired by bytecodealliance/wit-bindgen#494 specifying a world or
document to bindgen is now optional as it's inferred if there's only one
`default world` in a package's documents.
* Add cargo-vet entry
This commit fixes a panic related to type imports where an import of a
type didn't correctly declare the new type index on the Wasmtime side of
things. Additionally this plumbs more support throughout Wasmtime to
support type imports, namely that they do not need to be supplied
through a `Linker`. This additionally implements a feature where empty
instances, even transitively, do not need to be supplied by a Wasmtime
embedder. This means that instances which only have types, for example,
do not need to be supplied into a `Linker` since no runtime information
for them is required anyway.
Closes#5775
This works around a `rustc` bug where compiling with LTO
will sometimes strip out some of the trampoline entrypoint
symbols resulting in a linking failure.
* Add (bnot (bxor x y)) lowerings for s390x/aarch64
I originally thought that s390x's original lowering in #5709, but as was
rightfully pointed out `(bnot (bxor x y))` is equivalent to
`(bxor x (bnot y))` so the special lowering for one should apply as a
special lowering for the other. For the s390x and aarch64 backend that
have already have a fused lowering of the bxor/bnot add a lowering
additionally for the bnot/bxor combination.
* Add bnot(bxor(..)) tests for s390x 128-bit sizes
* Update wasm-tools crates
Pulls in a new component binary format which should hopefully be the
last update for awhile.
* Update cargo vet configuration
* Add support for WASI sockets to C API
Add support for WASI sockets in the C API by adding a new API to handle
preopening sockets for clients. This uses HashMap instead of Vec for
preopened sockets to identify if caller has called in more than once
with the same FD number. If so, then we return false so caller is given
hint that they are attempting to overwrite an already existing socket
FD.
* Apply suggestions from code review
Co-authored-by: Peter Huene <peter@huene.dev>
* s/stdlistener/listener/
---------
Co-authored-by: Peter Huene <peter@huene.dev>
As jump tables are used by at most one br_table instruction, inline their definition in those instructions instead of requiring them to be declared as function-level metadata.
This patch adds complete support for the `sub` and `add` WebAssembly instructions
for x64, and complete support for the `add` WebAssembly instruction for aarch64.
This patch also refactors how the binary operations get constructed within the
`VisitOperator` trait implementation. The refactor adds methods in the
`CodeGenContext` to abstract all the common steps to emit binary operations,
making this process less repetitive and less brittle (e.g. omitting to push the resulting value
to the stack, or omitting to free registers after used).
This patch also improves test coverage and refactors the filetests directory to make it
easier to add tests for other instructions.
* wasi-threads: fix import name
As @TerrorJack pointed out in #5484, that PR implements an older
name--`thread_spawn`. This change uses the now-official name from the
specification--`thread-spawn`.
* fix: update name in test
Move the storage for jump tables off of FunctionStencil and onto DataFlowGraph. This change is in service of #5731, making it easier to access the jump table data in the context of helpers like inst_values.
* Generalize `n ^ !n` optimization to more types
* Generalize `x & -1` optimization to more types
Also mark the `x & x` rewrite to `subsume`.
* Cranelift: Optimize x|!x and x&!x to constants
These cases are much like the existing x^!x rules.
Similar to when we exposed the DataFlowGraph::insts field through a restrictive newtype, expose DataFlowGraph::blocks through an interface that allows a restrictive set of operations. This field being public now allows us to avoid a rematch in ssa construction, and simplifies the implementation of adding a block argument to a block referenced by a br_table instruction.
We don't need to spend time going through the GVN map to dedup a
newly-constructed `iconst 0` when we already matched that value on the
left-hand side of these rules.
Also, mark these rules as subsuming any others since we can't do better
than reducing an expression to a constant.
At some point what is now `funcref` was called `anyfunc` and the spec changed,
but we didn't update our internal names. This does that.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This commit includes a set of changes that add initial support for `wasi-threads` to Wasmtime:
* feat: remove mutability from the WasiCtx Table
This patch adds interior mutability to the WasiCtx Table and the Table elements.
Major pain points:
* `File` only needs `RwLock<cap_std::fs::File>` to implement
`File::set_fdflags()` on Windows, because of [1]
* Because `File` needs a `RwLock` and `RwLock*Guard` cannot
be hold across an `.await`, The `async` from
`async fn num_ready_bytes(&self)` had to be removed
* Because `File` needs a `RwLock` and `RwLock*Guard` cannot
be dereferenced in `pollable`, the signature of
`fn pollable(&self) -> Option<rustix::fd::BorrowedFd>`
changed to `fn pollable(&self) -> Option<Arc<dyn AsFd + '_>>`
[1] da238e324e/src/fs/fd_flags.rs (L210-L217)
* wasi-threads: add an initial implementation
This change is a first step toward implementing `wasi-threads` in
Wasmtime. We may find that it has some missing pieces, but the core
functionality is there: when `wasi::thread_spawn` is called by a running
WebAssembly module, a function named `wasi_thread_start` is found in the
module's exports and called in a new instance. The shared memory of the
original instance is reused in the new instance.
This new WASI proposal is in its early stages and details are still
being hashed out in the [spec] and [wasi-libc] repositories. Due to its
experimental state, the `wasi-threads` functionality is hidden behind
both a compile-time and runtime flag: one must build with `--features
wasi-threads` but also run the Wasmtime CLI with `--wasm-features
threads` and `--wasi-modules experimental-wasi-threads`. One can
experiment with `wasi-threads` by running:
```console
$ cargo run --features wasi-threads -- \
--wasm-features threads --wasi-modules experimental-wasi-threads \
<a threads-enabled module>
```
Threads-enabled Wasm modules are not yet easy to build. Hopefully this
is resolved soon, but in the meantime see the use of
`THREAD_MODEL=posix` in the [wasi-libc] repository for some clues on
what is necessary. Wiggle complicates things by requiring the Wasm
memory to be exported with a certain name and `wasi-threads` also
expects that memory to be imported; this build-time obstacle can be
overcome with the `--import-memory --export-memory` flags only available
in the latest Clang tree. Due to all of this, the included tests are
written directly in WAT--run these with:
```console
$ cargo test --features wasi-threads -p wasmtime-cli -- cli_tests
```
[spec]: https://github.com/WebAssembly/wasi-threads
[wasi-libc]: https://github.com/WebAssembly/wasi-libc
This change does not protect the WASI implementations themselves from
concurrent access. This is already complete in previous commits or left
for future commits in certain cases (e.g., wasi-nn).
* wasi-threads: factor out process exit logic
As is being discussed [elsewhere], either calling `proc_exit` or
trapping in any thread should halt execution of all threads. The
Wasmtime CLI already has logic for adapting a WebAssembly error code to
a code expected in each OS. This change factors out this logic to a new
function, `maybe_exit_on_error`, for use within the `wasi-threads`
implementation.
This will work reasonably well for CLI users of Wasmtime +
`wasi-threads`, but embedders will want something better in the future:
when a `wasi-threads` threads fails, they may not want their application
to exit. Handling this is tricky, because it will require cancelling the
threads spawned by the `wasi-threads` implementation, something that is
not trivial to do in Rust. With this change, we defer that work until
later in order to provide a working implementation of `wasi-threads` for
experimentation.
[elsewhere]: https://github.com/WebAssembly/wasi-threads/pull/17
* review: work around `fd_fdstat_set_flags`
In order to make progress with wasi-threads, this change temporarily
works around limitations induced by `wasi-common`'s
`fd_fdstat_set_flags` to allow `&mut self` use in the implementation.
Eventual resolution is tracked in
https://github.com/bytecodealliance/wasmtime/issues/5643. This change
makes several related helper functions (e.g., `set_fdflags`) take `&mut
self` as well.
* test: use `wait`/`notify` to improve `threads.wat` test
Previously, the test simply executed in a loop for some hardcoded number
of iterations. This changes uses `wait` and `notify` and atomic
operations to keep track of when the spawned threads are done and join
on the main thread appropriately.
* various fixes and tweaks due to the PR review
---------
Signed-off-by: Harald Hoyer <harald@profian.com>
Co-authored-by: Harald Hoyer <harald@profian.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
I was playing around with souper recently on some wasms I had lying
around and these are some optimization opportunities that popped out
which seemed easy-enough to add to the egraph-based optimizations.
When investigating #5716, I found that rematerialization of a `call`, in
addition to blowing up for other reasons, caused aliasing of the varargs
list (the `EntityList` in the `ListPool`), such that editing the args of
the second copy of the call instruction inadvertently updated the first
as well.
This PR modifies `DataFlowGraph::clone_inst` so that it always clones
the varargs list if present. This shouldn't have any functional impact
on Cranelift today, because we don't rematerialize any instructions with
varargs; but it's important to get it right to avoid a bug later!
We don't have overlap in behavior for branch instructions anymore, so we can remove analyze_branch and instead match on the InstructionData directly.
Co-authored-by: Jamey Sharp <jamey@minilop.net>