As per Carlo Kok on Zulip #cranelift, this breaks builds with stable
Rust pre-1.43, as `core::u8::MAX` was only stabilized then. We'd like to
support older versions if we can easily do so.
This PR also adds `cranelift-tools` to the crates checked on CI with
Rust 1.41.0, which pulls in all backends (including `aarch64`).
* Use AsRef<Path> instead of AsRef<OsStr> in yanix functions.
`AsRef<Path>` makes these more consistent with `std` interfaces, making
them easier to use outside of wasi-common.
Also, refactor the conversion to `CString` into a helper function.
* Reduce clutter from fully-qualifying names.
* rustfmt
Previously, we simply compared the input bool to 0, which forced the
value into a register (usually via a cmp and cset), zero-extended it,
etc. This patch performs the same pattern-matching that branches do to
directly perform the cmp and use its flag results with the csel.
On the `bz2` benchmark, the runtime is affected as follows (measuring
with `perf stat`, using wasmtime with its cache enabled, and taking the
second run after the first compiles and populates the cache):
pre:
1117.232000 task-clock (msec) # 1.000 CPUs utilized
133 context-switches # 0.119 K/sec
1 cpu-migrations # 0.001 K/sec
5,041 page-faults # 0.005 M/sec
3,511,615,100 cycles # 3.143 GHz
4,272,427,772 instructions # 1.22 insn per cycle
<not supported> branches
27,980,906 branch-misses
1.117299838 seconds time elapsed
post:
1003.738075 task-clock (msec) # 1.000 CPUs utilized
121 context-switches # 0.121 K/sec
0 cpu-migrations # 0.000 K/sec
5,052 page-faults # 0.005 M/sec
3,224,875,393 cycles # 3.213 GHz
4,000,838,686 instructions # 1.24 insn per cycle
<not supported> branches
27,928,232 branch-misses
1.003440004 seconds time elapsed
In other words, with this change, on `bz2`, we see a 6.3% reduction in
executed instructions.
This lets us avoid the cost of `cranelift_codegen::ir::Opcode` to
`peepmatic_runtime::Operator` conversion overhead, and paves the way for
allowing Peepmatic to support non-clif optimizations (e.g. vcode optimizations).
Rather than defining our own `peepmatic::Operator` type like we used to, now the
whole `peepmatic` crate is effectively generic over a `TOperator` type
parameter. For the Cranelift integration, we use `cranelift_codegen::ir::Opcode`
as the concrete type for our `TOperator` type parameter. For testing, we also
define a `TestOperator` type, so that we can test Peepmatic code without
building all of Cranelift, and we can keep them somewhat isolated from each
other.
The methods that `peepmatic::Operator` had are now translated into trait bounds
on the `TOperator` type. These traits need to be shared between all of
`peepmatic`, `peepmatic-runtime`, and `cranelift-codegen`'s Peepmatic
integration. Therefore, these new traits live in a new crate:
`peepmatic-traits`. This crate acts as a header file of sorts for shared
trait/type/macro definitions.
Additionally, the `peepmatic-runtime` crate no longer depends on the
`peepmatic-macro` procedural macro crate, which should lead to faster build
times for Cranelift when it is using pre-built peephole optimizers.
We had previously fixed a bug in which constant shift amounts should be
masked to modulo the number of bits in the operand; however, we did not
fix the analogous case for shifts incorporated into the second register
argument of ALU instructions that support integrated shifts. This
failure to mask resulted in illegal instructions being generated, e.g.
in https://bugzilla.mozilla.org/show_bug.cgi?id=1653502. This PR fixes
the issue by masking the amount, as the shift semantics require.
Instead, when the `rebuild-peephole-optimizers` feature is enabled, rebuild them
the first time they are used. This allows peepmatic to run when Cranelift's
`Opcode` is defined and available, which paves the way forward for:
* merging `peepmatic_runtime::operator::Operator` and Cranelift's `Opcode` (we
are wasting a bunch of cycles converting between the two of them), and
* supporting vcode optimizations in `peepmatic`.
This commit updates our CI to verify that all crates are publish-able at
all times on every commit. During the 0.19.0 release we found another
case where the crates as they live in this repository weren't
publish-able, so the hope is that this no longer comes up again!
The script added in this commit also takes the time/liberty to remove
the existing bump/publish scripts and instead replace them with one Rust
script originally sourced from wasm-bindgen. The intention of this
script is that it has three modes:
* `./publish bump` - bumps version numbers which are sent as a PR to get
reviewed (probably with a changelog as well)
* `./publish verify` - run on CI on every commit, builds every crate we
publish as if it's being published to crates.io, notably without raw
access to other crates in the repository.
* `./publish publish` - publishes all crates to crates.io, passing the
`--no-verify` flag to make this a much speedier process than it is
today.
This commit fixes `Clone for wasm_val_t` to avoid attempting to chase a
null pointer. It also fixes the implementation for `FuncRef` values by
cloning their internal `wasm_ref_t` as well.
Currently `Func::new` will panic if one of the arguments of the function
is a reference type and the `Store` doesn't have reference types
enabled. This happens because cranelift isn't configure to enable stack
maps but the register allocators expects them to exist when reference
types are seen.
The fix here is to always enable reference types in cranelift for our
trampoline generation and `Func::new`. This should hopefully ensure that
trampolines are generated correctly and they'll just not be able to get
hooked up to an `Instance` because validation will prevent reference
types from being used elsewhere.
This is tricky: the control flow implicitly implied by the operand makes
it so that the output register may be undefined, if we mark it only as a
"def". Make it a "mod" instead, which matches our usage in the codebase,
and will make it crash if the output operand isn't unconditionally
defined before the instruction.