This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.
Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on.
This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!).
Fixes#5181.
* Support mergeable-but-side-effectful (idempotent) operations in general in the egraph's GVN.
This mirrors the similar change made in #5534.
* Add tests for egraph case.
A fuzz bug came in last night from #5567 where spectest fuzzing will
first generate a config, possibly with SSE features for SIMD disabled,
only to have SIMD later enabled by `set_spectest_compliant`. This commit
fixes the issue by changing to `is_spectest_compliant` as a query and
throwing out the fuzz case if it isn't. This means that the spectest
fuzzer will throw out more inputs but means we can continue to generate
interesting configs and such for other inputs.
* Adding in the foundations for Winch `filetests`
This commit adds two new crates into the Winch workspace:
`filetests` and `test-macros`. The intent is to mimic the
structure of Cranelift `filetests`, but in a simpler way.
* Updates to documentation
This commits adds a high level document to outline how to test Winch
through the `winch-tools` utility. It also updates some inline
documentation which gets propagated to the CLI.
* Updating test-macro to use a glob instead of only a flat directory
This commit fixes more cases from #5565 where `export` items introducing
indices wasn't handled by accident. Additionally this fixes support for
aliasing types from instances which largely wasn't working before. Most
of the fixes here are about correctly maintaining Wasmtime's view of the
type index spaces.
* Update WIT tooling used by Wasmtime
This commit updates the WIT tooling, namely the wasm-tools family of
crates, with recent updates. Notably:
* bytecodealliance/wasm-tools#867
* bytecodealliance/wasm-tools#871
This updates index spaces in components and additionally bumps the
minimum required version of the component binary format to be consumed
by Wasmtime (because of the index space changes). Additionally WIT
tooling now fully supports `use`.
Note that WIT tooling doesn't, at this time, fully support packages and
depending on remotely defined WIT packages. Currently WIT still needs to
be vendored in the project. It's hoped that future work with `cargo
component` and possible integration here could make the story about
depending on remotely-defined WIT more ergonomic and streamlined.
* Fix `bindgen!` codegen tests
* Add a test for `use` paths an implement support
* Update to crates.io versions of wasm-tools
* Uncomment codegen tests
* Resolve libcall relocations for older CPUs
Long ago Wasmtime used to have logic for resolving relocations
post-compilation for libcalls which I ended up removing during
refactorings last year. As #5563 points out, however, it's possible to
get Wasmtime to panic by disabling SSE features which forces Cranelift
to use libcalls for some floating-point operations instead. Note that
this also requires disabling SIMD because SIMD support has a baseline of
SSE 4.2.
This commit pulls back the old implementations of various libcalls and
reimplements logic necessary to have them work on CPUs without SSE 4.2
Closes#5563
* Fix log message in `wast` support
* Fix offset listed in relocations
Be sure to factor in the offset of the function itself
* Review comments
Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList.
To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments:
inst_values - returns an iterator that traverses values in the instruction and block arguments
map_inst_values - applies a function to each value in the instruction and block arguments
overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator
Co-authored-by: Jamey Sharp <jamey@minilop.net>
This is a cleanup to help prepare for #5464.
Most of the diff is inlining the closure for `mark_all_uses_as_multiple`
which was only called once. That avoids some borrow-checker challenges.
The key change is that the former `push_args_on_stack` closure no longer
actually pushes the iterator on the stack, but just returns it. That
way this closure doesn't need the name of the stack's type. It also
allows it to be reused in the debug_assert.
This commit fixes an issue where when backtraces were disabled but a
host function returned an error it would trigger a debug assertion
within Wasmtime. The fix here is to update the condition of the debug
assertion and add a test doing this behavior to ensure it works in the
future.
I've also further taken the liberty in this commit to remove the
deprecation notice for `Config::wasm_backtrace`. We don't really have a
strong reason for removing this functionality at this time and users
have multiple times now reported issues with performance that seem
worthwhile to keep the option. The latest issue, #5577, has a use case
where it appears the quadratic behavior is back in a way that Wasmtime
won't be able to detect. Namely with lots of wasm interleaved with host
on the stack if the original error isn't threaded through the entire
time then each host error will trigger a new backtrace since it doesn't
see a prior backtrace in the error being returned.
While this could otherwise be fixed with only capturing one contiguous
backtrace perhaps this seems reasonable enough to leave the
`wasm_backtrace` config option for now.
Closes#5577
cranelift/codegen/src/isa/*/lower/isle/generated_code.rs are
hand-written modules that `include!` the actual generated code.
Tagging these hand-written files as `linguist-generated` in
.gitattributes tells GitHub not to show them in diffs by default. But
they're actually important to review, so they should be included in
diffs.
* wiggle: Inline some trivial functions
This commit marks a number of functions in wiggle as `#[inline]` as
they're otherwise trivial, mostly returning constants. This comes out of
some work I looked at recently with Andrew where some of these functions
showed up in profiles when they shouldn't.
* wiggle: Optimize the `GuestMemory` for shared memory
This commit implements a minor optimization to the `GuestMemory`
implementation for Wasmtime to skip most methods if a shared memory is
in play. Shared memories never get borrowed and this can be used to
internally skip some borrow-checker methods.
* wiggle: Optimize `GuestPtr::to_vec`
This commit replaces the safe implementation of `GuestPtr::to_vec` with
an unsafe implementation. The purpose of this is to speed up the
function when used with shared memory which otherwise performs a bunch
of atomic reads for types like `u8` which does validation-per-element
and isn't vectorizable. On a benchmark I was helping Andrew with this
sped up the host code enough to the point that guest code dwarfed the
execution time.
* Fix build
Following up on #5535, treat positive and negative zero as inequal in
wasmtime::component::Val::Float{32,64}'s `PartialEq` logic. IEEE 754
equality considers these values equal, but they are semantically
distinct values, and testing and fuzzing should be aware of the
difference.
Update the documentation for `Caller::get_export` to clarify that it's
not expected to be removed in the future. Components do offer an
alternative to `Caller::get_export`, so add a brief note mentioning
that.
Also, as of #4431 `get_export` now works for all exports, not just
memories and functions.
* wasi: avoid buffer underflow with shared memory
This change fixes an issue identified when using wasi-threads to perform
file reads. In order to maintain Rust safety guarantees in the presence
of WebAssembly shared memory, which can be modified concurrently by any
of the running threads, the WASI implementations of `fd_read` and
`fd_pread` were given special code paths when shared memory is detected:
in these cases, the data is first read into a host-limited buffer and
then subsequently copied into linear memory. The problem was that the
rather-complex logic for doing this "buffer then copy" idea for multiple
IO vectors could fail due to buffer underflow. If, e.g., a read was
limited by the host to 64K (or even if the read returned less than the
total buffer size) the `UnsafeGuestSlice::copy_from_slice` logic would
fail, complaining that the sizes of both buffers were unequal.
This change both simplifies and fixes the logic:
- only the first IO vector is filled; this could represent a performance
penalty for threaded programs, but the "buffer then copy" idea already
imposes a non-trivial overhead. This simplifies the logic, allowing us
to...
- resize the shared memory buffer to the exact number of bytes read
* review: early return when no IO vectors passed to shared memory
* fix: add empty RoFlags on early exit
We accidentally broke the build for Android when introducing the jit-icache-coherence
crate. To avoid this happening again, add a check job just to ensure that it can build.
See #5323 and #5331 for context.
...not just the ones at the outer scope of a rule.
Thanks to @avanhatt for pointing out that #5221 didn't capture as much
information as I intended it to.
* Wasmtime: Add `Config::disable_cache`
* bench-api: Always disable the cache
* bench-api: Always get a `Config` from CLI flags
This commit fixes an issue that I ran into just now where benchmarking
one `*.so` with `--engine-flags` was giving wildly unexpected results
comparing to something without `--engine-flags`. The root cause here
appears to that when specifying `--engine-flags` the CLI parsing code is
used to create a `Config` and when omitted a `Config::new` instance is
created. The main difference between these is that for the CLI caching
is enabled by default and for `Config::new` it is not. Coupled with the
fact that caching doesn't really work for the `main` branch this ended
up giving wild results.
The fix here is to first always use the CLI parsing code to create a
`Config` to ensure that a config is consistently created. Next the
`--disable-cache` flag is unconditionally passed to the CLI parsing to
ensure that compilation actually happens.
Once applied this enables comparing an engine without flags and an
engine with flags which provides consistent results.
* Fix compile error
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* add clif-util compile option to output object file
* switch from a box to a borrow
* update objectmodule tests to use borrowed isa
* put targetisa into an arc
In #5510 we changed the value types of these variants from u{32,64} to
f{32,64}. One side effect of this change was that two NaN values would
no longer compare equal. While this is behavior complies with IEEE-754
floating point operations, it broke equality assumptions in fuzzing.
This commit changes equality for Val to make NaNs compare equal. Since
the component model requires NaN canonicalization, all NaN bit
representations compare equal, which is different from the original
behavior.
This also gives Vals the semantics of Eq again, so that trait impl has
been reintroduced to related types as well.
Fuzz additional targets in the cranelift-icache target. The list of targets fuzzed is controlled by the targets enabled in fuzz/Cargo.toml.
This PR also reworks how instruction disabling is done in function generator, moving the deny-list to a function to make the decision at runtime instead of compile time.