cranelift/codegen/src/isa/*/lower/isle/generated_code.rs are
hand-written modules that `include!` the actual generated code.
Tagging these hand-written files as `linguist-generated` in
.gitattributes tells GitHub not to show them in diffs by default. But
they're actually important to review, so they should be included in
diffs.
* wiggle: Inline some trivial functions
This commit marks a number of functions in wiggle as `#[inline]` as
they're otherwise trivial, mostly returning constants. This comes out of
some work I looked at recently with Andrew where some of these functions
showed up in profiles when they shouldn't.
* wiggle: Optimize the `GuestMemory` for shared memory
This commit implements a minor optimization to the `GuestMemory`
implementation for Wasmtime to skip most methods if a shared memory is
in play. Shared memories never get borrowed and this can be used to
internally skip some borrow-checker methods.
* wiggle: Optimize `GuestPtr::to_vec`
This commit replaces the safe implementation of `GuestPtr::to_vec` with
an unsafe implementation. The purpose of this is to speed up the
function when used with shared memory which otherwise performs a bunch
of atomic reads for types like `u8` which does validation-per-element
and isn't vectorizable. On a benchmark I was helping Andrew with this
sped up the host code enough to the point that guest code dwarfed the
execution time.
* Fix build
Following up on #5535, treat positive and negative zero as inequal in
wasmtime::component::Val::Float{32,64}'s `PartialEq` logic. IEEE 754
equality considers these values equal, but they are semantically
distinct values, and testing and fuzzing should be aware of the
difference.
Update the documentation for `Caller::get_export` to clarify that it's
not expected to be removed in the future. Components do offer an
alternative to `Caller::get_export`, so add a brief note mentioning
that.
Also, as of #4431 `get_export` now works for all exports, not just
memories and functions.
* wasi: avoid buffer underflow with shared memory
This change fixes an issue identified when using wasi-threads to perform
file reads. In order to maintain Rust safety guarantees in the presence
of WebAssembly shared memory, which can be modified concurrently by any
of the running threads, the WASI implementations of `fd_read` and
`fd_pread` were given special code paths when shared memory is detected:
in these cases, the data is first read into a host-limited buffer and
then subsequently copied into linear memory. The problem was that the
rather-complex logic for doing this "buffer then copy" idea for multiple
IO vectors could fail due to buffer underflow. If, e.g., a read was
limited by the host to 64K (or even if the read returned less than the
total buffer size) the `UnsafeGuestSlice::copy_from_slice` logic would
fail, complaining that the sizes of both buffers were unequal.
This change both simplifies and fixes the logic:
- only the first IO vector is filled; this could represent a performance
penalty for threaded programs, but the "buffer then copy" idea already
imposes a non-trivial overhead. This simplifies the logic, allowing us
to...
- resize the shared memory buffer to the exact number of bytes read
* review: early return when no IO vectors passed to shared memory
* fix: add empty RoFlags on early exit
We accidentally broke the build for Android when introducing the jit-icache-coherence
crate. To avoid this happening again, add a check job just to ensure that it can build.
See #5323 and #5331 for context.
...not just the ones at the outer scope of a rule.
Thanks to @avanhatt for pointing out that #5221 didn't capture as much
information as I intended it to.
* Wasmtime: Add `Config::disable_cache`
* bench-api: Always disable the cache
* bench-api: Always get a `Config` from CLI flags
This commit fixes an issue that I ran into just now where benchmarking
one `*.so` with `--engine-flags` was giving wildly unexpected results
comparing to something without `--engine-flags`. The root cause here
appears to that when specifying `--engine-flags` the CLI parsing code is
used to create a `Config` and when omitted a `Config::new` instance is
created. The main difference between these is that for the CLI caching
is enabled by default and for `Config::new` it is not. Coupled with the
fact that caching doesn't really work for the `main` branch this ended
up giving wild results.
The fix here is to first always use the CLI parsing code to create a
`Config` to ensure that a config is consistently created. Next the
`--disable-cache` flag is unconditionally passed to the CLI parsing to
ensure that compilation actually happens.
Once applied this enables comparing an engine without flags and an
engine with flags which provides consistent results.
* Fix compile error
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* add clif-util compile option to output object file
* switch from a box to a borrow
* update objectmodule tests to use borrowed isa
* put targetisa into an arc
In #5510 we changed the value types of these variants from u{32,64} to
f{32,64}. One side effect of this change was that two NaN values would
no longer compare equal. While this is behavior complies with IEEE-754
floating point operations, it broke equality assumptions in fuzzing.
This commit changes equality for Val to make NaNs compare equal. Since
the component model requires NaN canonicalization, all NaN bit
representations compare equal, which is different from the original
behavior.
This also gives Vals the semantics of Eq again, so that trait impl has
been reintroduced to related types as well.
Fuzz additional targets in the cranelift-icache target. The list of targets fuzzed is controlled by the targets enabled in fuzz/Cargo.toml.
This PR also reworks how instruction disabling is done in function generator, moving the deny-list to a function to make the decision at runtime instead of compile time.
* Switch duplicate loads w/ dynamic memories test to `min_size = 0`
This test was accidentally hitting a special case for bounds checks for when we
know that `offset + access_size < min_size` and can skip some steps. This
commit changes the `min_size` of the memory to zero so that we are forced to do
fully general bounds checks.
* Cranelift: Mark `uadd_overflow_trap` as okay for GVN
Although this improves our test sequence for duplicate loads with dynamic
memories, it unfortunately doesn't have any effect on sightglass benchmarks:
```
instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[34448 35607.23 37158] gvn_uadd_overflow_trap.so
[34566 35734.05 36585] main.so
instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[44101 60449.62 92712] gvn_uadd_overflow_trap.so
[44011 60436.37 92690] main.so
instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[35595 36675.72 38153] gvn_uadd_overflow_trap.so
[35440 36670.42 37993] main.so
compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[17370195 17405125.62 17471222] gvn_uadd_overflow_trap.so
[17369324 17404859.43 17470725] main.so
execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[7055720520 7055886880.32 7056265930] gvn_uadd_overflow_trap.so
[7055719554 7055843809.33 7056193289] main.so
compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[683589861 683767276.00 684098366] gvn_uadd_overflow_trap.so
[683590024 683767998.02 684097885] main.so
execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[46436883 46437135.10 46437823] gvn_uadd_overflow_trap.so
[46436883 46437087.67 46437785] main.so
compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[126522461 126565812.58 126647044] gvn_uadd_overflow_trap.so
[126522176 126565757.75 126647522] main.so
execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[653010531 653010533.03 653010544] gvn_uadd_overflow_trap.so
[653010531 653010533.18 653010537] main.so
```
* cranelift-codegen-meta: Rename `side_effects_okay_for_gvn` to `side_effects_idempotent`
* cranelift-filetests: Ensure there is a trailing newline for blessed Wasm tests
This unlocks certain bounds checking optimizations in some
configurations. Wasn't able to measure any delta in sightglass, but still worth
doing anyways.
Remove the lower_br_fcmp function from the riscv64 backend. This PR only affects the emit implementation for FloatRound, replacing the uses of lower_br_fcmp with direct uses of FpuRRR and CondBr.
Any changes in behavior here should be already covered by the runtests for ceil, floor, trunc, and nearest.
Remove the MInst::TrapFf instruction in the riscv64 backend. It was only used in two places in the emit case for FloatRound, and was easily replaced with a combination of FpuRRR and TrapIf.
* Small update for filename in `isle_integration.md`
The name "clif.isle" is stale (since #4953), now two files "clif_lower.isle" and "clif_opt.isle" are generated. Not sure if that PR necessitates other changes this this doc.
* Update cranelift/docs/isle-integration.md
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Make spectre guards GVN-able
While these instructions have a side effect that is otherwise invisible to the
optimizer, the side effect in question is idempotent, so it can be de-duplicated
by GVN.
* Cranelift: Run redundant load replacement and GVN twice
This allows us to actually replace redundant Wasm loads with dynamic memories.
While this improves our hand-crafted test sequences, it doesn't seem to have any
improvement on sightglass benchmarks run with dynamic memories, however it also
isn't a hit to compilation times, so seems generally good to land anyways:
```
$ cargo run --release -- benchmark -e ~/scratch/once.so -e ~/scratch/twice.so -m insts-retired --processes 20 --iterations-per-process 3 --engine-flags="--static-memory-maximum-size 0" -- benchmarks/default.suite
compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[683595240 683768610.53 684097577] once.so
[683597068 700115966.83 1664907164] twice.so
instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[44107 60411.07 92785] once.so
[44138 59552.32 92097] twice.so
compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[17369916 17404839.78 17471458] once.so
[17369935 17625713.87 30700150] twice.so
compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[126523640 126566170.80 126648265] once.so
[126523076 127174580.30 163145149] twice.so
instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[34569 35686.25 36513] once.so
[34651 35749.97 36953] twice.so
instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[35146 36639.10 37707] once.so
[34472 36580.82 38431] twice.so
execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[7055720115 7055841324.82 7056180024] once.so
[7055717681 7055877095.85 7056225217] twice.so
execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[46436881 46437081.28 46437691] once.so
[46436883 46437127.68 46437766] twice.so
execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[653010530 653010533.27 653010539] once.so
[653010531 653010532.95 653010538] twice.so
```
Rework the compilation of fcmp in the riscv64 backend to be in ISLE, removing the need for the dedicated Fcmp instruction. This change is motivated by #5500, which showed that the riscv64 backend was generating branch instructions in the middle of a basic block.
We can't remove lower_br_fcmp quite yet as it's used in a few places in the emit module, but it's now no longer reachable from the ISLE lowerings.
Fixes#5500
* wiggle: copy guest strings from shared memory
Along the same lines as #5471, this change adds a new smart pointer,
`GuestStrCow`, to copy the string bytes over from Wasm memory to the
host when the string is found in shared memory. This is necessary to
maintain Rust guarantees: with shared memory, the bytes backing a
`GuestStr` could be altered by another thread and this would invalidate
the assumption that we can dereference at any point to `&str`.
`GuestStrCow` is essentially a wrapper around `GuestStr` when the memory
is not shared but copies the memory region into a `String` when the
memory is shared.
This change updates the uses of Wiggle strings in both wasi-common and
wasi-crypto.
* review: perform UTF-8 check on `GuestStr` construction
* cranelift: Add `iabs.i128` runtest
* riscv64: Fix incorrect extension in iabs
When lowering iabs, we were accidentally comparing the unextended value
this caused the instruction to misbehave with certain top bits.
This commit also adds a zbb lowering that does not use jumps.
We can sometimes statically determine that a given load will unconditionally
trap. When this happens, we emit an unconditional trap, and we need to stop
adding new instructions to the block. This commit introduces a `Reachability<T>`
type that is like `Option<T>` but specifically for communicating reachability
and is marked `must_use` to receivers to handle transitions from reachable to
unreachable states.
Additionally, adds handling of reachable -> unreachable state transitions to
some SIMD op translations that weren't checking for it.
Fixes#5455Fixes#5456
The definitions of `wasmtime::component::Val::Float{32,64}` mirrored
`wasmtime::Val::F{32,64}` by using integers as their wrapped types,
storing the bit representation of their floating point values.
This was necessary for the core Wasm `f32`/`f64` types because Rust
floats don't have guaranteed NaN bit representations.
The component model `float32`/`float64` types require NaN
canonicalization, so we can use normal Rust `f{32,64}` instead.
Closes#5480
When lowering `select+icmp` we have an optimization that allows us to
avoid materializing the icmp result.
We were accidentally not masking the high bits for i8 and i16 in this case.
Issue #5498 reported this as an illegal instruction but what was happening
there was that the invalid select caused a division by zero.
This change upgrades `UnsafeGuestSlice` in Wiggle to expose more
functionality to be able to use `std::ptr::copy` for writing bytes into
Wasm shared memory. Additionally, it adds a new `GuestCow` type for
delineating between Wasm memory regions that can be borrowed (non-shared
memory) or must be copied (shared memory) in order to maintain Rust
guarantees.
With these in place, it is now possible to implement the `preview1`
"read" functions for shared memory. Previously, these would panic if
attempting to copy to a shared memory. This change removes the panic and
introduces some (rather complex) logic for handling both the shared and
non-shared cases:
- if reading into a Wasm non-shared memory, Wiggle guarantees that no
other guest pointers will touch the memory region and, in the absence
of concurrency, a WASI function can write directly to this memory
- if reading into a Wasm shared memory, the memory region can be
concurrently modified. At @alexcrichton's request re: Rust safety,
this change copies all of the bytes into an intermediate buffer before
using `std::ptr::copy` to move them into Wasm memory.
This change only applies to the `preview0` and `preview1`
implementations of `wasi-common`. Fixing up other WASI implementations
(esp. wasi-crypto) is left for later.
This adds a new error type `UnknownImportError` which will be returned
(wrapped in an `anyhow::Error`) by `Linker::instantiate{,_async,_pre}`
if a module has an unresolvable import.
This error type is also used by `Linker::define_unknown_imports_as_traps`;
any resulting traps will also downcast to `UnknownImportError`.
Closes#5416
We had a off-by-one bounds check error when checking if we should
jump to the default block in a br-table. Instead of always jumping
to the default block when we have a jump table with 0 targets we
would try to compute an offset past the end of the table.
This sometimes would not crash, but it would crash if the there was
no block after the br_table, thus adding a cold block would cause a
segfault.
The actual fix is quite simple, do not count the default block
as a jump table entry when computing the limits.
This commit also does a bunch of cleanup and adding some comments
to the br_table emission code.
We accidentally broke the build for FreeBSD when introducing the jit-icache-coherence
crate. To avoid this happening again, add a check job just to ensure that it can build.
See #5323 and #5331 for context.