This fixes two problems: minimum symbol alignment for the LARL
instruction, and alignment requirements for LRL/LGRL etc.
The first problem is that the LARL instruction used to load a
symbol address (PC relative) requires that the target symbol
is at least 2-byte aligned. This is always guaranteed for code
symbols (all instructions must be 2-aligned anyway), but not
necessarily for data symbols.
Other s390x compilers fix this problem by ensuring that all
global symbols are always emitted with a minimum 2-byte
alignment. This patch introduces an equivalent mechanism
for cranelift:
- Add a symbol_alignment routine to TargetIsa, similar to the
existing code_section_alignment routine.
- Respect symbol_alignment as minimum alignment for all symbols
emitted in the object backend (code and data).
The second problem is that PC-relative instructions that
directly *access* data (like LRL/LGRL, STRL/STGRL etc.)
not only have the 2-byte requirement like LARL, but actually
require that their memory operand is *naturally* aligned
(i.e. alignment is at least the size of the access).
This property (natural alignment for memory accesses) is
supposed to be provided by the "aligned" flag in MemFlags;
however, this is not implemented correctly at the moment.
To fix this, this patch:
- Only emits PC-relative memory access instructions if the
"aligned" flag is set in the associated MemFlags.
- Fixes a bug in emit_small_memory_copy and emit_small_memset
which currently set the aligned flag unconditionally, ignoring
the actual alignment info passed by their caller.
Tested with wasmtime and cg_clif.
* Add a test for iadd_pairwise with swiden input
* Implement iadd_pairwise for swiden_{low,high} input
* Add a test case for iadd_pairwise with uwiden input
* Implement iadd_pairwise with uwiden
* Optimize flat type representation calculations
Previously calculating the flat type representation would be done
recursively for an entire type tree every time it was visited.
Additionally the flat type representation was entirely built only to be
thrown away if it was too large at the end. This chiefly presented a
source of recursion based on the type structure in the component model
which fuzzing does not like as it reports stack overflows.
This commit overhauls the representation of flat types in Wasmtime by
caching the representation for each type in the compile-time
`ComponentTypesBuilder` structure. This avoids recalculating each time
the flat representation is queried and additionally allows opportunity
to have more short-circuiting to avoid building overly-large vectors.
* Remove duplicate flat count calculation in wasmtime
Roughly share the infrastructure in the `wasmtime-environ` crate, namely
the non-recursive and memoizing nature of the calculation.
* Fix component fuzz build
* Fix example compile
A `GtU` condition needed to actually be `GeU`, as the comment right
above it stated but apparently I forgot to translate the comment to
actual code. This fixes a fuzz bug that arose from oss-fuzz over the
weekend.
* Cranelift: Use bump allocation in `remove_constant_phis` pass
This makes compilation 2-6% faster for Sightglass's bz2 benchmark:
```
compilation :: cycles :: benchmarks/bz2/benchmark.wasm
Δ = 7290648.36 ± 4245152.07 (confidence = 99%)
bump.so is 1.02x to 1.06x faster than main.so!
[166388177 183238542.98 214732518] bump.so
[172836648 190529191.34 217514271] main.so
compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[182220055 225793551.12 277857575] bump.so
[193212613 227784078.61 277175335] main.so
compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[3848442474 4295214144.37 4665127241] bump.so
[3969505457 4262415290.10 4563869974] main.so
```
* Add audit for `bumpalo`
* Add an audit of `arrayvec` version 0.7.2
* Remove unnecessary `collect` into `Vec`
I wasn't able to measure any perf difference here, but its nice to do anyways.
* Use a `SecondaryMap` for keeping track of summaries
* Rename `MmapVec::drain` to `split_off`
As suggested on #4609
* Fix tests
* Make MmapVec::split_off work like Vec::split_off
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Remove the `ABICaller` trait
It has only one implementation: the `ABICallerImpl` struct. We can just use that
directly rather than having extra, unnecessary layers of generics and abstractions.
* Cranelift: Rename `ABICallerImpl` to `Caller`
* Cranelift: Remove `ABICallee` trait
It has only one implementation: the `ABICalleeImpl` struct. By using that
directly we can avoid unnecessary layers of generics and abstractions as well as
a couple `Box`es that were previously putting the single implementation into a
`Box<dyn>`.
* Cranelift: Rename `ABICalleeImpl` to `AbiCallee`
* Fix comments as per review
* Rename `AbiCallee` to `Callee`
* Remove recursion building types in `component_api` fuzzer
Sure enough the fuzzers found an input that blows the stack, so the
type-building here was rewritten to use a heap-based stack instead of a
stack-based-stack.
* Review comments
* Add source tarballs to our releases
This commit adds a small script to create a source tarball as part of
the release process. This goes further than requested by #3808 by
vendoring all Rust dependencies as well to be more in line with
"download the source once then build somewhere without a network".
Vendoring the Rust dependencies makes the tarball pretty beefy (67M
compressed, 500M uncompressed). Unfortunately most of this size comes
from vendored crates such as v8, pqcrypto-kyber, winapi, capstone-sys,
plotters, and web-sys. Only `winapi` in this list is actually needed for
`wasmtime`-the-binary and only on Windows as well but for now this is
the state of things related to `cargo vendor`. If this becomes an issue
we could specifically remove the bulky contents of crates in the
`vendor` directory such as `v8` since it's only used for fuzzing.
Closes#3808
* Review feedback
* Review comments
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris, `Function` has been split into mostly two parts:
- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.
Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
- `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
- The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.
The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.
A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.
Fixes#4155.
* cranelift: Remove shifts-small-types runtests
These were moved to the main shifts file in #4519 but this file was accidentaly left in tree.
It also fixes the missing sshr_i8_i8 testcase
* cranelift: Add shifts to fuzzer
* cranelift: Add extends to fuzzer
The trait had only one implementation: the `Lower` struct. It is easier to just
use that directly, and not introduce unnecessary layers of generics and
abstractions.
Once upon a time, there was hope that we would have other implementations of the
`LowerCtx` trait, that did things like lower CLIF to SMTLIB for
verification. However, this is not practical these days given the way that the
trait has evolved over time, and our verification efforts are focused on ISLE
now anyways, and we're actually making some progress on that front (much more
than anyone ever did on a second `LowerCtx` trait implementation!)
In #4375 we introduced a code pattern that appears as a warning when
building the `cranelift-interpreter` crate:
```
warning: cannot borrow `*state` as mutable because it is also borrowed as immutable
--> cranelift/interpreter/src/step.rs:412:13
|
47 | let arg = |index: usize| -> Result<V, StepError> {
| -------------------------------------- immutable borrow occurs here
48 | let value_ref = inst_context.args()[index];
49 | state
| ----- first borrow occurs due to use of `*state` in closure
...
412 | state.set_pinned_reg(arg(0)?);
| ^^^^^^^^^^^^^^^^^^^^^---^^^^^
| | |
| | immutable borrow later used here
| mutable borrow occurs here
|
= note: `#[warn(mutable_borrow_reservation_conflict)]` on by default
= warning: this borrowing pattern was not meant to be accepted, and may become a hard error in the future
= note: for more information, see issue #59159 <https://github.com/rust-lang/rust/issues/59159>
```
This change fixes the warning.
* Add a test for the existing behavior of fcvt_from_unit
* Migrate the I8, I16, I32 cases of fcvt_from_uint
* Implement the I64 case of fcvt_from_uint
* Add a test for the existing behavior of fcvt_from_uint.f64x2
* Migrate fcvt_from_uint.f64x2 to ISLE
* Lower the last case of `fcvt_from_uint`
* Add a test for `fcvt_from_uint`
* Finish lowering fcmp_from_uint
* Format
This implements the s390x back-end portion of the solution for
https://github.com/bytecodealliance/wasmtime/issues/4566
We now support both big- and little-endian vector lane order
in code generation. The order used for a function is determined
by the function's ABI: if it uses a Wasmtime ABI, it will use
little-endian lane order, and big-endian lane order otherwise.
(This ensures that all raw_bitcast instructions generated by
both wasmtime and other cranelift frontends can always be
implemented as a no-op.)
Lane order affects the implementation of a number of operations:
- Vector immediates
- Vector memory load / store (in big- and little-endian variants)
- Operations explicitly using lane numbers
(insertlane, extractlane, shuffle, swizzle)
- Operations implicitly using lane numbers
(iadd_pairwise, narrow/widen, promote/demote, fcvt_low, vhigh_bits)
In addition, when calling a function using a different lane order,
we need to lane-swap all vector values passed or returned in registers.
A small number of changes to common code were also needed:
- Ensure we always select a Wasmtime calling convention on s390x
in crates/cranelift (func_signature).
- Fix vector immediates for filetests/runtests. In PR #4427,
I attempted to fix this by byte-swapping the V128 value, but
with the new scheme, we'd instead need to perform a per-lane
byte swap. Since we do not know the actual type in write_to_slice
and read_from_slice, this isn't easily possible.
Revert this part of PR #4427 again, and instead just mark the
memory buffer as little-endian when emitting the trampoline;
the back-end will then emit correct code to load the constant.
- Change a runtest in simd-bitselect-to-vselect.clif to no longer
make little-endian lane order assumptions.
- Remove runtests in simd-swizzle.clif that make little-endian
lane order assumptions by relying on implicit type conversion
when using a non-i16x8 swizzle result type (this feature should
probably be removed anyway).
Tested with both wasmtime and cg_clif.
This commit fixes a build warning on Rust 1.63 when the `memory-init-cow`
feature is disabled in the `wasmtime-runtime` crate. Some "tricks" were
used prior to have the `MemoryImage` type be an empty `enum {}` but that
wreaks havoc with warnings so this commit instead just makes it a unit
struct and makes all methods panic (as they shouldn't be hit anyway).
During differential execution against V8, Wasm values need to be
converted back and forth from JS values. This change documents the
location in the specification where this is defined.
* Limit the type hierarchies in component fuzzing
For now `wasmparser` has a hard limit on the size of tuples and such at
1000 recursive types within the tuple itself. Respect this limit by
limiting the width of recursive types generated for the `component_api`
fuzzer. This commit unifies this new requirement with the preexisting
`TupleArray` and `NonEmptyArray` types into one `VecInRange<T, L, H>`
which allow expressing all of these various requirements in one type.
* Fix a compile error on `main`
* Review comments
We are using our own test harness for filetests and embedding it in
libtest isn't useful. It only hides test output until the end and
results in unnecessary noise.
* Stop returning `NOTCAPABLE` errors from WASI calls.
`ENOTCAPABLE` was an error code that is used as part of the rights
system, from CloudABI. There is a set of flags associated with each file
descriptor listing which operations can be performed with the file
descriptor, and if an attempt is made to perform an operation with a
file descriptor that isn't permitted by its rights flags, it fails with
`ENOTCAPABLE`.
WASI is removing the rights system. For example, WebAssembly/wasi-libc#294
removed support for translating `ENOTCAPABLE` into POSIX error codes, on
the assumption that engines should stop using it.
So as another step to migrating away from the rights system, remove uses
of the `ENOTCAPABLE` error.
* Update crates/wasi-common/src/file.rs
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Update crates/wasi-common/src/dir.rs
Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Jamey Sharp <jamey@minilop.net>
This enables the object backend for s390x, in particular the
processing of all required relocations.
This uncovered a bug: we need to use PLT relocations for the
target of calls, which we currently do not. Fixed by adding
a new S390xPLTRel32Dbl reloc type and using it where needed.
Figuring out which boolean settings go into each preset is not easy by
inspecting the DSL source (e.g. meta/src/isa/x86.rs). This patch extends
the comments in the Rust that's generated by that DSL to list the names
of the settings together with the name of the preset.
* Fix sret for AArch64
AArch64 requires the struct return address argument to be stored in the x8
register. This register is never used for regular arguments.
* Add extra sret tests for x86_64
Implement the tls_value for s390 in the ELF general-dynamic mode.
Notable differences to the x86_64 implementation are:
- We use a __tls_get_offset libcall instead of __tls_get_addr.
- The current thread pointer (stored in a pair of access registers)
needs to be added to the result of __tls_get_offset.
- __tls_get_offset has a variant ABI that requires the address of
the GOT (global offset table) is passed in %r12.
This means we need a new libcall entries for __tls_get_offset.
In addition, we also need a way to access _GLOBAL_OFFSET_TABLE_.
The latter is a "magic" symbol with a well-known name defined
by the ABI and recognized by the linker. This patch introduces
a new ExternalName::KnownSymbol variant to support such names
(originally due to @afonso360).
We also need to emit a relocation on a symbol placed in a
constant pool, as well as an extra relocation on the call
to __tls_get_offset required for TLS linker optimization.
Needed by the cg_clif frontend.
* [fuzz] Fix signature of `i64.extend32_s` single-instruction test
This single-instruction test incorrectly attempted to convert an `i32`
to an `i64`; the correct signature is `i64 -> i64`. See the [WebAssembly
specification](https://webassembly.github.io/spec/core/bikeshed/#a7-index-of-instructions).
* [fuzz] Fix typo in single-instruction function generator
Previously, the `unary!` macro created functions that used two operands
instead of the expected one.
* Update 0.40.0 release notes
Not a ton happened in terms of user-facing improvements here so I
outlined some internal changes as well. The cumulative effect of
improving compile times is Sightglass showing 30-40% improvements for
major benchmarks. Additionally I wrote down a note indicating that this
is likely the last `0.*` release and the next release of Wasmtime on
September 20 is planned to be 1.0.
* Remove perf-related relnotes
* Call out s390x simd at the top-level
* cranelift: Upgrade libm to 0.2.4
This resolves an issue with incorrect fmaf on the x86_64-pc-windows-gnu target under some inputs.
See: #4517
* supply-chain: Vet `libm` 0.2.4
This commit fixes#4600 in a somewhat roundabout fashion. Currently the
`main` branch of Wasmtime exhibits unusual behavior:
* If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level`
test does not fail.
* If `cargo test -p wasmtime --lib` is run, however, then the test
fails.
This test is indeed being run as part of `./ci/run-tests.sh` and it's
also passing in CI. The exact failure is that part of the debuginfo
support we have takes an existing ELF image, copies it, and then appends
some information to inform profilers/gdb about the image. This code is
all quite old at this point and not 100% optimal, but that's at least
where we're at.
The problem is that the appended `ProgramHeader64` is not aligned
correctly during `cargo test -p wasmtime --lib`, which is the panic that
happens causing the test to fail. The reason, however, that this test
passes with `./ci/run-tests.sh` is that the alignment of
`ProgramHeader64` is 1 instead of 8. The reason for that is that the
`object` crate has an `unaligned` feature which forcibly unaligns all
primitives to 1 byte instead of their natural alignment. During `cargo
test -p wasmtime --lib` this feature is not enabled but during
`./ci/run-tests.sh` this feature is enabled. The feature is currently
enabled through inclusion of the `backtrace` crate which only happens
for some tests in some crates.
The alignment issue explains why the test fails on a single crate test
but fails on the whole workspace tests. The next issue I investigated
was if this test ever passed. It turns out that on v0.39.0 this test
passed, and the regression to main was introduced during #4571. That
PR, however, has nothing to do with any of this! The reason that this
showed up as causing a "regression" however is because it changed
cranelift settings which changed the size of serialized metadata at the
end of a Wasmtime cache object.
Wasmtime compiled artifacts are ELF images with Wasmtime-specific
metadata appended after them. This appended metadata was making its way
all the way through to the gdbjit image itself which mean that while the
end of the ELF file itself was properly aligned the space after the
Wasmtime metadata was not aligned. This metadata changes in size over
time as Cranelift settings change which explains why #4571 was the
"source" of the regression.
The fix in this commit is to discard the extra Wasmtime metadata when
creating an `MmapVec` representing the underlying ELF image. This is
already supported with `MmapVec::drain` so it was relatively easy to
insert that. This means that the gdbjit image starts with just the ELF
file itself which is always aligned at the end, which gets the test
passing with/without the `unaligned` feature in the `object` crate.
In #4671, the meta-differential fuzz target was finding errors when
running certain Wasm modules (specifically `shr_s` in that case).
@conrad-watt diagnosed the issue as a missing reversal in the operands
passed to the spec interpreter. This change fixes#4671 and adds an
additional unit test to keep it fixed.