The previous address calculation code had a bug where we tried to
add offsets into a temporary register before defining it, causing
the regalloc to complain.
* Add support for processor features (including auto-detection).
* Move base architecture set requirement back to z14.
* Add z15 feature sets and re-enable z15-specific code generation
when required features are available.
This adds full back-end support for the Fence, AtomicLoad
and AtomicStore operations, and partial support for the
AtomicCas and AtomicRmw operations.
The missing pieces include sub-word operations, operations
on little-endian memory requiring byte-swapping, and some
of the subtypes of AtomicRmw -- everything that cannot be
implemented without a compare-and-swap loop. This will be
done in a follow-up patch.
This patch already suffices to make the test suite green
again after a recent change that now requires atomic
operations when accessing the heap.
We have 3 different aproaches depending on the type of comparision requested:
* For eq/ne we compare the high bits and low bits and check
if they are equal
* For overflow checks, we perform a i128 add and check the
resulting overflow flag
* For the remaining comparisions (gt/lt/sgt/etc...)
We compare both the low bits and high bits, and if the high bits are
equal we return the result of the unsigned comparision on the low bits
As with other i128 ops, we are still missing immlogic support.
Currently we just basically use a two instruction version of the same i64 ops.
IMMLogic doesn't really support multiple register inputs, so its left as a TODO for future optimizations.
Enabling runtests for the s390x backend exposed a pre-existing endian bug with handling bool test case return values.
These are written as integers of the same width by the trampoline, but are always read out as the Rust "bool" type. This happens to work on little-endian systems, but fails for any boolean type larger than 1 byte on big-endian systems.
See: https://github.com/bytecodealliance/wasmtime/pull/2964#issuecomment-855879866
With this change we now reuse tests across multiple arches.
Duplicate tests were merged into the same file where possible.
Some legacy x86 tests were left in separate files due to incompatibilities with the rest of the test suite.
When AVX512VL and AVX512F are available, use a single instruction
(`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This
optimization is a port from the legacy x86 backend.
Previously, the x64 backend's ABI code would generate a sign-extending
load when loading a less-than-64-bit integer from a spillslot. This is
incorrect: e.g., for i32s > 0x80000000, this would result in all high
bits set.
This interacts poorly with another optimization. Normally, the invariant
is that the high bits of a register holding a value of a certain type,
beyond that type's bits, are undefined. However, as an optimization, we
recognize and use the fact that on x86-64, 32-bit instructions zero the
upper 32 bits. This allows us to elide a 32-to-64-bit zero-extend op
(turning it into just a move, which can then sometimes disappear
entirely due to register coalescing).
If a spill and reload happen between the production of a 32-bit value
from an instruction known to zero the upper bits and its use, then we
will rely on zero upper bits that might actually be set by a
sign-extend. This will result in incorrect execution.
As a fix, we stick to a simple invariant: we always spill and reload a
full 64 bits when handling integer registers on x64. This ensures that
no bits are mangled.
This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`,
and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used
by
[simple_preopt.rs](fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999))
to lower `bitselect` to a single x86 instruction when the condition mask
is known to be boolean (all 1s or 0s, e.g., from a conversion). This is
better than `bitselect` in general, which lowers to 4-5 instructions.
The old backend had the `vselect` lowering; this simply introduces it to
the new backend.
* Upgrade to the latest versions of gimli, addr2line, object
And adapt to API changes. New gimli supports wasm dwarf, resulting in
some simplifications in the debug crate.
* upgrade gimli usage in linux-specific profiling too
* Add "continue" statement after interpreting a wasm local dwarf opcode
When dealing with params that need to be split, we follow the
arch64 ABI and split the value in two, and make sure that start that
argument in an even numbered xN register.
The apple ABI does not require this, so on those platforms, we start
params anywhere.
This adds support for the IBM z/Architecture (s390x-ibm-linux).
The status of the s390x backend in its current form is:
- Wasmtime is fully functional and passes all tests on s390x.
- All back-end features supported, with the exception of SIMD.
- There is still a lot of potential for performance improvements.
- Currently the only supported processor type is z15.
This removes an existing dependency on the byteorder crate in favor of
using std equivalents directly.
While not an issue for wasmtime per se, cranelift is now part of the
critical path of building and testing Rust, and minimizing dependencies,
even small ones, can help reduce the time and bandwidth required.
PR 2840 changed the store_spillslot routine to always store
integer registers in full word size to a spill slot. However,
the load_spillslot routine was not updated, which may causes
the contents to be reloaded in a different type. On big-endian
systems this will fetch wrong data.
Fixed by using the same type override in load_spillslot.
SIMD & FP registers are now saved and restored in pairs, similarly
to general-purpose registers. Also, only the bottom 64 bits of the
registers are saved and restored (in case of non-Baldrdash ABIs),
which is the requirement from the Procedure Call Standard for the
Arm 64-bit Architecture.
As for the callee-saved general-purpose registers, if a procedure
needs to save and restore an odd number of them, it no longer uses
store and load pair instructions for the last register.
Copyright (c) 2021, Arm Limited.
* Fully support multiple returns in Wasmtime
For quite some time now Wasmtime has "supported" multiple return values,
but only in the mose bare bones ways. Up until recently you couldn't get
a typed version of functions with multiple return values, and never have
you been able to use `Func::wrap` with functions that return multiple
values. Even recently where `Func::typed` can call functions that return
multiple values it uses a double-indirection by calling a trampoline
which calls the real function.
The underlying reason for this lack of support is that cranelift's ABI
for returning multiple values is not possible to write in Rust. For
example if a wasm function returns two `i32` values there is no Rust (or
C!) function you can write to correspond to that. This commit, however
fixes that.
This commit adds two new ABIs to Cranelift: `WasmtimeSystemV` and
`WasmtimeFastcall`. The intention is that these Wasmtime-specific ABIs
match their corresponding ABI (e.g. `SystemV` or `WindowsFastcall`) for
everything *except* how multiple values are returned. For multiple
return values we simply define our own version of the ABI which Wasmtime
implements, which is that for N return values the first is returned as
if the function only returned that and the latter N-1 return values are
returned via an out-ptr that's the last parameter to the function.
These custom ABIs provides the ability for Wasmtime to bind these in
Rust meaning that `Func::wrap` can now wrap functions that return
multiple values and `Func::typed` no longer uses trampolines when
calling functions that return multiple values. Although there's lots of
internal changes there's no actual changes in the API surface area of
Wasmtime, just a few more impls of more public traits which means that
more types are supported in more places!
Another change made with this PR is a consolidation of how the ABI of
each function in a wasm module is selected. The native `SystemV` ABI,
for example, is more efficient at returning multiple values than the
wasmtime version of the ABI (since more things are in more registers).
To continue to take advantage of this Wasmtime will now classify some
functions in a wasm module with the "fast" ABI. Only functions that are
not reachable externally from the module are classified with the fast
ABI (e.g. those not exported, used in tables, or used with `ref.func`).
This should enable purely internal functions of modules to have a faster
calling convention than those which might be exposed to Wasmtime itself.
Closes#1178
* Tweak some names and add docs
* "fix" lightbeam compile
* Fix TODO with dummy environ
* Unwind info is a property of the target, not the ABI
* Remove lightbeam unused imports
* Attempt to fix arm64
* Document new ABIs aren't stable
* Fix filetests to use the right target
* Don't always do 64-bit stores with cranelift
This was overwriting upper bits when 32-bit registers were being stored
into return values, so fix the code inline to do a sized store instead
of one-size-fits-all store.
* At least get tests passing on the old backend
* Fix a typo
* Add some filetests with mixed abi calls
* Get `multi` example working
* Fix doctests on old x86 backend
* Add a mixture of wasmtime/system_v tests
This PR switches the default backend on x86, for both the
`cranelift-codegen` crate and for Wasmtime, to the new
(`MachInst`-style, `VCode`-based) backend that has been under
development and testing for some time now.
The old backend is still available by default in builds with the
`old-x86-backend` feature, or by requesting `BackendVariant::Legacy`
from the appropriate APIs.
As part of that switch, it adds some more runtime-configurable plumbing
to the testing infrastructure so that tests can be run using the
appropriate backend. `clif-util test` is now capable of parsing a
backend selector option from filetests and instantiating the correct
backend.
CI has been updated so that the old x86 backend continues to run its
tests, just as we used to run the new x64 backend separately.
At some point, we will remove the old x86 backend entirely, once we are
satisfied that the new backend has not caused any unforeseen issues and
we do not need to revert.