* Add support for processor features (including auto-detection).
* Move base architecture set requirement back to z14.
* Add z15 feature sets and re-enable z15-specific code generation
when required features are available.
When shuffling values from two different registers, the x64 lowering for
`i8x16.shuffle` must first shuffle each register separately and then OR
the results with SSE instructions. With `VPERMI2B`, available in
AVX512VL + AVX512VBMI, this can be done in a single instruction after
the shuffle mask has been moved into the destination register. This
change uses `VPERMI2B` for that case when the CPU supports it.
When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt`
instruction can be lowered to a single x64 instruction, `VPOPCNTB`,
instead of 8+ instructions.
This adds support for the IBM z/Architecture (s390x-ibm-linux).
The status of the s390x backend in its current form is:
- Wasmtime is fully functional and passes all tests on s390x.
- All back-end features supported, with the exception of SIMD.
- There is still a lot of potential for performance improvements.
- Currently the only supported processor type is z15.
This commit adds the `wasmtime settings` command to print out available
Cranelift settings for a target (defaults to the host).
The compile command has been updated to remove the Cranelift ISA options in
favor of encouraging users to use `wasmtime settings` to discover what settings
are available. This will reduce the maintenance cost for syncing the compile
command with Cranelift ISA flags.
This commit adds a `compile` command to the Wasmtime CLI.
The command can be used to Ahead-Of-Time (AOT) compile WebAssembly modules.
With the `all-arch` feature enabled, AOT compilation can be performed for
non-native architectures (i.e. cross-compilation).
The `Module::compile` method has been added to perform AOT compilation.
A few of the CLI flags relating to "on by default" Wasm features have been
changed to be "--disable-XYZ" flags.
A simple example of using the `wasmtime compile` command:
```text
$ wasmtime compile input.wasm
$ wasmtime input.cwasm
```
This commit enables Cranelift's AArch64 backend to generate code
for instruction set extensions (previously only the base Armv8-A
architecture was supported); also, it makes it possible to detect
the extensions supported by the host when JIT compiling. The new
functionality is applied to the IR instruction `AtomicCas`.
Copyright (c) 2021, Arm Limited.
This was added as an incremental step to improve AArch64 code quality in
PR #2278. At the time, we did not have a way to pattern-match the load +
splat opcode sequence that the relevant Wasm opcodes lowered to.
However, now with PR #2366, we can merge effectful instructions such as
loads into other ops, and so we can do this pattern matching directly.
The pattern-matching update will come in a subsequent commit.
It corresponds to WebAssembly's `load*_splat` operations, which
were previously represented as a combination of `Load` and `Splat`
instructions. However, there are architectures such as Armv8-A
that have a single machine instruction equivalent to the Wasm
operations. In order to generate it, it is necessary to merge the
`Load` and the `Splat` in the backend, which is not possible
because the load may have side effects. The new IR instruction
works around this limitation.
The AArch64 backend leverages the new instruction to improve code
generation.
Copyright (c) 2020, Arm Limited.
Before this patch, running the x64 new backend would require both
compiling with --features experimental_x64 and running with
`use_new_backend`.
This patches changes this behavior so that the runtime flag is not
needed anymore: using the feature flag will enforce usage of the new
backend everywhere, making using and testing it much simpler:
cargo run --features experimental_x64 ;; other CLI options/flags
This also gives a hint at what the meta language generation would look
like after switching to the new backend.
Compiling only with the x64 codegen flag gives a nice compile time speedup.
This PR adds a conditional move following a heap bounds check through
which the address to be accessed flows. This conditional move ensures
that even if the branch is mispredicted (access is actually out of
bounds, but speculation goes down in-bounds path), the acually accessed
address is zero (a NULL pointer) rather than the out-of-bounds address.
The mitigation is controlled by a flag that is off by default, but can
be set by the embedding. Note that in order to turn it on by default,
we would need to add conditional-move support to the current x86
backend; this does not appear to be present. Once the deprecated
backend is removed in favor of the new backend, IMHO we should turn
this flag on by default.
Note that the mitigation is unneccessary when we use the "huge heap"
technique on 64-bit systems, in which we allocate a range of virtual
address space such that no 32-bit offset can reach other data. Hence,
this only affects small-heap configurations.
These instructions have fast, inline JIT paths for the common cases, and only
call out to host VM functions for the slow paths. This required some changes to
`cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert
an instruction sequence within the current basic block,
`FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder`
so that they can create whole new basic blocks. This is necessary for
implementing GC read/write barriers that involve branching (e.g. checking for
null, or whether a store buffer is at capacity).
Furthermore, it required that the `load`, `load_complex`, and `store`
instructions handle loading and storing through an `r{32,64}` rather than just
`i{32,64}` addresses. This involved making `r{32,64}` types acceptable
instantiations of the `iAddr` type variable, plus a few new instruction
encodings.
Part of #929
This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.
Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
* Encode vselect using BLEND instructions on x86
* Legalize vselect to bitselect
* Optimize bitselect to vselect for some operands
* Add run tests for bitselect-vselect optimization
* Address review feedback