Commit Graph

449 Commits

Author SHA1 Message Date
Johnnie Birch
d8e813204e Fold fcvt_low_from_uinit into previously existing clif instructions 2021-07-09 10:39:05 -07:00
Johnnie Birch
6dd2df4fb3 Update comment on fcvt_low_from_sint instruction 2021-07-09 10:39:05 -07:00
Johnnie Birch
2d676d838f Implements f64x2.convert_low_i32x4_u for x64 2021-07-09 10:39:05 -07:00
Anton Kirilov
330f02aa09 Enable the simd_i32x4_trunc_sat_f64x2 test for AArch64
Also, reorganize the AArch64-specific VCode instructions for unary
narrowing and widening vector operations, so that they are more
straightforward to use.

Copyright (c) 2021, Arm Limited.
2021-06-30 12:17:53 +01:00
Anton Kirilov
98f1ac789e Enable the simd_i16x8_q15mulr_sat_s test on AArch64
Copyright (c) 2021, Arm Limited.
2021-06-28 12:24:31 +01:00
bjorn3
1415dd824a Remove all dsl legalizations for arm32, arm64 and s390x 2021-06-20 18:43:45 +02:00
Ulrich Weigand
def54fb1fa s390x: Add z14 support
* Add support for processor features (including auto-detection).

* Move base architecture set requirement back to z14.

* Add z15 feature sets and re-enable z15-specific code generation
  when required features are available.
2021-06-17 10:23:15 +02:00
Alex Crichton
e8b8947956 Bump to 0.28.0 (#2972) 2021-06-09 14:00:13 -05:00
Chris Fallin
ffb92d9109 Merge pull request #2966 from akirilov-arm/simd_int_to_int_extend
Enable the simd_int_to_int_extend test for AArch64
2021-06-06 23:34:52 -07:00
Johnnie Birch
1770880e19 x64: add support for packed promote and demote (#2783)
* Add support for x64 packed promote low

* Add support for x64 packed floating point demote

* Update vector promote low and demote by adding constraints

Also does some renaming and minor refactoring
2021-06-04 15:59:20 -07:00
Anton Kirilov
5e8a8fe5a0 Enable the simd_int_to_int_extend test for AArch64
Copyright (c) 2021, Arm Limited.
2021-06-04 16:10:02 +01:00
Andrew Brown
2a9f458ea3 x64: lower i8x16.shuffle to VPERMI2B when possible
When shuffling values from two different registers, the x64 lowering for
`i8x16.shuffle` must first shuffle each register separately and then OR
the results with SSE instructions. With `VPERMI2B`, available in
AVX512VL + AVX512VBMI, this can be done in a single instruction after
the shuffle mask has been moved into the destination register. This
change uses `VPERMI2B` for that case when the CPU supports it.
2021-06-01 11:40:53 -07:00
Andrew Brown
459fce3467 x64: lower i8x16.popcnt to VPOPCNTB when possible
When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt`
instruction can be lowered to a single x64 instruction, `VPOPCNTB`,
instead of 8+ instructions.
2021-05-25 12:16:25 -07:00
Chris Fallin
800cf25bb5 Make the CFG metadata computation conditional on a flag. 2021-05-24 13:01:15 -07:00
Johnnie Birch
9a5c9607e1 Vpopcnt for x64 2021-05-21 19:23:26 -07:00
Chris Fallin
88455007b2 Bump Wasmtime to v0.27.0 and Cranelift to v0.74.0. 2021-05-20 14:06:41 -07:00
Andrew Brown
1fe7676831 cranelift: only allow vector types with saturating arithmetic
This fixes #2883 by restricting which types are available to the `uadd_sat`, `sadd_sat`, `usub_sat`, and `ssub_sat` IR operations.
2021-05-17 06:54:45 -07:00
Ulrich Weigand
89b5fc776d Support IBM z/Architecture
This adds support for the IBM z/Architecture (s390x-ibm-linux).

The status of the s390x backend in its current form is:
- Wasmtime is fully functional and passes all tests on s390x.
- All back-end features supported, with the exception of SIMD.
- There is still a lot of potential for performance improvements.
- Currently the only supported processor type is z15.
2021-05-10 16:01:16 +02:00
Chris Fallin
6bec13da04 Bump versions: Wasmtime to 0.26.0, Cranelift to 0.73.0. 2021-04-05 10:48:42 -07:00
Peter Huene
0ddfe97a09 Change how flags are stored in serialized modules.
This commit changes how both the shared flags and ISA flags are stored in the
serialized module to detect incompatibilities when a serialized module is
instantiated.

It improves the error reporting when a compiled module has mismatched shared
flags.
2021-04-01 21:39:57 -07:00
Peter Huene
abf3bf29f9 Add a wasmtime settings command to print Cranelift settings.
This commit adds the `wasmtime settings` command to print out available
Cranelift settings for a target (defaults to the host).

The compile command has been updated to remove the Cranelift ISA options in
favor of encouraging users to use `wasmtime settings` to discover what settings
are available.  This will reduce the maintenance cost for syncing the compile
command with Cranelift ISA flags.
2021-04-01 19:38:19 -07:00
Peter Huene
29d366db7b Add a compile command to Wasmtime.
This commit adds a `compile` command to the Wasmtime CLI.

The command can be used to Ahead-Of-Time (AOT) compile WebAssembly modules.

With the `all-arch` feature enabled, AOT compilation can be performed for
non-native architectures (i.e. cross-compilation).

The `Module::compile` method has been added to perform AOT compilation.

A few of the CLI flags relating to "on by default" Wasm features have been
changed to be "--disable-XYZ" flags.

A simple example of using the `wasmtime compile` command:

```text
$ wasmtime compile input.wasm
$ wasmtime input.cwasm
```
2021-04-01 19:38:18 -07:00
Johnnie Birch
31d3db1ec2 Implements convert low signed integer to float for x64 simd 2021-03-26 12:13:29 -07:00
Alex Crichton
30d9164b6e Fix a number of warnings cropping up on nightly Rust (#2767)
Various small issues here and there, nothing major
2021-03-25 13:19:37 -05:00
Alex Crichton
3f694ae319 Use stable Rust on CI to test the x64 backend (#2766)
* Use stable Rust on CI to test the x64 backend

This commit leverages the newly-released 1.51.0 compiler to test the
new backend on Windows and Linux with a stable compiler instead of a
nightly compiler. This isolates the nightly build to just the nightly
documentation generation and fuzzing, both of which rely on nightly for
the best results right now.

* Use updated stable in book build job

* Run rustfmt for new stable

* Silence new warnings for wasi-nn

* Allow some dead code in the x64 backend

Looks like new rustc is better about emitting some dead-code warnings

* Update rust in peepmatic job

* Fix a test in the pooling allocator

* Remove `package.metdata.docs.rs` temporarily

Needs resolution of https://github.com/rust-lang/cargo/pull/9300 first

* Fix a warning in a wasi-nn example
2021-03-25 13:18:59 -05:00
Benjamin Bouvier
6e6713ae0b cranelift: add support for the Mac aarch64 calling convention
This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).

Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.
2021-03-22 10:06:13 +01:00
Nick Fitzgerald
d081ef9c2e Bump Wasmtime to 0.25.0; Cranelift to 0.72.0 2021-03-16 11:02:56 -07:00
Anton Kirilov
07c27039b1 Cranelift AArch64: Add initial support for the Armv8.1 atomics
This commit enables Cranelift's AArch64 backend to generate code
for instruction set extensions (previously only the base Armv8-A
architecture was supported); also, it makes it possible to detect
the extensions supported by the host when JIT compiling. The new
functionality is applied to the IR instruction `AtomicCas`.

Copyright (c) 2021, Arm Limited.
2021-03-13 02:31:51 +00:00
Chris Fallin
2d5db92a9e Rework/simplify unwind infrastructure and implement Windows unwind.
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for `alloca` (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
`abi_impl.rs`.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from `sp` once to allocate space, then used stores to `[sp,
offset]` to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.
2021-03-11 20:03:52 -08:00
Chris Fallin
e41d882144 Merge pull request #2678 from cfallin/x64-fastcall
x86-64 Windows fastcall ABI support.
2021-03-05 10:46:47 -08:00
Dan Gohman
8854dec01d Bump version to 0.24.0
I used a specially modified version of the publish script to avoid
bumping the `witx` version.
2021-03-04 18:17:03 -08:00
Chris Fallin
6c94eb82aa x86-64 Windows fastcall ABI support.
This adds support for the "fastcall" ABI, which is the native C/C++ ABI
on Windows platforms on x86-64. It is similar to but not exactly like
System V; primarily, its argument register assignments are different,
and it requires stack shadow space.

Note that this also adjusts the handling of multi-register values in the
shared ABI implementation, and with this change, adjusts handling of
`i128`s on *both* Fastcall/x64 *and* SysV/x64 platforms. This was done
to align with actual behavior by the "rustc ABI" on both platforms, as
mapped out experimentally (Compiler Explorer link in comments). This
behavior is gated under the `enable_llvm_abi_extensions` flag.

Note also that this does *not* add x64 unwind info on Windows. That will
come in a future PR (but is planned!).
2021-03-03 19:53:18 -08:00
Chris Fallin
c07ec4c525 Merge pull request #2653 from bjorn3/more_atomic_ops
More atomic ops
2021-02-18 08:34:58 -08:00
bjorn3
ff22842da5 More atomic ops 2021-02-18 14:16:15 +01:00
bjorn3
2fc964ea35 Add serde serialization support for the full clif ir 2021-02-18 11:27:02 +01:00
Dan Gohman
8d90ea0390 Bump version to 0.23.0
I used a specially modified version of the publish script to avoid
bumping the `witx` version.
2021-02-17 15:35:43 -08:00
Alex Crichton
09b976e1d5 Fix a number of warnings on nightly Rust (#2652)
This fixes some issues that are cropping up where some syntax will get
phased out in 2021
2021-02-11 12:42:45 -06:00
Alex Crichton
503129ad91 Add a method to share Config across machines (#2608)
With `Module::{serialize,deserialize}` it should be possible to share
wasmtime modules across machines or CPUs. Serialization, however, embeds
a hash of all configuration values, including cranelift compilation
settings. By default wasmtime's selection of the native ISA would enable
ISA flags according to CPU features available on the host, but the same
CPU features may not be available across two machines.

This commit adds a `Config::cranelift_clear_cpu_flags` method which
allows clearing the target-specific ISA flags that are automatically
inferred by default for the native CPU. Options can then be
incrementally built back up as-desired with teh `cranelift_other_flag`
method.
2021-01-26 15:59:12 -06:00
Nick Fitzgerald
5ad82de3c5 Bump Wasmtime to 0.22.0; Cranelift to 0.69.0 2021-01-07 14:51:12 -08:00
Yury Delendik
2964023a77 [SIMD][x86_64] Add encoding for PMADDWD (#2530)
* [SIMD][x86_64] Add encoding for PMADDWD

* also for "experimental_x64"
2020-12-24 07:52:50 -06:00
bjorn3
8f7f8ee0b4 Fix iconst.i8 0 miscompilation 2020-12-12 09:44:05 +01:00
Chris Fallin
39b5736727 Remove LoadSplat opcode, in preparation for pattern-matching Load+Splat.
This was added as an incremental step to improve AArch64 code quality in
PR #2278. At the time, we did not have a way to pattern-match the load +
splat opcode sequence that the relevant Wasm opcodes lowered to.
However, now with PR #2366, we can merge effectful instructions such as
loads into other ops, and so we can do this pattern matching directly.
The pattern-matching update will come in a subsequent commit.
2020-11-16 15:31:56 -08:00
Alex Crichton
ab1958434a Bump to 0.21.0 (#2359) 2020-11-05 09:39:53 -06:00
Julian Seward
dd9bfcefaa CL/aarch64: implement the wasm SIMD v128.load{32,64}_zero instructions.
This patch implements, for aarch64, the following wasm SIMD extensions.

  v128.load32_zero and v128.load64_zero instructions
  https://github.com/WebAssembly/simd/pull/237

The changes are straightforward:

* no new CLIF instructions.  They are translated into an existing CLIF scalar
  load followed by a CLIF `scalar_to_vector`.

* the comment/specification for CLIF `scalar_to_vector` has been changed to
  match the actual intended semantics, per consulation with Andrew Brown.

* translation from `scalar_to_vector` to aarch64 `fmov` instruction.  This
  has been generalised slightly so as to allow both 32- and 64-bit transfers.

* special-case zero in `lower_constant_f128` in order to avoid a
  potentially slow call to `Inst::load_fp_constant128`.

* Once "Allow loads to merge into other operations during instruction
  selection in MachInst backends"
  (https://github.com/bytecodealliance/wasmtime/issues/2340) lands,
  we can use that functionality to pattern match the two-CLIF pair and
  emit a single AArch64 instruction.

* A simple filetest has been added.

There is no comprehensive testcase in this commit, because that is a separate
repo.  The implementation has been tested, nevertheless.
2020-11-04 20:00:04 +01:00
Julian Seward
5a5fb11979 CL/aarch64: implement the wasm SIMD i32x4.dot_i16x8_s instruction
This patch implements, for aarch64, the following wasm SIMD extensions

  i32x4.dot_i16x8_s instruction
  https://github.com/WebAssembly/simd/pull/127

It also updates dependencies as follows, in order that the new instruction can
be parsed, decoded, etc:

  wat          to  1.0.27
  wast         to  26.0.1
  wasmparser   to  0.65.0
  wasmprinter  to  0.2.12

The changes are straightforward:

* new CLIF instruction `widening_pairwise_dot_product_s`

* translation from wasm into `widening_pairwise_dot_product_s`

* new AArch64 instructions `smull`, `smull2` (part of the `VecRRR` group)

* translation from `widening_pairwise_dot_product_s` to `smull ; smull2 ; addv`

There is no testcase in this commit, because that is a separate repo.  The
implementation has been tested, nevertheless.
2020-11-03 14:25:04 +01:00
Chris Fallin
c35904a8bf Merge pull request #2278 from akirilov-arm/load_splat
Introduce the Cranelift IR instruction `LoadSplat`
2020-10-28 12:54:03 -07:00
Julian Seward
c15d9bd61b CL/aarch64: implement the wasm SIMD pseudo-max/min and FP-rounding instructions
This patch implements, for aarch64, the following wasm SIMD extensions

  Floating-point rounding instructions
  https://github.com/WebAssembly/simd/pull/232

  Pseudo-Minimum and Pseudo-Maximum instructions
  https://github.com/WebAssembly/simd/pull/122

The changes are straightforward:

* `build.rs`: the relevant tests have been enabled

* `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions
  `fmin_pseudo` and `fmax_pseudo`.  The wasm rounding instructions do not need
  any new CLIF instructions.

* `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is
  pretty much the same as any other unary or binary vector instruction (for
  the rounding and the pmin/max respectively)

* `cranelift/codegen/src/isa/aarch64/lower_inst.rs`:
  - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction
    sequence, `fcmpgt` followed by `bsl`
  - the CLIF rounding instructions are converted to a suitable vector
    `frint{n,z,p,m}` instruction.

* `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub
  enum VecMisc2` to handle the rounding operations.  And corresponding `emit`
  cases.
2020-10-26 10:37:07 +01:00
Julian Seward
2702942050 CL/aarch64 back end: implement the wasm SIMD bitmask instructions
The `bitmask.{8x16,16x8,32x4}` instructions do not map neatly to any single
AArch64 SIMD instruction, and instead need a sequence of around ten
instructions.  Because of this, this patch is somewhat longer and more complex
than it would be for (eg) x64.

Main changes are:

* the relevant testsuite test (`simd_boolean.wast`) has been enabled on aarch64.

* at the CLIF level, add a new instruction `vhigh_bits`, into which these wasm
  instructions are to be translated.

* in the wasm->CLIF translation (code_translator.rs), translate into
  `vhigh_bits`.  This is straightforward.

* in the CLIF->AArch64 translation (lower_inst.rs), translate `vhigh_bits`
  into equivalent sequences of AArch64 instructions.  There is a different
  sequence for each of the `{8x16, 16x8, 32x4}` variants.

All other changes are AArch64-specific, and add instruction definitions needed
by the previous step:

* Add two new families of AArch64 instructions: `VecShiftImm` (vector shift by
  immediate) and `VecExtract` (effectively a double-length vector shift)

* To the existing AArch64 family `VecRRR`, add a `zip1` variant.  To the
  `VecLanesOp` family add an `addv` variant.

* Add supporting code for the above changes to AArch64 instructions:
  - getting the register uses (`aarch64_get_regs`)
  - mapping the registers (`aarch64_map_regs`)
  - printing instructions
  - emitting instructions (`impl MachInstEmit for Inst`).  The handling of
    `VecShiftImm` is a bit complex.
  - emission tests for new instructions and variants.
2020-10-23 05:26:25 +02:00
Anton Kirilov
e0b911a4df Introduce the Cranelift IR instruction LoadSplat
It corresponds to WebAssembly's `load*_splat` operations, which
were previously represented as a combination of `Load` and `Splat`
instructions. However, there are architectures such as Armv8-A
that have a single machine instruction equivalent to the Wasm
operations. In order to generate it, it is necessary to merge the
`Load` and the `Splat` in the backend, which is not possible
because the load may have side effects. The new IR instruction
works around this limitation.

The AArch64 backend leverages the new instruction to improve code
generation.

Copyright (c) 2020, Arm Limited.
2020-10-14 13:07:13 +01:00
Chris Fallin
835db11bea Support for SpiderMonkey's "Wasm ABI 2020".
As part of a Wasm JIT update, SpiderMonkey is changing its internal
WebAssembly function ABI. The new ABI's frame format includes "caller
TLS" and "callee TLS" slots. The details of where these come from are
not important; from Cranelift's point of view, the only relevant
requirement is that we have two on-stack args that are always present
(offsetting other on-stack args), and that we define special argument
purposes so that we can supply values for these slots.

Note that this adds a *new* ABI (a variant of the Baldrdash ABI) because
we do not want to tightly couple the landing of this PR to the landing
of the changes in SpiderMonkey; it's better if both the old and new
behavior remain available in Cranelift, so SpiderMonkey can continue to
vendor Cranelift even if it does not land (or backs out) the ABI change.

Furthermore, note that this needs to be a Cranelift-level change (i.e.
cannot be done purely from the translator environment implementation)
because the special TLS arguments must always go on the stack, which
would not otherwise happen with the usual argument-placement logic; and
there is no primitive to push a value directly in CLIF code (the notion
of a stack frame is a lower-level concept).
2020-09-30 14:55:56 -07:00