Commit Graph

1825 Commits

Author SHA1 Message Date
Andrew Brown
8dc4cc9fe3 x64: fix AVX512 flag checks
Previously, the multiple flags for certain AVX512 instructions were
checked using `OR`: e.g., if the CPU has AVX512VL `OR` AVX512DQ,
emit `VPMULLQ`. This is incorrect--the logic should be `AND`. The Intel
Software Developer Manual, vol. 1, sec. 15.4, has more information on
this (notable there is the suggestion to check with `XGETBV` that the OS
is allowing the use of the XMM registers--but that is a separate issue).
This change switches to `AND` logic in the new backend.
2021-06-01 11:41:16 -07:00
Andrew Brown
2a9f458ea3 x64: lower i8x16.shuffle to VPERMI2B when possible
When shuffling values from two different registers, the x64 lowering for
`i8x16.shuffle` must first shuffle each register separately and then OR
the results with SSE instructions. With `VPERMI2B`, available in
AVX512VL + AVX512VBMI, this can be done in a single instruction after
the shuffle mask has been moved into the destination register. This
change uses `VPERMI2B` for that case when the CPU supports it.
2021-06-01 11:40:53 -07:00
Benjamin Bouvier
51edea9e57 cranelift: introduce a new WasmtimeAppleAarch64 calling convention
The previous choice to use the WasmtimeSystemV calling convention for
apple-aarch64 devices was incorrect: padding of arguments was
incorrectly computed. So we have to use some flavor of the apple-aarch64
ABI there.

Since we want to support the wasmtime custom convention for multiple
returns on apple-aarch64 too, a new custom Wasmtime calling convention
was introduced to support this.
2021-06-01 17:29:12 +02:00
Olivier Lemasle
eadafb1554 s390x: Fix import
cranelift-codegen's build failed on s390x, with this error:

```
error[E0432]: unresolved import `crate::isa::unwind::systemv`
 --> cranelift/codegen/src/isa/s390x/mod.rs:6:25
  |
6 | use crate::isa::unwind::systemv::RegisterMappingError;
  |                         ^^^^^^^ could not find `systemv` in `unwind`
```

This import should be used only with `unwind` feature enabled.
2021-06-01 15:16:41 +02:00
Andrew Brown
459fce3467 x64: lower i8x16.popcnt to VPOPCNTB when possible
When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt`
instruction can be lowered to a single x64 instruction, `VPOPCNTB`,
instead of 8+ instructions.
2021-05-25 12:16:25 -07:00
Chris Fallin
f2fe0c669e Merge pull request #2929 from cfallin/bb-offsets
Provide BB layout info externally in terms of code offsets.
2021-05-24 14:27:53 -07:00
Chris Fallin
800cf25bb5 Make the CFG metadata computation conditional on a flag. 2021-05-24 13:01:15 -07:00
Afonso Bordado
4ddbfe50ba aarch64: Implement imul for i128 operands 2021-05-24 18:23:30 +01:00
Chris Fallin
11a2ef01e7 Provide BB layout info externally in terms of code offsets.
This is sometimes useful when performing analyses on the generated
machine code: for example, some kinds of code verifiers will want to do
a control-flow analysis, and it is much easier to do this if one does
not have to recover the CFG from the machine code (doing so requires
heavyweight analysis when indirect branches are involved). If one trusts
the control-flow lowering and only needs to verify other properties of
the code, this can be very useful.
2021-05-24 09:18:06 -07:00
Afonso Bordado
a2e74b2c45 aarch64: Implement isub for i128 operands 2021-05-22 21:51:41 +01:00
Afonso Bordado
d3b525fa29 aarch64: Implement iadd for i128 operands 2021-05-22 21:21:44 +01:00
Johnnie Birch
9a5c9607e1 Vpopcnt for x64 2021-05-21 19:23:26 -07:00
Chris Fallin
65e0e20210 Merge pull request #2892 from afonso360/aarch64-multireg-args
Handle i128 arguments in the aarch64 ABI
2021-05-21 16:57:42 -07:00
Chris Fallin
95559c01aa Merge pull request from GHSA-hpqh-2wqx-7qp5
Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Release v0.74.0 as security-patch release.
2021-05-21 12:01:55 -07:00
Afonso Bordado
fbcfffdeab Handle spilling i128 arguments into the stack in aarch64 2021-05-21 17:05:41 +01:00
Andrew Brown
54b45d28a3 x64: lower fcvt_from_uint to VCVTUDQ2PS when possible
When AVX512VL and AVX512F are available, use a single instruction
(`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This
optimization is a port from the legacy x86 backend.
2021-05-19 12:20:11 -07:00
Chris Fallin
a1c9b06cea Fix spillslot reload of narrow values: zero-extend, don't sign-extend.
Previously, the x64 backend's ABI code would generate a sign-extending
load when loading a less-than-64-bit integer from a spillslot. This is
incorrect: e.g., for i32s > 0x80000000, this would result in all high
bits set.

This interacts poorly with another optimization. Normally, the invariant
is that the high bits of a register holding a value of a certain type,
beyond that type's bits, are undefined. However, as an optimization, we
recognize and use the fact that on x86-64, 32-bit instructions zero the
upper 32 bits. This allows us to elide a 32-to-64-bit zero-extend op
(turning it into just a move, which can then sometimes disappear
entirely due to register coalescing).

If a spill and reload happen between the production of a 32-bit value
from an instruction known to zero the upper bits and its use, then we
will rely on zero upper bits that might actually be set by a
sign-extend. This will result in incorrect execution.

As a fix, we stick to a simple invariant: we always spill and reload a
full 64 bits when handling integer registers on x64. This ensures that
no bits are mangled.
2021-05-19 12:19:19 -07:00
Andrew Brown
7ef3ae2903 x64: implement vselect with variable blend instructions
This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`,
and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used
by
[simple_preopt.rs](fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999))
to lower `bitselect` to a single x86 instruction when the condition mask
is known to be boolean (all 1s or 0s, e.g., from a conversion). This is
better than `bitselect` in general, which lowers to 4-5 instructions.
The old backend had the `vselect` lowering; this simply introduces it to
the new backend.
2021-05-17 11:23:33 -07:00
Andrew Brown
bc0df92137 peepmatic: rebuild peephole optimizers after cranelift/meta change 2021-05-17 06:54:45 -07:00
Andrew Brown
84b6f05971 cranelift: remove unreachable scalar lowerings of saturating arithmetic
Since `uadd_sat`, `sadd_sat`, `usub_sat`, and `ssub_sat` are now only
available to vector types, this removes the lowering code for the
scalar versions of these instructions in the arm32 and aarch64 backends.
2021-05-17 06:54:45 -07:00
Andrew Brown
e676589b0c x64: lower i64x2.imul to VPMULLQ when possible
This adds the machinery to encode the VPMULLQ instruction which is
available in AVX512VL and AVX512DQ. When these feature sets are
available, we use this instruction instead of a lengthy 12-instruction
sequence.
2021-05-13 20:14:05 -07:00
Andrew Brown
c982d2be65 x64: move multiplication lowering
Since the lowering of `imul` complicated the other ALU operations it was
matched with and since future commits will alter the multiplication
lowering further, this change moves the `imul` lowering to its own match
block.
2021-05-13 20:14:05 -07:00
Andrew Brown
c89e6b2353 x64: make the x64 module public
In order to benchmark portions of the x64 module, this change makes it a
public module of cranelift-codegen.
2021-05-13 10:46:08 -07:00
Andrew Brown
02796fc670 x64: move encodings to a separate module
In order to benchmark the encoding code with criterion, the functions
and structures must be public. Moving this code to its own module
(instead of keeping as a submodule to `inst`), allows `inst` to remain
private. This avoids having to expose and document (or ignore
documenting) the numerous instruction variants in `inst` while allowing
access to the encoding code. This commit changes no functionality.
2021-05-13 10:46:08 -07:00
Afonso Bordado
ac624da8d9 Handle i128 arguments in the aarch64 ABI
When dealing with params that need to be split, we follow the
arch64 ABI and split the value in two, and make sure that start that
argument in an even numbered xN register.

The apple ABI does not require this, so on those platforms, we start
params anywhere.
2021-05-12 13:06:13 +01:00
Chris Fallin
5fb2c8c235 Merge pull request #2874 from uweigand/s390x-backend
Support IBM z/Architecture
2021-05-10 13:53:23 -07:00
Afonso Bordado
e021995323 Allow i128 amount operands on shift instructions in the x64 backend
Fixes #2727.
2021-05-10 18:32:20 +01:00
Ulrich Weigand
89b5fc776d Support IBM z/Architecture
This adds support for the IBM z/Architecture (s390x-ibm-linux).

The status of the s390x backend in its current form is:
- Wasmtime is fully functional and passes all tests on s390x.
- All back-end features supported, with the exception of SIMD.
- There is still a lot of potential for performance improvements.
- Currently the only supported processor type is z15.
2021-05-10 16:01:16 +02:00
bjorn3
82f3ad4f1a Add comment why thiserror is not used 2021-05-04 13:51:28 +02:00
bjorn3
03fdbadfb4 Remove thiserror dependency from cranelift_codegen 2021-05-04 13:45:20 +02:00
Ulrich Weigand
e1cc1a67d5 Object file support for s390x (#2872)
Add support for s390x binary format object files.  In particular,
add support for s390x ELF relocation types (currently only
S390xPCRel32Dbl).
2021-05-03 11:50:00 -05:00
Anton Kirilov
480670e17f Enable the simd_boolean test for AArch64
Also, enable the simd_i64x2_arith2 test because it doesn't need
any code changes.

Copyright (c) 2021, Arm Limited.
2021-04-27 20:19:51 +01:00
Jubilee Young
a8c956ede1 Factor out byteorder in cranelift
This removes an existing dependency on the byteorder crate in favor of
using std equivalents directly.

While not an issue for wasmtime per se, cranelift is now part of the
critical path of building and testing Rust, and minimizing dependencies,
even small ones, can help reduce the time and bandwidth required.
2021-04-23 12:05:18 -07:00
StackDoubleFlow
9637bc5a09 Fix cranelift Module and ObjectModule docs links (#2852) 2021-04-21 06:29:02 -07:00
Benjamin Bouvier
8ab3511b3b Generate unwind information on Win64 with the old backend
Following the new ABI introduced for efficient support of multiple return values, the old-backend test for generating unwind information was incomplete, resulting in no unwind information being generated and traps not being correctly caught by the runtime.
2021-04-16 18:05:49 +02:00
Benjamin Bouvier
50aa645769 cranelift: use a deferred display wrapper for logging the vcode's IR 2021-04-16 10:27:19 +02:00
Chris Fallin
03077e0de9 Merge pull request #2843 from uweigand/spillslot-fix
cranelift: Fix spillslot regression on big-endian platforms
2021-04-15 13:28:33 -07:00
Ulrich Weigand
10efe8e780 cranelift: Fix spillslot regression on big-endian platforms
PR 2840 changed the store_spillslot routine to always store
integer registers in full word size to a spill slot.  However,
the load_spillslot routine was not updated, which may causes
the contents to be reloaded in a different type.  On big-endian
systems this will fetch wrong data.

Fixed by using the same type override in load_spillslot.
2021-04-15 21:39:14 +02:00
Andrew Brown
0acc1451ea x64: lower iabs.i64x2 using a single AVX512 instruction when possible (#2819)
* x64: add EVEX encoding mechanism

Also, includes an empty stub module for the VEX encoding.

* x64: lower abs.i64x2 to VPABSQ when available

* x64: refactor EVEX encodings to use `EvexInstruction`

This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked.

* x64: rename encoding CodeSink to ByteSink
2021-04-15 11:53:58 -07:00
Chris Fallin
36c667d58d Merge pull request #2837 from uweigand/outgoing-args
Add back support for accumulating outgoing arguments
2021-04-14 12:54:06 -07:00
Chris Fallin
fd4bfbe5a7 Merge pull request #2836 from uweigand/framesizefix
Fix frame size after unwind rework
2021-04-14 12:19:38 -07:00
Chris Fallin
1f21b32e99 Merge pull request #2838 from uweigand/optionalfp
Allow unwind support to work without a frame pointer
2021-04-14 10:58:51 -07:00
Chris Fallin
337cc47d2f Merge pull request #2840 from bnjbvr/fix-2839
cranelift: always spill i32 with i64 stores
2021-04-14 10:11:47 -07:00
Benjamin Bouvier
e7bced9512 cranelift: always spill i32 with i64 stores;
Fixes #2839. See also the issue description and comments in this commits for
details of what the fix is about here.
2021-04-14 18:08:52 +02:00
Ulrich Weigand
5904c09682 Allow unwind support to work without a frame pointer
The patch extends the unwinder to support targets that do not need
to use a dedicated frame pointer register.  Specifically, the
changes include:

- Change the "fp" routine in the RegisterMapper to return an
  *optional* frame pointer regsiter via Option<Register>.

- On targets that choose to not define a FP register via the above
  routine, the UnwindInst::DefineNewFrame operation no longer switches
  the CFA to be defined in terms of the FP.  (The operation still can
  be used to define the location of the clobber area.)

- In addition, on targets that choose not to define a FP register, the
  UnwindInst::PushFrameRegs operation is not supported.

- There is a new operation UnwindInst::StackAlloc that needs to be
  called on targets without FP whenever the stack pointer is updated.
  This caused the CFA offset to be adjusted accordingly.  (On
  targets with FP this operation is a no-op.)
2021-04-14 15:32:31 +02:00
Ulrich Weigand
336c6369b4 Add back support for accumulating outgoing arguments
The unwind rework (commit 2d5db92a) removed support for the
feature to allow a target to allocate the space for outgoing
function arguments right in the prologue (originally added
via commit 80c2d70d).   This patch adds it back.
2021-04-14 13:51:16 +02:00
Ulrich Weigand
e3bb36ba77 Fix frame size after unwind rework
After the unwind rework (commit 2d5db92a) the space used to save
clobbered registers now lies between the nominal SP and the FP.
Therefore, the size of that space should now be included in the
frame size as reported by frame_size(), since this value is used
to compute the nominal_sp_to_fp offset.
2021-04-14 13:46:08 +02:00
Chris Fallin
27b3162f87 Merge pull request #2833 from abrown/2826
x64: fix Inst::store to understand all scalar types
2021-04-13 15:36:41 -07:00
Chris Fallin
8caac9ed79 Merge pull request #2823 from akirilov-arm/callee_saves
Cranelift AArch64: Improve the handling of callee-saved registers
2021-04-13 15:35:46 -07:00
Andrew Brown
6bdef48473 x64: refactor to use Inst::store during lowering
This re-factoring replaces uses of `Inst::mov_r_m` with `Inst::store` to ensure there is only one code location to troubleshoot when generating store instructions for a specific type.
2021-04-13 13:09:07 -07:00