Commit Graph

3154 Commits

Author SHA1 Message Date
Afonso Bordado
c82764605f aarch64: Add i128 load & store tests and refactor address calculation
The previous address calculation code had a bug where we tried to
add offsets into a temporary register before defining it, causing
the regalloc to complain.
2021-06-17 15:50:08 +01:00
Ulrich Weigand
def54fb1fa s390x: Add z14 support
* Add support for processor features (including auto-detection).

* Move base architecture set requirement back to z14.

* Add z15 feature sets and re-enable z15-specific code generation
  when required features are available.
2021-06-17 10:23:15 +02:00
Afonso Bordado
9fc89d2316 aarch64: Add bitrev,clz,cls,ctz for i128 values 2021-06-16 10:44:10 +01:00
Afonso Bordado
09fec151eb aarch64: Add popcnt for i128 values 2021-06-16 10:44:10 +01:00
Alex Crichton
5140fd251a Update wasm-tools crates (#2989)
* Update wasm-tools crates

This brings in recent updates, notably including more improvements to
wasm-smith which will hopefully help exercise non-trapping wasm more.

* Fix some wat
2021-06-15 22:56:10 -05:00
Ulrich Weigand
46b73431ca s390x: Add support for atomic operations (part 1)
This adds full back-end support for the Fence, AtomicLoad
and AtomicStore operations, and partial support for the
AtomicCas and AtomicRmw operations.

The missing pieces include sub-word operations, operations
on little-endian memory requiring byte-swapping, and some
of the subtypes of AtomicRmw -- everything that cannot be
implemented without a compare-and-swap loop.  This will be
done in a follow-up patch.

This patch already suffices to make the test suite green
again after a recent change that now requires atomic
operations when accessing the heap.
2021-06-15 17:12:11 +02:00
Afonso Bordado
1c05e06bd5 aarch64: Implement I128 Loads and Stores 2021-06-14 21:56:53 +01:00
bjorn3
e9f3ac2068 Update object 2021-06-12 12:02:09 +02:00
bjorn3
7d23a9daf7 Lower Arm64Call relocations in cranelift-object 2021-06-10 12:54:25 +02:00
Chris Fallin
3d56728b86 Merge pull request #2975 from afonso360/aarch64-icmp
aarch64: Implement lowering i128 icmp instructions
2021-06-09 15:38:41 -07:00
Afonso Bordado
2643d2654c aarch64: Implement lowering i128 icmp instructions
We have 3 different aproaches depending on the type of comparision requested:
* For eq/ne we compare the high bits and low bits and check
  if they are equal
* For overflow checks, we perform a i128 add and check the
  resulting overflow flag
* For the remaining comparisions (gt/lt/sgt/etc...)
  We compare both the low bits and high bits, and if the high bits are
  equal we return the result of the unsigned comparision on the low bits

As with other i128 ops, we are still missing immlogic support.
2021-06-09 23:02:55 +01:00
Afonso Bordado
4d085d8fbf aarch64: Add sbcs instruction encodings 2021-06-09 22:56:39 +01:00
Afonso Bordado
61f07d79a7 aarch64: Add adcs instruction encodings 2021-06-09 22:56:39 +01:00
Afonso Bordado
b1475f32a6 aarch64: Add ishl,ushr,sshr for i128 values 2021-06-09 22:48:14 +01:00
Afonso Bordado
2c4d1c0003 aarch64: Add ands instruction encoding 2021-06-09 22:38:01 +01:00
Afonso Bordado
c38a5e8b62 aarch64: Add basic i128 bit ops to the AArch64 backend
Currently we just basically use a two instruction version of the same i64 ops.
IMMLogic doesn't really support multiple register inputs, so its left as a TODO for future optimizations.
2021-06-09 22:37:55 +01:00
Chris Fallin
59ebe4fa57 Merge pull request #2964 from afonso360/merge-runtests
cranelift: Merge all run tests into runtests dir
2021-06-09 12:26:45 -07:00
Alex Crichton
e8b8947956 Bump to 0.28.0 (#2972) 2021-06-09 14:00:13 -05:00
bjorn3
e6f399419c Atomic hotswapping in JIT mode (#2786)
* Introduce new_got_entry and new_plt_entry functions

* Return NonNull<*const u8> from get_got_address

* Make GOT entry writes atomic

* Defer GOT updates until relocations and protection

Co-authored-by: Alan Egerton <eggyal@gmail.com>
2021-06-09 09:51:11 -07:00
Afonso Bordado
7147e95add cranelift: Fix endianness bug in filetests runner
Enabling runtests for the s390x backend exposed a pre-existing endian bug with handling bool test case return values.

These are written as integers of the same width by the trampoline, but are always read out as the Rust "bool" type. This happens to work on little-endian systems, but fails for any boolean type larger than 1 byte on big-endian systems.

See: https://github.com/bytecodealliance/wasmtime/pull/2964#issuecomment-855879866
2021-06-07 14:53:18 +01:00
Afonso Bordado
214755c6a0 cranelift: Merge all run tests into runtests dir
With this change we now reuse tests across multiple arches.

Duplicate tests were merged into the same file where possible.
Some legacy x86 tests were left in separate files due to incompatibilities with the rest of the test suite.
2021-06-07 14:44:11 +01:00
Chris Fallin
ffb92d9109 Merge pull request #2966 from akirilov-arm/simd_int_to_int_extend
Enable the simd_int_to_int_extend test for AArch64
2021-06-06 23:34:52 -07:00
Johnnie Birch
1770880e19 x64: add support for packed promote and demote (#2783)
* Add support for x64 packed promote low

* Add support for x64 packed floating point demote

* Update vector promote low and demote by adding constraints

Also does some renaming and minor refactoring
2021-06-04 15:59:20 -07:00
Anton Kirilov
5e8a8fe5a0 Enable the simd_int_to_int_extend test for AArch64
Copyright (c) 2021, Arm Limited.
2021-06-04 16:10:02 +01:00
Alex Crichton
0c90d789e8 Update Cargo.lock, remove deny.toml warning (#2965)
This runs through and updates all our dependencies, additionally
removing the need for ignoring a rustsec warning.

Closes #2882
2021-06-03 17:11:07 -05:00
Andrew Brown
e25bf362ab Switch to using TargetFrontendConfig in the frontend memory tests
Commit 7d36fd9a1e avoided these
x86-specific tests altogether. This change avoids any dependency on x86
entirely by specifying a frontend configuration (SystemV + U64); this is
enough information for the `FunctionBuilder` to correctly generate the
syscalls.
2021-06-03 11:01:13 -07:00
Alex Crichton
7a1b7cdf92 Implement RFC 11: Redesigning Wasmtime's APIs (#2897)
Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread.
2021-06-03 09:10:53 -05:00
Chris Fallin
a5a28b1c5b Merge pull request #2958 from abrown/fix-avx512-flags
x64: fix AVX512 flag checks
2021-06-02 18:03:51 -07:00
Olivier Lemasle
7d36fd9a1e Restrict running tests dependent of x86_64
These 5 tests fail with the error "This test requires x86_64 support." when
executed on another architecture.
2021-06-02 10:35:35 -07:00
Andrew Brown
8dc4cc9fe3 x64: fix AVX512 flag checks
Previously, the multiple flags for certain AVX512 instructions were
checked using `OR`: e.g., if the CPU has AVX512VL `OR` AVX512DQ,
emit `VPMULLQ`. This is incorrect--the logic should be `AND`. The Intel
Software Developer Manual, vol. 1, sec. 15.4, has more information on
this (notable there is the suggestion to check with `XGETBV` that the OS
is allowing the use of the XMM registers--but that is a separate issue).
This change switches to `AND` logic in the new backend.
2021-06-01 11:41:16 -07:00
Andrew Brown
2a9f458ea3 x64: lower i8x16.shuffle to VPERMI2B when possible
When shuffling values from two different registers, the x64 lowering for
`i8x16.shuffle` must first shuffle each register separately and then OR
the results with SSE instructions. With `VPERMI2B`, available in
AVX512VL + AVX512VBMI, this can be done in a single instruction after
the shuffle mask has been moved into the destination register. This
change uses `VPERMI2B` for that case when the CPU supports it.
2021-06-01 11:40:53 -07:00
Benjamin Bouvier
51edea9e57 cranelift: introduce a new WasmtimeAppleAarch64 calling convention
The previous choice to use the WasmtimeSystemV calling convention for
apple-aarch64 devices was incorrect: padding of arguments was
incorrectly computed. So we have to use some flavor of the apple-aarch64
ABI there.

Since we want to support the wasmtime custom convention for multiple
returns on apple-aarch64 too, a new custom Wasmtime calling convention
was introduced to support this.
2021-06-01 17:29:12 +02:00
Olivier Lemasle
eadafb1554 s390x: Fix import
cranelift-codegen's build failed on s390x, with this error:

```
error[E0432]: unresolved import `crate::isa::unwind::systemv`
 --> cranelift/codegen/src/isa/s390x/mod.rs:6:25
  |
6 | use crate::isa::unwind::systemv::RegisterMappingError;
  |                         ^^^^^^^ could not find `systemv` in `unwind`
```

This import should be used only with `unwind` feature enabled.
2021-06-01 15:16:41 +02:00
Andrew Brown
40d546c420 Upgrade capstone to v0.8 (#2948) 2021-05-28 09:51:14 -05:00
Olivier Lemasle
33c791e1f5 Add license files
This commit adds LICENSE files to all **published** crates which do
not have it already (most of the crates have it).

Providing the license files is a requiment of the Apache 2.0 License.
2021-05-27 11:56:58 -07:00
Amanieu d'Antras
76664fc73e Optimize codegen for SecondaryMap indexing (#2940)
Moves the slow path which resizes the vector out-of-line. The actual
indexing is also done in the out-of-line path which avoids the need for
a second bounds check in the fast path after a potential resize.
2021-05-27 11:09:15 -05:00
Andrew Brown
459fce3467 x64: lower i8x16.popcnt to VPOPCNTB when possible
When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt`
instruction can be lowered to a single x64 instruction, `VPOPCNTB`,
instead of 8+ instructions.
2021-05-25 12:16:25 -07:00
Chris Fallin
f2fe0c669e Merge pull request #2929 from cfallin/bb-offsets
Provide BB layout info externally in terms of code offsets.
2021-05-24 14:27:53 -07:00
Chris Fallin
800cf25bb5 Make the CFG metadata computation conditional on a flag. 2021-05-24 13:01:15 -07:00
Afonso Bordado
4ddbfe50ba aarch64: Implement imul for i128 operands 2021-05-24 18:23:30 +01:00
Chris Fallin
11a2ef01e7 Provide BB layout info externally in terms of code offsets.
This is sometimes useful when performing analyses on the generated
machine code: for example, some kinds of code verifiers will want to do
a control-flow analysis, and it is much easier to do this if one does
not have to recover the CFG from the machine code (doing so requires
heavyweight analysis when indirect branches are involved). If one trusts
the control-flow lowering and only needs to verify other properties of
the code, this can be very useful.
2021-05-24 09:18:06 -07:00
Afonso Bordado
a2e74b2c45 aarch64: Implement isub for i128 operands 2021-05-22 21:51:41 +01:00
Afonso Bordado
d3b525fa29 aarch64: Implement iadd for i128 operands 2021-05-22 21:21:44 +01:00
Johnnie Birch
9a5c9607e1 Vpopcnt for x64 2021-05-21 19:23:26 -07:00
Chris Fallin
65e0e20210 Merge pull request #2892 from afonso360/aarch64-multireg-args
Handle i128 arguments in the aarch64 ABI
2021-05-21 16:57:42 -07:00
Chris Fallin
95559c01aa Merge pull request from GHSA-hpqh-2wqx-7qp5
Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Release v0.74.0 as security-patch release.
2021-05-21 12:01:55 -07:00
Afonso Bordado
fbcfffdeab Handle spilling i128 arguments into the stack in aarch64 2021-05-21 17:05:41 +01:00
Chris Fallin
88455007b2 Bump Wasmtime to v0.27.0 and Cranelift to v0.74.0. 2021-05-20 14:06:41 -07:00
Andrew Brown
54b45d28a3 x64: lower fcvt_from_uint to VCVTUDQ2PS when possible
When AVX512VL and AVX512F are available, use a single instruction
(`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This
optimization is a port from the legacy x86 backend.
2021-05-19 12:20:11 -07:00
Chris Fallin
a1c9b06cea Fix spillslot reload of narrow values: zero-extend, don't sign-extend.
Previously, the x64 backend's ABI code would generate a sign-extending
load when loading a less-than-64-bit integer from a spillslot. This is
incorrect: e.g., for i32s > 0x80000000, this would result in all high
bits set.

This interacts poorly with another optimization. Normally, the invariant
is that the high bits of a register holding a value of a certain type,
beyond that type's bits, are undefined. However, as an optimization, we
recognize and use the fact that on x86-64, 32-bit instructions zero the
upper 32 bits. This allows us to elide a 32-to-64-bit zero-extend op
(turning it into just a move, which can then sometimes disappear
entirely due to register coalescing).

If a spill and reload happen between the production of a 32-bit value
from an instruction known to zero the upper bits and its use, then we
will rely on zero upper bits that might actually be set by a
sign-extend. This will result in incorrect execution.

As a fix, we stick to a simple invariant: we always spill and reload a
full 64 bits when handling integer registers on x64. This ensures that
no bits are mangled.
2021-05-19 12:19:19 -07:00