Commit Graph

958 Commits

Author SHA1 Message Date
Afonso Bordado
7a5948f729 aarch64: Implement lowering i128 select 2021-06-24 16:19:25 +01:00
Afonso Bordado
b8ad99e435 aarch64: Implement TLS ELF GD Relocations
Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD.

This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers.
However currently we only recognize elf_gd so lets start with that as a TLS implementation.
2021-06-24 12:21:44 +01:00
Chris Fallin
4a6594c514 Merge pull request #3011 from cfallin/bint-x64
Fix `bint` on x64, and make `bextend` consistent with bool representation.
2021-06-22 11:26:20 -07:00
Chris Fallin
efe3930215 Fix bint on x64, and make bextend consistent with bool representation.
There has been occasional confusion with the representation that we use
for bool-typed values in registers, at least when these are wider than
one bit. Does a `b8` store `true` as 1, or as all-ones (`0xff`)?

We've settled on the latter because of some use-cases where the wide
bool becomes a mask -- see #2058 for more on this.

This is fine, and transparent, to most operations within CLIF, because
the bool-typed value still has only two semantically-visible states,
namely `true` and `false`.

However, we have to be careful with bool-to-int conversions. `bint` on
aarch64 correctly masked the all-ones value down to 0 or 1, as required
by the instruction specification, but on x64 it did not. This PR fixes
that bug and makes x64 consistent with aarch64.

While staring at this code I realized that `bextend` was also not
consistent with the all-ones invariant: it should do a sign-extend, not
a zero-extend as it previously did. This is also rectified and tested.
(Aarch64 also already had this case implemented correctly.)

Fixes #3003.
2021-06-22 10:56:56 -07:00
Chris Fallin
fa1a04d002 Merge pull request #3005 from afonso360/aarch64-i128-extend
aarch64: Implement uextend/sextend for  i128 values
2021-06-22 10:24:30 -07:00
Afonso Bordado
f25f5b2732 aarch64: Implement lowering uextend/sextend for i128 values 2021-06-22 12:24:07 +01:00
Chris Fallin
18cd2f681c Merge pull request #3002 from afonso360/aarch64-i128-br
aarch64 implement brz,brnz,br_icmp for i128 values
2021-06-21 10:52:50 -07:00
Chris Fallin
444d9f9726 Merge pull request #3008 from afonso360/aarch64-i128-ireduce
aarch64: Implement ireduce for i128 values
2021-06-21 09:54:43 -07:00
Afonso Bordado
151ad2f338 aarch64: Implement ireduce for i128 values 2021-06-20 19:04:45 +01:00
Afonso Bordado
f7f52445c8 aarch64: Implement lowering rotl/rotr for i128 values 2021-06-20 15:53:56 +01:00
Afonso Bordado
da4daa6f32 cranelift: Expand i128 extend testcases 2021-06-20 09:26:09 +01:00
Afonso Bordado
45faace329 aarch64: Implement i128 br_icmp
The previous commit deduplicated the icmp impl, so we reuse that
but make modifications where we don't need to set the results.
2021-06-19 22:01:33 +01:00
Afonso Bordado
a26be628bc aarch64: Implement lowering brz,brnz for i128 values 2021-06-18 00:21:54 +01:00
Chris Fallin
5ddf562309 Merge pull request #2991 from uweigand/s390x-z14
s390x: Add z14 support
2021-06-17 08:24:23 -07:00
Chris Fallin
de1edd4976 Merge pull request #2985 from afonso360/aarch64-i128-load-store
aarch64: Implement I128 Loads and Stores
2021-06-17 08:23:15 -07:00
Afonso Bordado
c82764605f aarch64: Add i128 load & store tests and refactor address calculation
The previous address calculation code had a bug where we tried to
add offsets into a temporary register before defining it, causing
the regalloc to complain.
2021-06-17 15:50:08 +01:00
Ulrich Weigand
def54fb1fa s390x: Add z14 support
* Add support for processor features (including auto-detection).

* Move base architecture set requirement back to z14.

* Add z15 feature sets and re-enable z15-specific code generation
  when required features are available.
2021-06-17 10:23:15 +02:00
Afonso Bordado
9fc89d2316 aarch64: Add bitrev,clz,cls,ctz for i128 values 2021-06-16 10:44:10 +01:00
Afonso Bordado
09fec151eb aarch64: Add popcnt for i128 values 2021-06-16 10:44:10 +01:00
Ulrich Weigand
46b73431ca s390x: Add support for atomic operations (part 1)
This adds full back-end support for the Fence, AtomicLoad
and AtomicStore operations, and partial support for the
AtomicCas and AtomicRmw operations.

The missing pieces include sub-word operations, operations
on little-endian memory requiring byte-swapping, and some
of the subtypes of AtomicRmw -- everything that cannot be
implemented without a compare-and-swap loop.  This will be
done in a follow-up patch.

This patch already suffices to make the test suite green
again after a recent change that now requires atomic
operations when accessing the heap.
2021-06-15 17:12:11 +02:00
Afonso Bordado
1c05e06bd5 aarch64: Implement I128 Loads and Stores 2021-06-14 21:56:53 +01:00
Chris Fallin
3d56728b86 Merge pull request #2975 from afonso360/aarch64-icmp
aarch64: Implement lowering i128 icmp instructions
2021-06-09 15:38:41 -07:00
Afonso Bordado
2643d2654c aarch64: Implement lowering i128 icmp instructions
We have 3 different aproaches depending on the type of comparision requested:
* For eq/ne we compare the high bits and low bits and check
  if they are equal
* For overflow checks, we perform a i128 add and check the
  resulting overflow flag
* For the remaining comparisions (gt/lt/sgt/etc...)
  We compare both the low bits and high bits, and if the high bits are
  equal we return the result of the unsigned comparision on the low bits

As with other i128 ops, we are still missing immlogic support.
2021-06-09 23:02:55 +01:00
Afonso Bordado
b1475f32a6 aarch64: Add ishl,ushr,sshr for i128 values 2021-06-09 22:48:14 +01:00
Afonso Bordado
c38a5e8b62 aarch64: Add basic i128 bit ops to the AArch64 backend
Currently we just basically use a two instruction version of the same i64 ops.
IMMLogic doesn't really support multiple register inputs, so its left as a TODO for future optimizations.
2021-06-09 22:37:55 +01:00
Chris Fallin
59ebe4fa57 Merge pull request #2964 from afonso360/merge-runtests
cranelift: Merge all run tests into runtests dir
2021-06-09 12:26:45 -07:00
Alex Crichton
e8b8947956 Bump to 0.28.0 (#2972) 2021-06-09 14:00:13 -05:00
Afonso Bordado
7147e95add cranelift: Fix endianness bug in filetests runner
Enabling runtests for the s390x backend exposed a pre-existing endian bug with handling bool test case return values.

These are written as integers of the same width by the trampoline, but are always read out as the Rust "bool" type. This happens to work on little-endian systems, but fails for any boolean type larger than 1 byte on big-endian systems.

See: https://github.com/bytecodealliance/wasmtime/pull/2964#issuecomment-855879866
2021-06-07 14:53:18 +01:00
Afonso Bordado
214755c6a0 cranelift: Merge all run tests into runtests dir
With this change we now reuse tests across multiple arches.

Duplicate tests were merged into the same file where possible.
Some legacy x86 tests were left in separate files due to incompatibilities with the rest of the test suite.
2021-06-07 14:44:11 +01:00
Afonso Bordado
4ddbfe50ba aarch64: Implement imul for i128 operands 2021-05-24 18:23:30 +01:00
Afonso Bordado
a2e74b2c45 aarch64: Implement isub for i128 operands 2021-05-22 21:51:41 +01:00
Afonso Bordado
d3b525fa29 aarch64: Implement iadd for i128 operands 2021-05-22 21:21:44 +01:00
Chris Fallin
65e0e20210 Merge pull request #2892 from afonso360/aarch64-multireg-args
Handle i128 arguments in the aarch64 ABI
2021-05-21 16:57:42 -07:00
Chris Fallin
95559c01aa Merge pull request from GHSA-hpqh-2wqx-7qp5
Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Release v0.74.0 as security-patch release.
2021-05-21 12:01:55 -07:00
Afonso Bordado
fbcfffdeab Handle spilling i128 arguments into the stack in aarch64 2021-05-21 17:05:41 +01:00
Chris Fallin
88455007b2 Bump Wasmtime to v0.27.0 and Cranelift to v0.74.0. 2021-05-20 14:06:41 -07:00
Andrew Brown
54b45d28a3 x64: lower fcvt_from_uint to VCVTUDQ2PS when possible
When AVX512VL and AVX512F are available, use a single instruction
(`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This
optimization is a port from the legacy x86 backend.
2021-05-19 12:20:11 -07:00
Chris Fallin
a1c9b06cea Fix spillslot reload of narrow values: zero-extend, don't sign-extend.
Previously, the x64 backend's ABI code would generate a sign-extending
load when loading a less-than-64-bit integer from a spillslot. This is
incorrect: e.g., for i32s > 0x80000000, this would result in all high
bits set.

This interacts poorly with another optimization. Normally, the invariant
is that the high bits of a register holding a value of a certain type,
beyond that type's bits, are undefined. However, as an optimization, we
recognize and use the fact that on x86-64, 32-bit instructions zero the
upper 32 bits. This allows us to elide a 32-to-64-bit zero-extend op
(turning it into just a move, which can then sometimes disappear
entirely due to register coalescing).

If a spill and reload happen between the production of a 32-bit value
from an instruction known to zero the upper bits and its use, then we
will rely on zero upper bits that might actually be set by a
sign-extend. This will result in incorrect execution.

As a fix, we stick to a simple invariant: we always spill and reload a
full 64 bits when handling integer registers on x64. This ensures that
no bits are mangled.
2021-05-19 12:19:19 -07:00
Andrew Brown
7ef3ae2903 x64: implement vselect with variable blend instructions
This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`,
and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used
by
[simple_preopt.rs](fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999))
to lower `bitselect` to a single x86 instruction when the condition mask
is known to be boolean (all 1s or 0s, e.g., from a conversion). This is
better than `bitselect` in general, which lowers to 4-5 instructions.
The old backend had the `vselect` lowering; this simply introduces it to
the new backend.
2021-05-17 11:23:33 -07:00
Andrew Brown
1fe7676831 cranelift: only allow vector types with saturating arithmetic
This fixes #2883 by restricting which types are available to the `uadd_sat`, `sadd_sat`, `usub_sat`, and `ssub_sat` IR operations.
2021-05-17 06:54:45 -07:00
Andrew Brown
5929a5e6ee x64: improve arithmetic filetests 2021-05-13 20:14:05 -07:00
Benjamin Bouvier
d7053ea9c7 Upgrade to the latest versions of gimli, addr2line, object (#2901)
* Upgrade to the latest versions of gimli, addr2line, object

And adapt to API changes. New gimli supports wasm dwarf, resulting in
some simplifications in the debug crate.

* upgrade gimli usage in linux-specific profiling too

* Add "continue" statement after interpreting a wasm local dwarf opcode
2021-05-12 10:53:17 -05:00
Afonso Bordado
ac624da8d9 Handle i128 arguments in the aarch64 ABI
When dealing with params that need to be split, we follow the
arch64 ABI and split the value in two, and make sure that start that
argument in an even numbered xN register.

The apple ABI does not require this, so on those platforms, we start
params anywhere.
2021-05-12 13:06:13 +01:00
Chris Fallin
5fb2c8c235 Merge pull request #2874 from uweigand/s390x-backend
Support IBM z/Architecture
2021-05-10 13:53:23 -07:00
Afonso Bordado
e021995323 Allow i128 amount operands on shift instructions in the x64 backend
Fixes #2727.
2021-05-10 18:32:20 +01:00
Ulrich Weigand
89b5fc776d Support IBM z/Architecture
This adds support for the IBM z/Architecture (s390x-ibm-linux).

The status of the s390x backend in its current form is:
- Wasmtime is fully functional and passes all tests on s390x.
- All back-end features supported, with the exception of SIMD.
- There is still a lot of potential for performance improvements.
- Currently the only supported processor type is z15.
2021-05-10 16:01:16 +02:00
Jubilee Young
a8c956ede1 Factor out byteorder in cranelift
This removes an existing dependency on the byteorder crate in favor of
using std equivalents directly.

While not an issue for wasmtime per se, cranelift is now part of the
critical path of building and testing Rust, and minimizing dependencies,
even small ones, can help reduce the time and bandwidth required.
2021-04-23 12:05:18 -07:00
Ulrich Weigand
10efe8e780 cranelift: Fix spillslot regression on big-endian platforms
PR 2840 changed the store_spillslot routine to always store
integer registers in full word size to a spill slot.  However,
the load_spillslot routine was not updated, which may causes
the contents to be reloaded in a different type.  On big-endian
systems this will fetch wrong data.

Fixed by using the same type override in load_spillslot.
2021-04-15 21:39:14 +02:00
Chris Fallin
337cc47d2f Merge pull request #2840 from bnjbvr/fix-2839
cranelift: always spill i32 with i64 stores
2021-04-14 10:11:47 -07:00
Benjamin Bouvier
e7bced9512 cranelift: always spill i32 with i64 stores;
Fixes #2839. See also the issue description and comments in this commits for
details of what the fix is about here.
2021-04-14 18:08:52 +02:00