Commit Graph

59 Commits

Author SHA1 Message Date
Chris Fallin
39b5736727 Remove LoadSplat opcode, in preparation for pattern-matching Load+Splat.
This was added as an incremental step to improve AArch64 code quality in
PR #2278. At the time, we did not have a way to pattern-match the load +
splat opcode sequence that the relevant Wasm opcodes lowered to.
However, now with PR #2366, we can merge effectful instructions such as
loads into other ops, and so we can do this pattern matching directly.
The pattern-matching update will come in a subsequent commit.
2020-11-16 15:31:56 -08:00
Anton Kirilov
e0b911a4df Introduce the Cranelift IR instruction LoadSplat
It corresponds to WebAssembly's `load*_splat` operations, which
were previously represented as a combination of `Load` and `Splat`
instructions. However, there are architectures such as Armv8-A
that have a single machine instruction equivalent to the Wasm
operations. In order to generate it, it is necessary to merge the
`Load` and the `Splat` in the backend, which is not possible
because the load may have side effects. The new IR instruction
works around this limitation.

The AArch64 backend leverages the new instruction to improve code
generation.

Copyright (c) 2020, Arm Limited.
2020-10-14 13:07:13 +01:00
Andrew Brown
f0b083c6ad Legalize [u|s]widen_high for x86
Use `x86_palignr` and `[u|s]widen_low` for legalizing this instruction.
2020-07-15 11:32:08 -07:00
Andrew Brown
c5a69cee9f Add x86 legalization for fcvt_to_uint_sat.i32x4
This converts an `f32x4` into an `i32x4` (unsigned) with rounding by using a long sequence of SSE4.1 compatible instructions.
2020-07-08 10:20:01 -07:00
Andrew Brown
65e6de2344 Replace x86_packss with snarrow
Since the Wasm specification contains narrowing instructions (see https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing) that lower to PACKSS*, the x86-specific instruction is not necessary in the CLIF IR.
2020-07-02 09:35:45 -07:00
Andrew Brown
c9d573d841 Provide spec-compliant legalization for SIMD floating point min/max 2020-06-25 14:48:16 -07:00
Andrew Brown
3675f95bb2 Legalize fcvt_to_sint_sat.i32x4 on x86
Use a lengthy sequence involving CVTTPS2DQ to quiet NaNs and saturate overflow.
2020-06-18 11:39:38 -07:00
Andrew Brown
01d34e71b9 Add x86 legalization for fcvt_from_uint.f32x4
This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
2020-06-12 15:06:22 -07:00
Andrew Brown
1ea09088be Add x86 legalization for imul.i64x2 for non-AVX CPUs
The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.
2020-06-03 16:27:57 -07:00
Andrew Brown
b3a6985cd5 Re-organize transform groups for x86 legalization 2020-06-03 16:27:57 -07:00
Andrew Brown
9ba9fd0f64 Add x86-specific instruction for i64x2 multiplication
Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.
2020-06-03 16:27:57 -07:00
Andrew Brown
7d6e94b952 Replace InsertLane format with TernaryImm8
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
2020-05-29 19:56:27 -07:00
teapotd
e430984ac4 Improve bitselect codegen with knowledge of operand origin (#1783)
* Encode vselect using BLEND instructions on x86

* Legalize vselect to bitselect

* Optimize bitselect to vselect for some operands

* Add run tests for bitselect-vselect optimization

* Address review feedback
2020-05-29 19:53:11 -07:00
whitequark
4ec16fa057 Legalize 64 bit shifts on x86_32 using PSLLQ/PSRLQ.
Co-authored-by: iximeow <git@iximeow.net>
2020-05-09 03:28:19 -07:00
Andrew Brown
cd49ed9582 Add x86 legalization for sshr.i64x2 2020-05-05 12:01:46 -07:00
Andrew Brown
d24f23285b Legalize i8x16.sshr using pack/unpack instructions
Due to arithmetic shift behavior, this legalization cannot easily use the masks for i8x16.ushr or i8x16.ishl
2020-04-23 10:55:54 -07:00
Andrew Brown
5f0286696c Add x86 implentation of 8x16 ishl
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
2020-04-23 10:55:54 -07:00
Andrew Brown
3f47291f2e Add x86 implentation of 8x16 ushr
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
2020-04-17 11:59:47 -07:00
Andrew Brown
fa7481a681 Add x86 implementation of SIMD swizzle instruction 2020-03-06 15:49:53 -08:00
Andrew Brown
442edf5c84 Refactor SIMD legalizations to separate define* function
See https://github.com/bytecodealliance/wasmtime/issues/1168
2020-03-06 14:57:11 -08:00
bjorn3
0a1bb3ba6c Add TLS support for ELF and MachO (#1174)
* Add TLS support
* Add binemit and legalize tests
* Spill all caller-saved registers when necessary
2020-02-25 17:50:04 -08:00
Andrew Brown
3ae1af1ad2 Add new Cranelift instructions for integer min/max
This includes legalizations to the previously-existing x86 SIMD integer min/max.
2020-02-21 09:33:43 -08:00
Peter Delevoryas
18b40d1101 Add ineg legalization for scalar integer types (#1385) 2020-02-14 13:16:02 -08:00
Andrew Brown
6fe86bcb61 Fix SIMD float comparison encoding (#1285)
The Intel manual uses `CMPNLT` and `CMPNLE` to denote not-less-than and not-less-than-or-equals. These were translated previously to `FloatCC::GreaterThan` and `FloatCC::GreaterThanOrEqual` but should be correctly translated to `FloatCC::UnorderedOrGreaterThanOrEqual` and `FloatCC::UnorderedOrGreaterThan`. This change adds the necessary legalizations to make use of these new encodings.
2020-01-08 09:28:05 -08:00
Philip Craig
86b66e8ede Fix build failure in cranelift-codegen (#1294)
error[E0425]: cannot find value `ones` in this scope
   --> cranelift-codegen/meta/src/isa/x86/legalize.rs:564:33
    |
564 |                 def!(c = vconst(ones)),
    |                                 ^^^^ not found in this scope
2019-12-16 19:38:09 -08:00
Andrew Brown
4433ad2858 Fix legalization of icmp ugt (#1278)
Previously, the same pattern (pmax + pcmpeq) as `uge` was used but this logic was incorrect for operands with equal values.
2019-12-16 14:14:51 -07:00
Andrew Brown
6181f20326 Fix legalization of SIMD fneg (#1286)
Previously `fsub` was used but this fails when negating -0.0 and +0.0 in the SIMD spec tests; using more instructions, this change uses shifts to create a constant for flipping the most significant bit of each lane with `bxor`.
2019-12-16 10:32:08 -08:00
Andrew Brown
91d29c09d0 Add x86 SIMD floating-point absolute value 2019-11-15 13:45:25 -08:00
Andrew Brown
6519a43b08 Add x86 SIMD floating-point negation 2019-11-15 13:45:25 -08:00
Andrew Brown
04db2a9f39 Bind constant vectors to vconst; fixes #1052 (#1217) 2019-11-12 15:57:59 -08:00
Benjamin Bouvier
9080a02e10 Replace CraneStation by bytecodealliance everywhere; (#1221) 2019-11-12 10:09:31 -08:00
Andrew Brown
af4637aff6 Add x86 SIMD legalizations for icmp less-than 2019-11-05 16:42:34 -08:00
Andrew Brown
feffed85d2 Add x86 SIMD legalizations for integer greater-than
This includes `icmp ugt`, `icmp sge`, and `icmp uge` for vectors with lanes of I8, I16, and I32.
2019-11-05 16:42:34 -08:00
Andrew Brown
e3a20d67b2 Add x86 SIMD legalization of icmp ne 2019-11-05 16:42:34 -08:00
Peter Huene
9f506692c2 Fix clippy warnings.
This commit fixes the current set of (stable) clippy warnings in the repo.
2019-10-24 17:20:12 -07:00
Andrew Brown
879ccf871a Add x86 SIMD vall_true
In order to implement SIMD's all_true (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#all-lanes-true), we must legalize some instruction (I chose `vall_true`) to a comparison against 0 and a similar reduction as vany_true using `PTEST` and `SETNZ`. Since `icmp` only allows integers but `vall_true` could allow more vector types, `raw_bitcast` is used to convert the lane types into integers, e.g. b32x4 to i32x4. To do so without runtime type-checking, the `raw_bitcast` instruction (which emits no instruction) can now bitcast from any vector type to the same type, e.g. i32x4 to i32x4.
2019-10-22 11:01:05 -07:00
Andrew Brown
186effc420 Add x86 SIMD vany_true and x86_ptest
In order to implement SIMD's any_true (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#any-lane-true), we must legalize some instruction (I chose `vany_true`) to a sequence of `PTEST` and `SETNZ`. To emit `PTEST` I added the new CLIF instruction `x86_ptest` and used CLIF's `trueif ne` for `SETNZ`.
2019-10-22 11:01:05 -07:00
Andrew Brown
b927c55511 Add SIMD bitselect instruction and x86 legalization
This new instructions matches the `bitselect` behavior described in the WASM SIMD spec (https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#bitwise-select)
2019-10-17 15:49:29 -07:00
Andrew Brown
f1904bffea Add x86 SIMD sshr and ushr
Only the shifts with applicable SSE2 instructions are implemented here: PSRL* (for ushr) only has 16-64 bit instructions and PSRA* (for sshr) only has 16-32 bit instructions.
2019-10-15 15:51:50 -07:00
Andrew Brown
6460fe705f Add x86 SIMD ishl
Only the shifts with applicable SSE2 instructions (i.e. 16-64 bit width) are implemented here.
2019-10-15 15:51:50 -07:00
Benjamin Bouvier
350b3b2406 [meta] Avoid unwrapping instructions several times during legalization;
This avoids doing multiple unpacking of the InstructionData for a single
legalization, improving readability and reducing size of the generated
code. For instance, icmp had to unpack the format once per IntCC
condition code.
2019-10-15 11:37:48 +02:00
Andrew Brown
1f728c1797 Add x86 legalization for SIMD bnot 2019-10-11 11:05:24 -07:00
Andrew Brown
6d690e5275 Allow binding immediates to instructions (#1012)
This change should make the code more clear (and less code) when adding encodings for instructions with specific immediates; e.g., a constant with a 0 immediate could be encoded as an XOR with something like `const.bind(...)` without explicitly creating the necessary predicates. It has several parts:
* Introduce Bindable trait to instructions
* Convert all instruction bindings to use Bindable::bind()
* Add ability to bind immediates to BoundInstruction
This is an attempt to reduce some of the issues in #955.
2019-10-10 08:54:46 -07:00
Andrew Brown
ba393afd4d Add x86 legalization for SIMD ineg 2019-09-30 13:54:30 -07:00
Andrew Brown
a3db30d97e Add x86 encoding for SIMD icmp eq
Also adds a predicate for matching the `eq` IntCC code (TODO this should be replaced by something more general)
2019-09-24 09:33:07 -07:00
Andrew Brown
af1499ce99 Add x86 implementation of shuffle 2019-09-19 10:53:40 -07:00
Ujjwal Sharma
3418fb6e18 [codegen] reintroduce support for carry and borrow instructions in RI… (#1005)
Reintroduce support for iadd carry variants and isub borrow variants for
RISC ISAs which had been removed in
https://github.com/CraneStation/cranelift/pull/961 and
https://github.com/CraneStation/cranelift/pull/962 because of the lack
of a proper flags register in RISC architectures.
2019-09-13 17:27:49 +02:00
Andrew Brown
295b2ef614 Avoid extra register movement when lowering an x86 insertlane to a float vector 2019-09-10 10:45:12 -07:00
Andrew Brown
00bedca274 Avoid extra register movement when lowering the x86 extractlane of a float vector
This commit is based on the assumption that floats are already stored in XMM registers in x86. When extracting a lane, cranelift was moving the float to a regular register and back to an XMM register; this change avoids this by shuffling the float value to the lowest bits of the XMM register. It also assumes that the upper bits can be left as is (instead of zeroing them out).
2019-09-10 10:45:12 -07:00
Andrew Brown
ebc783e49b Use raw_bitcast when legalizing splat
raw_bitcast matches the intent of this legalization more clearly (to simply change the CLIF type without changing any bits) and the additional null encodings added are necessary for later instructions
2019-09-10 10:45:12 -07:00