Commit Graph

627 Commits

Author SHA1 Message Date
Andrew Brown
737cf1d605 Implement iabs for x86 SIMD
This only covers the types necessary for implementing the Wasm SIMD spec--`i8x16`, `i16x8`, `i32x4`.
2020-06-30 14:00:17 -07:00
Andrew Brown
26bdf9c333 Add iabs instructions.rs
This is necessary for implementing SIMD [abs](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#lane-wise-integer-absolute-value).
2020-06-30 14:00:17 -07:00
Nick Fitzgerald
8c5f59c0cf wasmtime: Implement table.get and table.set
These instructions have fast, inline JIT paths for the common cases, and only
call out to host VM functions for the slow paths. This required some changes to
`cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert
an instruction sequence within the current basic block,
`FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder`
so that they can create whole new basic blocks. This is necessary for
implementing GC read/write barriers that involve branching (e.g. checking for
null, or whether a store buffer is at capacity).

Furthermore, it required that the `load`, `load_complex`, and `store`
instructions handle loading and storing through an `r{32,64}` rather than just
`i{32,64}` addresses. This involved making `r{32,64}` types acceptable
instantiations of the `iAddr` type variable, plus a few new instruction
encodings.

Part of #929
2020-06-30 12:00:57 -07:00
Andrew Brown
c9d573d841 Provide spec-compliant legalization for SIMD floating point min/max 2020-06-25 14:48:16 -07:00
Benjamin Bouvier
c9a3f05afd machinst x64: implement calls and int cmp/store/loads;
This makes it possible to run a simple recursive fibonacci function in
wasmtime.
2020-06-25 16:20:33 +02:00
Andrew Brown
3675f95bb2 Legalize fcvt_to_sint_sat.i32x4 on x86
Use a lengthy sequence involving CVTTPS2DQ to quiet NaNs and saturate overflow.
2020-06-18 11:39:38 -07:00
Andrew Brown
3740772176 Add encoding for x86 CVTTPS2DQ
This reuses the `x86_cvtt2si` instruction since the packed and scalar versions seem to group together well.
2020-06-18 11:39:38 -07:00
Benjamin Bouvier
dad56a2488 cranelift: add a new resumable_trapnz instruction;
This is useful to have to allow resumable_trap to happen in loop
headers, for instance. This is the correct way to implement interrupt
checks in Spidermonkey, which are effectively resumable traps. Previous
implementation was using traps, which is wrong, since traps semantically
can't be resumed after.
2020-06-15 12:04:28 +02:00
Andrew Brown
01d34e71b9 Add x86 legalization for fcvt_from_uint.f32x4
This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
2020-06-12 15:06:22 -07:00
Andrew Brown
23ed48f269 Add AVX512F flag 2020-06-12 15:06:22 -07:00
Andrew Brown
772ce73f7f Add x86_pblendw instruction
This instruction is necessary for lowering `fcvt_from_uint`.
2020-06-12 15:06:22 -07:00
Andrew Brown
546fc9ddf1 Add x86_vcvtudq2ps instruction
This instruction converts i32x4 to f32x4 in several AVX512 feature sets.
2020-06-12 15:06:22 -07:00
Dan Gohman
caa87048ab Wasmtime 0.18.0 and Cranelift 0.65.0. 2020-06-11 17:49:56 -07:00
whitequark
3796164642 x86_32: legalize br{z,nz}.i64. 2020-06-08 12:52:13 -07:00
Andrew Brown
5db384cd76 Rename opcode: PMULLQ to VPMULLQ 2020-06-03 16:27:57 -07:00
Andrew Brown
1ea09088be Add x86 legalization for imul.i64x2 for non-AVX CPUs
The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.
2020-06-03 16:27:57 -07:00
Andrew Brown
b3a6985cd5 Re-organize transform groups for x86 legalization 2020-06-03 16:27:57 -07:00
Andrew Brown
5a32500518 Remove non-existent x86 encoding for sshr_imm.i64x2
This instruction does not exist in the SSE2 feature set; it can be added later with an VEX/EVEX encoding.
2020-06-03 16:27:57 -07:00
Andrew Brown
df171f01b5 Add x86_pmuludq
This instruction multiplies the lower 32 bits of two 64x2 unsigned integers into an i64x2; this is necessary for lowering Wasm's i64x2.mul.
2020-06-03 16:27:57 -07:00
Andrew Brown
9ba9fd0f64 Add x86-specific instruction for i64x2 multiplication
Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.
2020-06-03 16:27:57 -07:00
Dan Gohman
a76639c6fb Wasmtime 0.17.0 and Cranelift 0.64.0. (#1805) 2020-06-02 18:51:59 -07:00
Andrew Brown
0dd77d36f8 Rename BinaryImm format to BinaryImm64 2020-05-29 19:56:27 -07:00
Andrew Brown
a27a079d65 Replace ExtractLane format with BinaryImm8
Like https://github.com/bytecodealliance/wasmtime/pull/1762, this change the name of the `ExtractLane` format to the more-general `BinaryImm8` and renames its immediate argument from `lane` to `imm`.
2020-05-29 19:56:27 -07:00
Andrew Brown
7d6e94b952 Replace InsertLane format with TernaryImm8
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
2020-05-29 19:56:27 -07:00
teapotd
e430984ac4 Improve bitselect codegen with knowledge of operand origin (#1783)
* Encode vselect using BLEND instructions on x86

* Legalize vselect to bitselect

* Optimize bitselect to vselect for some operands

* Add run tests for bitselect-vselect optimization

* Address review feedback
2020-05-29 19:53:11 -07:00
whitequark
a180b5b393 x86_32: fix stack_addr encoding.
Consider this testcase:

    target i686
    function u0:0() -> i32 system_v {
        ss0 = explicit_slot 0
    block0:
        v2 = stack_addr.i32 ss0
        return v2
    }

Before this commit, in 32-bit mode the x86 backend would generate
incorrect code for stack addresses:

     0:   55                      push    ebp
     1:   89 e5                   mov     ebp, esp
     3:   83 ec 08                sub     esp, 8
     6:   8d 44 24 00             lea     eax, [esp]
     a:   00 00                   add     byte ptr [eax], al
     c:   00 83 c4 08 5d c3       add     byte ptr [ebx - 0x3ca2f73c], al

This happened because the ModRM byte indicated a disp8 encoding, but
the instruction actually used a disp32 encoding. After this commit,
correct code is generated:

     0:   55                      push    ebp
     1:   89 e5                   mov     ebp, esp
     3:   83 ec 08                sub     esp, 8
     6:   8d 84 24 00 00 00 00    lea     eax, [esp]
     d:   83 c4 08                add     esp, 8
    10:   5d                      pop     ebp
    11:   c3                      ret
2020-05-29 09:17:36 -07:00
whitequark
880e692fd4 x86: add encoding for bnot.b1.
Fixes #1743.

Co-authored-by: iximeow <git@iximeow.net>
2020-05-28 08:43:25 -07:00
Andrew Brown
c92917de15 Fix typo in sadd_sat instruction definition 2020-05-26 09:55:26 +02:00
Nick Fitzgerald
01f46d0238 Merge pull request #1692 from fitzgen/update-to-wasmparser-0.55.0
Update to using `wasmparser` 0.55.0
2020-05-14 14:00:24 -07:00
Nick Fitzgerald
1a4f3fb2df Update deps and tests for anyref --> externref
* Update to using `wasmparser` 0.55.0
* Update wasmprinter to 0.2.5
* Update `wat` to 1.0.18, and `wast` to 17.0.0
2020-05-14 12:47:37 -07:00
Nick Fitzgerald
fb7a690efc Merge pull request #1687 from fitzgen/sign-extend-immediates
cranelift: Sign extend `Imm64` immediates
2020-05-14 10:09:53 -07:00
Benjamin Bouvier
5987cf5cda machinst: add a linear-scan checked variant too; 2020-05-13 10:56:32 +02:00
Nick Fitzgerald
9b867b09c7 cranelift: Sign extend Imm64 immediates
When an instruction has an `Imm64` immediate, but operates on values of a
narrower width, we need to sign extend the value.

Fixes #1095
2020-05-12 15:44:48 -07:00
whitequark
4ec16fa057 Legalize 64 bit shifts on x86_32 using PSLLQ/PSRLQ.
Co-authored-by: iximeow <git@iximeow.net>
2020-05-09 03:28:19 -07:00
whitequark
162fcd3d75 Legalize [su]extend.i64 to iconst/sshr_imm + iconcat.
This was already done for [su]extend.i128, and is necessary for
codegen for 32-bit x86.
2020-05-05 16:08:58 -07:00
whitequark
14bdaf3ce3 Legalize ireduce.iN.i2N to isplit. 2020-05-05 14:13:30 -07:00
Andrew Brown
cd49ed9582 Add x86 legalization for sshr.i64x2 2020-05-05 12:01:46 -07:00
Benjamin Bouvier
fa54422854 Add a work-in-progress backend for x86_64 using the new instruction selection;
Most of the work is credited to Julian Seward.

Co-authored-by: Julian Seward <jseward@acm.org>
Co-authored-by: Chris Fallin <cfallin@mozilla.com>
2020-05-05 16:35:41 +02:00
Andrew Brown
a312506262 Add x86 complex encodings for SIMD load-extend instructions 2020-04-30 11:38:01 -07:00
Andrew Brown
2048d3d30c Add x86 encodings for same-size bint conversions up to 64 bits 2020-04-30 11:21:00 -07:00
Benjamin Bouvier
a2b6c19861 Fix arm32 build: ensure that the expand group is always generated; 2020-04-30 13:50:45 +02:00
Dan Gohman
864cf98c8d Update release notes, wasmtime 0.16, cranelift 0.63. 2020-04-29 17:30:25 -07:00
Andrew Brown
d24f23285b Legalize i8x16.sshr using pack/unpack instructions
Due to arithmetic shift behavior, this legalization cannot easily use the masks for i8x16.ushr or i8x16.ishl
2020-04-23 10:55:54 -07:00
Andrew Brown
fb6e8f784d Add x86 pack instructions 2020-04-23 10:55:54 -07:00
Andrew Brown
341dc45cea Add DerivedFunction for splitting lane widths and doubling the number of lanes
Certain operations (e.g. x86_packss) will have operands with types like `NxM` but will return results with types like `(N/2)x(M*2)` (halve the lane width, double the number of lanes; maintain the same number of vector bits). This is equivalent to applying two `DerivedFunction`s to the type: `DerivedFunction::HalfWidth` then `DerivedFunction::DoubleVector`. Since there is no easy way to apply multiple `DerivedFunction`s (e.g. most of the logic is one-level deep, 1d5a678124/cranelift/codegen/meta/src/gen_inst.rs (L618-L621)), I added `DerivedFunction::SplitLanes` to do the necessary type conversion.
2020-04-23 10:55:54 -07:00
Andrew Brown
f5fc09f64a Add x86 unpack instructions 2020-04-23 10:55:54 -07:00
Andrew Brown
5f0286696c Add x86 implentation of 8x16 ishl
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
2020-04-23 10:55:54 -07:00
Benjamin Bouvier
65ef26b989 Add a setting to choose a register allocator algorithm to use with MachBackend; 2020-04-22 14:47:18 +02:00
Andrew Brown
3f47291f2e Add x86 implentation of 8x16 ushr
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
2020-04-17 11:59:47 -07:00
Andrew Brown
65856987cd Add const_addr instruction
This new instruction calculates the effective address of a constant in the constant pool using LEA (x86).
2020-04-17 11:59:47 -07:00