Commit Graph

210 Commits

Author SHA1 Message Date
Chris Fallin
71768bb6cf Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs.
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.

In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).

Fixes #2254.
2020-10-06 14:44:02 -07:00
Johnnie Birch
5799fd3cc0 Add file test simd-arithmetic-run to x64 backend
Copies over simd-arithmetic-run from the old backend, adding
several run tests including for min/max. Tests not supported
are commented out.
2020-10-02 16:20:10 -07:00
Chris Fallin
b2f52910fb Merge pull request #2224 from jgouly/sp_adjust
arm64: Use SignedOffset rather than PreIndexed addressing mode for ca…
2020-10-02 09:18:00 -07:00
Andrew Brown
16a2538ecd [machinst x64]: rename Inst::XmmUninitializedValue and document
This approach is not the best but avoids an extra instruction; perhaps at some point, as mentioned in https://github.com/bytecodealliance/wasmtime/pull/2248, we will add the extra instruction or refactor things in such a way that this `Inst` variant is unnecessary.
2020-10-02 08:29:31 -07:00
Andrew Brown
3d9f3bf728 [machinst x64]: port CLIF tests related to comparison and lane operations 2020-10-02 08:29:31 -07:00
Joey Gouly
eec60c9b06 arm64: Use SignedOffset rather than PreIndexed addressing mode for callee-saved registers
This also passes `fixed_frame_storage_size` (previously `total_sp_adjust`)
into `gen_clobber_save` so that it can be combined with other stack
adjustments.

Copyright (c) 2020, Arm Limited.
2020-10-02 16:22:55 +01:00
Anton Kirilov
d18de69e5a AArch64: Add test cases for callee-saved SIMD & FP registers
Copyright (c) 2020, Arm Limited.
2020-09-30 14:19:02 +01:00
Andrew Brown
b43f4a464a refactor: move all 'filetests/vcode' tests to 'filetests/isa' 2020-09-29 09:27:39 -07:00
Andrew Brown
452d854855 [machinst x64]: demonstrate that packed register moves are elided 2020-09-29 08:48:37 -07:00
Andrew Brown
b7217d454f [machinst x64]: add lane-related CLIF filetests 2020-09-29 08:45:12 -07:00
Benjamin Bouvier
e2c286deeb machinst x64: enable clif testing
This adds a new feature experimental_x64 for CLIF tests.

A test is run in the new x64 backend iff:

- either the test doesn't have an x86_64 target requirement, signaling
it must be target agnostic or not run on this target.
- or the test does require the x86_64 target, and the test is marked
with the `experimental_x64` feature.

This required one workaround in the parser. The reason is that the
parser will try to use information not provided by the TargetIsa adapter
for the Mach backends, like register names. In particular, parsing test
may fail before the test runner realizes that the test must not be run.
In this case, we early return an almost-empty TestFile from the parser,
under the same conditions as above, so that the caller may filter out
the test properly.

This also copies two tests from the test suite using the new backend,
for demonstration purposes.
2020-09-25 11:12:21 +02:00
bjorn3
5c5a30f76c Fix review comments 2020-07-17 12:03:17 +02:00
bjorn3
7b7b1f4997 Rename sarg__ to sarg_t 2020-07-17 12:03:17 +02:00
bjorn3
4431ac1108 Implement SystemV struct argument passing 2020-07-17 12:03:17 +02:00
Andrew Brown
f0b083c6ad Legalize [u|s]widen_high for x86
Use `x86_palignr` and `[u|s]widen_low` for legalizing this instruction.
2020-07-15 11:32:08 -07:00
Andrew Brown
c8ddf8a34c Encode [u|s]widen_low for x86 2020-07-15 11:32:08 -07:00
Andrew Brown
fafef7db77 Add x86_palignr instructions
This instruction is necessary for implementing `[s|u]widen_high`.
2020-07-15 11:32:08 -07:00
Andrew Brown
c5a69cee9f Add x86 legalization for fcvt_to_uint_sat.i32x4
This converts an `f32x4` into an `i32x4` (unsigned) with rounding by using a long sequence of SSE4.1 compatible instructions.
2020-07-08 10:20:01 -07:00
Peter Huene
3a33749404 Remove 'set frame pointer' unwind code from Windows x64 unwind.
This commit removes the "set frame pointer" unwind code and frame
pointer information from Windows x64 unwind information.

In Windows x64 unwind information, a "frame pointer" is actually the
*base address* of the static part of the local frame and would be at some
negative offset to RSP upon establishing the frame pointer.

Currently Cranelift uses a "traditional" notion of a frame pointer, one
that is the highest address in the local frame (i.e. pointing at the
previous frame pointer on the stack).

Windows x64 unwind doesn't describe such frame pointers and only needs
one described if the frame contains a dynamic stack allocation.

Fixes #1967.
2020-07-06 14:22:57 -07:00
Andrew Brown
057c93b64e Add unarrow instruction with x86 implementation
Adds a shared `unarrow` instruction in order to lower the Wasm SIMD specification's unsigned narrowing (see https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing). Additionally, this commit implements the instruction for x86 using PACKUSWB and PACKUSDW for the applicable encodings.
2020-07-02 09:35:45 -07:00
Andrew Brown
65e6de2344 Replace x86_packss with snarrow
Since the Wasm specification contains narrowing instructions (see https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing) that lower to PACKSS*, the x86-specific instruction is not necessary in the CLIF IR.
2020-07-02 09:35:45 -07:00
Chris Fallin
a351fa52b5 Merge pull request #1930 from cfallin/spectre-heap
Spectre mitigation on heap access overflow checks.
2020-07-01 09:23:04 -07:00
Chris Fallin
e694fb1312 Spectre mitigation on heap access overflow checks.
This PR adds a conditional move following a heap bounds check through
which the address to be accessed flows. This conditional move ensures
that even if the branch is mispredicted (access is actually out of
bounds, but speculation goes down in-bounds path), the acually accessed
address is zero (a NULL pointer) rather than the out-of-bounds address.

The mitigation is controlled by a flag that is off by default, but can
be set by the embedding. Note that in order to turn it on by default,
we would need to add conditional-move support to the current x86
backend; this does not appear to be present. Once the deprecated
backend is removed in favor of the new backend, IMHO we should turn
this flag on by default.

Note that the mitigation is unneccessary when we use the "huge heap"
technique on 64-bit systems, in which we allocate a range of virtual
address space such that no 32-bit offset can reach other data. Hence,
this only affects small-heap configurations.
2020-07-01 08:36:09 -07:00
Andrew Brown
737cf1d605 Implement iabs for x86 SIMD
This only covers the types necessary for implementing the Wasm SIMD spec--`i8x16`, `i16x8`, `i32x4`.
2020-06-30 14:00:17 -07:00
Andrew Brown
c9d573d841 Provide spec-compliant legalization for SIMD floating point min/max 2020-06-25 14:48:16 -07:00
Andrew Brown
3675f95bb2 Legalize fcvt_to_sint_sat.i32x4 on x86
Use a lengthy sequence involving CVTTPS2DQ to quiet NaNs and saturate overflow.
2020-06-18 11:39:38 -07:00
Andrew Brown
01d34e71b9 Add x86 legalization for fcvt_from_uint.f32x4
This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
2020-06-12 15:06:22 -07:00
Andrew Brown
772ce73f7f Add x86_pblendw instruction
This instruction is necessary for lowering `fcvt_from_uint`.
2020-06-12 15:06:22 -07:00
Andrew Brown
546fc9ddf1 Add x86_vcvtudq2ps instruction
This instruction converts i32x4 to f32x4 in several AVX512 feature sets.
2020-06-12 15:06:22 -07:00
whitequark
3796164642 x86_32: legalize br{z,nz}.i64. 2020-06-08 12:52:13 -07:00
Andrew Brown
1ea09088be Add x86 legalization for imul.i64x2 for non-AVX CPUs
The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.
2020-06-03 16:27:57 -07:00
Andrew Brown
5a32500518 Remove non-existent x86 encoding for sshr_imm.i64x2
This instruction does not exist in the SSE2 feature set; it can be added later with an VEX/EVEX encoding.
2020-06-03 16:27:57 -07:00
Andrew Brown
df171f01b5 Add x86_pmuludq
This instruction multiplies the lower 32 bits of two 64x2 unsigned integers into an i64x2; this is necessary for lowering Wasm's i64x2.mul.
2020-06-03 16:27:57 -07:00
Andrew Brown
9ba9fd0f64 Add x86-specific instruction for i64x2 multiplication
Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.
2020-06-03 16:27:57 -07:00
Nick Fitzgerald
7c68a10ed6 Merge pull request #1670 from teapotd/win64-pass-by-ref
Implement passing arguments by ref for win64 ABI
2020-06-01 11:13:30 -07:00
Andrew Brown
7d6e94b952 Replace InsertLane format with TernaryImm8
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
2020-05-29 19:56:27 -07:00
teapotd
e430984ac4 Improve bitselect codegen with knowledge of operand origin (#1783)
* Encode vselect using BLEND instructions on x86

* Legalize vselect to bitselect

* Optimize bitselect to vselect for some operands

* Add run tests for bitselect-vselect optimization

* Address review feedback
2020-05-29 19:53:11 -07:00
teapotd
759cc3e751 Implement passing arguments by ref for win64 ABI 2020-05-29 20:12:41 +02:00
Nick Fitzgerald
94380bf2b7 Merge pull request #1510 from teapotd/abi-i128-fix
Always check if struct-return parameter is needed
2020-05-29 10:02:16 -07:00
whitequark
a180b5b393 x86_32: fix stack_addr encoding.
Consider this testcase:

    target i686
    function u0:0() -> i32 system_v {
        ss0 = explicit_slot 0
    block0:
        v2 = stack_addr.i32 ss0
        return v2
    }

Before this commit, in 32-bit mode the x86 backend would generate
incorrect code for stack addresses:

     0:   55                      push    ebp
     1:   89 e5                   mov     ebp, esp
     3:   83 ec 08                sub     esp, 8
     6:   8d 44 24 00             lea     eax, [esp]
     a:   00 00                   add     byte ptr [eax], al
     c:   00 83 c4 08 5d c3       add     byte ptr [ebx - 0x3ca2f73c], al

This happened because the ModRM byte indicated a disp8 encoding, but
the instruction actually used a disp32 encoding. After this commit,
correct code is generated:

     0:   55                      push    ebp
     1:   89 e5                   mov     ebp, esp
     3:   83 ec 08                sub     esp, 8
     6:   8d 84 24 00 00 00 00    lea     eax, [esp]
     d:   83 c4 08                add     esp, 8
    10:   5d                      pop     ebp
    11:   c3                      ret
2020-05-29 09:17:36 -07:00
whitequark
880e692fd4 x86: add encoding for bnot.b1.
Fixes #1743.

Co-authored-by: iximeow <git@iximeow.net>
2020-05-28 08:43:25 -07:00
teapotd
fbac2e53f9 Make vconst BxN match specification 2020-05-27 09:37:13 -07:00
teapotd
b18846057f Add system_v legalizer tests for i128 args 2020-05-25 20:03:24 +02:00
teapotd
0f55bb4b8d Always check if struct-return parameter is needed 2020-05-25 20:03:24 +02:00
Peter Huene
78c3091e84 Fix FPR saving and shadow space allocation for Windows x64.
This commit fixes both how FPR callee-saved registers are saved and how the
shadow space allocation occurs when laying out the stack for Windows x64
calling convention.

Importantly, this commit removes the compiler limitation of stack size for
Windows x64 that was imposed because FPR saves previously couldn't always be
represented in the unwind information.

The FPR saves are now performed without using stack slots, much like how the
callee-saved GPRs are saved. The total CSR space is given to `layout_stack` so
that it is included in the frame size and to offset the layout of spills and
explicit slots.

The FPR saves are now done via an RSP offset (post adjustment) and they always
follow the GPR saves on the stack. A simpler calculation can now be made to
determine the proper offsets of the FPR saves for representing the unwind
information.

Additionally, the shadow space is no longer treated as an incoming argument,
but an explicit stack slot that gets laid out at the lowest address possible in
the local frame. This prevents `layout_stack` from putting a spill or explicit
slot in this reserved space. In the future, `layout_stack` should take
advantage of the *caller-provided* shadow space for spills, but this commit does
not attempt to address that.

The shadow space is now omitted from the local frame for leaf functions.

Fixes #1728.
Fixes #1587.
Fixes #1475.
2020-05-20 15:37:30 -07:00
Nick Fitzgerald
52c6ece5f3 peepmatic: Make peepmatic optional to enable
Rather than outright replacing parts of our existing peephole optimizations
passes, this makes peepmatic an optional cargo feature that can be enabled. This
allows us to take a conservative approach with enabling peepmatic everywhere,
while also allowing us to get it in-tree and make it easier to collaborate on
improving it quickly.
2020-05-14 07:52:23 -07:00
Nick Fitzgerald
090d1c2d32 cranelift: Port most of simple_preopt.rs over to the peepmatic DSL
This ports all of the identity, no-op, simplification, and canonicalization
related optimizations over from being hand-coded to the `peepmatic` DSL. This
does not handle the branch-to-branch optimizations or most of the
divide-by-constant optimizations.
2020-05-14 07:52:23 -07:00
whitequark
4ec16fa057 Legalize 64 bit shifts on x86_32 using PSLLQ/PSRLQ.
Co-authored-by: iximeow <git@iximeow.net>
2020-05-09 03:28:19 -07:00
whitequark
162fcd3d75 Legalize [su]extend.i64 to iconst/sshr_imm + iconcat.
This was already done for [su]extend.i128, and is necessary for
codegen for 32-bit x86.
2020-05-05 16:08:58 -07:00
whitequark
14bdaf3ce3 Legalize ireduce.iN.i2N to isplit. 2020-05-05 14:13:30 -07:00