wasmtime

Author	SHA1	Message	Date
Anton Kirilov	207779fe1d	Cranelift AArch64: Improve code generation for vector constants In particular, introduce initial support for the MOVI and MVNI instructions, with 8-bit elements. Also, treat vector constants as 32- or 64-bit floating-point numbers, if their value allows it, by relying on the architectural zero extension. Finally, stop generating literal loads for 32-bit constants. Copyright (c) 2020, Arm Limited.	2020-10-30 13:16:12 +00:00
Chris Fallin	71768bb6cf	Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs. This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Fixes #2254.	2020-10-06 14:44:02 -07:00
Julian Seward	25e31739a6	Implement Wasm Atomics for Cranelift/newBE/aarch64. The implementation is pretty straightforward. Wasm atomic instructions fall into 5 groups * atomic read-modify-write * atomic compare-and-swap * atomic loads * atomic stores * fences and the implementation mirrors that structure, at both the CLIF and AArch64 levels. At the CLIF level, there are five new instructions, one for each group. Some comments about these: * for those that take addresses (all except fences), the address is contained entirely in a single `Value`; there is no offset field as there is with normal loads and stores. Wasm atomics require alignment checks, and removing the offset makes implementation of those checks a bit simpler. * atomic loads and stores get their own instructions, rather than reusing the existing load and store instructions, for two reasons: - per above comment, makes alignment checking simpler - reuse of existing loads and stores would require extension of `MemFlags` to indicate atomicity, which sounds semantically unclean. For example, then any instruction carrying `MemFlags` could be marked as atomic, even in cases where it is meaningless or ambiguous. * I tried to specify, in comments, the behaviour of these instructions as tightly as I could. Unfortunately there is no way (per my limited CLIF knowledge) to enforce the constraint that they may only be used on I8, I16, I32 and I64 types, and in particular not on floating point or vector types. The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable. At the AArch64 level, there are also five new instructions, one for each group. All of them except `::Fence` contain multiple real machine instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual load-linked store-conditional loops, guarded at both ends by memory fences. Atomic loads and stores are emitted as a load preceded by a fence, and a store followed by a fence, respectively. The amount of fencing may be overkill, but it reflects exactly what the SM Wasm baseline compiler for AArch64 does. One reason to implement r-m-w and c-a-s as a single insn which is expanded only at emission time is that we must be very careful what instructions we allow in between the load-linked and store-conditional. In particular, we cannot allow any extra memory transactions in there, since -- particularly on low-end hardware -- that might cause the transaction to fail, hence deadlocking the generated code. That implies that we can't present the LL/SC loop to the register allocator as its constituent instructions, since it might insert spills anywhere. Hence we must present it as a single indivisible unit, as we do here. It also has the benefit of reducing the total amount of work the RA has to do. The only other notable feature of the r-m-w and c-a-s translations into AArch64 code, is that they both need a scratch register internally. Rather than faking one up by claiming, in `get_regs` that it modifies an extra scratch register, and having to have a dummy initialisation of it, these new instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so as to be call-clobbered, hence their use is less likely to interfere with long live ranges that span calls. One subtlety regarding the use of completely fixed input and output registers is that we must be careful how the surrounding copy from/to of the arg/result registers is done. In particular, it is not safe to simply emit copies in some arbitrary order if one of the arg registers is a real reg. For that reason, the arguments are first moved into virtual regs if they are not already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`. Again, we rely on coalescing to turn them into no-ops in the common case. There is also a ridealong fix for the AArch64 lowering case for `Opcode::Trapif \| Opcode::Trapff`, which removes a bug in which two trap insns in a row were generated. In the patch as submitted there are 6 "FIXME JRS" comments, which mark things which I believe to be correct, but for which I would appreciate a second opinion. Unless otherwise directed, I will remove them for the final commit but leave the associated code/comments unchanged.	2020-08-04 09:35:50 +02:00
Anton Kirilov	79dfac5514	Refactor the InstSize enum in the AArch64 backend The main issue with the InstSize enum was that it was used both for GPR and SIMD & FP operands, even though machine instructions do not mix them in general (as in a destination register is either a GPR or not). As a result it had methods such as sf_bit() that made sense only for one type of operand. Another issue was that the enum name was not reflecting its purpose accurately - it was meant to represent an instruction operand size, not an instruction size, which is fixed in A64 (always 4 bytes). Now the enum is split into one for GPR operands and another for scalar SIMD & FP operands. Copyright (c) 2020, Arm Limited.	2020-07-14 15:04:35 +01:00
Joey Gouly	0f462330e0	arm64: Implement AllTrue and AnyTrue This enables the simd_boolean WASM SIMD spec test. Copyright (c) 2020, Arm Limited.	2020-06-17 15:40:51 +01:00
Chris Fallin	ed7e410111	Bugfix: scaled addressing mode: round B1 up to one byte. Issue uncovered by Ben Bouvier during regalloc work.	2020-06-11 10:27:32 -07:00
Chris Fallin	fe97659813	Address review comments.	2020-06-03 13:31:34 -07:00
Joey Gouly	02c3f238f8	arm64: Use FPU instrctions for Fcopysign Copyright (c) 2020, Arm Limited.	2020-05-21 18:14:12 +01:00
Chris Fallin	a66724aafd	Rework aarch64 stack frame implementation. This PR changes the aarch64 ABI implementation to use positive offsets from SP, rather than negative offsets from FP, to refer to spill slots and stack-local storage. This allows for better addressing-mode options, and hence slightly better code: e.g., the unsigned scaled 12-bit offset mode can be used to reach anywhere in a 32KB frame without extra address-construction instructions, whereas negative offsets are limited to a signed 9-bit unscaled mode (-256 bytes). To enable this, the PR introduces a notion of "nominal SP offsets" as a virtual addressing mode, lowered during the emission pass. The offsets are relative to "SP after adjusting downward to allocate stack/spill slots", but before pushing clobbers. This allows the addressing-mode expressions to be generated before register allocation (or during it, for spill/reload sequences). To convert these offsets into true offsets from SP, we need to track how much further SP is moved downward, and compensate for this. We do so with "virtual SP offset adjustment" pseudo-instructions: these are seen by the emission pass, and result in no instruction (0 byte output), but update state that is now threaded through each instruction emission in turn. In this way, we can push e.g. stack args for a call and adjust the virtual SP offset, allowing reloads from nominal-SP-relative spillslots while we do the argument setup with "real SP offsets" at the same time.	2020-05-06 09:23:55 -07:00
Chris Fallin	8f462db645	Fix ImmLogic.invert(), and with it, `fcopysign` and `float_misc` test. Previously, `fcopysign` was mysteriously failing to pass the `float_misc` spec test. This was tracked down to bad logical-immediate masks used to separate the sign and not-sign bits. In particular, the masks for the and-not operations were wrong. The `invert()` function on an `ImmLogic` immediate, it turns out, assumed every immediate would be used by a 64-bit instruction; `ImmLogic` immediates are subtly different for 32-bit instructions. This change tracks the instruction size (32 or 64 bits) intended for use with each such immediate, and passes it back into `maybe_from_u64` when computing the inverted immediate. Addresses several of the failures (`float_misc`, `f32_bitwise`) for #1521 (test failures) and presumably helps #1519 (SpiderMonkey integration).	2020-04-23 13:06:13 -07:00
Joey Gouly	3638f8a764	arm64: Add support for CCmp Also add a test for SUBS/ADDS with XZR, as CMP/CMN are aliases. Copyright (c) 2020, Arm Limited.	2020-04-21 12:19:07 +02:00
Chris Fallin	48cf2c2f50	Address review comments: - Undo temporary changes to default features (`all-arch`) and a signal-handler test. - Remove `SIGTRAP` handler: no longer needed now that we've found an "undefined opcode" option on ARM64. - Rename pp.rs to pretty_print.rs in machinst/. - Only use empty stack-probe on non-x86. As per a comment in rust-lang/compiler-builtins [1], LLVM only supports stack probes on x86 and x86-64. Thus, on any other CPU architecture, we cannot refer to `__rust_probestack`, because it does not exist. - Rename arm64 to aarch64. - Use `target` directive in vcode filetests. - Run the flags verifier, but without encinfo, when using new backends. - Clean up warning overrides. - Fix up use of casts: use u32::from(x) and siblings when possible, u32::try_from(x).unwrap() when not, to avoid silent truncation. - Take immutable `Function` borrows as input; we don't actually mutate the input IR. - Lots of other miscellaneous cleanups. [1] `cae3e6ea23/src/probestack.rs (L39)`	2020-04-15 17:21:28 -07:00

12 Commits