wasmtime

Author	SHA1	Message	Date
Julian Seward	c15d9bd61b	CL/aarch64: implement the wasm SIMD pseudo-max/min and FP-rounding instructions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions https://github.com/WebAssembly/simd/pull/232 Pseudo-Minimum and Pseudo-Maximum instructions https://github.com/WebAssembly/simd/pull/122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.	2020-10-26 10:37:07 +01:00
Yury Delendik	de4af90af6	machinst x64: New backend unwind (#2266 ) Addresses unwind for experimental x64 backend. The preliminary code enables backtrace on SystemV call convension.	2020-10-23 15:19:41 -05:00
Julian Seward	2702942050	CL/aarch64 back end: implement the wasm SIMD `bitmask` instructions The `bitmask.{8x16,16x8,32x4}` instructions do not map neatly to any single AArch64 SIMD instruction, and instead need a sequence of around ten instructions. Because of this, this patch is somewhat longer and more complex than it would be for (eg) x64. Main changes are: * the relevant testsuite test (`simd_boolean.wast`) has been enabled on aarch64. * at the CLIF level, add a new instruction `vhigh_bits`, into which these wasm instructions are to be translated. * in the wasm->CLIF translation (code_translator.rs), translate into `vhigh_bits`. This is straightforward. * in the CLIF->AArch64 translation (lower_inst.rs), translate `vhigh_bits` into equivalent sequences of AArch64 instructions. There is a different sequence for each of the `{8x16, 16x8, 32x4}` variants. All other changes are AArch64-specific, and add instruction definitions needed by the previous step: * Add two new families of AArch64 instructions: `VecShiftImm` (vector shift by immediate) and `VecExtract` (effectively a double-length vector shift) * To the existing AArch64 family `VecRRR`, add a `zip1` variant. To the `VecLanesOp` family add an `addv` variant. * Add supporting code for the above changes to AArch64 instructions: - getting the register uses (`aarch64_get_regs`) - mapping the registers (`aarch64_map_regs`) - printing instructions - emitting instructions (`impl MachInstEmit for Inst`). The handling of `VecShiftImm` is a bit complex. - emission tests for new instructions and variants.	2020-10-23 05:26:25 +02:00
Anton Kirilov	e0b911a4df	Introduce the Cranelift IR instruction `LoadSplat` It corresponds to WebAssembly's `load*_splat` operations, which were previously represented as a combination of `Load` and `Splat` instructions. However, there are architectures such as Armv8-A that have a single machine instruction equivalent to the Wasm operations. In order to generate it, it is necessary to merge the `Load` and the `Splat` in the backend, which is not possible because the load may have side effects. The new IR instruction works around this limitation. The AArch64 backend leverages the new instruction to improve code generation. Copyright (c) 2020, Arm Limited.	2020-10-14 13:07:13 +01:00
Chris Fallin	71768bb6cf	Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs. This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Fixes #2254.	2020-10-06 14:44:02 -07:00
Anton Kirilov	f612e8e7b2	AArch64: Add various missing SIMD bits In addition, improve the code for stack pointer manipulation. Copyright (c) 2020, Arm Limited.	2020-09-09 13:37:50 +01:00
Chris Fallin	3d6c4d312f	Merge pull request #2187 from akirilov-arm/ALUOp3 AArch64: Introduce an enum for ternary integer operations	2020-09-08 12:57:59 -07:00
Anton Kirilov	e92f949663	AArch64: Introduce an enum for ternary integer operations This commit performs a small cleanup in the AArch64 backend - after the MAdd and MSub variants have been extracted, the ALUOp enum is used purely for binary integer operations. Also, Inst::Mov has been renamed to Inst::Mov64 for consistency. Copyright (c) 2020, Arm Limited.	2020-09-08 13:22:22 +01:00
Joey Gouly	650d48cd84	arm64: Don't always materialise a 64-bit constant This improves the mov/movk/movn sequnce when the high half of the 64-bit value is all zero. Copyright (c) 2020, Arm Limited.	2020-09-01 13:29:01 +01:00
Julian Seward	620e4b4e82	This patch fills in the missing pieces needed to support wasm atomics on newBE/x64. It does this by providing an implementation of the CLIF instructions `AtomicRmw`, `AtomicCas`, `AtomicLoad`, `AtomicStore` and `Fence`. The translation is straightforward. `AtomicCas` is translated into x64 `cmpxchg`, `AtomicLoad` becomes a normal load because x64-TSO provides adequate sequencing, `AtomicStore` becomes a normal store followed by `mfence`, and `Fence` becomes `mfence`. `AtomicRmw` is the only complex case: it becomes a normal load, followed by a loop which computes an updated value, tries to `cmpxchg` it back to memory, and repeats if necessary. This is a minimum-effort initial implementation. `AtomicRmw` could be implemented more efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old value in memory is not required. Subsequent work could add that, if required. The x64 emitter has been updated to emit the new instructions, obviously. The `LegacyPrefix` mechanism has been revised to handle multiple prefix bytes, not just one, since it is now sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock). In the aarch64 implementation of atomics, there has been some minor renaming for the sake of clarity, and for consistency with this x64 implementation.	2020-08-24 11:50:06 +02:00
Anton Kirilov	b895ac0e40	AArch64: Implement SIMD conversions Copyright (c) 2020, Arm Limited.	2020-08-21 18:03:50 +01:00
Joey Gouly	a518c10141	arm64: Implement SIMD i64x2 multiply Copyright (c) 2020, Arm Limited.	2020-08-20 13:26:03 +01:00
Chris Fallin	5cf3fba3da	Refactor AArch64 ABI support to extract common bits for shared impl with x64. We have observed that the ABI implementations for AArch64 and x64 are very similar; in fact, x64's implementation started as a modified copy of AArch64's implementation. This is an artifact of both a similar ABI (both machines pass args and return values in registers first, then the stack, and both machines give considerable freedom with stack-frame layout) and a too-low-level ABI abstraction in the existing design. For machines that fit the mainstream or most common ABI-design idioms, we should be able to do much better. This commit factors AArch64 into machine-specific and machine-independent parts, but does not yet modify x64; that will come next. This should be completely neutral with respect to compile time and generated code performance.	2020-08-14 16:27:39 -07:00
Anton Kirilov	1ec6930005	Enable the spec::simd::simd_lane test for AArch64 Copyright (c) 2020, Arm Limited.	2020-08-06 11:14:15 +01:00
Julian Seward	25e31739a6	Implement Wasm Atomics for Cranelift/newBE/aarch64. The implementation is pretty straightforward. Wasm atomic instructions fall into 5 groups * atomic read-modify-write * atomic compare-and-swap * atomic loads * atomic stores * fences and the implementation mirrors that structure, at both the CLIF and AArch64 levels. At the CLIF level, there are five new instructions, one for each group. Some comments about these: * for those that take addresses (all except fences), the address is contained entirely in a single `Value`; there is no offset field as there is with normal loads and stores. Wasm atomics require alignment checks, and removing the offset makes implementation of those checks a bit simpler. * atomic loads and stores get their own instructions, rather than reusing the existing load and store instructions, for two reasons: - per above comment, makes alignment checking simpler - reuse of existing loads and stores would require extension of `MemFlags` to indicate atomicity, which sounds semantically unclean. For example, then any instruction carrying `MemFlags` could be marked as atomic, even in cases where it is meaningless or ambiguous. * I tried to specify, in comments, the behaviour of these instructions as tightly as I could. Unfortunately there is no way (per my limited CLIF knowledge) to enforce the constraint that they may only be used on I8, I16, I32 and I64 types, and in particular not on floating point or vector types. The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable. At the AArch64 level, there are also five new instructions, one for each group. All of them except `::Fence` contain multiple real machine instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual load-linked store-conditional loops, guarded at both ends by memory fences. Atomic loads and stores are emitted as a load preceded by a fence, and a store followed by a fence, respectively. The amount of fencing may be overkill, but it reflects exactly what the SM Wasm baseline compiler for AArch64 does. One reason to implement r-m-w and c-a-s as a single insn which is expanded only at emission time is that we must be very careful what instructions we allow in between the load-linked and store-conditional. In particular, we cannot allow any extra memory transactions in there, since -- particularly on low-end hardware -- that might cause the transaction to fail, hence deadlocking the generated code. That implies that we can't present the LL/SC loop to the register allocator as its constituent instructions, since it might insert spills anywhere. Hence we must present it as a single indivisible unit, as we do here. It also has the benefit of reducing the total amount of work the RA has to do. The only other notable feature of the r-m-w and c-a-s translations into AArch64 code, is that they both need a scratch register internally. Rather than faking one up by claiming, in `get_regs` that it modifies an extra scratch register, and having to have a dummy initialisation of it, these new instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so as to be call-clobbered, hence their use is less likely to interfere with long live ranges that span calls. One subtlety regarding the use of completely fixed input and output registers is that we must be careful how the surrounding copy from/to of the arg/result registers is done. In particular, it is not safe to simply emit copies in some arbitrary order if one of the arg registers is a real reg. For that reason, the arguments are first moved into virtual regs if they are not already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`. Again, we rely on coalescing to turn them into no-ops in the common case. There is also a ridealong fix for the AArch64 lowering case for `Opcode::Trapif \| Opcode::Trapff`, which removes a bug in which two trap insns in a row were generated. In the patch as submitted there are 6 "FIXME JRS" comments, which mark things which I believe to be correct, but for which I would appreciate a second opinion. Unless otherwise directed, I will remove them for the final commit but leave the associated code/comments unchanged.	2020-08-04 09:35:50 +02:00
Anton Kirilov	adf25d27c2	AArch64: Implement SIMD floating-point arithmetic Copyright (c) 2020, Arm Limited.	2020-07-28 15:19:47 +01:00
Chris Fallin	44ef8247a9	Merge pull request #2062 from akirilov-arm/extract_lane AArch64: Improve code generation for Extractlane + Sextend / Uextend	2020-07-22 13:35:00 -07:00
Chris Fallin	b8f6d53a6b	Aarch64 codegen: represent bool `true` as -1, not 1. It seems that this is actually the correct behavior for bool types wider than `b1`; some of the vector instruction optimizations depend on bool lanes representing false and true as all-zeroes and all-ones respectively. For `b8`..`b64`, this results in an extra negation after a `cset` when a bool is produced by an `icmp`/`fcmp`, but the most common case (`b1`) is unaffected, because an all-ones one-bit value is just `1`. An example of this assumption can be seen here: `399ee0a54c/cranelift/codegen/src/simple_preopt.rs (L956)` Thanks to Joey Gouly of ARM for noting this issue while implementing SIMD support, and digging into the source (finding the above example) to determine the correct behavior.	2020-07-22 12:30:55 -07:00
Anton Kirilov	420c4f06b8	AArch64: Improve code generation for Extractlane + Sextend / Uextend Copyright (c) 2020, Arm Limited.	2020-07-22 11:47:51 +01:00
Joey Gouly	40473dffed	arm64: Enable arith2 tests Copyright (c) 2020, Arm Limited.	2020-07-17 15:58:16 +01:00
Benjamin Bouvier	ead8a835c4	machinst x64: add more FP support	2020-07-17 15:56:44 +02:00
Anton Kirilov	95b0b05af2	AArch64: Introduce an enum to specify vector instruction operand sizes Copyright (c) 2020, Arm Limited.	2020-07-14 21:37:44 +01:00
Anton Kirilov	400639245c	AArch64: Remove show_freg_sized() It provides the same functionality as show_vreg_scalar(). Copyright (c) 2020, Arm Limited.	2020-07-14 11:27:46 -07:00
Chris Fallin	4ba3ee3368	Merge pull request #2016 from jgouly/saturating-math arm64: Implement saturating SIMD arithmetic	2020-07-14 11:24:10 -07:00
Joey Gouly	aa84a4173c	arm64: Implement saturating SIMD arithmetic Copyright (c) 2020, Arm Limited.	2020-07-14 18:19:11 +01:00
Chris Fallin	26529006e0	Address review comments.	2020-07-14 10:17:29 -07:00
Chris Fallin	08353fcc14	Reftypes part two: add support for stackmaps. This commit adds support for generating stackmaps at safepoints to the new backend framework and to the AArch64 backend in particular. It has been tested to work with SpiderMonkey.	2020-07-14 10:17:27 -07:00
Chris Fallin	b93e8c296d	Initial reftype support in aarch64, modulo safepoints. This commit adds the inital support to allow reftypes to flow through the program when targetting aarch64. It also adds a fix to the `ModuleTranslationState` needed to send R32/R64 types over from the SpiderMonkey embedding. This commit does not include any support for safepoints in aarch64 or the `MachInst` infrastructure; that is in the next commit. This commit also makes a drive-by improvement to `Bint`, avoiding an unneeded zero-extension op when the extended value comes directly from a conditional-set (which produces a full-width 0 or 1).	2020-07-14 10:14:18 -07:00
Anton Kirilov	79dfac5514	Refactor the InstSize enum in the AArch64 backend The main issue with the InstSize enum was that it was used both for GPR and SIMD & FP operands, even though machine instructions do not mix them in general (as in a destination register is either a GPR or not). As a result it had methods such as sf_bit() that made sense only for one type of operand. Another issue was that the enum name was not reflecting its purpose accurately - it was meant to represent an instruction operand size, not an instruction size, which is fixed in A64 (always 4 bytes). Now the enum is split into one for GPR operands and another for scalar SIMD & FP operands. Copyright (c) 2020, Arm Limited.	2020-07-14 15:04:35 +01:00
Chris Fallin	f4f5eed254	Merge pull request #1977 from jgouly/simd_shift arm64: Implement SIMD shift instructions	2020-07-13 09:24:19 -07:00
Joey Gouly	3a67d25ed6	arm64: Implement SIMD shift instructions Copyright (c) 2020, Arm Limited.	2020-07-03 16:47:43 +01:00
Benjamin Bouvier	2115e70acb	machinst x64: implement enough to support branch tables;	2020-07-03 14:33:52 +02:00
Chris Fallin	b700646c93	Merge pull request #1962 from cfallin/aarch64-lowering-condbr AArch64: avoid branches with explicit offsets at lowering stage.	2020-07-02 14:05:40 -07:00
Chris Fallin	b7ecad1d74	AArch64: avoid branches with explicit offsets at lowering stage. In discussions with @bnjbvr, it came up that generating `OneWayCondBr`s with explicit, hardcoded PC-offsets as part of lowered instruction sequences is actually unsafe, because the register allocator might insert a spill or reload into the middle of our sequence. We were careful about this in some cases but somehow missed that it was a general restriction. Conceptually, all inter-instruction references should be via labels at the VCode level; explicit offsets are only ever known at emission time, and resolved by the `MachBuffer`. To allow for conditional trap checks without modifying the CFG (as seen by regalloc) during lowering, this PR instead adds a `TrapIf` pseudo-instruction that conditionally skips a single embedded trap instruction. It lowers to the same `condbr label ; trap ; label: ...` sequence, but without the hardcoded branch-target offset in the lowering code.	2020-07-02 11:02:27 -07:00
Joey Gouly	62e7b7f838	arm64: Implement basic SIMD arithmetic Copyright (c) 2020, Arm Limited.	2020-07-02 13:17:33 +01:00
Benjamin Bouvier	c9a3f05afd	machinst x64: implement calls and int cmp/store/loads; This makes it possible to run a simple recursive fibonacci function in wasmtime.	2020-06-25 16:20:33 +02:00
Anton Kirilov	90bafae1dc	AArch64: Implement SIMD floating-point comparisons Copyright (c) 2020, Arm Limited.	2020-06-18 11:07:52 +01:00
Joey Gouly	0f462330e0	arm64: Implement AllTrue and AnyTrue This enables the simd_boolean WASM SIMD spec test. Copyright (c) 2020, Arm Limited.	2020-06-17 15:40:51 +01:00
Chris Fallin	6286ca7310	AArch64: make use of reg-reg-extend amode. When a load/store instruction needs an address of the form `v0 + uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we currently generate a separate zero/sign-extend operation and then use a plain `[rA, rB]` addressing mode. This patch extends `lower_address()` to look at both addends of an address if it has two addends and a zero offset, recognize extension operations, and incorporate them directly into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve our performence on WebAssembly workloads, at least, because we often see a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.	2020-06-12 10:40:54 -07:00
Joey Gouly	544c5dece5	arm64: Implement SIMD bitwise operations Copyright (c) 2020, Arm Limited.	2020-06-11 10:58:23 -07:00
Anton Kirilov	d941034c2e	Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-10 15:01:37 +01:00
Chris Fallin	ac87ed12bd	Merge pull request #1847 from akirilov-arm/simd_load_extend Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64	2020-06-09 12:29:06 -07:00
Joey Gouly	df2b031b6a	arm64: Implement Icmp for I16X8 and I32X4 Copyright (c) 2020, Arm Limited.	2020-06-09 11:07:43 -07:00
Anton Kirilov	7ac19af498	Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-09 18:05:38 +01:00
Anton Kirilov	51a551fb39	Implement vector element extensions for AArch64 This commit also includes load and extend operations. Both are prerequisites for enabling further SIMD spec tests. Copyright (c) 2020, Arm Limited.	2020-06-09 12:28:49 +01:00
Chris Fallin	fe97659813	Address review comments.	2020-06-03 13:31:34 -07:00
Chris Fallin	9fec933056	Merge pull request #1801 from jgouly/cmp-rebase arm64: add support for I8X16 ICmp	2020-06-02 09:35:41 -07:00
Joey Gouly	90a421193f	arm64: add support for I8X16 ICmp Copyright (c) 2020, Arm Limited.	2020-06-02 16:58:09 +01:00
Benjamin Bouvier	67c7a3ed19	mach backend: reduce the size of the Inst enum down to 32 bytes;	2020-06-02 16:29:05 +02:00
Benjamin Bouvier	e227608510	mach backend: use vectors instead of sets to remember set of uses/defs for calls; This avoids the set uniqueness (hashing) test, reduces memory churn when re-mapping virtual register onto real registers, and is generally more memory-efficient.	2020-06-02 16:29:05 +02:00

1 2 3

124 Commits