Commit Graph

76 Commits

Author SHA1 Message Date
Anton Kirilov
adf25d27c2 AArch64: Implement SIMD floating-point arithmetic
Copyright (c) 2020, Arm Limited.
2020-07-28 15:19:47 +01:00
Benjamin Bouvier
35d9ab19b7 Review fixes; 2020-07-24 19:29:12 +02:00
Chris Fallin
44ef8247a9 Merge pull request #2062 from akirilov-arm/extract_lane
AArch64: Improve code generation for Extractlane + Sextend / Uextend
2020-07-22 13:35:00 -07:00
Chris Fallin
b8f6d53a6b Aarch64 codegen: represent bool true as -1, not 1.
It seems that this is actually the correct behavior for bool types wider
than `b1`; some of the vector instruction optimizations depend on bool
lanes representing false and true as all-zeroes and all-ones
respectively. For `b8`..`b64`, this results in an extra negation after a
`cset` when a bool is produced by an `icmp`/`fcmp`, but the most common
case (`b1`) is unaffected, because an all-ones one-bit value is just
`1`.

An example of this assumption can be seen here:

399ee0a54c/cranelift/codegen/src/simple_preopt.rs (L956)

Thanks to Joey Gouly of ARM for noting this issue while implementing
SIMD support, and digging into the source (finding the above example) to
determine the correct behavior.
2020-07-22 12:30:55 -07:00
Anton Kirilov
420c4f06b8 AArch64: Improve code generation for Extractlane + Sextend / Uextend
Copyright (c) 2020, Arm Limited.
2020-07-22 11:47:51 +01:00
Chris Fallin
784e2f1480 Merge pull request #2038 from jgouly/arith2
arm64: Enable arith2 tests
2020-07-20 09:00:10 -07:00
Chris Fallin
9bd9c628aa Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts.
We had previously fixed a bug in which constant shift amounts should be
masked to modulo the number of bits in the operand; however, we did not
fix the analogous case for shifts incorporated into the second register
argument of ALU instructions that support integrated shifts.  This
failure to mask resulted in illegal instructions being generated, e.g.
in https://bugzilla.mozilla.org/show_bug.cgi?id=1653502. This PR fixes
the issue by masking the amount, as the shift semantics require.
2020-07-17 14:55:23 -07:00
Joey Gouly
40473dffed arm64: Enable arith2 tests
Copyright (c) 2020, Arm Limited.
2020-07-17 15:58:16 +01:00
Benjamin Bouvier
ead8a835c4 machinst x64: add more FP support 2020-07-17 15:56:44 +02:00
Anton Kirilov
95b0b05af2 AArch64: Introduce an enum to specify vector instruction operand sizes
Copyright (c) 2020, Arm Limited.
2020-07-14 21:37:44 +01:00
Anton Kirilov
400639245c AArch64: Remove show_freg_sized()
It provides the same functionality as show_vreg_scalar().

Copyright (c) 2020, Arm Limited.
2020-07-14 11:27:46 -07:00
Chris Fallin
4ba3ee3368 Merge pull request #2016 from jgouly/saturating-math
arm64: Implement saturating SIMD arithmetic
2020-07-14 11:24:10 -07:00
Joey Gouly
aa84a4173c arm64: Implement saturating SIMD arithmetic
Copyright (c) 2020, Arm Limited.
2020-07-14 18:19:11 +01:00
Chris Fallin
26529006e0 Address review comments. 2020-07-14 10:17:29 -07:00
Chris Fallin
08353fcc14 Reftypes part two: add support for stackmaps.
This commit adds support for generating stackmaps at safepoints to the
new backend framework and to the AArch64 backend in particular. It has
been tested to work with SpiderMonkey.
2020-07-14 10:17:27 -07:00
Chris Fallin
b93e8c296d Initial reftype support in aarch64, modulo safepoints.
This commit adds the inital support to allow reftypes to flow through
the program when targetting aarch64. It also adds a fix to the
`ModuleTranslationState` needed to send R32/R64 types over from the
SpiderMonkey embedding.

This commit does not include any support for safepoints in aarch64
or the `MachInst` infrastructure; that is in the next commit.

This commit also makes a drive-by improvement to `Bint`, avoiding an
unneeded zero-extension op when the extended value comes directly from a
conditional-set (which produces a full-width 0 or 1).
2020-07-14 10:14:18 -07:00
Anton Kirilov
79dfac5514 Refactor the InstSize enum in the AArch64 backend
The main issue with the InstSize enum was that it was used both for
GPR and SIMD & FP operands, even though machine instructions do not
mix them in general (as in a destination register is either a GPR
or not). As a result it had methods such as sf_bit() that made
sense only for one type of operand.

Another issue was that the enum name was not reflecting its purpose
accurately - it was meant to represent an instruction operand size,
not an instruction size, which is fixed in A64 (always 4 bytes).

Now the enum is split into one for GPR operands and another for
scalar SIMD & FP operands.

Copyright (c) 2020, Arm Limited.
2020-07-14 15:04:35 +01:00
Chris Fallin
4b6ebc0c27 Merge pull request #1999 from bnjbvr/fix-aarch64-ishl-by-zero
machinst aarch64: fix encoding generation of left-shift by 0
2020-07-13 09:25:21 -07:00
Chris Fallin
f4f5eed254 Merge pull request #1977 from jgouly/simd_shift
arm64: Implement SIMD shift instructions
2020-07-13 09:24:19 -07:00
Benjamin Bouvier
f316bef763 machinst aarch64: fix encoding generation of left-shift by 0;
The ARM book says that the immr field should contain (-count % 64); the
existing code was approximating this with (64 - count), which is not
correct for a zero count.
2020-07-09 15:43:42 +02:00
Joey Gouly
3a67d25ed6 arm64: Implement SIMD shift instructions
Copyright (c) 2020, Arm Limited.
2020-07-03 16:47:43 +01:00
Benjamin Bouvier
9d5be00de4 Address review comments
- put the division in the synthetic instruction as well,
- put the branch table check in the inst's emission code,
- replace OneWayCondJmp by TrapIf vcode instruction,
- add comments describing code generated by the synthetic instructions
2020-07-03 14:33:52 +02:00
Benjamin Bouvier
2115e70acb machinst x64: implement enough to support branch tables; 2020-07-03 14:33:52 +02:00
Chris Fallin
b700646c93 Merge pull request #1962 from cfallin/aarch64-lowering-condbr
AArch64: avoid branches with explicit offsets at lowering stage.
2020-07-02 14:05:40 -07:00
Chris Fallin
b7ecad1d74 AArch64: avoid branches with explicit offsets at lowering stage.
In discussions with @bnjbvr, it came up that generating `OneWayCondBr`s
with explicit, hardcoded PC-offsets as part of lowered instruction
sequences is actually unsafe, because the register allocator *might*
insert a spill or reload into the middle of our sequence. We were
careful about this in some cases but somehow missed that it was a
general restriction. Conceptually, all inter-instruction references
should be via labels at the VCode level; explicit offsets are only ever
known at emission time, and resolved by the `MachBuffer`.

To allow for conditional trap checks without modifying the CFG (as seen
by regalloc) during lowering, this PR instead adds a `TrapIf`
pseudo-instruction that conditionally skips a single embedded trap
instruction. It lowers to the same `condbr label ; trap ; label: ...`
sequence, but without the hardcoded branch-target offset in the lowering
code.
2020-07-02 11:02:27 -07:00
Joey Gouly
62e7b7f838 arm64: Implement basic SIMD arithmetic
Copyright (c) 2020, Arm Limited.
2020-07-02 13:17:33 +01:00
Benjamin Bouvier
c9a3f05afd machinst x64: implement calls and int cmp/store/loads;
This makes it possible to run a simple recursive fibonacci function in
wasmtime.
2020-06-25 16:20:33 +02:00
Anton Kirilov
90bafae1dc AArch64: Implement SIMD floating-point comparisons
Copyright (c) 2020, Arm Limited.
2020-06-18 11:07:52 +01:00
Joey Gouly
0f462330e0 arm64: Implement AllTrue and AnyTrue
This enables the simd_boolean WASM SIMD spec test.

Copyright (c) 2020, Arm Limited.
2020-06-17 15:40:51 +01:00
Chris Fallin
6286ca7310 AArch64: make use of reg-reg-extend amode.
When a load/store instruction needs an address of the form `v0 +
uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we
currently generate a separate zero/sign-extend operation and then use a
plain `[rA, rB]` addressing mode. This patch extends `lower_address()`
to look at both addends of an address if it has two addends and a zero
offset, recognize extension operations, and incorporate them directly
into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve
our performence on WebAssembly workloads, at least, because we often see
a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.
2020-06-12 10:40:54 -07:00
Chris Fallin
b0cccf1d87 Merge pull request #1864 from jgouly/bitwise
arm64: Implement SIMD bitwise operations
2020-06-11 11:38:39 -07:00
Chris Fallin
6ba165be01 Merge pull request #1858 from cfallin/fix-scale-b1
Bugfix: scaled addressing mode: round B1 up to one byte.
2020-06-11 11:16:07 -07:00
Joey Gouly
544c5dece5 arm64: Implement SIMD bitwise operations
Copyright (c) 2020, Arm Limited.
2020-06-11 10:58:23 -07:00
Chris Fallin
ed7e410111 Bugfix: scaled addressing mode: round B1 up to one byte.
Issue uncovered by Ben Bouvier during regalloc work.
2020-06-11 10:27:32 -07:00
Anton Kirilov
d941034c2e Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64
Copyright (c) 2020, Arm Limited.
2020-06-10 15:01:37 +01:00
Chris Fallin
ac87ed12bd Merge pull request #1847 from akirilov-arm/simd_load_extend
Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64
2020-06-09 12:29:06 -07:00
Joey Gouly
df2b031b6a arm64: Implement Icmp for I16X8 and I32X4
Copyright (c) 2020, Arm Limited.
2020-06-09 11:07:43 -07:00
Anton Kirilov
7ac19af498 Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64
Copyright (c) 2020, Arm Limited.
2020-06-09 18:05:38 +01:00
Anton Kirilov
51a551fb39 Implement vector element extensions for AArch64
This commit also includes load and extend operations. Both are
prerequisites for enabling further SIMD spec tests.

Copyright (c) 2020, Arm Limited.
2020-06-09 12:28:49 +01:00
Chris Fallin
fe97659813 Address review comments. 2020-06-03 13:31:34 -07:00
Chris Fallin
9fec933056 Merge pull request #1801 from jgouly/cmp-rebase
arm64: add support for I8X16 ICmp
2020-06-02 09:35:41 -07:00
Joey Gouly
90a421193f arm64: add support for I8X16 ICmp
Copyright (c) 2020, Arm Limited.
2020-06-02 16:58:09 +01:00
Benjamin Bouvier
67c7a3ed19 mach backend: reduce the size of the Inst enum down to 32 bytes; 2020-06-02 16:29:05 +02:00
Benjamin Bouvier
e227608510 mach backend: use vectors instead of sets to remember set of uses/defs for calls;
This avoids the set uniqueness (hashing) test, reduces memory
churn when re-mapping virtual register onto real registers, and is
generally more memory-efficient.
2020-06-02 16:29:05 +02:00
Benjamin Bouvier
cfa0527794 mach backend: have mem_finalize return a SmallVec;
This avoids a spurious reallocation of the SmallVec containing the
load_constants result to a Vec, which appeared in dhat profiles.
2020-06-02 16:29:05 +02:00
Anton Kirilov
8a928830ac Enable the wast::Cranelift::spec::simd::simd_store test for AArch64
Copyright (c) 2020, Arm Limited.
2020-05-24 22:53:07 +01:00
Chris Fallin
73537e72c0 Merge pull request #1732 from jgouly/copysign-fpu
arm64: Use FPU instrctions for Fcopysign
2020-05-22 17:25:33 -07:00
Joey Gouly
02c3f238f8 arm64: Use FPU instrctions for Fcopysign
Copyright (c) 2020, Arm Limited.
2020-05-21 18:14:12 +01:00
Chris Fallin
c9e3b71c39 Merge pull request #1729 from cfallin/machinst-branch-opt
Fix MachBuffer branch optimization.
2020-05-20 14:43:57 -07:00
Benjamin Bouvier
1f620e1b46 cranelift: bump regalloc.rs to 0.0.24 and adapt to latest API changes; 2020-05-20 15:37:15 +02:00