Commit Graph

723 Commits

Author SHA1 Message Date
Andrew Brown
01d34e71b9 Add x86 legalization for fcvt_from_uint.f32x4
This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.
2020-06-12 15:06:22 -07:00
Andrew Brown
23ed48f269 Add AVX512F flag 2020-06-12 15:06:22 -07:00
Andrew Brown
772ce73f7f Add x86_pblendw instruction
This instruction is necessary for lowering `fcvt_from_uint`.
2020-06-12 15:06:22 -07:00
Andrew Brown
546fc9ddf1 Add x86_vcvtudq2ps instruction
This instruction converts i32x4 to f32x4 in several AVX512 feature sets.
2020-06-12 15:06:22 -07:00
Chris Fallin
6286ca7310 AArch64: make use of reg-reg-extend amode.
When a load/store instruction needs an address of the form `v0 +
uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we
currently generate a separate zero/sign-extend operation and then use a
plain `[rA, rB]` addressing mode. This patch extends `lower_address()`
to look at both addends of an address if it has two addends and a zero
offset, recognize extension operations, and incorporate them directly
into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve
our performence on WebAssembly workloads, at least, because we often see
a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.
2020-06-12 10:40:54 -07:00
Dan Gohman
caa87048ab Wasmtime 0.18.0 and Cranelift 0.65.0. 2020-06-11 17:49:56 -07:00
Chris Fallin
cdbe76a1d4 Remove uses of matches!() macro, incompatible with Firefox build.
When we vendor Cranelift into Firefox, we need to be able to build with
the Firefox CI setup (unless we carry patches on top of upstream).
Unfortunately, the Firefox CI currently appears to build with a slightly
older version of Rust: I can't work out which version exactly, but one
without stable support for `matches!()`.

A recent attempt to version-bump Cranelift failed with build errors at
the two locations in this patch:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305994046&repo=autoland&lineNumber=24829

I also see a bunch of uses of `matches!()` in Peepmatic, but those
crates are not built by Firefox, so we can leave them be for now, I
think.
2020-06-11 15:11:10 -07:00
Chris Fallin
b0cccf1d87 Merge pull request #1864 from jgouly/bitwise
arm64: Implement SIMD bitwise operations
2020-06-11 11:38:39 -07:00
Chris Fallin
6ba165be01 Merge pull request #1858 from cfallin/fix-scale-b1
Bugfix: scaled addressing mode: round B1 up to one byte.
2020-06-11 11:16:07 -07:00
Joey Gouly
544c5dece5 arm64: Implement SIMD bitwise operations
Copyright (c) 2020, Arm Limited.
2020-06-11 10:58:23 -07:00
Chris Fallin
ed7e410111 Bugfix: scaled addressing mode: round B1 up to one byte.
Issue uncovered by Ben Bouvier during regalloc work.
2020-06-11 10:27:32 -07:00
Chris Fallin
a84c1931a0 Merge pull request #1854 from akirilov-arm/simd_load_splat
Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64
2020-06-10 12:11:29 -07:00
Johnnie Birch
48f0b10c7a Add initial scalar FP operations (addss, subss, etc) to x64 backend.
Adds support for addss and subss. This is the first lowering for
sse floating point alu and some move operations. The changes here do
some renaming of data structures and adds a couple of new ones
to support sse specific operations. The work done here will likely
evolve as needed to support an efficient, inituative, and consistent
framework.
2020-06-10 18:36:57 +02:00
Benjamin Bouvier
5d01603390 mach backend: allow snapshotting IR graphs with the SNAPSHOT_REGALLOC env variable;
This also requires the serde feature, which isn't enabled by default,
thus it must be passed as a command-line argument to cargo.
2020-06-10 18:23:04 +02:00
Benjamin Bouvier
46093f6119 Bump regalloc.rs to 0.0.26;
And adapt to regalloc.rs API change to provide the exact number of vregs.
2020-06-10 18:23:04 +02:00
Anton Kirilov
d941034c2e Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64
Copyright (c) 2020, Arm Limited.
2020-06-10 15:01:37 +01:00
Chris Fallin
ac87ed12bd Merge pull request #1847 from akirilov-arm/simd_load_extend
Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64
2020-06-09 12:29:06 -07:00
Joey Gouly
df2b031b6a arm64: Implement Icmp for I16X8 and I32X4
Copyright (c) 2020, Arm Limited.
2020-06-09 11:07:43 -07:00
Anton Kirilov
7ac19af498 Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64
Copyright (c) 2020, Arm Limited.
2020-06-09 18:05:38 +01:00
Chris Fallin
8da71a145c Merge pull request #1802 from akirilov-arm/simd_align
Enable the wast::Cranelift::spec::simd::simd_align test for AArch64
2020-06-09 09:58:26 -07:00
Chris Fallin
02ae1b4464 Merge pull request #1846 from julian-seward1/better-phis
Rewrite `lower_edge` to produce better phi-translations:
2020-06-09 09:56:52 -07:00
Anton Kirilov
51a551fb39 Implement vector element extensions for AArch64
This commit also includes load and extend operations. Both are
prerequisites for enabling further SIMD spec tests.

Copyright (c) 2020, Arm Limited.
2020-06-09 12:28:49 +01:00
Julian Seward
6d25759c8e Rewrite lower_edge to produce better phi-translations:
* ensure that all const assignments are placed at the end of the sequence.
  This minimises live ranges.

* for the non-const assignments, ignore self-assignments.  This can
  dramatically reduce the total number of moves generated, because any
  self-assignments trigger the overlap-case handling, hence invoking the
  double-copy behaviour in cases where it's not necessary.

It's worth pointing out that self-assignments are common, and are not due to
deficiencies in CLIR optimisation.  Rather, they occur whenever a loop back
edge doesn't modify *all* loop-carried values.  This can easily happen if
the loop has multiple "early" back-edges -- "continues" in C parlance.  Eg:

   loop_header(a, b, c, d, e, f):
      ...
      a_new = ...
      b_new = ...
      if (..) goto loop_header(a_new, b_new, c, d, e, f)

      ...
      c_new = ...
      d_new = ...
      if (..) goto loop_header(a_new, b_new, c_new, d_new, e, f)

      etc

For functions with many live values, this can dramatically reduce the number
of spill moves we throw into the register allocator.

In terms of compilation costs, this ranges from neutral for functions which
spill not at all, or minimally (joey_small, joey_med) to a 7.1% reduction in
insn count.

In terms of run costs, for one spill-heavy test (bz2 w/ custom timing harness),
instruction counts are reduced by 4.3%, data reads by 12.3% and data writes
by 18.5%.  Note those last two figures include all reads and writes made by the
generated code, not just spills/reloads, so the proportional reduction in
spill/reload traffic must be greater.
2020-06-09 10:36:32 +02:00
Nick Fitzgerald
fb9f39ce17 Merge pull request #1824 from fitzgen/test-stack-maps
cranelift: Better document and test stack maps
2020-06-08 15:58:20 -07:00
Nick Fitzgerald
6aac4c891e cranelift: Better document and test stack maps 2020-06-08 15:05:20 -07:00
Chris Fallin
e3d89c8a92 Merge pull request #1825 from cfallin/spidermonkey-fixes
Three fixes to various SpiderMonkey-related issues
2020-06-08 13:54:13 -07:00
Chris Fallin
fc2a6f273b Three fixes to various SpiderMonkey-related issues:
- Properly mask constant values down to appropriate width when
  generating a constant value directly in aarch64 backend. This was a
  miscompilation introduced in the new-isel refactor. In combination
  with failure to respect NarrowValueMode, this resulted in a very
  subtle bug when an `i32` constant was used in bit-twiddling logic.

- Add support for `iadd_ifcout` in aarch64 backend as used in explicit
  heap-check mode. With this change, we no longer fail heap-related
  tests with the huge-heap-region mode disabled.

- Remove a panic that was occurring in some tests that are currently
  ignored on aarch64, by simply returning empty/default information in
  `value_label` functionality rather than touching unimplemented APIs.
  This is not a bugfix per-se, but removes confusing panic messages from
  `cargo test` output that might otherwise mislead.
2020-06-08 13:02:00 -07:00
whitequark
3796164642 x86_32: legalize br{z,nz}.i64. 2020-06-08 12:52:13 -07:00
whitequark
bc555468a7 cranelift: add i64.{ishl,ushr,ashr} libcalls.
These libcalls are useful for 32-bit platforms.

On x86_32 in particular, commit 4ec16fa0 added support for legalizing
64-bit shifts through SIMD operations. However, that legalization
requires SIMD to be enabled and SSE 4.1 to be supported, which is not
acceptable as a hard requirement.
2020-06-05 12:13:49 -07:00
Chris Fallin
00abfcd943 Merge pull request #1817 from cfallin/issue-1809
Avoid touching encodings in `EncCursor` if using a MachInst backend.
2020-06-04 12:50:39 -07:00
Yury Delendik
6f37204f82 Upgrade gimli to 0.21 (#1819)
* Use gimli 0.21

* rm CFI w Expression

* Don't write .debug_frame twice
2020-06-04 14:34:05 -05:00
Chris Fallin
63a335b7d4 Avoid touching encodings in EncCursor if using a MachInst backend.
`EncCursor` is a variant of `Cursor` that allows updating CLIF while
keeping its encodings up to date, given a particular ISA. However, new
(MachInst) backends don't use the encodings, and the `TargetIsaAdapter`
shim will panic if any encoding-related method is called. This PR avoids
those panics.

Fixes #1809.
2020-06-04 10:53:45 -07:00
Andrew Brown
5db384cd76 Rename opcode: PMULLQ to VPMULLQ 2020-06-03 16:27:57 -07:00
Andrew Brown
1ea09088be Add x86 legalization for imul.i64x2 for non-AVX CPUs
The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.
2020-06-03 16:27:57 -07:00
Andrew Brown
b3a6985cd5 Re-organize transform groups for x86 legalization 2020-06-03 16:27:57 -07:00
Andrew Brown
5a32500518 Remove non-existent x86 encoding for sshr_imm.i64x2
This instruction does not exist in the SSE2 feature set; it can be added later with an VEX/EVEX encoding.
2020-06-03 16:27:57 -07:00
Andrew Brown
df171f01b5 Add x86_pmuludq
This instruction multiplies the lower 32 bits of two 64x2 unsigned integers into an i64x2; this is necessary for lowering Wasm's i64x2.mul.
2020-06-03 16:27:57 -07:00
Andrew Brown
40f31375a5 Add TargetIsa::as_any for downcasting to specific ISA implementations
This is necessary when we would like to check specific ISA flags, e.g.
2020-06-03 16:27:57 -07:00
Andrew Brown
9ba9fd0f64 Add x86-specific instruction for i64x2 multiplication
Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.
2020-06-03 16:27:57 -07:00
Chris Fallin
fe97659813 Address review comments. 2020-06-03 13:31:34 -07:00
Chris Fallin
615362068f Multi-value return support. 2020-06-03 13:31:34 -07:00
Dan Gohman
a76639c6fb Wasmtime 0.17.0 and Cranelift 0.64.0. (#1805) 2020-06-02 18:51:59 -07:00
Chris Fallin
9fec933056 Merge pull request #1801 from jgouly/cmp-rebase
arm64: add support for I8X16 ICmp
2020-06-02 09:35:41 -07:00
Joey Gouly
90a421193f arm64: add support for I8X16 ICmp
Copyright (c) 2020, Arm Limited.
2020-06-02 16:58:09 +01:00
Benjamin Bouvier
67c7a3ed19 mach backend: reduce the size of the Inst enum down to 32 bytes; 2020-06-02 16:29:05 +02:00
Benjamin Bouvier
e227608510 mach backend: use vectors instead of sets to remember set of uses/defs for calls;
This avoids the set uniqueness (hashing) test, reduces memory
churn when re-mapping virtual register onto real registers, and is
generally more memory-efficient.
2020-06-02 16:29:05 +02:00
Benjamin Bouvier
cfa0527794 mach backend: have mem_finalize return a SmallVec;
This avoids a spurious reallocation of the SmallVec containing the
load_constants result to a Vec, which appeared in dhat profiles.
2020-06-02 16:29:05 +02:00
Nick Fitzgerald
7c68a10ed6 Merge pull request #1670 from teapotd/win64-pass-by-ref
Implement passing arguments by ref for win64 ABI
2020-06-01 11:13:30 -07:00
Andrew Brown
0dd77d36f8 Rename BinaryImm format to BinaryImm64 2020-05-29 19:56:27 -07:00
Andrew Brown
a27a079d65 Replace ExtractLane format with BinaryImm8
Like https://github.com/bytecodealliance/wasmtime/pull/1762, this change the name of the `ExtractLane` format to the more-general `BinaryImm8` and renames its immediate argument from `lane` to `imm`.
2020-05-29 19:56:27 -07:00