Commit Graph

712 Commits

Author SHA1 Message Date
Chris Fallin
a84c1931a0 Merge pull request #1854 from akirilov-arm/simd_load_splat
Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64
2020-06-10 12:11:29 -07:00
Johnnie Birch
48f0b10c7a Add initial scalar FP operations (addss, subss, etc) to x64 backend.
Adds support for addss and subss. This is the first lowering for
sse floating point alu and some move operations. The changes here do
some renaming of data structures and adds a couple of new ones
to support sse specific operations. The work done here will likely
evolve as needed to support an efficient, inituative, and consistent
framework.
2020-06-10 18:36:57 +02:00
Benjamin Bouvier
5d01603390 mach backend: allow snapshotting IR graphs with the SNAPSHOT_REGALLOC env variable;
This also requires the serde feature, which isn't enabled by default,
thus it must be passed as a command-line argument to cargo.
2020-06-10 18:23:04 +02:00
Benjamin Bouvier
46093f6119 Bump regalloc.rs to 0.0.26;
And adapt to regalloc.rs API change to provide the exact number of vregs.
2020-06-10 18:23:04 +02:00
Anton Kirilov
d941034c2e Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64
Copyright (c) 2020, Arm Limited.
2020-06-10 15:01:37 +01:00
Chris Fallin
ac87ed12bd Merge pull request #1847 from akirilov-arm/simd_load_extend
Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64
2020-06-09 12:29:06 -07:00
Joey Gouly
df2b031b6a arm64: Implement Icmp for I16X8 and I32X4
Copyright (c) 2020, Arm Limited.
2020-06-09 11:07:43 -07:00
Anton Kirilov
7ac19af498 Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64
Copyright (c) 2020, Arm Limited.
2020-06-09 18:05:38 +01:00
Chris Fallin
8da71a145c Merge pull request #1802 from akirilov-arm/simd_align
Enable the wast::Cranelift::spec::simd::simd_align test for AArch64
2020-06-09 09:58:26 -07:00
Chris Fallin
02ae1b4464 Merge pull request #1846 from julian-seward1/better-phis
Rewrite `lower_edge` to produce better phi-translations:
2020-06-09 09:56:52 -07:00
Anton Kirilov
51a551fb39 Implement vector element extensions for AArch64
This commit also includes load and extend operations. Both are
prerequisites for enabling further SIMD spec tests.

Copyright (c) 2020, Arm Limited.
2020-06-09 12:28:49 +01:00
Julian Seward
6d25759c8e Rewrite lower_edge to produce better phi-translations:
* ensure that all const assignments are placed at the end of the sequence.
  This minimises live ranges.

* for the non-const assignments, ignore self-assignments.  This can
  dramatically reduce the total number of moves generated, because any
  self-assignments trigger the overlap-case handling, hence invoking the
  double-copy behaviour in cases where it's not necessary.

It's worth pointing out that self-assignments are common, and are not due to
deficiencies in CLIR optimisation.  Rather, they occur whenever a loop back
edge doesn't modify *all* loop-carried values.  This can easily happen if
the loop has multiple "early" back-edges -- "continues" in C parlance.  Eg:

   loop_header(a, b, c, d, e, f):
      ...
      a_new = ...
      b_new = ...
      if (..) goto loop_header(a_new, b_new, c, d, e, f)

      ...
      c_new = ...
      d_new = ...
      if (..) goto loop_header(a_new, b_new, c_new, d_new, e, f)

      etc

For functions with many live values, this can dramatically reduce the number
of spill moves we throw into the register allocator.

In terms of compilation costs, this ranges from neutral for functions which
spill not at all, or minimally (joey_small, joey_med) to a 7.1% reduction in
insn count.

In terms of run costs, for one spill-heavy test (bz2 w/ custom timing harness),
instruction counts are reduced by 4.3%, data reads by 12.3% and data writes
by 18.5%.  Note those last two figures include all reads and writes made by the
generated code, not just spills/reloads, so the proportional reduction in
spill/reload traffic must be greater.
2020-06-09 10:36:32 +02:00
Nick Fitzgerald
fb9f39ce17 Merge pull request #1824 from fitzgen/test-stack-maps
cranelift: Better document and test stack maps
2020-06-08 15:58:20 -07:00
Nick Fitzgerald
6aac4c891e cranelift: Better document and test stack maps 2020-06-08 15:05:20 -07:00
Chris Fallin
e3d89c8a92 Merge pull request #1825 from cfallin/spidermonkey-fixes
Three fixes to various SpiderMonkey-related issues
2020-06-08 13:54:13 -07:00
Chris Fallin
fc2a6f273b Three fixes to various SpiderMonkey-related issues:
- Properly mask constant values down to appropriate width when
  generating a constant value directly in aarch64 backend. This was a
  miscompilation introduced in the new-isel refactor. In combination
  with failure to respect NarrowValueMode, this resulted in a very
  subtle bug when an `i32` constant was used in bit-twiddling logic.

- Add support for `iadd_ifcout` in aarch64 backend as used in explicit
  heap-check mode. With this change, we no longer fail heap-related
  tests with the huge-heap-region mode disabled.

- Remove a panic that was occurring in some tests that are currently
  ignored on aarch64, by simply returning empty/default information in
  `value_label` functionality rather than touching unimplemented APIs.
  This is not a bugfix per-se, but removes confusing panic messages from
  `cargo test` output that might otherwise mislead.
2020-06-08 13:02:00 -07:00
whitequark
3796164642 x86_32: legalize br{z,nz}.i64. 2020-06-08 12:52:13 -07:00
whitequark
bc555468a7 cranelift: add i64.{ishl,ushr,ashr} libcalls.
These libcalls are useful for 32-bit platforms.

On x86_32 in particular, commit 4ec16fa0 added support for legalizing
64-bit shifts through SIMD operations. However, that legalization
requires SIMD to be enabled and SSE 4.1 to be supported, which is not
acceptable as a hard requirement.
2020-06-05 12:13:49 -07:00
Chris Fallin
00abfcd943 Merge pull request #1817 from cfallin/issue-1809
Avoid touching encodings in `EncCursor` if using a MachInst backend.
2020-06-04 12:50:39 -07:00
Yury Delendik
6f37204f82 Upgrade gimli to 0.21 (#1819)
* Use gimli 0.21

* rm CFI w Expression

* Don't write .debug_frame twice
2020-06-04 14:34:05 -05:00
Chris Fallin
63a335b7d4 Avoid touching encodings in EncCursor if using a MachInst backend.
`EncCursor` is a variant of `Cursor` that allows updating CLIF while
keeping its encodings up to date, given a particular ISA. However, new
(MachInst) backends don't use the encodings, and the `TargetIsaAdapter`
shim will panic if any encoding-related method is called. This PR avoids
those panics.

Fixes #1809.
2020-06-04 10:53:45 -07:00
Andrew Brown
5db384cd76 Rename opcode: PMULLQ to VPMULLQ 2020-06-03 16:27:57 -07:00
Andrew Brown
1ea09088be Add x86 legalization for imul.i64x2 for non-AVX CPUs
The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.
2020-06-03 16:27:57 -07:00
Andrew Brown
b3a6985cd5 Re-organize transform groups for x86 legalization 2020-06-03 16:27:57 -07:00
Andrew Brown
5a32500518 Remove non-existent x86 encoding for sshr_imm.i64x2
This instruction does not exist in the SSE2 feature set; it can be added later with an VEX/EVEX encoding.
2020-06-03 16:27:57 -07:00
Andrew Brown
df171f01b5 Add x86_pmuludq
This instruction multiplies the lower 32 bits of two 64x2 unsigned integers into an i64x2; this is necessary for lowering Wasm's i64x2.mul.
2020-06-03 16:27:57 -07:00
Andrew Brown
40f31375a5 Add TargetIsa::as_any for downcasting to specific ISA implementations
This is necessary when we would like to check specific ISA flags, e.g.
2020-06-03 16:27:57 -07:00
Andrew Brown
9ba9fd0f64 Add x86-specific instruction for i64x2 multiplication
Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.
2020-06-03 16:27:57 -07:00
Chris Fallin
fe97659813 Address review comments. 2020-06-03 13:31:34 -07:00
Chris Fallin
615362068f Multi-value return support. 2020-06-03 13:31:34 -07:00
Dan Gohman
a76639c6fb Wasmtime 0.17.0 and Cranelift 0.64.0. (#1805) 2020-06-02 18:51:59 -07:00
Chris Fallin
9fec933056 Merge pull request #1801 from jgouly/cmp-rebase
arm64: add support for I8X16 ICmp
2020-06-02 09:35:41 -07:00
Joey Gouly
90a421193f arm64: add support for I8X16 ICmp
Copyright (c) 2020, Arm Limited.
2020-06-02 16:58:09 +01:00
Benjamin Bouvier
67c7a3ed19 mach backend: reduce the size of the Inst enum down to 32 bytes; 2020-06-02 16:29:05 +02:00
Benjamin Bouvier
e227608510 mach backend: use vectors instead of sets to remember set of uses/defs for calls;
This avoids the set uniqueness (hashing) test, reduces memory
churn when re-mapping virtual register onto real registers, and is
generally more memory-efficient.
2020-06-02 16:29:05 +02:00
Benjamin Bouvier
cfa0527794 mach backend: have mem_finalize return a SmallVec;
This avoids a spurious reallocation of the SmallVec containing the
load_constants result to a Vec, which appeared in dhat profiles.
2020-06-02 16:29:05 +02:00
Nick Fitzgerald
7c68a10ed6 Merge pull request #1670 from teapotd/win64-pass-by-ref
Implement passing arguments by ref for win64 ABI
2020-06-01 11:13:30 -07:00
Andrew Brown
0dd77d36f8 Rename BinaryImm format to BinaryImm64 2020-05-29 19:56:27 -07:00
Andrew Brown
a27a079d65 Replace ExtractLane format with BinaryImm8
Like https://github.com/bytecodealliance/wasmtime/pull/1762, this change the name of the `ExtractLane` format to the more-general `BinaryImm8` and renames its immediate argument from `lane` to `imm`.
2020-05-29 19:56:27 -07:00
Andrew Brown
7d6e94b952 Replace InsertLane format with TernaryImm8
The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.
2020-05-29 19:56:27 -07:00
teapotd
e430984ac4 Improve bitselect codegen with knowledge of operand origin (#1783)
* Encode vselect using BLEND instructions on x86

* Legalize vselect to bitselect

* Optimize bitselect to vselect for some operands

* Add run tests for bitselect-vselect optimization

* Address review feedback
2020-05-29 19:53:11 -07:00
teapotd
759cc3e751 Implement passing arguments by ref for win64 ABI 2020-05-29 20:12:41 +02:00
Nick Fitzgerald
94380bf2b7 Merge pull request #1510 from teapotd/abi-i128-fix
Always check if struct-return parameter is needed
2020-05-29 10:02:16 -07:00
whitequark
a180b5b393 x86_32: fix stack_addr encoding.
Consider this testcase:

    target i686
    function u0:0() -> i32 system_v {
        ss0 = explicit_slot 0
    block0:
        v2 = stack_addr.i32 ss0
        return v2
    }

Before this commit, in 32-bit mode the x86 backend would generate
incorrect code for stack addresses:

     0:   55                      push    ebp
     1:   89 e5                   mov     ebp, esp
     3:   83 ec 08                sub     esp, 8
     6:   8d 44 24 00             lea     eax, [esp]
     a:   00 00                   add     byte ptr [eax], al
     c:   00 83 c4 08 5d c3       add     byte ptr [ebx - 0x3ca2f73c], al

This happened because the ModRM byte indicated a disp8 encoding, but
the instruction actually used a disp32 encoding. After this commit,
correct code is generated:

     0:   55                      push    ebp
     1:   89 e5                   mov     ebp, esp
     3:   83 ec 08                sub     esp, 8
     6:   8d 84 24 00 00 00 00    lea     eax, [esp]
     d:   83 c4 08                add     esp, 8
    10:   5d                      pop     ebp
    11:   c3                      ret
2020-05-29 09:17:36 -07:00
Nick Fitzgerald
2e7b3ba8de cranelift: Implement serialize/deserialize for stack maps
When the `enable-serde` feature is set.
2020-05-28 11:34:58 -07:00
whitequark
880e692fd4 x86: add encoding for bnot.b1.
Fixes #1743.

Co-authored-by: iximeow <git@iximeow.net>
2020-05-28 08:43:25 -07:00
Chris Fallin
6ead7527af Merge pull request #1748 from akirilov-arm/simd_store
Enable the wast::Cranelift::spec::simd::simd_store test for AArch64
2020-05-26 09:21:36 -07:00
Andrew Brown
c92917de15 Fix typo in sadd_sat instruction definition 2020-05-26 09:55:26 +02:00
teapotd
9e70a64728 Legalize sret call arguments 2020-05-25 20:03:24 +02:00
teapotd
0f55bb4b8d Always check if struct-return parameter is needed 2020-05-25 20:03:24 +02:00