Commit Graph

16 Commits

Author SHA1 Message Date
Trevor Elliott
9386409607 x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780)
Lower extractlane, scalar_to_vector and splat in ISLE.

This PR also makes some changes to the SinkableLoad api
* change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem
* add reg_mem_to_reg_mem_imm and register it as an automatic conversion
2022-08-25 09:38:03 -07:00
Trevor Elliott
b8b6f2781e x64: Lower shuffle and swizzle in ISLE (#4772)
Lower `shuffle` and `swizzle` in ISLE.

This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result.

I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: 94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)
2022-08-24 21:49:51 +00:00
Trevor Elliott
4bdfa76370 x64: Migrate get_pinned_reg, set_pinned_reg, vconst, and raw_bitcast to ISLE (#4763)
https://github.com/bytecodealliance/wasmtime/pull/4763
2022-08-23 16:32:00 -07:00
Chris Fallin
019ebf47b1 x64: fix pretty-printing argument order for XmmRmR instructions. (#4094)
The pretty-printing had swapped dst and src2; this was introduced when
we moved to RA2 (sorry about that! IMHO we should do something to
automate the mapping between regalloc arg collection and pretty
printing/emission).

`src2` comes at the end because it has a variable number of register
mentions; this is in line with how many of the other inst formats work.

Actual emitted code was never incorrect, just the pretty-printing.

Updated test golden outputs look correct to me now, including the one
that we saw was incorrect in #3945.
2022-05-03 10:12:58 -07:00
Chris Fallin
a0318f36f0 Switch Cranelift over to regalloc2. (#3989)
This PR switches Cranelift over to the new register allocator, regalloc2.

See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801)
for a summary of the design changes. This switchover has implications for
core VCode/MachInst types and the lowering pass.

Overall, this change brings improvements to both compile time and speed of
generated code (runtime), as reported in #3942:

```
Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A
```
2022-04-14 10:28:21 -07:00
Chris Fallin
1c014d129a Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816)
Addresses #3809: when we are asked to create a Cranelift backend with
shared flags that indicate support for SIMD, we should check that the
ISA level needed for our SIMD lowerings is present.
2022-02-16 17:29:30 -08:00
Alex Crichton
1ef0abb12c Update lots of isa/*/*.clif tests to precise-output (#3677)
* Update lots of `isa/*/*.clif` tests to `precise-output`

This commit goes through the `aarch64` and `x64` subdirectories and
subjectively changes tests from `test compile` to add `precise-output`.
This then auto-updates all the test expectations so they can be
automatically instead of manually updated in the future. Not all tests
were migrated, largely subject to the whims of myself, mainly looking to
see if the test was looking for specific instructions or just checking
the whole assembly output.

* Filter out `;;` comments from test expctations

Looks like the cranelift parser picks up all comments, not just those
trailing the function, so use a convention where `;;` is used for
human-readable-comments in test cases and `;`-prefixed comments are the
test expectation.
2022-01-10 13:38:23 -06:00
bjorn3
9e34df33b9 Remove the old x86 backend 2021-09-29 16:13:46 +02:00
Chris Fallin
cb48ea406e Switch default to new x86_64 backend.
This PR switches the default backend on x86, for both the
`cranelift-codegen` crate and for Wasmtime, to the new
(`MachInst`-style, `VCode`-based) backend that has been under
development and testing for some time now.

The old backend is still available by default in builds with the
`old-x86-backend` feature, or by requesting `BackendVariant::Legacy`
from the appropriate APIs.

As part of that switch, it adds some more runtime-configurable plumbing
to the testing infrastructure so that tests can be run using the
appropriate backend. `clif-util test` is now capable of parsing a
backend selector option from filetests and instantiating the correct
backend.

CI has been updated so that the old x86 backend continues to run its
tests, just as we used to run the new x64 backend separately.

At some point, we will remove the old x86 backend entirely, once we are
satisfied that the new backend has not caused any unforeseen issues and
we do not need to revert.
2021-04-02 11:35:53 -07:00
Andrew Brown
bb2dd5b68b [machinst x64]: implement load*_zero for x64 2021-01-08 16:21:57 -08:00
Chris Fallin
1dddba649a x64 regalloc register order: put caller-saves (volatiles) first.
The x64 backend currently builds the `RealRegUniverse` in a way that
is generating somewhat suboptimal code. In many blocks, we see uses of
callee-save (non-volatile) registers (r12, r13, r14, rbx) first, even in
very short leaf functions where there are plenty of volatiles to use.
This is leading to unnecessary spills/reloads.

On one (local) test program, a medium-sized C benchmark compiled to Wasm
and run on Wasmtime, I am seeing a ~10% performance improvement with
this change; it will be less pronounced in programs with high register
pressure (there we are likely to use all registers regardless, so the
prologue/epilogue will save/restore all callee-saves), or in programs
with fewer calls, but this is a clear win for small functions and in
many cases removes prologue/epilogue clobber-saves altogether.

Separately, I think the RA's coalescing is tripping up a bit in some
cases; see e.g. the filetest touched by this commit that loads a value
into %rsi then moves to %rax and returns immediately. This is an
orthogonal issue, though, and should be addressed (if worthwhile) in
regalloc.rs.
2020-12-06 22:37:43 -08:00
Andrew Brown
83f182b390 Implement initial emission of constants
This approach suffers from memory-size bloat during compile time due to the desire to de-duplicate the constants emitted and reduce runtime memory-size. As a first step, though, this provides an end-to-end mechanism for constants to be emitted in the MachBuffer islands.
2020-11-05 14:25:02 -08:00
Benjamin Bouvier
e32e6fb612 machinst x64: check SSE requirements for instructions against enabled features; 2020-10-08 09:21:51 +02:00
Andrew Brown
16a2538ecd [machinst x64]: rename Inst::XmmUninitializedValue and document
This approach is not the best but avoids an extra instruction; perhaps at some point, as mentioned in https://github.com/bytecodealliance/wasmtime/pull/2248, we will add the extra instruction or refactor things in such a way that this `Inst` variant is unnecessary.
2020-10-02 08:29:31 -07:00
Andrew Brown
3d9f3bf728 [machinst x64]: port CLIF tests related to comparison and lane operations 2020-10-02 08:29:31 -07:00
Andrew Brown
b7217d454f [machinst x64]: add lane-related CLIF filetests 2020-09-29 08:45:12 -07:00