Commit Graph

3951 Commits

Author SHA1 Message Date
Alex Crichton
65930640f8 Bump Wasmtime to 2.0.0 (#4874)
This commit replaces #4869 and represents the actual version bump that
should have happened had I remembered to bump the in-tree version of
Wasmtime to 1.0.0 prior to the branch-cut date. Alas!
2022-09-06 13:49:56 -05:00
Jamey Sharp
9856664f1f Make DataValue, not Ieee32/64, respect IEEE754 (#4860)
* cranelift-codegen: Remove all uses of DataValue

This type is only used by the interpreter, cranelift-fuzzgen, and
filetests. I haven't found another convenient crate for those to all
depend on where this type can live instead, but this small refactor at
least makes it obvious that code generation does not in any way depend
on the implementation of this type.

* Make DataValue, not Ieee32/64, respect IEEE754

This fixes #4857 by partially reverting #4849.

It turns out that Ieee32 and Ieee64 need bitwise equality semantics so
they can be used as hash-table keys.

Moving the IEEE754 semantics up a layer to DataValue makes sense in
conjunction with #4855, where we introduced a DataValue::bitwise_eq
alternative implementation of equality for those cases where users of
DataValue still want the bitwise equality semantics.

* cranelift-interpreter: Use eq/ord from DataValue

This fixes #4828, again, now that the comparison operators on DataValue
have the right IEEE754 semantics.

* Add regression test from issue #4857
2022-09-03 00:26:14 +00:00
Afonso Bordado
7e45cff459 cranelift: Bitwise compare fuzzgen results (#4855) 2022-09-02 19:34:16 +00:00
Afonso Bordado
3afb711a51 cranelift: Document Ieee{32,64} implementation (#4854) 2022-09-02 19:02:22 +00:00
Afonso Bordado
f30a7eb0c9 cranelift: Implement PartialEq on Ieee{32,64} (#4849)
* cranelift: Add `fcmp` tests

Some of these are disabled on aarch64 due to not being implemented yet.

* cranelift: Implement float PartialEq for Ieee{32,64} (fixes #4828)

Previously `PartialEq` was auto derived. This means that it was implemented in terms of PartialEq in a u32.

This is not correct for floats because `NaN != NaN`.

PartialOrd was manually implemented in 6d50099816, but it seems like it was an oversight to leave PartialEq out until now.

The test suite depends on the previous behaviour so we adjust it to keep comparing bits instead of floats.

* cranelift: Disable `fcmp ord` tests on aarch64

* cranelift: Disable `fcmp ueq` tests on aarch64
2022-09-02 10:42:42 -07:00
Anton Kirilov
48bf078c83 Cranelift AArch64: Fix the atomic memory operations (#4831)
Previously the implementations of the various atomic memory IR operations
ignored the memory operation flags that were passed.

Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 09:35:21 -07:00
Anton Kirilov
d2e19b8d74 Cranelift AArch64: Migrate AMode to ISLE (#4832)
Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 00:24:46 +00:00
Chris Fallin
385bd0cbf8 x64: fix CvtFloatToUintSeq: do not clobber src. (#4842)
This slipped through the regalloc2 operand code update in #4811: the
CvtFloatToUintSeq pseudo-instruction actually clobbers its source. It
was marked as a "mod" operand in the original and I mistakenly
converted it to a "use" as I had not seen the actual clobber. The
instruction now takes an extra temp and makes a copy of `src` in the
appropriate place.

Fixes #4840.
2022-09-01 22:46:57 +00:00
Afonso Bordado
08e7a7f1a0 cranelift: Add inline stack probing for x64 (#4747)
* cranelift: Add inline stack probe for x64

* cranelift: Cleanups comments

Thanks @jameysharp!
2022-09-01 22:32:54 +00:00
Jamey Sharp
84ac24c23d cranelift: Remove const_addr instruction (fixes #2398) (#4843) 2022-09-01 21:57:37 +00:00
Chris Fallin
ae5fe8a728 aarch64: fix up regalloc2 semantics. (#4830)
This PR removes all uses of modify-operands in the aarch64 backend,
replacing them with reused-input operands instead. This has the nice
effect of removing a bunch of move instructions and more clearly
representing inputs and outputs.

This PR also removes the explicit use of pinned vregs in the aarch64
backend, instead using fixed-register constraints on the operands when
insts or pseudo-inst sequences require certain registers.

This is the second PR in the regalloc-semantics cleanup series; after
the remaining backend (s390x) and the ABI code are cleaned up as well,
we'll be able to simplify the regalloc2 frontend.
2022-09-01 21:25:20 +00:00
Andrew Brown
ac2d4c4818 x64: improve tests for heap_addr (#4841)
* x64: improve tests for `heap_addr`

This change adds Cranelift `compile` tests for the various cases for
`heap_addr`. The idea behind this is to more clearly show what the
penalties are for dynamically- vs statically-allocated memory as well as
turning Spectre mitigations on and off.

* Add test case: "right" size memory with Spectre enabled
2022-09-01 13:59:55 -07:00
Afonso Bordado
2beaf7352f cranelift: Test calling across different calling conventions (#4801)
* cranelift: Test calling across different calling conventions

* cranelift: Use `wasmtime_system_v` calling convention for cross cc tests
2022-08-31 15:16:41 -07:00
Trevor Elliott
dde2c5a3b6 Align functions according to their ISA's requirements (#4826)
Add a function_alignment function to the TargetIsa trait, and use it to align functions when generating objects. Additionally, collect the maximum alignment required for pc-relative constants in functions and pass that value out. Use the max of these two values when padding functions for alignment.

This fixes a bug on x86_64 where rip-relative loads to sse registers could cause a segfault, as functions weren't always guaranteed to be aligned to 16-byte addresses.

Fixes #4812
2022-08-31 14:41:44 -07:00
Nick Fitzgerald
f18a1f1488 Cranelift: Deduplicate ABI signatures during lowering (#4829)
* Cranelift: Deduplicate ABI signatures during lowering

This commit creates the `SigSet` type which interns and deduplicates the ABI
signatures that we create from `ir::Signature`s. The ABI signatures are now
referred to indirectly via a `Sig` (which is a `cranelift_entity` ID), and we
pass around a `SigSet` to anything that needs to access the actual underlying
`SigData` (which is what `ABISig` used to be).

I had to change a couple methods to return a `SmallInstVec` instead of emitting
directly to work around what would otherwise be shared and exclusive borrows of
the lowering context overlapping. I don't expect any of these to heap allocate
in practice.

This does not remove the often-unnecessary allocations caused by
`ensure_struct_return_ptr_is_returned`. That is left for follow up work.

This also opens the door for further shuffling of signature data into more
efficient representations in the future, now that we have `SigSet` to store it
all in one place and it is threaded through all the code. We could potentially
move each signature's parameter and return vectors into one big vector shared
between all signatures, for example, which could cut down on allocations and
shrink the size of `SigData` since those `SmallVec`s have pretty large inline
capacity.

Overall, this refactoring gives a 1-7% speedup for compilation on
`pulldown-cmark`:

```
compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 8754213.66 ± 7526266.23 (confidence = 99%)

  dedupe.so is 1.01x to 1.07x faster than main.so!

  [191003295 234620642.20 280597986] dedupe.so
  [197626699 243374855.86 321816763] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [170406200 194299792.68 253001201] dedupe.so
  [172071888 193230743.11 223608329] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3870997347 4437735062.59 5216007266] dedupe.so
  [4019924063 4424595349.24 4965088931] main.so
```

* Use full path instead of import to avoid warnings in some build configurations

Warnings will then cause CI to fail.

* Move `SigSet` into `VCode`
2022-08-31 20:39:32 +00:00
Trevor Elliott
fb8b9838fe Add MInst.XmmUnaryRmRImm to handle rounding instructions (#4823)
Add a new pseudo-instruction, XmmUnaryRmRImm, to handle instructions like roundss that only use their first register argument for the instruction's result. This has the added benefit of allowing the isle wrappers for those instructions to take an XmmMem argument, allowing for more cases where loads may be merged.
2022-08-31 08:29:32 -07:00
Afonso Bordado
cf7cb10036 cranelift: Add some filetests documentation (#4833) 2022-08-31 08:15:10 -07:00
Chris Fallin
186c7c3b89 x64: clean up regalloc-related semantics on several instructions. (#4811)
* x64: clean up regalloc-related semantics on several instructions.

This PR removes all uses of "modify" operands on instructions in the x64
backend, and also removes all uses of "pinned vregs", or vregs that are
explicitly tied to particular physical registers. In place of both of
these mechanisms, which are legacies of the old regalloc design and
supported via compatibility code, the backend now uses operand
constraints. This is more flexible as it allows the regalloc to see the
liveranges and constraints without "reverse-engineering" move instructions.

Eventually, after removing all such uses (including in other backends
and by the ABI code), we can remove the compatibility code in regalloc2,
significantly simplifying its liverange-construction frontend and
thus allowing for higher confidence in correctness as well as possibly a
bit more compilation speed.

Curiously, there are a few extra move instructions now; they are likely
poor splitting decisions and I can try to chase these down later.

* Fix cranelift-codegen tests.

* Review feedback.
2022-08-30 17:21:14 -07:00
Afonso Bordado
3ce3eeb668 cranelift: Register all functions in test file for interpreter (#4817)
* cranelift: Implement `bnot` in interpreter

* cranelift: Register all functions in test file for interpreter

* cranelift: Relax signature checking for bools and vectors
2022-08-30 15:45:21 -07:00
Chris Fallin
1a59b3e6c6 AArch64: port tls_value to ISLE. (#4821) 2022-08-30 16:51:15 +00:00
Trevor Elliott
b033aba61b Move the nop lowering to ISLE, and remove the final return from lower.rs (#4809)
Lower nop in ISLE in the x64 backend, and remove the final Ok(()) from the lower function to assert that all cases that aren't handled in ISLE will panic.
2022-08-30 09:14:20 -07:00
Damian Heaton
3d9d759380 Port fcmp to ISLE (AArch64) (#4819)
Ported the existing implementation of `fcmp` for AArch64 to ISLE.

This also ports the `lower_vector_comparison` method to ISLE.

Copyright (c) 2022 Arm Limited
2022-08-30 09:06:15 -07:00
Chris Fallin
b1fb4d7c35 Fix lowering issue in x64 vany_true: sinking and using original value. (#4815)
The x64 lowring of `vany_true` both sinks mergeable loads and uses the
original register. This PR fixes the lowering to force the value into a
register first. Ideally we should solve the issue by catching this in
the ISLE type system, as described in #4745, but this resolves the issue
for now.

Fixes #4807.
2022-08-29 22:22:12 -07:00
Chris Fallin
2b4b257834 Revert "cranelift: Register all functions in test file for interpreter (#4800)" (#4810)
This reverts commit 500a9f17be.
2022-08-30 01:15:11 +00:00
Chris Fallin
955d4e4ba1 AArch64: port load and store operations to ISLE. (#4785)
This retains `lower_amode` in the handwritten code (@akirilov-arm
reports that there is an upcoming patch to port this), but tweaks it
slightly to take a `Value` rather than an `Inst`.
2022-08-29 17:45:55 -07:00
Afonso Bordado
500a9f17be cranelift: Register all functions in test file for interpreter (#4800)
* cranelift: Implement `bnot` in interpreter

* cranelift: Register all functions in test file for interpreter
2022-08-29 23:39:50 +00:00
Nick Fitzgerald
5392d7cdd7 cranelift: Merge abi and abi_impl modules (#4805) 2022-08-29 23:20:36 +00:00
Jamey Sharp
4882347868 Disable funcref generation for fuzz tests with inputs (#4797)
This fixes #4757, fixes #4758, and fixes new fuzzbugs that are probably
coming after we merged #4667.
2022-08-29 14:30:26 -07:00
Afonso Bordado
07767c3d4a cranelift: Enable i128 shifts (#4783) 2022-08-29 14:30:03 -07:00
Afonso Bordado
7663cc1c3d cranelift: Disable i128 divs on fuzzgen (#4771) 2022-08-29 14:29:51 -07:00
Afonso Bordado
9a8bd5be02 cranelift: Add LibCalls to the interpreter (#4782)
* cranelift: Add libcall handlers to interpreter

* cranelift: Fuzz IshlI64 libcall

* cranelift: Revert back to fuzzing udivi64

* cranelift: Use sdiv as a fuzz libcall

* cranelift: Register Sdiv in fuzzgen

* cranelift: Add multiple libcalls to fuzzer

* cranelift: Register a single libcall handler

* cranelift: Simplify args checking in interpreter

* cranelift: Remove unused LibCalls

* cranelift: Cleanup interpreter libcall types

* cranelift: Fix Interpreter Docs
2022-08-29 13:36:33 -07:00
Chris Fallin
a6eb24bd4f AArch64: port misc ops to ISLE. (#4796)
* Add some precise-output compile tests for aarch64.

* AArch64: port misc ops to ISLE.

- get_pinned_reg / set_pinned_reg
- bitcast
- stack_addr
- extractlane
- insertlane
- vhigh_bits
- iadd_ifcout
- fcvt_low_from_sint
2022-08-29 12:56:39 -07:00
Johnnie Birch
6368c6b188 Modifies fcvt_to_sint and fcvt_to_unit clif to make scalar only (#4794)
Closes #4693.
2022-08-29 08:40:39 -07:00
Jamey Sharp
573ae0c60b cranelift-fuzzgen: use a different namespace (#4795)
Otherwise I get a panic with "Duplicate function with name u0:1 found!"
at fuzz/fuzz_targets/cranelift-fuzzgen.rs:76:10.
2022-08-26 23:55:37 +00:00
Trevor Elliott
25d960f9c4 x64: Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE (#4793)
Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE.
2022-08-26 16:33:48 -07:00
Chris Fallin
8e8dfdf5f9 AArch64: Migrate calls and returns to ISLE. (#4788) 2022-08-26 16:26:39 -07:00
Trevor Elliott
ca6d648e37 x64: Ensure that constants are always 16 bytes for XmmMem (#4790)
Ensure that constants generated for the memory case of XmmMem values are always 16 bytes, ensuring that we don't accidantally perform an unaligned load.

Fixes #4761
2022-08-26 20:04:38 +00:00
Afonso Bordado
7a9078d9cc cranelift: Allow call and call_indirect in runtests (#4667)
* cranelift: Change test runner order

Changes the ordering of runtests to run per target and then per function.

This change doesn't do a lot by itself, but helps future refactorings of runtests.

* cranelift: Rename SingleFunctionCompiler to TestCaseCompiler

* cranelift: Skip runtests per target instead of per run

* cranelift: Deduplicate test names

With the upcoming changes to the runtest infrastructure we require unique ExtNames for all tests.

Note that for test names we have a 16 character limit on test names, and must be unique within those 16 characters.

* cranelift: Add TestFileCompiler to runtests

TestFileCompiler allows us to compile the entire file once, and then call the trampolines for each test.

The previous code was compiling the function for each invocation of a test.

* cranelift: Deduplicate ExtName for avg_round tests

* cranelift: Rename functions as they are defined.

The JIT internally only deals with User functions, and cannot link test name funcs.

This also caches trampolines by signature.

* cranelift: Preserve original name when reporting errors.

* cranelift: Rename aarch64 test functions

* cranelift: Add `call` and `call_indirect` tests!

* cranelift: Add pauth runtests for aarch64

* cranelift: Rename duplicate s390x tests

* cranelift: Delete `i128_bricmp_of` function from i128-bricmp

It looks like we forgot to delete it when it was moved to
`i128-bricmp-overflow`, and since it didn't have a run invocation
it was never compiled.

However, s390x does not support this, and panics when lowering.

* cranelift: Add `colocated` call tests

* cranelift: Rename *more* `s390x` tests

* cranelift: Add pauth + sign_return_address call tests

* cranelift: Undeduplicate test names

With the latest main changes we now support *unlimited* length test names.

This commit reverts:
52274676ff631c630f9879dd32e756566d3e700f
7989edc172493547cdf63e180bb58365e8a43a42
25c8a8395527d98976be6a34baa3b0b214776739
792e8cfa8f748077f9d80fe7ee5e958b7124e83b

* cranelift: Add LibCall tests

* cranelift: Revert more test names

These weren't auto reverted by the previous revert.

* cranelift: Disable libcall tests for aarch64

* cranelift: Runtest fibonacci tests

* cranelift: Misc cleanup
2022-08-26 12:42:16 -07:00
Trevor Elliott
c1f9736938 x64: Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE (#4787)
Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE.
2022-08-26 09:07:22 -07:00
Trevor Elliott
9386409607 x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780)
Lower extractlane, scalar_to_vector and splat in ISLE.

This PR also makes some changes to the SinkableLoad api
* change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem
* add reg_mem_to_reg_mem_imm and register it as an automatic conversion
2022-08-25 09:38:03 -07:00
Chris Fallin
c664fb6f70 Update ISLE error message: unknown term in expr, not pattern. (#4775)
This was likely a copy-paste from the `ast::Pattern` case, but here it
is checking a term name in `ast::Expr` and so should say "... in
expression", not "... in pattern".
2022-08-24 22:07:29 +00:00
Trevor Elliott
b8b6f2781e x64: Lower shuffle and swizzle in ISLE (#4772)
Lower `shuffle` and `swizzle` in ISLE.

This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result.

I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: 94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)
2022-08-24 21:49:51 +00:00
Damian Heaton
94bcbe8446 Port Fcopysign..FcvtToSintSat to ISLE (AArch64) (#4753)
* Port `Fcopysign`..``FcvtToSintSat` to ISLE (AArch64)

Ported the existing implementations of the following opcodes to ISLE on
AArch64:
- `Fcopysign`
  - Also introduced missing support for `fcopysign` on vector values, as
    per the docs.
  - This introduces the vector encoding for the `SLI` machine
    instruction.
- `FcvtToUint`
- `FcvtToSint`
- `FcvtFromUint`
- `FcvtFromSint`
- `FcvtToUintSat`
- `FcvtToSintSat`

Copyright (c) 2022 Arm Limited

* Document helpers and abstract conversion checks
2022-08-24 10:37:14 -07:00
Afonso Bordado
7e3c481f4e cranelift: Avoid lowering VEX instructions with memory encodings (#4768) 2022-08-24 10:35:06 -07:00
Afonso Bordado
d394edcefe x64: Mask shift amounts for small types (#4752)
* x64: Mask shift amounts for small types

* cranelift: Disable i128 shifts in fuzzer again

They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed

* cranelift: Avoid masking shifts for 32 and 64 bit cases

* cranelift: Add const shift tests and fix them

* cranelift: Remove const `rotl` cases

Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr
2022-08-24 10:31:38 -07:00
Jamey Sharp
9cb987c678 Don't limit ExternalName::TestName length (#4764)
In order to keep the `ExternalName` enum small, the `TestcaseName`
struct was limited to 17 bytes: a 1 byte length and a 16 byte buffer.
Due to alignment, that made `ExternalName` 20 bytes.

That fixed-size buffer means that the names of functions in Cranelift
filetests are truncated to fit, which limits our ability to give tests
meaningful names. And I think meaningful names are important in tests.

This patch replaces the inline `TestcaseName` buffer with a
heap-allocated slice. We don't care about performance for test names, so
an indirection out to the heap is fine in that case. But we do care
somewhat about the size of `ExternalName` when it's used during
compiles.

On 64-bit systems, `Box<[u8]>` is 16 bytes, so `TestcaseName` gets one
byte smaller. Unfortunately, its alignment is 8 bytes, so `ExternalName`
grows from 20 to 24 bytes.

According to `valgrind --tool=dhat`, this change has very little effect
on compiler performance. Building wasmtime with `--no-default-features
--release`, and compiling the pulldown-cmark benchmark from Sightglass,
I measured these differences between `main` and this patch:

- total number of allocations didn't change (`ExternalName::TestCase` is
  not used in normal compiles)
- 592 more bytes allocated over the process lifetime, out of 171.5MiB
- 320 more bytes allocated at peak heap size, out of 12MiB
- 0.24% more instructions executed
- 16,987 more bytes written
- 12,120 _fewer_ bytes read
2022-08-23 21:17:30 -07:00
Trevor Elliott
4bdfa76370 x64: Migrate get_pinned_reg, set_pinned_reg, vconst, and raw_bitcast to ISLE (#4763)
https://github.com/bytecodealliance/wasmtime/pull/4763
2022-08-23 16:32:00 -07:00
Trevor Elliott
b5f1ab7780 x64: Lower stack_addr, udiv, sdiv, urem, srem, umulhi, smulhi in ISLE (#4741)
Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE.

For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.
2022-08-23 11:22:49 -07:00
Damian Heaton
3b68d76905 Port widening ops to ISLE (AArch64) (#4751)
Ported the existing implementations of the following opcodes for AArch64
to ISLE, and implemented support for 64-bit vectors (per the docs):
- `SwidenLow`
- `SwidenHigh`
- `UwidenLow`
- `UwidenHigh`

Also ported `WideningPairwiseDotProductS` as-is.

Copyright (c) 2022 Arm Limited
2022-08-23 09:42:11 -07:00
Damian Heaton
da1fb305a3 Port vconst to ISLE (AArch64) (#4750)
* Port `vconst` to ISLE (AArch64)

Ported the existing implementation of `vconst` to ISLE for AArch64, and
added support for 64-bit vector constants.

Also introduced 64-bit `vconst` support to the interpreter.

Copyright (c) 2022 Arm Limited

* Replace if-chains with match statements

Copyright (c) 2022 Arm Limited
2022-08-23 09:40:11 -07:00