Commit Graph

1275 Commits

Author SHA1 Message Date
Chris Fallin
8dddd6f1f7 Cranelift: Remove ifcmp_sp opcode. (#4578)
This was temporarily added back in #3502 due to a need from Lucet; now
that Lucet is EOL, the opcode is no longer needed and we can remove it.
2022-08-02 13:15:39 -07:00
Chris Fallin
43f1765272 Cranellift: remove Baldrdash support and related features. (#4571)
* Cranellift: remove Baldrdash support and related features.

As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.

In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.

This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the  understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425

* Review feedback.

* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.

The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.

* Review feedback.
2022-08-02 19:37:56 +00:00
Benjamin Bouvier
ff37c9d8a4 [cranelift] Rejigger the compile API (#4540)
* Move `emit_to_memory` to `MachCompileResult`

This small refactoring makes it clearer to me that emitting to memory
doesn't require anything else from the compilation `Context`. While it's
a trivial change, it's a small public API change that shouldn't cause
too much trouble, and doesn't seem RFC-worthy. Happy to hear different
opinions about this, though!

* hide the MachCompileResult behind a method

* Add a `CompileError` wrapper type that references a `Function`

* Rename MachCompileResult to CompiledCode

* Additionally remove the last unsafe API in cranelift-codegen
2022-08-02 12:05:40 -07:00
Sam Parker
37cd96beff [AArch64] i64x2 support for min/max (#4575)
Also added interpreter support for vector min/max.

Copyright (c) 2022, Arm Limited.
2022-08-02 11:42:05 -07:00
Trevor Elliott
25782b527e x64: Migrate trapif and trapff to ISLE (#4545)
https://github.com/bytecodealliance/wasmtime/pull/4545
2022-08-01 11:24:11 -07:00
Anton Kirilov
a47a82d2e5 Cranelift AArch64: Harden the Spectre mitigations (#4555)
Use the `CSDB` instruction following Arm's recommendation.

Copyright (c) 2022, Arm Limited.
2022-08-01 10:20:48 -07:00
Afonso Bordado
1f058a02c0 cranelift: Add MinGW fma regression tests (#4517)
* cranelift: Add MinGW `fma` regression tests

* cranelift: Fix FMA in interpreter

* cranelift: Add separate `fma` test suite for the interpreter

The interpreter can run `fma.clif` on most platforms, however on
`x86_64-pc-windows-gnu` we use libm which has issues with some inputs.
We should delete `fma-interpreter.clif` and enable the interpreter on
the main `fma.clif` file once those are fixed.
2022-07-29 09:09:37 -05:00
Damian Heaton
5e3bb588a8 Port Fence, IsNull/IsInvalid & Debugtrap to ISLE (AArch64) (#4548)
Ported the existing implementation of the following Opcodes for AArch64
to ISLE:
- `Fence`
- `IsNull`
- `IsInvalid`
- `Debugtrap`

Copyright (c) 2022 Arm Limited
2022-07-28 15:36:13 -07:00
Trevor Elliott
29d4edc76b x64: Migrate call and call_indirect to ISLE (#4542)
https://github.com/bytecodealliance/wasmtime/pull/4542
2022-07-28 13:10:03 -07:00
Afonso Bordado
0508932174 cranelift: Align Scalar and SIMD shift semantics (#4520)
* cranelift: Reorganize test suite

Group some SIMD operations by instruction.

* cranelift: Deduplicate some shift tests

Also, new tests with the mod behaviour

* aarch64: Lower shifts with mod behaviour

* x64: Lower shifts with mod behaviour

* wasmtime: Don't mask SIMD shifts
2022-07-27 17:54:00 +00:00
Afonso Bordado
e121c209fc cranelift: Fix urem/srem in interpreter (#4532) 2022-07-27 10:47:08 -07:00
Anton Kirilov
ead6edb0c5 Cranelift AArch64: Migrate Splat to ISLE (#4521)
Copyright (c) 2022, Arm Limited.
2022-07-26 17:57:15 +00:00
Anton Kirilov
d041c4b376 Cranelift AArch64: Further integral constant fixes (#4530)
Copyright (c) 2022, Arm Limited.
2022-07-26 09:35:06 -07:00
Afonso Bordado
02c3b47db2 x64: Implement SIMD fma (#4474)
* x64: Add VEX Instruction Encoder

This uses a similar builder pattern to the EVEX Encoder.
Does not yet support memory accesses.

* x64: Add FMA Flag

* x64: Implement SIMD `fma`

* x64: Use 4 register Vex Inst

* x64: Reorder VEX pretty print args
2022-07-25 22:01:02 +00:00
Damian Heaton
3ef89b7787 Allow 64-bit vectors and implement for interpreter (#4509)
* Allow 64-bit vectors and implement for interpreter

The AArch64 backend already supports 64-bit vectors; this simply allows
instructions to make use of that.

Implemented support for 64-bit vectors within the interpreter to allow
interpret runtests to use them.

Copyright (c) 2022 Arm Limited

* Disable 64-bit SIMD `iaddpairwise` tests on s390x

Copyright (c) 2022 Arm Limited
2022-07-25 13:00:43 -07:00
Sam Parker
c5ddb4b803 [AArch64] Port SIMD narrowing to ISLE (#4478)
* [AArch64] Port SIMD narrowing to ISLE

Fvdemote, snarrow, unarrow and uunarrow.

Also refactor the aarch64 instructions descriptions to parameterize
on ScalarSize instead of using different opcodes.

The zero_value pure constructor has been introduced and used by the
integer narrow operations and it replaces, and extends, the compare
zero patterns.

Copright (c) 2022, Arm Limited.

* use short 'if' patterns
2022-07-25 12:40:36 -07:00
Ulrich Weigand
dd40bf075a s390x: Enable more runtests, and fix a few bugs (#4516)
This enables more runtests to be executed on s390x.  Doing so
uncovered a two back-end bugs, which are fixed as well:

- The result of cls was always off by one.
- The result of popcnt.i16 has uninitialized high bits.

In addition, I found a bug in the load-op-store.clif test case:
     v3 = heap_addr.i64 heap0, v1, 4
     v4 = iconst.i64 42
     store.i32 v4, v3
This was clearly intended to perform a 32-bit store, but
actually performs a 64-bit store (it seems the type annotation
of the store opcode is ignored, and the type of the operand
is used instead).  That bug did not show any noticable symptoms
on little-endian architectures, but broke on big-endian.
2022-07-25 12:37:06 -07:00
Trevor Elliott
ee7e4f4c6b x64: Port func_addr and symbol_value to ISLE (#4485)
https://github.com/bytecodealliance/wasmtime/pull/4485
2022-07-25 11:11:16 -07:00
Afonso Bordado
446efd3e11 cranelift: Fix icmp_imm for small types in interpreter (#4506) 2022-07-23 00:26:56 +00:00
Afonso Bordado
af62037f62 cranelift: Restrict br_table to i32 indices (#4510)
* cranelift: Restrict `br_table` to `i32` indices

In #4498 it was proposed that we should only accept `i32` indices
to `br_table`. The rationale for this is that larger types lead the
users to a false sense of flexibility (since we don't support jump
tables larger than u32's), and narrower types are not well tested
paths that would be safer if we removed them.

* cranelift: Reduce directly from i128 to i32 in Switch
2022-07-22 23:32:40 +00:00
Damian Heaton
f1a0c40a53 Convert sqrt..nearest to ISLE (AArch64) (#4508)
Converted the existing implementations for the following opcodes to ISLE
on AArch64:
- `sqrt`
- `fneg`
- `fabs`
- `fpromote`
- `fdemote`
- `ceil`
- `floor`
- `trunc`
- `nearest`

Copyright (c) 2022 Arm Limited
2022-07-22 14:48:07 -07:00
Afonso Bordado
d89c262657 cranelift: Implement {u,s}extend.i128 in interpreter (#4505) 2022-07-22 10:47:10 -07:00
Ulrich Weigand
fd639dd044 s390x: Support preserve_frame_pointers flag (#4477)
On s390x, we do not have a frame pointer that can be used to chain
stack frames for easy unwinding.  Instead, our ABI defines a stack
"backchain" mechanism that can be used to the same effect.

This PR uses that backchain mechanism to implement the new
preserve_frame_pointers flags introduced here:
https://github.com/bytecodealliance/wasmtime/pull/4469
2022-07-21 10:09:16 -07:00
Anton Kirilov
2ba4bce5cc Merge pull request from GHSA-7f6x-jwh5-m9r4
Copyright (c) 2022, Arm Limited.
2022-07-20 11:53:56 -05:00
Nick Fitzgerald
22d91a7c84 cranelift: Add a flag for preserving frame pointers (#4469)
Preserving frame pointers -- even inside leaf functions -- makes it easy to
capture the stack of a running program, without requiring any side tables or
metadata (like `.eh_frame` sections). Many sampling profilers and similar tools
walk frame pointers to capture stacks. Enabling this option will play nice with
those tools.
2022-07-20 08:02:21 -07:00
Damian Heaton
00ac18c866 Convert fadd..fmax_pseudo to ISLE (AArch64) (#4452)
Converted the existing implementations for the following Opcodes to ISLE on AArch64:
- `fadd`
- `fsub`
- `fmul`
- `fdiv`
- `fmin`
- `fmax`
- `fmin_pseudo`
- `fmax_pseudo`

Copyright (c) 2022 Arm Limited
2022-07-19 12:03:05 -07:00
Trevor Elliott
b519c975cb x64: Port fdemote and fvdemote to ISLE (#4449)
https://github.com/bytecodealliance/wasmtime/pull/4449
2022-07-18 14:26:23 -07:00
Ulrich Weigand
638dc4e0b3 s390x: Implement full SIMD support (#4427)
This adds full support for all Cranelift SIMD instructions
to the s390x target.  Everything is matched fully via ISLE.

In addition to adding support for many new instructions,
and the lower.isle code to match all SIMD IR patterns,
this patch also adds ABI support for vector types.
In particular, we now need to handle the fact that
vector registers 8 .. 15 are partially callee-saved,
i.e. the high parts of those registers (which correspond
to the old floating-poing registers) are callee-saved,
but the low parts are not.  This is the exact same situation
that we already have on AArch64, and so this patch uses the
same solution (the is_included_in_clobbers callback).

The bulk of the changes are platform-specific, but there are
a few exceptions:

- Added ISLE extractors for the Immediate and Constant types,
  to enable matching the vconst and swizzle instructions.

- Added a missing accessor for call_conv to ABISig.

- Fixed endian conversion for vector types in data_value.rs
  to enable their use in runtests on the big-endian platforms.

- Enabled (nearly) all SIMD runtests on s390x.  [ Two test cases
  remain disabled due to vector shift count semantics, see below. ]

- Enabled all Wasmtime SIMD tests on s390x.

There are three minor issues, called out via FIXMEs below,
which should be addressed in the future, but should not be
blockers to getting this patch merged.  I've opened the
following issues to track them:

- Vector shift count semantics
  https://github.com/bytecodealliance/wasmtime/issues/4424

- is_included_in_clobbers vs. link register
  https://github.com/bytecodealliance/wasmtime/issues/4425

- gen_constant callback
  https://github.com/bytecodealliance/wasmtime/issues/4426

All tests, including all newly enabled SIMD tests, pass
on both z14 and z15 architectures.
2022-07-18 14:00:48 -07:00
Damian Heaton
d792646677 Implement iabs in ISLE (AArch64) (#4399)
* Implement `iabs` in ISLE (AArch64)

Converts the existing implementation of `iabs` for AArch64 into ISLE,
and fixes support for `iabs` on scalar values.

Copyright (c) 2022 Arm Limited.

* Improve scalar `iabs` implementation.

Also introduces `CSNeg` instruction.

Copyright (c) 2022 Arm Limited
2022-07-18 11:12:34 -07:00
Damian Heaton
db7f9ccd2b Convert scalar_to_vector to ISLE (AArch64) (#4401)
* Convert `scalar_to_vector` to ISLE (AArch64)

Converted the exisiting implementation of `scalar_to_vector` for AArch64 to
ISLE.

Copyright (c) 2022 Arm Limited

* Add support for floats and fix FpuExtend

- Added rules to cover `f32 -> f32x4` and `f64 -> f64x2` for
`scalar_to_vector`
- Added tests for `scalar_to_vector` on floats.
- Corrected an invalid instruction emitted by `FpuExtend` on 64-bit
values.

Copyright (c) 2022 Arm Limited
2022-07-18 11:11:54 -07:00
Afonso Bordado
eca0a73453 cranelift: Use requested ISA Flags in run tests (#4450) 2022-07-15 12:09:07 -07:00
Afonso Bordado
80976b6fc7 cranelift: Add fadd/fsub/fmul/fdiv to interpreter (#4446)
Fuzzgen found these as soon as I added float support
2022-07-14 21:53:03 +00:00
Afonso Bordado
4ea46c3ca8 cranelift: Implement table_addr in interpreter (#4433) 2022-07-13 12:53:42 -07:00
Damian Heaton
6c70428735 Convert isplit / iconcat to ISLE (AArch64) (#4402)
Converted the existing implementations for `isplit` and `iconcat` for
AArch64 to ISLE.

Copyright (c) 2022 Arm Limited
2022-07-08 17:12:42 -07:00
Afonso Bordado
16cb287c53 cranelift: Use round_ties_even for nearest in interpreter (#4413)
As @MaxGraey pointed out (thanks!) in #4397, `round` has different
 behavior from `nearest`. And it looks like the native rust
 implementation is still pending stabilization.

 Right now we duplicate the wasmtime implementation, merged in #2171.

 However, we definitely should switch to the rust native version
 when it is available.
2022-07-07 16:36:43 -07:00
Sam Parker
9c43749dfe [RFC] Dynamic Vector Support (#4200)
Introduce a new concept in the IR that allows a producer to create
dynamic vector types. An IR function can now contain global value(s)
that represent a dynamic scaling factor, for a given fixed-width
vector type. A dynamic type is then created by 'multiplying' the
corresponding global value with a fixed-width type. These new types
can be used just like the existing types and the type system has a
set of hard-coded dynamic types, such as I32X4XN, which the user
defined types map onto. The dynamic types are also used explicitly
to create dynamic stack slots, which have no set size like their
existing counterparts. New IR instructions are added to access these
new stack entities.

Currently, during codegen, the dynamic scaling factor has to be
lowered to a constant so the dynamic slots do eventually have a
compile-time known size, as do spill slots.

The current lowering for aarch64 just targets Neon, using a dynamic
scale of 1.

Copyright (c) 2022, Arm Limited.
2022-07-07 12:54:39 -07:00
Afonso Bordado
e9727b9d4b aarch64: Fix i128 of/nof implementations (#4403)
@yuyang-ok reported via zulip that i128 overflow tests were:
1. different from the interpreter implementation
2. wrong on some of the test cases

This fixes both the tests and the aarch64 implementation and adds the
interpreter to the testsuite.
2022-07-07 11:00:58 -07:00
Afonso Bordado
f98076ae88 cranelift: Implement float rounding operations (#4397)
Implements the following operations on the interpreter:
* `ceil`
* `floor`
* `nearest`
* `trunc`
2022-07-06 16:43:54 -07:00
Afonso Bordado
9575ed4eb7 cranelift: Implement global_value in interpreter (#4396) 2022-07-06 15:53:52 -07:00
Afonso Bordado
f9e5f274e8 cranelift: Disable pinned_reg runtests (#4392)
We've had some ABI issues with pinned reg in runtests.

See #4376 for more info.
2022-07-06 15:52:59 -07:00
Afonso Bordado
0f603dd2c5 cranelift: Implement fmin_pseudo/fmax_pseudo in interpreter (#4394) 2022-07-06 14:54:29 -07:00
Afonso Bordado
925891245d cranelift: Fix fmin/fmax when dealing with zeroes (#4373)
`fmin`/`fmax` are defined as returning -0.0 as smaller than 0.0.
This is not how the IEEE754 views these values and the interpreter was
returning the wrong value in these operations since it was just using the
standard IEEE754 comparisons.

This also tries to preserve NaN information by avoiding passing NaN's
through any operation that could canonicalize it.
2022-07-05 12:59:23 -07:00
Sam Parker
d9e0e6a6a9 [AArch64] Port min/max to ISLE (#4374)
Copyright (c) 2022, Arm Limited.
2022-07-05 09:16:45 -07:00
Afonso Bordado
e91f493ff5 cranelift: Add heap support to the interpreter (#3302)
* cranelift: Add heaps to interpreter

* cranelift: Add RunTest Environment mechanism to  test interpret

* cranelift: Remove unused `MemoryError`

* cranelift: Add docs for `State::resolve_global_value`

* cranelift: Rename heap tests

* cranelift: Refactor heap address resolution

* Fix typos and clarify docs (thanks @cfallin)
2022-07-05 09:05:26 -07:00
Afonso Bordado
2003ae99a0 Implement fma/fabs/fneg/fcopysign on the interpreter (#4367)
* cranelift: Implement `fma` on interpreter

* cranelift: Implement `fabs` on interpreter

* cranelift: Fix `fneg` implementation on interpreter

`fneg` was implemented as `0 - x` which is not correct according to the
standard since that operation makes no guarantees on what the output
is when the input is `NaN`. However for `fneg` the output for `NaN`
inputs is fully defined.

* cranelift: Implement `fcopysign` on interpreter
2022-07-05 09:03:04 -07:00
wasmtime-publish
7c428bbd62 Bump Wasmtime to 0.40.0 (#4378)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-07-05 09:10:52 -05:00
Afonso Bordado
f2e6ff5e70 cranelift: Implement sqrt in interpreter (#4362)
This ignores SIMD for now.
2022-07-01 09:39:11 -07:00
Afonso Bordado
38ecd3744f aarch64: Implement bmask/bextend in ISLE (#4358)
* aarch64: Implement `bmask`/`bextend` in ISLE

* cranelift: Remove vector versions of `bextend`

* aarch64: Cleanup `bmask`/`bextend` documentation
2022-07-01 09:37:18 -07:00
Ulrich Weigand
ec83144c88 s390x: use full vector register file for FP operations (#4360)
This defines the full set of 32 128-bit vector registers on s390x.
(Note that the VRs overlap the existing FPRs.)  In addition, this
adds support to use all 32 vector registers to implement floating-
point operations, by using vector floating-point instructions with
the 'W' bit set to operate only on the first element.

This part of the vector instruction set mostly matches the old FP
instruction set, with two exceptions:

- There is no vector version of the COPY SIGN instruction.  Instead,
  now use a VECTOR SELECT with an appropriate bit mask to implement
  the fcopysign operation.

- There are no vector version of the float <-> int conversion
  instructions where source and target differ in bit size.  Use
  appropriate multiple conversion steps instead.  This also requires
  use of explicit checking to implement correct overflow handling.
  As a side effect, this version now also implements the i8 / i16
  variants of all conversions, which had been missing so far.

For all operations except those two above, we continue to use the
old FP instruction if applicable (i.e. if all operands happen to
have been allocated to the original FP register set), and use the
vector instruction otherwise.
2022-06-30 16:33:39 -07:00
bjorn3
d1446f767d Mark return value as define instead of clobber for TLS pseudoinstructions (#4357) 2022-06-30 10:44:51 -07:00