Commit Graph

1601 Commits

Author SHA1 Message Date
Trevor Elliott
06407dd337 Add a test to prevent x64 Inst size slipping further (#4489)
* Add a test to prevent x64 Inst size slipping further

* Enable the test for 64-bit pointers only
2022-07-21 00:01:33 +00:00
Alex Crichton
2127c3a369 Fix CI for main (#4486)
* Skip new `table_ops` test under emulation

When emulating we already have to disable most pooling-allocator related
tests so this commit carries over that logic to the new fuzz test which
may run some configurations with the pooling allocator depending on the
random input.

* Fix panics in s390x codegen related to aliases

This commit fixes an issue introduced as part of the fix for
GHSA-5fhj-g3p3-pq9g. The `reftyped_vregs` list given to `regalloc2` is
not allowed to have duplicates in it and while the list originally
doesn't have duplicates once aliases are applied the list may have
duplicates. The fix here is to perform another pass to remove duplicates
after the aliases have been processed.
2022-07-20 21:39:59 +00:00
Anton Kirilov
2ba4bce5cc Merge pull request from GHSA-7f6x-jwh5-m9r4
Copyright (c) 2022, Arm Limited.
2022-07-20 11:53:56 -05:00
Alex Crichton
2154c63de9 Merge pull request from GHSA-5fhj-g3p3-pq9g
* Improve cranelift disassembly of stack maps

Print out extra information about stack maps such as their contents and
other related metadata available. Additionally also print out addresses
in hex to line up with the disassembly otherwise printed as well.

* Improve the `table_ops` fuzzer

* Generate more instructions by default
* Fix negative indices appearing in `table.{get,set}`
* Assert that the traps generated are expected to prevent accidental
  other errors reporting a fuzzing success.

* Fix `reftype_vregs` reported to `regalloc2`

This fixes a mistake in the register allocation of Cranelift functions
where functions using reference-typed arguments incorrectly report which
virtual registers are reference-typed values if there are vreg aliases
in play. The fix here is to apply the vreg aliases to the final list of
reftyped regs which is eventually passed to `regalloc2`.

The main consequence of this fix is that functions which previously
accidentally didn't have correct stack maps should now have the missing
stack maps.

* Add a test that `table_ops` gc's eventually

* Add a comment about new alias resolution

* Update crates/fuzzing/src/oracles.rs

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add some comments

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-07-20 11:52:23 -05:00
Nick Fitzgerald
22d91a7c84 cranelift: Add a flag for preserving frame pointers (#4469)
Preserving frame pointers -- even inside leaf functions -- makes it easy to
capture the stack of a running program, without requiring any side tables or
metadata (like `.eh_frame` sections). Many sampling profilers and similar tools
walk frame pointers to capture stacks. Enabling this option will play nice with
those tools.
2022-07-20 08:02:21 -07:00
Jeffrey Charles
d55eb64b9e Enable generating debug symbols on AArch64 (#4468) 2022-07-19 19:12:07 +00:00
Damian Heaton
00ac18c866 Convert fadd..fmax_pseudo to ISLE (AArch64) (#4452)
Converted the existing implementations for the following Opcodes to ISLE on AArch64:
- `fadd`
- `fsub`
- `fmul`
- `fdiv`
- `fmin`
- `fmax`
- `fmin_pseudo`
- `fmax_pseudo`

Copyright (c) 2022 Arm Limited
2022-07-19 12:03:05 -07:00
Ulrich Weigand
b18c9bee15 s390x: Small refactoring of ABI code (#4465)
Remove a bit of duplicated code and eliminate some
unnecessary allocations.
2022-07-19 11:59:30 -07:00
Trevor Elliott
b519c975cb x64: Port fdemote and fvdemote to ISLE (#4449)
https://github.com/bytecodealliance/wasmtime/pull/4449
2022-07-18 14:26:23 -07:00
Ulrich Weigand
638dc4e0b3 s390x: Implement full SIMD support (#4427)
This adds full support for all Cranelift SIMD instructions
to the s390x target.  Everything is matched fully via ISLE.

In addition to adding support for many new instructions,
and the lower.isle code to match all SIMD IR patterns,
this patch also adds ABI support for vector types.
In particular, we now need to handle the fact that
vector registers 8 .. 15 are partially callee-saved,
i.e. the high parts of those registers (which correspond
to the old floating-poing registers) are callee-saved,
but the low parts are not.  This is the exact same situation
that we already have on AArch64, and so this patch uses the
same solution (the is_included_in_clobbers callback).

The bulk of the changes are platform-specific, but there are
a few exceptions:

- Added ISLE extractors for the Immediate and Constant types,
  to enable matching the vconst and swizzle instructions.

- Added a missing accessor for call_conv to ABISig.

- Fixed endian conversion for vector types in data_value.rs
  to enable their use in runtests on the big-endian platforms.

- Enabled (nearly) all SIMD runtests on s390x.  [ Two test cases
  remain disabled due to vector shift count semantics, see below. ]

- Enabled all Wasmtime SIMD tests on s390x.

There are three minor issues, called out via FIXMEs below,
which should be addressed in the future, but should not be
blockers to getting this patch merged.  I've opened the
following issues to track them:

- Vector shift count semantics
  https://github.com/bytecodealliance/wasmtime/issues/4424

- is_included_in_clobbers vs. link register
  https://github.com/bytecodealliance/wasmtime/issues/4425

- gen_constant callback
  https://github.com/bytecodealliance/wasmtime/issues/4426

All tests, including all newly enabled SIMD tests, pass
on both z14 and z15 architectures.
2022-07-18 14:00:48 -07:00
Sam Parker
e5678e8f8d [AArch64] Cleanup dynamic lowering (#4432)
Copyright (c) 2022, Arm Limited.
2022-07-18 11:13:16 -07:00
Damian Heaton
d792646677 Implement iabs in ISLE (AArch64) (#4399)
* Implement `iabs` in ISLE (AArch64)

Converts the existing implementation of `iabs` for AArch64 into ISLE,
and fixes support for `iabs` on scalar values.

Copyright (c) 2022 Arm Limited.

* Improve scalar `iabs` implementation.

Also introduces `CSNeg` instruction.

Copyright (c) 2022 Arm Limited
2022-07-18 11:12:34 -07:00
Damian Heaton
db7f9ccd2b Convert scalar_to_vector to ISLE (AArch64) (#4401)
* Convert `scalar_to_vector` to ISLE (AArch64)

Converted the exisiting implementation of `scalar_to_vector` for AArch64 to
ISLE.

Copyright (c) 2022 Arm Limited

* Add support for floats and fix FpuExtend

- Added rules to cover `f32 -> f32x4` and `f64 -> f64x2` for
`scalar_to_vector`
- Added tests for `scalar_to_vector` on floats.
- Corrected an invalid instruction emitted by `FpuExtend` on 64-bit
values.

Copyright (c) 2022 Arm Limited
2022-07-18 11:11:54 -07:00
Afonso Bordado
eca0a73453 cranelift: Use requested ISA Flags in run tests (#4450) 2022-07-15 12:09:07 -07:00
Afonso Bordado
80976b6fc7 cranelift: Add fadd/fsub/fmul/fdiv to interpreter (#4446)
Fuzzgen found these as soon as I added float support
2022-07-14 21:53:03 +00:00
Alex Crichton
18d9685eb3 Fix pretty print of atomic_rmw clif ops (#4441)
A comma was accidentally placed after the operation itself when no comma
is necessary (only after operands, not the operation itself).
2022-07-13 10:51:07 -05:00
Damian Heaton
6c70428735 Convert isplit / iconcat to ISLE (AArch64) (#4402)
Converted the existing implementations for `isplit` and `iconcat` for
AArch64 to ISLE.

Copyright (c) 2022 Arm Limited
2022-07-08 17:12:42 -07:00
Afonso Bordado
16cb287c53 cranelift: Use round_ties_even for nearest in interpreter (#4413)
As @MaxGraey pointed out (thanks!) in #4397, `round` has different
 behavior from `nearest`. And it looks like the native rust
 implementation is still pending stabilization.

 Right now we duplicate the wasmtime implementation, merged in #2171.

 However, we definitely should switch to the rust native version
 when it is available.
2022-07-07 16:36:43 -07:00
Sam Parker
9c43749dfe [RFC] Dynamic Vector Support (#4200)
Introduce a new concept in the IR that allows a producer to create
dynamic vector types. An IR function can now contain global value(s)
that represent a dynamic scaling factor, for a given fixed-width
vector type. A dynamic type is then created by 'multiplying' the
corresponding global value with a fixed-width type. These new types
can be used just like the existing types and the type system has a
set of hard-coded dynamic types, such as I32X4XN, which the user
defined types map onto. The dynamic types are also used explicitly
to create dynamic stack slots, which have no set size like their
existing counterparts. New IR instructions are added to access these
new stack entities.

Currently, during codegen, the dynamic scaling factor has to be
lowered to a constant so the dynamic slots do eventually have a
compile-time known size, as do spill slots.

The current lowering for aarch64 just targets Neon, using a dynamic
scale of 1.

Copyright (c) 2022, Arm Limited.
2022-07-07 12:54:39 -07:00
Afonso Bordado
e9727b9d4b aarch64: Fix i128 of/nof implementations (#4403)
@yuyang-ok reported via zulip that i128 overflow tests were:
1. different from the interpreter implementation
2. wrong on some of the test cases

This fixes both the tests and the aarch64 implementation and adds the
interpreter to the testsuite.
2022-07-07 11:00:58 -07:00
Damian Heaton
6a5fe20956 Convert swizzle to ISLE (AArch64) (#4400)
Converted the implementation of `swizzle` for AArch64 to ISLE.

Copyright (c) 2022 Arm Limited
2022-07-07 10:29:33 -07:00
Andrew Brown
8629cbc6a4 x64: port atomic_rmw to ISLE (#4389)
* x64: port `atomic_rmw` to ISLE

This change ports `atomic_rmw` to ISLE for the x64 backend. It does not
change the lowering in any way, though it seems possible that the fixed
regs need not be as fixed and that there are opportunities for single
instruction lowerings. It does rename `inst_common::AtomicRmwOp` to
`MachAtomicRmwOp` to disambiguate with the IR enum with the same name.

* x64: remove remaining hardcoded register constraints for `atomic_rmw`

* x64: use `SyntheticAmode` in `AtomicRmwSeq`

* review: add missing reg collector for amode

* review: collect memory registers in the 'late' phase
2022-07-06 23:58:59 +00:00
Afonso Bordado
f98076ae88 cranelift: Implement float rounding operations (#4397)
Implements the following operations on the interpreter:
* `ceil`
* `floor`
* `nearest`
* `trunc`
2022-07-06 16:43:54 -07:00
Sam Parker
e86e628061 ValueDataPacked getter and setter for Type (#4390)
Speeds compilation of the spidermonkey benchmark by ~0.5%.

Copyright (c) 2022, Arm Limited.
2022-07-06 09:40:03 -07:00
Chris Fallin
00f357c028 Cranelift: support 14-bit Type index with some bitpacking. (#4269)
* Cranelift: make `ir::Type` a `u16`.

* Cranelift: pack ValueData back into 64 bits.

After extending `Type` to a `u16`, `ValueData` became 12 bytes rather
than 8. This packs it back down to 8 bytes (64 bits) by stealing two
bits from the `Type` for the enum discriminant (leaving 14 bits for the
type itself).

Performance comparison (3-way between original (`ty-u8`), 16-bit `Type`
(`ty-u16`), and this PR (`ty-packed`)):

```
~/work/sightglass% target/release/sightglass-cli benchmark \
    -e ~/ty-u8.so -e ~/ty-u16.so -e ~/ty-packed.so \
    --iterations-per-process 10 --processes 2 \
    benchmarks-next/spidermonkey/benchmark.wasm

compilation
  benchmarks-next/spidermonkey/benchmark.wasm
    cycles
      [20654406874 21749213920.50 22958520306] /home/cfallin/ty-packed.so
      [22227738316 22584704883.90 22916433748] /home/cfallin/ty-u16.so
      [20659150490 21598675968.60 22588108428] /home/cfallin/ty-u8.so
    nanoseconds
      [5435333269 5723139427.25 6041072883] /home/cfallin/ty-packed.so
      [5848788229 5942729637.85 6030030341] /home/cfallin/ty-u16.so
      [5436002390 5683248226.10 5943626225] /home/cfallin/ty-u8.so
```

So, when compiling SpiderMonkey.wasm, making `Type` 16 bits regresses
performance by 4.5% (5.683s -> 5.723s), while this PR gets 14 bits for a 1.0%
cost (5.683s -> 5.723s). That's still not great, and we can likely do better,
but it's a start.

* Fix test failure: entities to/from u32 via `{from,to}_bits`, not `{from,to}_u32`.
2022-07-05 14:51:02 -07:00
Afonso Bordado
925891245d cranelift: Fix fmin/fmax when dealing with zeroes (#4373)
`fmin`/`fmax` are defined as returning -0.0 as smaller than 0.0.
This is not how the IEEE754 views these values and the interpreter was
returning the wrong value in these operations since it was just using the
standard IEEE754 comparisons.

This also tries to preserve NaN information by avoiding passing NaN's
through any operation that could canonicalize it.
2022-07-05 12:59:23 -07:00
Sam Parker
d9e0e6a6a9 [AArch64] Port min/max to ISLE (#4374)
Copyright (c) 2022, Arm Limited.
2022-07-05 09:16:45 -07:00
Afonso Bordado
2003ae99a0 Implement fma/fabs/fneg/fcopysign on the interpreter (#4367)
* cranelift: Implement `fma` on interpreter

* cranelift: Implement `fabs` on interpreter

* cranelift: Fix `fneg` implementation on interpreter

`fneg` was implemented as `0 - x` which is not correct according to the
standard since that operation makes no guarantees on what the output
is when the input is `NaN`. However for `fneg` the output for `NaN`
inputs is fully defined.

* cranelift: Implement `fcopysign` on interpreter
2022-07-05 09:03:04 -07:00
Afonso Bordado
f2e6ff5e70 cranelift: Implement sqrt in interpreter (#4362)
This ignores SIMD for now.
2022-07-01 09:39:11 -07:00
Afonso Bordado
38ecd3744f aarch64: Implement bmask/bextend in ISLE (#4358)
* aarch64: Implement `bmask`/`bextend` in ISLE

* cranelift: Remove vector versions of `bextend`

* aarch64: Cleanup `bmask`/`bextend` documentation
2022-07-01 09:37:18 -07:00
Ulrich Weigand
ec83144c88 s390x: use full vector register file for FP operations (#4360)
This defines the full set of 32 128-bit vector registers on s390x.
(Note that the VRs overlap the existing FPRs.)  In addition, this
adds support to use all 32 vector registers to implement floating-
point operations, by using vector floating-point instructions with
the 'W' bit set to operate only on the first element.

This part of the vector instruction set mostly matches the old FP
instruction set, with two exceptions:

- There is no vector version of the COPY SIGN instruction.  Instead,
  now use a VECTOR SELECT with an appropriate bit mask to implement
  the fcopysign operation.

- There are no vector version of the float <-> int conversion
  instructions where source and target differ in bit size.  Use
  appropriate multiple conversion steps instead.  This also requires
  use of explicit checking to implement correct overflow handling.
  As a side effect, this version now also implements the i8 / i16
  variants of all conversions, which had been missing so far.

For all operations except those two above, we continue to use the
old FP instruction if applicable (i.e. if all operands happen to
have been allocated to the original FP register set), and use the
vector instruction otherwise.
2022-06-30 16:33:39 -07:00
Sam Parker
a2d49ebf27 Use u32 in Type API (#4280)
Move from passing and returning u8 and u16 values to u32 in many of
the functions. This removes a number of type conversions and gives
a small compilation time speedup, around ~0.7% on my aarch64 machine.

Copyright (c) 2022, Arm Limited.
2022-06-30 12:43:36 -07:00
Ulrich Weigand
95836ba114 s390x: clean up lower.rs (#4355)
Now that lowering is fully done in ISLE, clean up some code remnants
in lower.rs.  In particular, move code to lower/isle.rs where
possible, and inline lower_insn_to_regs into its caller and simplify.
2022-06-30 11:16:59 -07:00
Afonso Bordado
919604b8c5 aarch64: Implement ireduce/breduce in ISLE (#4331)
* aarch64: Implement `ireduce`/`breduce` in ISLE

* cranelift: Remove vector versions of `breduce`/`ireduce`
2022-06-30 11:15:47 -07:00
bjorn3
d1446f767d Mark return value as define instead of clobber for TLS pseudoinstructions (#4357) 2022-06-30 10:44:51 -07:00
Ulrich Weigand
7a9479f77c ISLE: Migrate call and return instructions (#3785)
This adds infrastructure to allow implementing call and return
instructions in ISLE, and migrates the s390x back-end.

To implement ABI details, this patch creates public accessors
for `ABISig` and makes them accessible in ISLE.  All actual
code generation is then done in ISLE rules, following the
information provided by that signature.

[ Note that the s390x back end never requires multiple slots for
a single argument - the infrastructure to handle this should
already be present, however. ]

To implement loops in ISLE rules, this patch uses regular tail
recursion, employing a `Range` data structure holding a range
of integers to be looped over.
2022-06-29 14:22:50 -07:00
Sam Parker
fb61774df2 [AArch64] Port AtomicLoad and AtomicStore to ISLE (#4301)
Copyright (c) 2022, Arm Limited.
2022-06-29 12:12:48 -07:00
Chris Fallin
2034c8aa45 Cranelift: add a config option for alias analysis and redundant-load elimination. (#4349)
This allows for experiments as in here [1] and also generally gives an
option to anyone who is concerned that the extra optimization may be
counterproductive or take too much time. The optimization remains
enabled by default.

[1]
https://github.com/bytecodealliance/wasmtime/pull/4163#issuecomment-1169303683
2022-06-28 15:25:47 -07:00
Chris Fallin
b2e28b917a Cranelift: update to latest regalloc2: (#4324)
- Handle call instructions' clobbers with the clobbers API, using RA2's
  clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers
  list;

- Pull in changes from bytecodealliance/regalloc2#59 for much more sane
  edge-case behavior w.r.t. liverange splitting.
2022-06-28 09:01:59 -07:00
Afonso Bordado
42d4f97b78 cranelift: Fix cls for small types on aarch64 (#4305)
The previous `cls` code was producing wrong results when fed with a -1 i8.

The fix here is to sign extend instead of zero extending since we want
to keep the sign bit as one in order for it to be counted correctly
in the cls instruction

This also merges the interpreter only tests now that aarch64
correctly supports this instruction
2022-06-27 15:55:02 -07:00
Afonso Bordado
aef53784ec aarch64: Implement bint in ISLE (#4319) 2022-06-27 15:50:46 -07:00
Chris Fallin
5c2c285dd7 Cranelift/x64: fix register allocator metadata for 8-bit divides. (#4332)
`idiv` on x86-64 only reads `rdx`/`edx`/`dx`/`dl` for divides with width
greater than 8 bits; for an 8-bit divide, it reads the whole 16-bit
divisor from `ax`, as our CISC ancestors intended. This PR fixes the
metadata to avoid a regalloc panic (due to undefined `rdx`) in this
case. Does not affect Wasmtime or other Wasm-frontend embedders.
2022-06-27 12:31:06 -07:00
Alex Crichton
dc2fe0ac67 x64: Fix codegen for the i8x16.swizzle instruction (#4318)
This commit fixes a mistake in the `Swizzle` opcode implementation in
the x64 backend of Cranelift. Previously an input register was casted to
a writable register and then modified, which I believe instructions are
not supposed to do. This was discovered as part of my investigation
into #4315.
2022-06-27 13:20:31 -05:00
Alex Crichton
8bb07523e2 x64: Fix codegen for the select instruction with v128 (#4317)
This commit fixes a bug in the previous codegen for the `select`
instruction when the operations of the `select` were of the `v128` type.
Previously teh `XmmCmove` instruction only stored an `OperandSize` of 32
or 64 for a 64 or 32-bit move, but this was also used for these 128-bit
types which meant that when used the wrong move instruction was
generated. The fix applied here is to store the whole `Type` being moved
so the 128-bit variant can be selected as well.
2022-06-27 11:02:40 -07:00
Anton Kirilov
25a588c35f Cranelift AArch64: Use an allocated encoding for Udf (#4281)
Preserve the current behaviour when code is generated for SpiderMonkey.

Copyright (c) 2022, Arm Limited.
2022-06-22 15:03:28 +01:00
Trevor Elliott
7e0bb465d0 X64: port the rest of icmp to ISLE (#4254)
Finish migrating icmp to ISLE for x64
2022-06-13 16:34:11 -07:00
Benjamin Bouvier
43d4f0b93b Serialize BlockNode's cold field too when serializing a Layout (#4265)
This fixes a bug when the `cold` field would not be serialized, since
we're using a custom (de)serializer for `Layout`. This is now properly
handled by adding a boolean in the serialized stream.

This was caught during the work on #4155, as this would result in cache
mismatches between a function and itself.
2022-06-13 12:04:37 -07:00
Anton Kirilov
c15c3061ca CFI improvements to the AArch64 fiber implementation (#4195)
Now the fiber implementation on AArch64 authenticates function
return addresses and includes the relevant BTI instructions, except
on macOS.

Also, change the locations of the saved FP and LR registers on the
fiber stack to make them compliant with the Procedure Call Standard
for the Arm 64-bit Architecture.

Copyright (c) 2022, Arm Limited.
2022-06-09 09:17:12 -05:00
Chris Fallin
ed9db962de x64 backend: fix cmpxchg (don't return RealReg as result). (#4243)
The current lowering helper for `cmpxchg` returns the literal RealReg
`rax` as its result. However, this breaks a number of invariants, and
eventually causes a regalloc panic if used as a blockparam arg (pinned
vregs cannot be used in this way).

In general we have to return regular vregs, not a RealReg, as results of
instructions during lowering. However #4223 added a helper for
`x64_cmpxchg` that returns a literal `rax`.

Fortunately we can do the right thing here by just giving a fresh vreg
to the instruction; the regalloc constraints mean that this vreg is
constrained to `rax` at the instruction (at its def/late point), so the
generator of the instruction need not worry about `rax` here.
2022-06-08 06:13:31 -07:00
Trevor Elliott
bc3c4fa206 X64: port fvpromote to ISLE (#4242) 2022-06-07 17:18:23 -07:00