Commit Graph

1156 Commits

Author SHA1 Message Date
uint256_t
b00455135e Cranelift: Implement 'iabs' for scalar types on x86_64 (#5527)
* Implement 'iabs' for scalar types on x86_64

* Small fix
2023-01-05 21:33:12 -08:00
Trevor Elliott
36e5bdfd0e Fuzz multiple targets in cranelift-icache (#5482)
Fuzz additional targets in the cranelift-icache target. The list of targets fuzzed is controlled by the targets enabled in fuzz/Cargo.toml.

This PR also reworks how instruction disabling is done in function generator, moving the deny-list to a function to make the decision at runtime instead of compile time.
2023-01-05 18:49:23 +00:00
Trevor Elliott
e2e98f694f Remove lower_br_fcmp from the riscv64 backend (#5519)
Remove the lower_br_fcmp function from the riscv64 backend. This PR only affects the emit implementation for FloatRound, replacing the uses of lower_br_fcmp with direct uses of FpuRRR and CondBr.

Any changes in behavior here should be already covered by the runtests for ceil, floor, trunc, and nearest.
2023-01-04 14:22:35 -08:00
Trevor Elliott
5d429e46e8 Remove the MInst::TrapFf constructor from the riscv64 backend (#5515)
Remove the MInst::TrapFf instruction in the riscv64 backend. It was only used in two places in the emit case for FloatRound, and was easily replaced with a combination of FpuRRR and TrapIf.
2023-01-04 13:34:46 -08:00
Trevor Elliott
b2d5afdf83 riscv64: Implement fcmp in ISLE (#5512)
Rework the compilation of fcmp in the riscv64 backend to be in ISLE, removing the need for the dedicated Fcmp instruction. This change is motivated by #5500, which showed that the riscv64 backend was generating branch instructions in the middle of a basic block.

We can't remove lower_br_fcmp quite yet as it's used in a few places in the emit module, but it's now no longer reachable from the ISLE lowerings.

Fixes #5500
2023-01-04 11:52:00 -08:00
Afonso Bordado
52ba72f341 riscv64: Fix masking on iabs (#5505)
* cranelift: Add `iabs.i128` runtest

* riscv64: Fix incorrect extension in iabs

When lowering iabs, we were accidentally comparing the unextended value
this caused the instruction to misbehave with certain top bits.

This commit also adds a zbb lowering that does not use jumps.
2023-01-03 17:37:25 -08:00
Afonso Bordado
7e94704264 riscv64: Add masking for small types when lowering select (#5504)
When lowering `select+icmp` we have an optimization that allows us to
avoid materializing the icmp result.

We were accidentally not masking the high bits for i8 and i16 in this case.

Issue #5498 reported this as an illegal instruction but what was happening
there was that the invalid select caused a division by zero.
2023-01-03 19:59:14 +00:00
Afonso Bordado
c9c7d4991c riscv64: Fix br-table segfault with zero sized jump tables (#5508)
We had a off-by-one bounds check error when checking if we should
jump to the default block in a br-table. Instead of always jumping
to the default block when we have a jump table with 0 targets we
would try to compute an offset past the end of the table.

This sometimes would not crash, but it would crash if the there was
no block after the br_table, thus adding a cold block would cause a
segfault.

The actual fix is quite simple, do not count the default block
as a jump table entry when computing the limits.

This commit also does a bunch of cleanup and adding some comments
to the br_table emission code.
2023-01-03 10:22:48 -08:00
KarelPeeters
320d67fe8d Cranelift: include return values in instruction pretty print output. (#5489) 2023-01-03 09:06:47 -08:00
Mrmaxmeier
fe992c2627 Cranelift: aarch64: lower umin.i64 and friends (#5495)
* Cranelift: aarch64: lower umin.i64 and friends

* fuzzgen: Enable integer-min/max for aarch64
2022-12-29 18:03:31 -08:00
Chris Fallin
03463458e4 Cranelift: fix branch-of-icmp/fcmp regression: look through uextend. (#5487)
In #5031, we removed `bool` types from CLIF, using integers instead for
"truthy" values. This greatly simplified the IR, and was generally an
improvement.

However, because x86's `SETcc` instruction sets only the low 8 bits of a
register, we chose to use `i8` types as the result of `icmp` and `fcmp`,
to avoid the need for a masking operation when materializing the result.

Unfortunately this means that uses of truthy values often now have
`uextend` operations, especially when coming from Wasm (where truthy
values are naturally `i32`-typed). For example, where we previously had
`(brz (icmp ...))`, we now have `(brz (uextend (icmp ...)))`.

It's arguable whether or not we should switch to `i32` truthy values --
in most cases we can avoid materializing a value that's immediately used
for a branch or select, so a mask would in most cases be unnecessary,
and it would be a win at the IR level -- but irrespective of that, this
change *did* regress our generated code quality: our backends had
patterns for e.g. `(brz (icmp ...))` but not with the `uextend`, so we
were *always* materializing truthy values. Many blocks thus ended with
"cmp; setcc; cmp; test; branch" rather than "cmp; branch".

In #5391 we noticed this and fixed it on x64, but it was a general
problem on aarch64 and riscv64 as well. This PR introduces a
`maybe_uextend` extractor that "looks through" uextends, and uses it
where we consume truthy values, thus fixing the regression.  This PR
also adds compile filetests to ensure we don't regress again.

The riscv64 backend has not been updated here because doing so appears
to trigger another issue in its branch handling; fixing that is TBD.
2022-12-22 01:43:44 -08:00
Trevor Elliott
fac4a915a3 Assert that we only use virtual registers with moves (#5440)
Assert that we never see real registers as arguments to move instructions in VCodeBuilder::collect_operands.

Also fix a bug in the riscv64 backend that was discovered by these assertions: the lowerings of get_stack_pointer and get_frame_pointer were using physical registers 8 and 2 directly. The solution was similar to other backends: add a move instruction specifically for moving out of physical registers, whose source operand is opaque to regalloc2.
2022-12-20 18:22:47 -08:00
Saúl Cabrera
962a911163 cranelift-codegen: Add support for immediate to memory moves in x64 (#5461)
This change adds support for immediate to memory moves in x64 which
are needed by Winch for zeroing local slots.

This change follows the guideline in `isa/x64/inst/emit` and uses
other instructions (immediate to register moves) as a base for the
test cases.

The instruction encoding expectation was derived by assembling each
instruction and inspecting the assembly with `objdump`.
2022-12-19 21:54:45 +00:00
Chris Fallin
22439f7b39 support select_spectre_guard and select on i128 conditions on all platforms. (#5460)
Fixes #5199.
Fixes #5200.
Fixes #5452.
Fixes #5453.

On riscv64, there is apparently an autoconversion from `ValueRegs` to
`Reg` that takes just the low register [0], and removing this conversion
causes 48 errors. As a result of this, `select` with an `i128` condition
was silently miscompiling, testing only the low 64 bits. We should
remove this autoconversion to ensure we aren't missing any other silent
truncations, but for now this PR just adds the explicit `I128` logic for
`select` / `select_spectre_guard`.

[0]
d9fdbfd50e/cranelift/codegen/src/isa/riscv64/inst.isle (L1762)
2022-12-16 14:18:22 -08:00
Trevor Elliott
9dc4f1a83c s390x: Move the value out of the casloop_val_reg with mov_preg (#5430)
The casloop_emit function in the s390x backend was using the fixed non-allocatable register %r0 directly with move instructions, which produced a panic in the regalloc2 checker (#5425). This PR changes the casloop_result function to use mov_preg instead of copy_reg to fetch the result, as it's not viewed by regalloc2 as a move.

Fixes #5425
2022-12-14 13:06:35 -08:00
Ulrich Weigand
f0af622208 Simplify LowerBackend interface (#5432)
* Refactor lower_branch to have Unit result

Branches cannot have any output, so it is more straightforward
to have the ISLE term return Unit instead of InstOutput.

Also provide a new `emit_side_effect` term to simplify
implementation of `lower_branch` rules with Unit result.

* Simplify LowerBackend interface

Move all remaining asserts from the LowerBackend::lower and
::lower_branch_group into the common call site.

Change return value of ::lower to Option<InstOutput>, and
return value of ::lower_branch_group to Option<()> to match
ISLE term signature.

Only pass the first branch into ::lower_branch_group and
rename it to ::lower_branch.

As a result of all those changes, LowerBackend routines
now consists solely to calls to the corresponding ISLE
routines.
2022-12-14 00:48:25 +00:00
Ulrich Weigand
299be327d5 Simplify "unimplemented" operation error message (#5429)
Now that all operations are implemented in ISLE, simplify Rust
code by providing a generic error message if any operation is
not implemented in ISLE.  Done across all targets.
2022-12-13 15:22:49 -08:00
Jamey Sharp
eba6b76511 aarch64: Use unsigned constants where appropriate (#5423)
The Rust type expected in these locations is unsigned, but these
constants are negative. ISLE currently emits a Rust expression with
extra type conversions in order to make this work as intended.

However, across all backends, only these three aarch64 constants use
this particular "feature" of ISLE, and I want to make it go away.
2022-12-13 23:08:40 +00:00
Ulrich Weigand
df923f18ca Remove MachInst::gen_constant (#5427)
* aarch64: constant generation cleanup

Add support for MOVZ and MOVN generation via ISLE.
Handle f32const, f64const, and nop instructions via ISLE.
No longer call Inst::gen_constant from lower.rs.

* riscv64: constant generation cleanup

Handle f32const, f64const, and nop instructions via ISLE.

* s390x: constant generation cleanup

Fix rule priorities for "imm" term.
Only handle 32-bit stack offsets; no longer use load_constant64.

* x64: constant generation cleanup

No longer call Inst::gen_constant from lower.rs or abi.rs.

* Refactor LowerBackend::lower to return InstOutput

No longer write to the per-insn output registers; instead, return
an InstOutput vector of temp registers holding the outputs.

This will allow calling LowerBackend::lower multiple times for
the same instruction, e.g. to rematerialize constants.

When emitting the primary copy of the instruction during lowering,
writing to the per-insn registers is now done in lower_clif_block.

As a result, the ISLE lower_common routine is no longer needed.
In addition, the InsnOutput type and all code related to it
can be removed as well.

* Refactor IsleContext to hold a LowerBackend reference

Remove the "triple", "flags", and "isa_flags" fields that are
copied from LowerBackend to each IsleContext, and instead just
hold a reference to LowerBackend in IsleContext.

This will allow calling LowerBackend::lower from within callbacks
in src/machinst/isle.rs, e.g. to rematerialize constants.

To avoid having to pass LowerBackend references through multiple
functions, eliminate the lower_insn_to_regs subroutines in those
targets that still have them, and just inline into the main
lower routine.  This also eliminates lower_inst.rs on aarch64
and riscv64.

Replace all accesses to the removed IsleContext fields by going
through the LowerBackend reference.

* Remove MachInst::gen_constant

This addresses the problem described in issue
https://github.com/bytecodealliance/wasmtime/issues/4426
that targets currently have to duplicate code to emit
constants between the ISLE logic and the gen_constant
callback.

After the various cleanups in earlier patches in this series,
the only remaining user of get_constant is put_value_in_regs
in Lower.  This can now be removed, and instead constant
rematerialization can be performed in the put_in_regs ISLE
callback by simply directly calling LowerBackend::lower
on the instruction defining the constant (using a different
output register).

Since the check for egraph mode is now no longer performed in
put_value_in_regs, the Lower::flags member becomes obsolete.

Care needs to be taken that other calls directly to the
Lower::put_value_in_regs routine now handle the fact that
no more rematerialization is performed.  All such calls in
target code already historically handle constants themselves.
The remaining call site in the ISLE gen_call_common helper
can be redirected to the ISLE put_in_regs callback.

The existing target implementations of gen_constant are then
unused and can be removed.  (In some target there may still
be further opportunities to remove duplication between ISLE
and some local Rust code - this can be left to future patches.)
2022-12-13 13:00:04 -08:00
Chris Fallin
92ce79366c riscv64: remove valueregs_2_reg extractor. (#5426)
This extractor had a side-effect of invoking `put_in_regs`, which is not
supposed to be invoked until the pattern-matching commits to evaluating
a rule right-hand side (i.e., cannot backtrack). In this case the
side-effect was mostly benign (in theory it could have caused additional
values to be computed needlessly), but in general we should be careful
to keep side-effects out of the left-hand side to enable further
optimizations and work on islec.

The implicit conversion from `Value` to `Reg` turns out to be enough to
make the rules in question work, so we can simply remove the use of the
extractor in this case.
2022-12-13 11:47:20 -08:00
Trevor Elliott
a5ecb5e647 x64: Share a zero in the ushr translation on x64 to free up a register (#5424)
Share a zero value in the translation of ushr for i128. This increases the lifetime of the value by a few instructions, and reduces the number of registers used in the translation by one, which seems like an acceptable trade-off.
2022-12-12 18:15:43 -08:00
Chris Fallin
9397ea1abe Cranelift: implement general select_spectre_guard fallbacks. (#5420)
When adding some optimization rules for `icmp` in the egraph
infrastructure, we ended up creating a path to legal CLIF but with
patterns unsupported by three of our four backends: specifically,
`select_spectre_guard` with a general truthy input, rather than an
`icmp`.

In #5206 we discussed replacing `select_spectre_guard` with something
more specific, and that could still be a long-term solution here, but
doing so now would interfere with ongoing refactoring of heap access
lowering, so I've opted not to do so. (In that issue I was concerned
about complexity and didn't see the need but with this fuzzbug I'm
starting to feel a bit differently; maybe we should remove this
non-orthogonal op in the long run.)

Fixes #5417.
2022-12-12 17:13:34 -08:00
Saúl Cabrera
7adf3cacc5 cranelift-codegen: Prepare cranelift codegen for usage from Winch (#5413)
This commit prepares the x64 pieces from cranelift codegen to be consumed by
Winch for binary emission. This change doesn't introduce or modifies
functionality it makes the necessary pieces for binary emission public.

This change also improves documentation where applicable.
2022-12-12 09:01:06 -08:00
Timothy Chen
122872fb0c Remove references for sig (#5414) 2022-12-12 08:46:23 -08:00
Timothy Chen
8035945502 Reduce sig data size by changing sized spaces (#5402)
* Reduce sig sizes

* Fix test

* Change compute_args_loc to return u32
2022-12-11 15:32:30 -08:00
Ulrich Weigand
e913cf3647 Remove IFLAGS/FFLAGS types (#5406)
All instructions using the CPU flags types (IFLAGS/FFLAGS) were already
removed.  This patch completes the cleanup by removing all remaining
instructions that define values of CPU flags types, as well as the
types themselves.

Specifically, the following features are removed:
- The IFLAGS and FFLAGS types and the SpecialType category.
- Special handling of IFLAGS and FFLAGS in machinst/isle.rs and
  machinst/lower.rs.
- The ifcmp, ifcmp_imm, ffcmp, iadd_ifcin, iadd_ifcout, iadd_ifcarry,
  isub_ifbin, isub_ifbout, and isub_ifborrow instructions.
- The writes_cpu_flags instruction property.
- The flags verifier pass.
- Flags handling in the interpreter.

All of these features are currently unused; no functional change
intended by this patch.

This addresses https://github.com/bytecodealliance/wasmtime/issues/3249.
2022-12-09 13:42:03 -08:00
Jamey Sharp
8726eeefb3 cranelift-isle: Add "partial" flag for constructors (#5392)
* cranelift-isle: Add "partial" flag for constructors

Instead of tying fallibility of constructors to whether they're either
internal or pure, this commit assumes all constructors are infallible
unless tagged otherwise with a "partial" flag.

Internal constructors without the "partial" flag are not allowed to use
constructors which have the "partial" flag on the right-hand side of any
rules, because they have no way to report last-minute match failures.

Multi-constructors should never be "partial"; they report match failures
with an empty iterator instead. In turn this means you can't use partial
constructors on the right-hand side of internal multi-constructor rules.
However, you can use the same constructors on the left-hand side with
`if` or `if-let` instead.

In many cases, ISLE can already trivially prove that an internal
constructor always returns `Some`. With this commit, those cases are
largely unchanged, except for removing all the `Option`s and `Some`s
from the generated code for those terms.

However, for internal non-partial constructors where ISLE could not
prove that, it now emits an `unreachable!` panic as the last-resort,
instead of returning `None` like it used to do. Among the existing
backends, here's how many constructors have these panic cases:

- x64: 14% (53/374)
- aarch64: 15% (41/277)
- riscv64: 23% (26/114)
- s390x: 47% (268/567)

It's often possible to rewrite rules so that ISLE can tell the panic can
never be hit. Just ensure that there's a lowest-priority rule which has
no constraints on the left-hand side.

But in many of these constructors, it's difficult to statically prove
the unhandled cases are unreachable because that's only down to
knowledge about how they're called or other preconditions.

So this commit does not try to enforce that all terms have a last-resort
fallback rule.

* Check term flags while translating expressions

Instead of doing it in a separate pass afterward.

This involved threading all the term flags (pure, multi, partial)
through the recursive `translate_expr` calls, so I extracted the flags
to a new struct so they can all be passed together.

* Validate multi-term usage

Now that I've threaded the flags through `translate_expr`, it's easy to
check this case too, so let's just do it.

* Extract `ReturnKind` to use in `ExternalSig`

There are only three legal states for the combination of `multi` and
`infallible`, so replace those fields of `ExternalSig` with a
three-state enum.

* Remove `Option` wrapper from multi-extractors too

If we'd had any external multi-constructors this would correct their
signatures as well.

* Update ISLE tests

* Tag prelude constructors as pure where appropriate

I believe the only reason these weren't marked `pure` before was because
that would have implied that they're also partial. Now that those two
states are specified separately we apply this flag more places.

* Fix my changes to aarch64 `lower_bmask` and `imm` terms
2022-12-07 17:16:03 -08:00
Chris Fallin
8c55b81300 Optimizations to egraph framework (#5391)
* Optimizations to egraph framework:

- Save elaborated results by canonical value, not latest value (union
  value). Previously we were artificially skipping and re-elaborating
  some values we already had because we were not finding them in the
  map.

- Make some changes to handling of icmp results: when icmp became
  I8-typed (when bools went away), many uses became `(uextend $I32 (icmp
  $I8 ...))`, and so patterns in lowering backends were no longer
  matching.

  This PR includes an x64-specific change to match `(brz (uextend (icmp
  ...)))` and similarly for `brnz`, but it also takes advantage of the
  ability to write rules easily in the egraph mid-end to rewrite selects
  with icmp inputs appropriately.

- Extend constprop to understand selects in the egraph mid-end.

With these changes, bz2.wasm sees a ~1% speedup, and spidermonkey.wasm
with a fib.js input sees a 16.8% speedup:

```
$ time taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./spidermonkey.base.cwasm ./fib.js
1346269
taskset 1 target/release/wasmtime run --allow-precompiled --dir=.  ./fib.js  2.14s user 0.01s system 99% cpu 2.148 total
$ time taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./spidermonkey.egraphs.cwasm ./fib.js
1346269
taskset 1 target/release/wasmtime run --allow-precompiled --dir=.  ./fib.js  1.78s user 0.01s system 99% cpu 1.788 total
```

* Review feedback.
2022-12-07 13:23:13 -08:00
Trevor Elliott
c5379051c4 Enable the ssa verifier in debug builds (#5354)
Enable regalloc2's SSA verifier in debug builds to check for any outstanding reuse of virtual registers in def constraints. As fuzzing enables debug_assertions, this will enable the SSA verifier when fuzzing as well.
2022-12-07 12:22:51 -08:00
Nick Fitzgerald
f0c4b6f3a1 Cranelift: Implement iadd_cout on x64 for 32- and 64-bit integers (#5285)
* Split the `iadd_cout` runtests by type

* Implement `iadd_cout` for 32- and 64-bit values on x64

* Delete trailing whitespace in `riscv/lower.isle`
2022-12-07 19:54:14 +00:00
Jamey Sharp
29b23d41b6 ISLE rule cleanups (#5389)
* cranelift-codegen: Use ISLE matching, not same_value

The `same_value` function just wrapped an equality test into an external
constructor, but we can do that with ISLE's equality constraints
instead.

* riscv64: Remove custom condition-code tests

The `lower_icmp` term exists solely to decide whether to sign-extend or
zero-extend the comparison operands, based on whether the condition code
requires a signed comparison. It additionally tested whether the
condition code was == or !=, but produced the same result as for other
unsigned comparisons.

We already have `signed_cond_code` in the ISLE prelude, which classifies
the total-ordering condition codes according to whether they're signed.
It also lumps == and != in the "unsigned" camp, as desired.

So this commit uses the existing method from the prelude instead of
riscv64-local definitions.

Because this version has no constraints on the left-hand side of the
rule in the unsigned case, ISLE generates Rust that always returns
`Some`. That shows that the current use of `unwrap` is justified, at the
only Rust-side call-site of `constructor_lower_icmp`, which is in
cranelift/codegen/src/isa/riscv64/lower/isle.rs.

* ISLE prelude: make offset32 infallible

This extractor always returns `Some`, so it doesn't need to be fallible.
2022-12-07 02:55:59 +00:00
Trevor Elliott
293bb5b334 riscv64: Only emit jumps at the end of basic blocks (#5381)
This PR fixes two bugs in the riscv64 backend, where branch instructions were emitted in the middle of a basic block:

Constant emission, where the constants are inlined into the vcode and are jumped over at runtime,
The BrTableCheck pseudo-instruction, which was always emitted before a BrTable instruction, and would handle jumping to the default label.
The first bug was resolved by introducing two new psuedo instructions, LoadConst32 and LoadConst64. Both of these instructions serve to delay the original encoding to emission time, after regalloc2 has run.

The second bug was fixed by removing the BrTableCheck instruction. As it was always emitted directly before BrTable, it was easier to remove it and merge the two into a single instruction.
2022-12-06 10:54:10 -08:00
Trevor Elliott
353a681671 Avoid reusing a register during constant loading (#5379)
Avoid reusing a register when loading a constant, allocating a temporary instead.
2022-12-05 18:37:53 -08:00
Trevor Elliott
7d28d586da riscv64: Don't reuse registers when loading constants (#5376)
Rework the constant loading functions in the riscv64 backend to generate fresh temporaries instead of reusing the destination register.
2022-12-05 16:51:52 -08:00
Trevor Elliott
817c2b205c riscv64: Use a temporary when translating shift amount (#5375)
Use a temporary when translating the shift amount, instead of reusing the destination register.
2022-12-05 20:54:14 +00:00
Trevor Elliott
b475b9bd19 Terminate blocks with a single branch in riscv64 (#5374)
Ensure that we're terminating blocks with a single branch instruction, when testing I128 values against zero.
2022-12-05 20:13:28 +00:00
Trevor Elliott
6aea8e0d7e Don't reuse destination registers when lowering splat on aarch64 (#5370) 2022-12-05 08:18:49 -08:00
Trevor Elliott
2e9b0802ab aarch64: Rework amode compilation to produce SSA code (#5369)
Rework the compilation of amodes in the aarch64 backend to stop reusing registers and instead generate fresh virtual registers for intermediates. This resolves some SSA checker violations with the aarch64 backend, and as a nice side-effect removes some unnecessary movs in the generated code.
2022-12-02 01:23:15 +00:00
Trevor Elliott
d54a27d0ea Allocate temporary intermediates when loading constants on aarch64 (#5366)
As loading constants on aarch64 can take up to 4 instructions, we need to plumb through some additional registers. Rather than pass a fixed list of registers in, pass an allocation function.
2022-12-01 22:29:36 +00:00
Trevor Elliott
37c3c5b1e0 Remove an unnecessary debug trace (#5359) 2022-11-30 20:37:20 -08:00
Trevor Elliott
c16f2956db Allocate a temporary for 64-bit constant loads in the s390x backend (#5357)
Avoid reusing a destination virtual register for 64-bit constants in the s390x backend. This change addresses a case identified by the regalloc2 ssa validator, as the destination register was written to twice when constants were generated via the MachInst::gen_constant function.
2022-11-30 17:01:14 -08:00
Trevor Elliott
d8dbabfe6b Don't reuse registers in the x64 div lowering (#5356)
Introduce a temporary for an intermediate value in the lowering of div in the x64 backend. Additionally, add a src argument to the shift_r smart constructor, which is why the diff got larger than just the div lowering.
2022-11-30 22:44:59 +00:00
Trevor Elliott
87b63174b1 Don't reuse registers in make_i64x2_from_lanes (#5355)
Avoid reusing output registers in make_i64x2_from_lanes by threading the output name instead, and using smart constructors for x64_pinsrd instead of constructing the instructions directly.
2022-11-30 14:37:01 -08:00
Alex Crichton
830885383f Implement inline stack probes for AArch64 (#5353)
* Turn off probestack by default in Cranelift

The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-by-default status requires the aarch64 and
s390x implementations to be a stub. Turning off probestack by default
allows the s390x and aarch64 backends to panic with an error message to
avoid providing a false sense of security. When the probestack option is
implemented for all backends, however, it may be reasonable to
re-enable.

* aarch64: Improve codegen for AMode fallback

Currently the final fallback for finalizing an `AMode` will generate
both a constant-loading instruction as well as an `add` instruction to
the base register into the same temporary. This commit improves the
codegen by removing the `add` instruction and folding the final add into
the finalized `AMode`. This changes the `extendop` used but both
registers are 64-bit so shouldn't be affected by the extending
operation.

* aarch64: Implement inline stack probes

This commit implements inline stack probes for the aarch64 backend in
Cranelift. The support here is modeled after the x64 support where
unrolled probes are used up to a particular threshold after which a loop
is generated. The instructions here are similar in spirit to x64 except
that unlike x64 the stack pointer isn't modified during the unrolled
loop to avoid needing to re-adjust it back up at the end of the loop.

* Enable inline probestack for AArch64 and Riscv64

This commit enables inline probestacks for the AArch64 and Riscv64
architectures in the same manner that x86_64 has it enabled now. Some
more testing was additionally added since on Unix platforms we should be
guaranteed that Rust's stack overflow message is now printed too.

* Enable probestack for aarch64 in cranelift-fuzzgen

* Address review comments

* Remove implicit stack overflow traps from x64 backend

This commit removes implicit `StackOverflow` traps inserted by the x64
backend for stack-based operations. This was historically required when
stack overflow was detected with page faults but Wasmtime no longer
requires that since it's not suitable for wasm modules which call host
functions. Additionally no other backend implements this form of
implicit trap-code additions so this is intended to synchronize the
behavior of all the backends.

This fixes a test added prior for aarch64 to properly abort the process
instead of accidentally being caught by Wasmtime.

* Fix a style issue
2022-11-30 12:30:00 -06:00
Timothy Chen
67fc5389b0 Remove sig data arg and ret fields to reduce size (#5319)
* Remove sig data arg and ret fields to reduce size

* Update cranelift/codegen/src/machinst/abi.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Update cranelift/codegen/src/machinst/abi.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix offsets

* Add comment

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2022-11-30 07:19:41 -08:00
Afonso Bordado
ec342c20e3 cranelift: Add iadd_cout lowerings for aarch64 (#5177)
* cranelift: Add `iadd_cout`/`isub_bout`  i128 tests

* aarch64: Add `iadd_cout` lowerings

* fuzzgen: Add `iadd_cout`
2022-11-29 10:58:44 -08:00
Trevor Elliott
a5a0645aff Don't allow reuse_def constraints in the s390x Loop instruction (#5336) 2022-11-28 17:52:11 -08:00
Trevor Elliott
368004428a Fix rule shadowing instances in x64 and aarch64 backends (#5334)
Fix shadowing identified in #5322 for imul and swiden_high/swiden_low/uwiden_high/uwiden_low combinations in the x64 backend, and remove some redundant rules from the aarch64 dynamic neon ruleset. Additionally, add tests to the x64 backend showing that the imul specializations are firing.
2022-11-28 15:48:34 -08:00
Timothy Chen
48ee42efc2 Refactor Sigdata methods with sigset (#5307)
* Refactor sigdata methods

* Update cranelift/codegen/src/machinst/abi.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Address comments

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2022-11-22 09:03:51 -08:00
Nick Fitzgerald
d0d3245a35 Cranelift: Add heap_load and heap_store instructions (#5300)
* Cranelift: Define `heap_load` and `heap_store` instructions

* Cranelift: Implement interpreter support for `heap_load` and `heap_store`

* Cranelift: Add a suite runtests for `heap_{load,store}`

There are so many knobs we can twist for heaps and I wanted to exhaustively test
all of them, so I wrote a script to generate the tests. I've checked in the
script in case we want to make any changes in the future, but I don't think it
is worth adding this to CI to check that scripts are up to date or anything like
that.

* Review feedback
2022-11-21 23:00:39 +00:00