Commit Graph

1387 Commits

Author SHA1 Message Date
Alex Crichton
72962c9f08 Add some minor souper-harvested optimizations (#5735)
I was playing around with souper recently on some wasms I had lying
around and these are some optimization opportunities that popped out
which seemed easy-enough to add to the egraph-based optimizations.
2023-02-07 14:06:24 -06:00
Chris Fallin
75ae976adc egraphs: fix accidental remat of call. (#5726)
In the provided test case in #5716, the result of a call was then
added to 0. We have a rewrite rule that sets the remat-bit on any add
of a value and a constant, because these frequently appear (e.g. from
address offset calculations) and this can frequently reduce register
pressure (one long-lived base vs. many long-lived base+offset values).
Separately, we have an algebraic rule that `x+0` rewrites to `x`.

The result of this was that we had an eclass with the remat bit set on
the add, but the add was also union'd into the call. We pick the
latter during extraction, because it's cheaper not to do the add at
all; but we still get the remat bit, and try to remat a call (!),
which blows up later.

This PR fixes the logic to look up the "best value" for a value (i.e.,
whatever extraction determined), and look up the remat bit on *that*
node, not the canonical node.

(Why did the canonical node become the iadd and not the call? Because
the former had a lower value-number, as an accident of IR
construction; we don't impose any requirements on the input CLIF's
value-number ordering, and I don't think this breaks any of the
important acyclic properties, even though there is technically a
dependence from a lower-numbered to a higher-numbered node. In essence
one can think of them as having "virtual numbers" in any true
topologically-sorted order, and the only place the actual integer
indices matter should be in choosing the "canonical ID", which is just
used for dedup'ing, modulo this bug.)

Fixes #5716.
2023-02-06 23:36:16 +00:00
Alex Crichton
de0e0bea3f Legalize b{and,or,xor}_not into component instructions (#5709)
* Remove trailing whitespace in `lower.isle` files

* Legalize the `band_not` instruction into simpler form

This commit legalizes the `band_not` instruction into `band`-of-`bnot`,
or two instructions. This is intended to assist with egraph-based
optimizations where the `band_not` instruction doesn't have to be
specifically included in other bit-operation-patterns.

Lowerings of the `band_not` instruction have been moved to a
specialization of the `band` instruction.

* Legalize `bor_not` into components

Same as prior commit, but for the `bor_not` instruction.

* Legalize bxor_not into bxor-of-bnot

Same as prior commits. I think this also ended up fixing a bug in the
s390x backend where `bxor_not x y` was actually translated as `bnot
(bxor x y)` by accident given the test update changes.

* Simplify not-fused operands for riscv64

Looks like some delegated-to rules have special-cases for "if this
feature is enabled use the fused instruction" so move the clause for
testing the feature up to the lowering phase to help trigger other rules
if the feature isn't enabled. This should make the riscv64 backend more
consistent with how other backends are implemented.

* Remove B{and,or,xor}Not from cost of egraph metrics

These shouldn't ever reach egraphs now that they're legalized away.

* Add an egraph optimization for `x^-1 => ~x`

This adds a simplification node to translate xor-against-minus-1 to a
`bnot` instruction. This helps trigger various other optimizations in
the egraph implementation and also various backend lowering rules for
instructions. This is chiefly useful as wasm doesn't have a `bnot`
equivalent, so it's encoded as `x^-1`.

* Add a wasm test for end-to-end bitwise lowerings

Test that end-to-end various optimizations are being applied for input
wasm modules.

* Specifically don't self-update rustup on CI

I forget why this was here originally, but this is failing on Windows
CI. In general there's no need to update rustup, so leave it as-is.

* Cleanup some aarch64 lowering rules

Previously a 32/64 split was necessary due to the `ALUOp` being different
but that's been refactored away no so there's no longer any need for
duplicate rules.

* Narrow a x64 lowering rule

This previously made more sense when it was `band_not` and rarely used,
but be more specific in the type-filter on this rule that it's only
applicable to SIMD types with lanes.

* Simplify xor-against-minus-1 rule

No need to have the commutative version since constants are already
shuffled right for egraphs

* Optimize band-of-bnot when bnot is on the left

Use some more rules in the egraph algebraic optimizations to
canonicalize band/bor/bxor with a `bnot` operand to put the operand on
the right. That way the lowerings in the backends only have to list the
rule once, with the operand on the right, to optimize both styles of
input.

* Add commutative lowering rules

* Update cranelift/codegen/src/isa/x64/lower.isle

Co-authored-by: Jamey Sharp <jamey@minilop.net>

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-06 13:53:40 -06:00
Trevor Elliott
6d8f2be9e1 Use andn for band_not when bmi1 is present (#5701)
We can use the andn instruction for the lowering of band_not on x64 when bmi1 is available.
2023-02-03 16:23:18 -08:00
Nick Fitzgerald
e18d4cb711 Cranelift: Introduce support for return_call in the interpreter (#5697)
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-03 15:53:54 -08:00
Nick Fitzgerald
72c8513411 Cranelift: Correctly wrap shifts in constant propagation (#5695)
Fixes #5690
Fixes #5696

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-03 00:12:57 +00:00
Jun Ryung Ju
9cd4146939 Implemented b{and,or,xor}_not bitops for ty_int_ref_scalar_64 type. (#5604)
* Implemented `b{and,or,xor}_not` bitops for ty_int_ref_scalar_64 type.

* Added tests.
2023-02-01 21:57:18 -08:00
Jamey Sharp
ac4d28f4dd Constant-fold icmp instructions (#5666)
We found examples of icmp instructions with both operands constant in
spidermonkey.wasm.
2023-02-01 21:55:36 +00:00
Nick Fitzgerald
bdfb746548 Cranelift: Introduce the return_call and return_call_indirect instructions (#5679)
* Cranelift: Introduce the `tail` calling convention

This is an unstable-ABI calling convention that we will eventually use to
support Wasm tail calls.

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* Cranelift: Introduce the `return_call` and `return_call_indirect` instructions

These will be used to implement tail calls for Wasm and any other language
targeting CLIF. The `return_call_indirect` instruction differs from the Wasm
instruction of the same name by taking a native address callee rather than a
Wasm function index.

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* Cranelift: Implement verification rules for `return_call[_indirect]`

They must:

* have the same return types between the caller and callee,
* have the same calling convention between caller and callee,
* and that calling convention must support tail calls.

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* cargo fmt

---------

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-01 21:20:35 +00:00
Nick Fitzgerald
ffbbfbffce Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) again (#5684)
This rewrite was introduced in #5676 and then reverted in #5682 due to a footgun
where we accidentally weren't actually checking the `y == !z` precondition. This
commit fixes the precondition check. It also fixes the arithmetic to be
correctly masked to the value type's width.

This reverts commit 268f6bfc1d.
2023-02-01 20:53:22 +00:00
yuyang
cb3b6c621f fix rotl.i16 with i128 shift value. (#5611)
* fix issue 5523.

* fix.

* add missing issue file.

* fix issue.

* fix duplicate shamt_128.

* issue 5523 add test target,and fix some wrong comment.

* fix output file.

* enable llvm_abi_extensions for regression test file.
2023-02-01 03:44:13 +00:00
Trevor Elliott
268f6bfc1d Revert "Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) (#5676)" (#5682)
This reverts commit 8c9eb9939b.

Fixes #5680
2023-02-01 02:53:23 +00:00
yuyang
0c66a1bba7 Fix issue 5528 (#5605)
* fix parameter error.

* fix float convert to i8 and i16   should extract sign bit.

* add missing regression test file.

* using tmp register.

* float convert i8 will consume more instructions.

* fix worse inst emit size.

* fix worst_case_size.
2023-01-31 15:37:36 -08:00
Nick Fitzgerald
8c9eb9939b Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) (#5676)
Co-authored-by: Rainy Sinclair <844493+itsrainy@users.noreply.github.com>
2023-01-31 22:44:45 +00:00
Nick Fitzgerald
253e28ca4f Cranelift: Rewrite (x>>k)<<k into masking off the bottom k bits (#5673)
* Cranelift: Rewrite `(x>>k)<<k` into masking off the bottom `k` bits

* Add a runtest for exercising our rewrite of `(x >> k) << k` into masking
2023-01-31 21:11:12 +00:00
Nick Fitzgerald
7aa240e0f2 Cranelift: constant propagate shifts (#5671)
Thanks to Souper for pointing out we weren't doing this!
2023-01-31 12:06:53 -08:00
Nick Fitzgerald
c9d1c068bc Cranelift: Add egraph rule to rewrite x * C ==> x << log2(C) when C is a power of two (#5647) 2023-01-31 18:04:17 +00:00
Trevor Elliott
a5698cedf8 cranelift: Remove brz and brnz (#5630)
Remove the brz and brnz instructions, as their behavior is now redundant with brif.
2023-01-30 20:34:56 +00:00
yuyang
77cf547f41 fix issue 5569. (#5612)
* add regression test file.

* fix issute5569.

* enable code length check.
2023-01-30 10:01:33 -08:00
Jamey Sharp
915801551b Delete old cranelift-preopt crate (#5642)
Most of these optimizations are in the egraph `cprop.isle` rules now,
making a separate crate unnecessary.

Also I think the `udiv` optimizations here are straight-up wrong (doing
signed instead of unsigned division, and panicking instead of preserving
traps on division by zero) so I'm guessing this crate isn't seriously
used anywhere.

At the least, bjorn3 confirms that cg_clif doesn't use this, and I've
verified that Wasmtime doesn't either.

Closes #1090.
2023-01-26 21:32:33 +00:00
Trevor Elliott
7926808e8e riscv64: improve unordered comparison generated code (#5636)
Improve the generated code for unordered floating point comparisons by negating the comparison and inveritng the branches. This allows us to pick the unordered versions, which generate significantly better code.
2023-01-25 17:28:28 -08:00
Trevor Elliott
b58a197d33 cranelift: Add a conditional branch instruction with two targets (#5446)
Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both.

This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks.

Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.
2023-01-24 14:37:16 -08:00
Jamey Sharp
fef9f64d2c x86: Test paired udiv/urem (#5573)
Ideally these pairs of CLIF instructions should emit a single x86
instruction, but they don't today. This test will tell us if somebody
fixes that.

Similar tests might make sense for imul/umulhi as well as signed
versions, but I haven't tried that.
2023-01-23 11:44:27 -08:00
yuyang
7e10bd1f58 fix issue #5497 #5524 #5526. (#5595)
* fix issue 5497.

* fix issue 5524

* fix issue 5497 5524 5526.

* some clif change because of reg alloc.
2023-01-20 14:06:26 -08:00
yuyang
299b8187f8 fix issue 5525. (#5603)
* fix issue 5525.

* reg alloc changed.
2023-01-20 09:53:54 -08:00
Chris Fallin
1faff8c2ce Enable egraph-based optimization by default. (#5587)
This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.

Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on.

This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!).

Fixes #5181.
2023-01-19 15:46:53 -08:00
Chris Fallin
704f5a5772 Cranelift/egraph mid-end: support merging effectful-but-idempotent ops (#5594)
* Support mergeable-but-side-effectful (idempotent) operations in general in the egraph's GVN.

This mirrors the similar change made in #5534.

* Add tests for egraph case.
2023-01-19 11:51:19 -08:00
Kevin Rizzo
da03ff47f1 winch: Adding support for integration tests (#5588)
* Adding in the foundations for Winch `filetests`

This commit adds two new crates into the Winch workspace:
`filetests` and `test-macros`. The intent is to mimic the
structure of Cranelift `filetests`, but in a simpler way.

* Updates to documentation

This commits adds a high level document to outline how to test Winch
through the `winch-tools` utility. It also updates some inline
documentation which gets propagated to the CLI.

* Updating test-macro to use a glob instead of only a flat directory
2023-01-19 07:34:48 -05:00
Afonso Bordado
3ae373b073 cranelift: Disable select rule for i128 types on riscv64 (#5584)
* fuzzgen: Disable some selects for RISC-V

* cranelift: Force disable gen_select_reg rule for i128 values
2023-01-17 10:01:23 -08:00
Afonso Bordado
82494661c1 cranelift: Add atomic_{load,store} and fence to the interpreter (#5503)
* cranelift: Add `fence` to interpreter

* cranelift: Add `atomic_{load,store}` to the interpreter

* fuzzgen: Add `atomic_{load,store}`

* Update cranelift/fuzzgen/src/function_generator.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* fuzzgen: Use type size as the alignment size.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-01-12 08:36:04 -08:00
Afonso Bordado
9556cb190f cranelift: Forbid argument extensions for floats and SIMD vectors (#5536)
* fuzzgen: Generate argument extensions only for integer argumetns

* cranelift: Add verifier check for argument extensions
2023-01-10 10:26:30 -08:00
Alexa VanHattum
44913825b5 cranelift: fix register for srem.i8 on x86_64 (#5540)
* Change register written to in specific srem case. Add regression test as filetest case. Fixes #5470

* Add another test case, newline

* Update comment
2023-01-06 22:18:16 +00:00
Sam Sartor
1efa3d6f8b Add clif-util compile option to output object file (#5493)
* add clif-util compile option to output object file

* switch from a box to a borrow

* update objectmodule tests to use borrowed isa

* put targetisa into an arc
2023-01-06 12:53:48 -08:00
uint256_t
b00455135e Cranelift: Implement 'iabs' for scalar types on x86_64 (#5527)
* Implement 'iabs' for scalar types on x86_64

* Small fix
2023-01-05 21:33:12 -08:00
Nick Fitzgerald
c50bdf600e Cranelift: GVN all idempotently trapping but otherwise pure instructions (#5534) 2023-01-05 15:08:06 -08:00
Afonso Bordado
ee6a909ccb cranelift: Cleanup SIMD icmp tests (#5530)
* cranelift: Enable more SIMD tests

* cranelift: Reorganize icmp tests

* cranelift: Enable SIMD icmp tests for unsigned ops

* cranelift: Cleanup trailing newlines
2023-01-05 09:19:03 -08:00
Nick Fitzgerald
f4a2d5337a Cranelift: GVN uadd_overflow_trap (#5520)
* Switch duplicate loads w/ dynamic memories test to `min_size = 0`

This test was accidentally hitting a special case for bounds checks for when we
know that `offset + access_size < min_size` and can skip some steps. This
commit changes the `min_size` of the memory to zero so that we are forced to do
fully general bounds checks.

* Cranelift: Mark `uadd_overflow_trap` as okay for GVN

Although this improves our test sequence for duplicate loads with dynamic
memories, it unfortunately doesn't have any effect on sightglass benchmarks:

```
instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [34448 35607.23 37158] gvn_uadd_overflow_trap.so
  [34566 35734.05 36585] main.so

instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [44101 60449.62 92712] gvn_uadd_overflow_trap.so
  [44011 60436.37 92690] main.so

instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [35595 36675.72 38153] gvn_uadd_overflow_trap.so
  [35440 36670.42 37993] main.so

compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [17370195 17405125.62 17471222] gvn_uadd_overflow_trap.so
  [17369324 17404859.43 17470725] main.so

execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [7055720520 7055886880.32 7056265930] gvn_uadd_overflow_trap.so
  [7055719554 7055843809.33 7056193289] main.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [683589861 683767276.00 684098366] gvn_uadd_overflow_trap.so
  [683590024 683767998.02 684097885] main.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [46436883 46437135.10 46437823] gvn_uadd_overflow_trap.so
  [46436883 46437087.67 46437785] main.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [126522461 126565812.58 126647044] gvn_uadd_overflow_trap.so
  [126522176 126565757.75 126647522] main.so

execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [653010531 653010533.03 653010544] gvn_uadd_overflow_trap.so
  [653010531 653010533.18 653010537] main.so
```

* cranelift-codegen-meta: Rename `side_effects_okay_for_gvn` to `side_effects_idempotent`

* cranelift-filetests: Ensure there is a trailing newline for blessed Wasm tests
2023-01-04 22:03:16 -08:00
Nick Fitzgerald
937601c7c3 Cranelift: GVN spectre guards and run redundant load elimination twice (#5517)
* Cranelift: Make spectre guards GVN-able

While these instructions have a side effect that is otherwise invisible to the
optimizer, the side effect in question is idempotent, so it can be de-duplicated
by GVN.

* Cranelift: Run redundant load replacement and GVN twice

This allows us to actually replace redundant Wasm loads with dynamic memories.

While this improves our hand-crafted test sequences, it doesn't seem to have any
improvement on sightglass benchmarks run with dynamic memories, however it also
isn't a hit to compilation times, so seems generally good to land anyways:

```
$ cargo run --release -- benchmark -e ~/scratch/once.so -e ~/scratch/twice.so -m insts-retired --processes 20 --iterations-per-process 3 --engine-flags="--static-memory-maximum-size 0" -- benchmarks/default.suite
compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [683595240 683768610.53 684097577] once.so
  [683597068 700115966.83 1664907164] twice.so

instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [44107 60411.07 92785] once.so
  [44138 59552.32 92097] twice.so

compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [17369916 17404839.78 17471458] once.so
  [17369935 17625713.87 30700150] twice.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [126523640 126566170.80 126648265] once.so
  [126523076 127174580.30 163145149] twice.so

instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [34569 35686.25 36513] once.so
  [34651 35749.97 36953] twice.so

instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [35146 36639.10 37707] once.so
  [34472 36580.82 38431] twice.so

execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [7055720115 7055841324.82 7056180024] once.so
  [7055717681 7055877095.85 7056225217] twice.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [46436881 46437081.28 46437691] once.so
  [46436883 46437127.68 46437766] twice.so

execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [653010530 653010533.27 653010539] once.so
  [653010531 653010532.95 653010538] twice.so
```
2023-01-04 20:05:43 +00:00
Trevor Elliott
b2d5afdf83 riscv64: Implement fcmp in ISLE (#5512)
Rework the compilation of fcmp in the riscv64 backend to be in ISLE, removing the need for the dedicated Fcmp instruction. This change is motivated by #5500, which showed that the riscv64 backend was generating branch instructions in the middle of a basic block.

We can't remove lower_br_fcmp quite yet as it's used in a few places in the emit module, but it's now no longer reachable from the ISLE lowerings.

Fixes #5500
2023-01-04 11:52:00 -08:00
Nick Fitzgerald
d1920f5a2d cranelift: Add wasm tests for duplicate loads (#5514)
* cranelift-filetests: Add the ability to test optimized CLIF in Wasm tests

* cranelift: Add Wasm tests for identical loads, back to back
2023-01-04 18:52:32 +00:00
Afonso Bordado
52ba72f341 riscv64: Fix masking on iabs (#5505)
* cranelift: Add `iabs.i128` runtest

* riscv64: Fix incorrect extension in iabs

When lowering iabs, we were accidentally comparing the unextended value
this caused the instruction to misbehave with certain top bits.

This commit also adds a zbb lowering that does not use jumps.
2023-01-03 17:37:25 -08:00
Afonso Bordado
7e94704264 riscv64: Add masking for small types when lowering select (#5504)
When lowering `select+icmp` we have an optimization that allows us to
avoid materializing the icmp result.

We were accidentally not masking the high bits for i8 and i16 in this case.

Issue #5498 reported this as an illegal instruction but what was happening
there was that the invalid select caused a division by zero.
2023-01-03 19:59:14 +00:00
Afonso Bordado
c9c7d4991c riscv64: Fix br-table segfault with zero sized jump tables (#5508)
We had a off-by-one bounds check error when checking if we should
jump to the default block in a br-table. Instead of always jumping
to the default block when we have a jump table with 0 targets we
would try to compute an offset past the end of the table.

This sometimes would not crash, but it would crash if the there was
no block after the br_table, thus adding a cold block would cause a
segfault.

The actual fix is quite simple, do not count the default block
as a jump table entry when computing the limits.

This commit also does a bunch of cleanup and adding some comments
to the br_table emission code.
2023-01-03 10:22:48 -08:00
Mrmaxmeier
fe992c2627 Cranelift: aarch64: lower umin.i64 and friends (#5495)
* Cranelift: aarch64: lower umin.i64 and friends

* fuzzgen: Enable integer-min/max for aarch64
2022-12-29 18:03:31 -08:00
Chris Fallin
03463458e4 Cranelift: fix branch-of-icmp/fcmp regression: look through uextend. (#5487)
In #5031, we removed `bool` types from CLIF, using integers instead for
"truthy" values. This greatly simplified the IR, and was generally an
improvement.

However, because x86's `SETcc` instruction sets only the low 8 bits of a
register, we chose to use `i8` types as the result of `icmp` and `fcmp`,
to avoid the need for a masking operation when materializing the result.

Unfortunately this means that uses of truthy values often now have
`uextend` operations, especially when coming from Wasm (where truthy
values are naturally `i32`-typed). For example, where we previously had
`(brz (icmp ...))`, we now have `(brz (uextend (icmp ...)))`.

It's arguable whether or not we should switch to `i32` truthy values --
in most cases we can avoid materializing a value that's immediately used
for a branch or select, so a mask would in most cases be unnecessary,
and it would be a win at the IR level -- but irrespective of that, this
change *did* regress our generated code quality: our backends had
patterns for e.g. `(brz (icmp ...))` but not with the `uextend`, so we
were *always* materializing truthy values. Many blocks thus ended with
"cmp; setcc; cmp; test; branch" rather than "cmp; branch".

In #5391 we noticed this and fixed it on x64, but it was a general
problem on aarch64 and riscv64 as well. This PR introduces a
`maybe_uextend` extractor that "looks through" uextends, and uses it
where we consume truthy values, thus fixing the regression.  This PR
also adds compile filetests to ensure we don't regress again.

The riscv64 backend has not been updated here because doing so appears
to trigger another issue in its branch handling; fixing that is TBD.
2022-12-22 01:43:44 -08:00
Ayomide Bamidele
b47e644c3d Remove vconcat and vsplit clif instructions (#5465)
Fixes #5463.

* remove vsplit instruction

* remove vconcat instruction

* remove unsused half/double vector helper functions

* remove unused operand constraints

* delete + inline Type::half_vector method
2022-12-20 00:41:55 +00:00
Ayomide Bamidele
93ae9078c5 Implement vsplit in cranelift interpreter (#5462)
* Add vsplit testfile

* Add vsplit implementation
2022-12-16 23:14:56 +00:00
Chris Fallin
22439f7b39 support select_spectre_guard and select on i128 conditions on all platforms. (#5460)
Fixes #5199.
Fixes #5200.
Fixes #5452.
Fixes #5453.

On riscv64, there is apparently an autoconversion from `ValueRegs` to
`Reg` that takes just the low register [0], and removing this conversion
causes 48 errors. As a result of this, `select` with an `i128` condition
was silently miscompiling, testing only the low 64 bits. We should
remove this autoconversion to ensure we aren't missing any other silent
truncations, but for now this PR just adds the explicit `I128` logic for
`select` / `select_spectre_guard`.

[0]
d9fdbfd50e/cranelift/codegen/src/isa/riscv64/inst.isle (L1762)
2022-12-16 14:18:22 -08:00
Nick Fitzgerald
c0b587ac5f Remove heaps from core Cranelift, push them into cranelift-wasm (#5386)
* cranelift-wasm: translate Wasm loads into lower-level CLIF operations

Rather than using `heap_{load,store,addr}`.

* cranelift: Remove the `heap_{addr,load,store}` instructions

These are now legalized in the `cranelift-wasm` frontend.

* cranelift: Remove the `ir::Heap` entity from CLIF

* Port basic memory operation tests to .wat filetests

* Remove test for verifying CLIF heaps

* Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test

* Remove `heap_addr` from readonly.clif test

* Remove `heap_addr` from `table_addr.clif` test

* Remove `heap_addr` from the simd-fvpromote_low.clif test

* Remove `heap_addr` from simd-fvdemote.clif test

* Remove `heap_addr` from the load-op-store.clif test

* Remove the CLIF heap runtest

* Remove `heap_addr` from the global_value.clif test

* Remove `heap_addr` from fpromote.clif runtests

* Remove `heap_addr` from fdemote.clif runtests

* Remove `heap_addr` from memory.clif parser test

* Remove `heap_addr` from reject_load_readonly.clif test

* Remove `heap_addr` from reject_load_notrap.clif test

* Remove `heap_addr` from load_readonly_notrap.clif test

* Remove `static-heap-without-guard-pages.clif` test

Will be subsumed when we port `make-heap-load-store-tests.sh` to generating
`.wat` tests.

* Remove `static-heap-with-guard-pages.clif` test

Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat`
tests.

* Remove more heap tests

These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat`
tests.

* Remove `heap_addr` from `simple-alias.clif` test

* Remove `heap_addr` from partial-redundancy.clif test

* Remove `heap_addr` from multiple-blocks.clif test

* Remove `heap_addr` from fence.clif test

* Remove `heap_addr` from extends.clif test

* Remove runtests that rely on heaps

Heaps are not a thing in CLIF or the interpreter anymore

* Add generated load/store `.wat` tests

* Enable memory-related wasm features in `.wat` tests

* Remove CLIF heap from fcmp-mem-bug.clif test

* Add a mode for compiling `.wat` all the way to assembly in filetests

* Also generate WAT to assembly tests in `make-load-store-tests.sh`

* cargo fmt

* Reinstate `f{de,pro}mote.clif` tests without the heap bits

* Remove undefined doc link

* Remove outdated SVG and dot file from docs

* Add docs about `None` returns for base address computation helpers

* Factor out `env.heap_access_spectre_mitigation()` to a local

* Expand docs for `FuncEnvironment::heaps` trait method

* Restore f{de,pro}mote+load clif runtests with stack memory
2022-12-15 00:26:45 +00:00
Nick Fitzgerald
be710df237 Cranelift: Add .wat to assembly test support and generate Wasm load/store tests for all ISAs (#5439)
* cranelift-filetest: Add the ability to test `.wat` to assembly

* Make the load/store test case generator script use `.wat` tests

And generate tests that exercise both Wasm-to-CLIF lowering and Wasm all the way
to assembly.

* Remove old versions of generated load/store tests

* Add new generated load/store tests

* Fix filename reference in script
2022-12-14 21:13:43 +00:00