Commit Graph

1523 Commits

Author SHA1 Message Date
Afonso Bordado
480c45b854 fuzzgen: Initial SIMD support (#5885)
* fuzzgen: Initial SIMD support

* riscv64: Address PR Feedback

Thanks!
2023-02-28 11:33:11 +00:00
Afonso Bordado
ae881407cd cranelift-jit: Implement RISC-V Call relocation (#5835) 2023-02-28 11:14:50 +00:00
Alex Crichton
f2dce812c3 x64: Sink constant loads into xmm instructions (#5880)
A number of places in the x64 backend make use of 128-bit constants for
various wasm SIMD-related instructions although most of them currently
use the `x64_xmm_load_const` helper to load the constant into a
register. Almost all xmm instructions, however, enable using a memory
operand which means that these loads can be folded into instructions to
help reduce register pressure. Automatic conversions were added for a
`VCodeConstant` into an `XmmMem` value and then explicit loads were all
removed in favor of forwarding the `XmmMem` value directly to the
underlying instruction. Note that some instances of `x64_xmm_load_const`
remain since they're used in contexts where load sinking won't work
(e.g. they're the first operand, not the second for non-commutative
instructions).
2023-02-27 22:02:42 +00:00
Alex Crichton
9b86a0b9b1 Remove the widening_pairwise_dot_product_s clif instruction (#5889)
This was added for the wasm SIMD proposal but I've been poking around at
this recently and the instruction can instead be represented by its
component parts with the same semantics I believe. This commit removes
the instruction and instead represents it with the existing
`iadd_pairwise` instruction (among others) and updates backends to with
new pattern matches to have the same codegen as before.

This interestingly entirely removed the codegen rule with no replacement
on the AArch64 backend as the existing rules all existed to produce the
same codegen.
2023-02-27 18:43:43 +00:00
Jamey Sharp
6cf7155052 Cranelift: Generalize (x << k) >> k optimization (#5746)
* Generalize unsigned `(x << k) >> k` optimization

Split the existing rule into three parts:
- A dual of the rule for `(x >> k) << k` that is only valid for unsigned
  shifts.
- Known-bits analysis for `(band (uextend x) k)`.
- A new rule for converting `sextend` to `uextend` if the sign-extended
  bits are masked out anyway.

The first two together cover the existing rule.

* Generalize signed `(x << k) >> k` optimization

* Review comments

* Generalize sign-extending shifts further

The shifts can be eliminated even if the shift amount isn't exactly
equal to the difference in bit-widths between the narrow and wide types.

* Add filetests
2023-02-27 17:34:46 +00:00
yuyang
3864286596 fix issue 5714. (#5845)
* fix issue 5714.

* add target for regression test.

* remove x86_64 test because of not implemented.
2023-02-26 16:25:38 +00:00
Jan-Justin van Tonder
66cb13cb4b cranelift: Add atomic_cas to interpreter (#5875)
As per issue #5818, atomic_cas was implemented without specific regard for thread safety.
2023-02-25 14:36:49 +00:00
Afonso Bordado
e9095050be cranelift-interpreter: Implement call_indirect and return_call_indirect (#5877)
* cranelift-interpreter: Implement `call_indirect`

* cranelift: Fix typo

* riscv64: Enable `call_indirect` tests
2023-02-25 13:16:59 +00:00
Afonso Bordado
36e92add6f riscv64: Move is_null/is_invalid to ISLE (#5874)
* riscv64: Move `is_null`/`is_invalid` to ISLE

* riscv64: Fix `is_invalid` codegen

* Implement review suggestions

Thanks!

Co-authored-by: Jamey Sharp <jamey@minilop.net>

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-25 12:48:44 +00:00
Jamey Sharp
7d790fcdfe x64: Only branch once in br_table (#5850)
This uses the `cmov`, which was previously necessary for Spectre
mitigation, to clamp the table index instead of zeroing it. By then
placing the default target as the last entry in the table, we can use
just one branch instruction in all cases.

Since there isn't a bounds-check branch any more, this sequence no
longer needs Spectre mitigation. And since we don't need to be careful
about preserving flags, half the instructions can be removed from this
pseudoinstruction and emitted as regular instructions instead.

This is a net savings of three bytes in the encoding of x64's br_table
pseudoinstruction. The generated code can sometimes be longer overall
because the blocks are emitted in a slightly different order.

My benchmark results show a very small effect on runtime performance
with this change.

The spidermonkey benchmark in Sightglass runs "1.01x faster" than main
by instructions retired, but with no significant difference in CPU
cycles. I think that means it rarely hit the default case in any
br_table instructions it executed.

The pulldown-cmark benchmark in Sightglass runs "1.01x faster" than main
by CPU cycles, but main runs "1.00x faster" by instructions retired. I
think that means this benchmark hit the default case a significant
amount of the time, so it executes a few more instructions per br_table,
but maybe the branches were predicted better.
2023-02-24 04:46:38 +00:00
Trevor Elliott
c5d9d5b10f Remove module-level code generation tests (#5870)
* Remove module-level code generation tests

* Add cold block tests for each backend

* Better cold block tests
2023-02-24 01:19:26 +00:00
Alex Crichton
3fc3bc9ec8 x64: Fill out more AVX instructions (#5849)
* x64: Fill out more AVX instructions

This commit fills out more AVX instructions for SSE counterparts
currently used. Many of these instructions do not benefit from the
3-operand form that AVX uses but instead benefit from being able to use
`XmmMem` instead of `XmmMemAligned` which may be able to avoid some
extra temporary registers in some cases.

* Review comments
2023-02-23 22:31:31 +00:00
Trevor Elliott
8abfe928d6 Reuse the DominatorTree postorder travesal in BlockLoweringOrder (#5843)
* Rework the blockorder module to reuse the dom tree's cfg postorder

* Update domtree tests

* Treat br_table with an empty jump table as multiple block exits

* Bless tests

* Change branch_idx to succ_idx and fix the comment
2023-02-23 22:05:20 +00:00
Ulrich Weigand
4314210162 s390x: Fix implementation of {s,u}{min,max} (#5864)
When expanding a min/max operation to a pair of icmp + select,
do not attempt to expand the input value operands twice, as
this might fail with memory operands.

Fixes https://github.com/bytecodealliance/wasmtime/issues/5859.
2023-02-23 20:01:51 +00:00
Jan-Justin van Tonder
0521155896 cranelift: Add atomic_rmw to interpreter (#5817) (#5856)
As per the linked issue, atomic_rmw was implemented without specific regard for thread safety.
Additionally, the relevant filetest (atomic-rmw-little.clif) was enabled and altered to fix an
inccorrect call to test function `%atomic_rmw_and_i64` after setting up test function
`%atomic_rmw_and_i32`.
2023-02-23 10:24:56 +00:00
Afonso Bordado
f6c6bc2155 riscv64: Improve signed and zero extend codegen (#5844)
* riscv64: Remove unused code

* riscv64: Group extend rules

* riscv64: Remove more unused rules

* riscv64: Cleanup existing extension rules

* riscv64: Move the existing Extend rules to ISLE

* riscv64: Use `sext.w` when extending

* riscv64: Remove duplicate extend tests

* riscv64: Use `zbb` instructions when extending values

* riscv64: Use `zbkb` extensions when zero extending

* riscv64: Enable additional tests for extend i128

* riscv64: Fix formatting for `Inst::Extend`

* riscv64: Reverse register for pack

* riscv64: Misc Cleanups

* riscv64: Cleanup extend rules
2023-02-22 17:41:14 +00:00
Afonso Bordado
6e6a1034d7 riscv64: Add bitmanip extension flags (#5847) 2023-02-21 22:12:44 +00:00
Alex Crichton
bd3dcd313d x64: Add more fma instruction lowerings (#5846)
The relaxed-simd proposal for WebAssembly adds a fused-multiply-add
operation for `v128` types so I was poking around at Cranelift's
existing support for its `fma` instruction. I was also poking around at
the x86_64 ISA's offerings for the FMA operation and ended up with this
PR that improves the lowering of the `fma` instruction on the x64
backend in a number of ways:

* A libcall-based fallback is now provided for `f32x4` and `f64x2` types
  in preparation for eventual support of the relaxed-simd proposal.
  These encodings are horribly slow, but it's expected that if FMA
  semantics must be guaranteed then it's the best that can be done
  without the `fma` feature. Otherwise it'll be up to producers (e.g.
  Wasmtime embedders) whether wasm-level FMA operations should be FMA or
  multiply-then-add.

* In addition to the existing `vfmadd213*` instructions opcodes were
  added for `vfmadd132*`. The `132` variant is selected based on which
  argument can have a sinkable load.

* Any argument in the `fma` CLIF instruction can now have a
  `sinkable_load` and it'll generate a single FMA instruction.

* All `vfnmadd*` opcodes were added as well. These are pattern-matched
  where one of the arguments to the CLIF instruction is an `fneg`. I
  opted to not add a new CLIF instruction here since it seemed like
  pattern matching was easy enough but I'm also not intimately familiar
  with the semantics here so if that's the preferred approach I can do
  that too.
2023-02-21 20:51:22 +00:00
Alex Crichton
d82ebcc102 x64: Enable load-coalescing for SSE/AVX instructions (#5841)
* x64: Enable load-coalescing for SSE/AVX instructions

This commit unlocks the ability to fold loads into operands of SSE and
AVX instructions. This is beneficial for both function size when it
happens in addition to being able to reduce register pressure.
Previously this was not done because most SSE instructions require
memory to be aligned. AVX instructions, however, do not have alignment
requirements.

The solution implemented here is one recommended by Chris which is to
add a new `XmmMemAligned` newtype wrapper around `XmmMem`. All SSE
instructions are now annotated as requiring an `XmmMemAligned` operand
except for a new new instruction styles used specifically for
instructions that don't require alignment (e.g.  `movdqu`, `*sd`, and
`*ss` instructions). All existing instruction helpers continue to take
`XmmMem`, however. This way if an AVX lowering is chosen it can be used
as-is. If an SSE lowering is chosen, however, then an automatic
conversion from `XmmMem` to `XmmMemAligned` kicks in. This automatic
conversion only fails for unaligned addresses in which case a load
instruction is emitted and the operand becomes a temporary register
instead. A number of prior `Xmm` arguments have now been converted to
`XmmMem` as well.

One change from this commit is that loading an unaligned operand for an
SSE instruction previously would use the "correct type" of load, e.g.
`movups` for f32x4 or `movup` for f64x2, but now the loading happens in
a context without type information so the `movdqu` instruction is
generated. According to [this stack overflow question][question] it
looks like modern processors won't penalize this "wrong" choice of type
when the operand is then used for f32 or f64 oriented instructions.

Finally this commit improves some reuse of logic in the `put_in_*_mem*`
helper to share code with `sinkable_load` and avoid duplication. With
this in place some various ISLE rules have been updated as well.

In the tests it can be seen that AVX-instructions are now automatically
load-coalesced and use memory operands in a few cases.

[question]: https://stackoverflow.com/questions/40854819/is-there-any-situation-where-using-movdqu-and-movupd-is-better-than-movups

* Fix tests

* Fix move-and-extend to be unaligned

These don't have alignment requirements like other xmm instructions as
well. Additionally add some ISA tests to ensure that their output is
tested.

* Review comments
2023-02-21 19:10:19 +00:00
Alex Crichton
c65de1f1b1 x64: Remove conditional SseOpcode::uses_src1 (#5842)
This is a follow-up to comments in #5795 to remove some cruft in the x64
instruction model to ensure that the shape of an `Inst` reflects what's
going to happen in regalloc and encoding. This accessor was used to
handle `round*`, `pextr*`, and `pshufb` instructions. The `round*` ones
had already moved to the appropriate `XmmUnary*` variant and `pshufb`
was additionally moved over to that variant as well.

The `pextr*` instructions got a new `Inst` variant and additionally had
their constructors slightly modified to no longer require the type as
input. The encoding for these instructions now automatically handles the
various type-related operands through a new `SseOpcode::Pextrq` operand
to represent 64-bit movements.
2023-02-21 18:17:07 +00:00
Afonso Bordado
0f51338def riscv64: Clear the top 32bits in the br_table index (#5831)
We were unintentionally relying on these to be zeroed when jumping.
2023-02-21 18:05:51 +00:00
Alex Crichton
c26a65a854 x64: Add most remaining AVX lowerings (#5819)
* x64: Add most remaining AVX lowerings

This commit goes through `inst.isle` and adds a corresponding AVX
lowering for most SSE lowerings. I opted to skip instructions where the
SSE lowering didn't read/modify a register, such as `roundps`. I think
that AVX will benefit these instructions when there's load-merging since
AVX doesn't require alignment, but I've deferred that work to a future
PR.

Otherwise though in this PR I think all (or almost all) of the 3-operand
forms of AVX instructions are supported with their SSE counterparts.
This should ideally improve codegen slightly by removing register
pressure and the need for `movdqa` between registers. I've attempted to
ensure that there's at least one codegen test for all the new instructions.

As a side note, the recent capstone integration into `precise-output`
tests helped me catch a number of encoding bugs much earlier than
otherwise, so I've found that incredibly useful in tests!

* Move `vpinsr*` instructions to their own variant

Use true `XmmMem` and `GprMem` types in the instruction as well to get
more type-level safety for what goes where.

* Remove `Inst::produces_const` accessor

Instead of conditionally defining regalloc and various other operations
instead add dedicated `MInst` variants for operations which are intended
to produce a constant to have more clear interactions with regalloc and
printing and such.

* Fix tests

* Register traps in `MachBuffer` for load-folding ops

This adds a missing `add_trap` to encoding of VEX instructions with
memory operands to ensure that if they cause a segfault that there's
appropriate metadata for Wasmtime to understand that the instruction
could in fact trap. This fixes a fuzz test case found locally where v8
trapped and Wasmtime didn't catch the signal and crashed the fuzzer.
2023-02-20 15:11:52 +00:00
Alex Crichton
453330b2db x64: Add rudimentary support for some AVX instructions (#5795)
* x64: Add rudimentary support for some AVX instructions

I was poking around Spidermonkey's wasm backend and saw that the various
assembler functions used are all `v*`-prefixed which look like they're
intended for use with AVX instructions. I looked at Cranelift and it
currently doesn't have support for many AVX-based instructions, so I
figured I'd take a crack at it!

The support added here is a bit of a mishmash when viewed alone, but my
general goal was to take a single instruction from the SIMD proposal for
WebAssembly and migrate all of its component instructions to AVX. I, by
random chance, picked a pretty complicated instruction of `f32x4.min`.
This wasm instruction is implemented on x64 with 4 unique SSE
instructions and ended up being a pretty good candidate.

Further digging about AVX-vs-SSE shows that there should be two major
benefits to using AVX over SSE:

* Primarily AVX instructions largely use a three-operand form where two
  input registers are operated with and an output register is also
  specified. This is in contrast to SSE's predominant
  one-register-is-input-but-also-output pattern. This should help free
  up the register allocator a bit and additionally remove the need for
  movement between registers.

* As #4767 notes the memory-based operations of VEX-encoded instructions
  (aka AVX instructions) do not have strict alignment requirements which
  means we would be able to sink loads and stores into individual
  instructions instead of having separate instructions.

So I set out on my journey to implement the instructions used by
`f32x4.min`. The first few were fairly easy. The machinst backends are
already of the shape "take these inputs and compute the output" where
the x86 requirement of a register being both input and output is
postprocessed in. This means that the `inst.isle` creation helpers for
SSE instructions were already of the correct form to use AVX. I chose to
add new `rule` branches for the instruction creation helpers, for
example `x64_andnps`. The new `rule` conditionally only runs if AVX is
enabled and emits an AVX instruction instead of an SSE instruction for
achieving the same goal. This means that no lowerings of clif
instructions were modified, instead just new instructions are being
generated.

The VEX encoding was previously not heavily used in Cranelift. The only
current user are the FMA-style instructions that Cranelift has at this
time. These FMA instructions have one extra operand than `vandnps`, for
example, so I split the existing `XmmRmRVex` into a few more variants to
fit the shape of the instructions that needed generating for
`f32x4.min`. This was accompanied then with more AVX opcode definitions,
more emission support, etc.

Upon implementing all of this it turned out that the test suite was
failing on my machine due to the memory-operand encodings of VEX
instructions not being supported. I didn't explicitly add those in
myself but some preexisting RIP-relative addressing was leaking into the
new instructions with existing tests. I opted to go ahead and fill out
the memory addressing modes of VEX encoding to get the tests passing
again.

All-in-all this PR adds new instructions to the x64 backend for a number
of AVX instructions, updates 5 existing instruction producers to use AVX
instructions conditionally, implements VEX memory operands, and adds
some simple tests for the new output of `f32x4.min`. The existing
runtest for `f32x.min` caught a few intermediate bugs along the way and
I additionally added a plain `target x86_64` to that runtest to ensure
that it executes with and without AVX to test the various lowerings.
I'll also note that this, and future support, should be well-fuzzed
through Wasmtime's fuzzing which may explicitly disable AVX support
despite the machine having access to AVX, so non-AVX lowerings should be
well-tested into the future.

It's also worth mentioning that I am not an AVX or VEX or x64 expert.
Implementing the memory operand part for VEX was the hardest part of
this PR and while I think it should be good someone else should
definitely double-check me. Additionally I haven't added many
instructions to the x64 backend yet so I may have missed obvious places
to tests or such, so am happy to follow-up with anything to be more
thorough if necessary.

Finally I should note that this is just the tip of the iceberg when it
comes to AVX. My hope is to get some of the idioms sorted out to make it
easier for future PRs to add one-off instruction lowerings or such.

* Review feedback
2023-02-17 01:29:55 +00:00
Chris Fallin
c7e2571866 egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)
This is a short-term fix to the same bug that #5800 is addressing
(#5796), but with less risk: it simply turns off GVN'ing of effectful
but idempotent ops. Because we have an upcoming release, and this is a
miscompile (albeit to do with trapping behavior), we would like to make
the simplest possible fix that avoids the bug, and backport it. I will
then rebase #5800 on top of a revert of this followed by the more
complete fix.
2023-02-16 21:29:03 +00:00
Alex Crichton
cae3b26623 x64: Improve codegen for vectors with constant shift amounts (#5797)
I stumbled across this working on #5795 and figured this was a nice
opportunity to improve the codegen here.
2023-02-16 20:47:59 +00:00
Trevor Elliott
80c147d9c0 Rework br_table to use BlockCall (#5731)
Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.
2023-02-16 09:23:27 -08:00
Trevor Elliott
cc073593a4 Fix block label printing in precise-output tests (#5798)
As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function.

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-16 02:35:26 +00:00
Trevor Elliott
f04decc4a1 Use capstone to validate precise-output tests (#5780)
Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.
2023-02-15 16:35:10 -08:00
Afonso Bordado
eabd43a178 aarch64: Support GOT Relative relocations in PIC mode (#5550)
* cranelift: Add `adrp` encoding to AArch64 backend

* cranelift: Support GOT Symbol References in AArch64

* cranelift: Add MachO GOT relocations

* cranelift: Do not mark the GOT PageOffset12 MachO relocation as relative
2023-02-15 15:19:18 -08:00
Afonso Bordado
76539ef9f2 cranelift: Optimize select+icmp into {s,u}{min,max} (#5546)
* cranelift: Optimize `select+icmp` into `{s,u}{min,max}`

* cranelift: Add generic egraph icmp reverse rule

* cranelift: Optimize `vselect+icmp` into `{s,u}{min,max}`

* cranelift: Optimize some `vselect+fcmp` into `f{min,max}_pseudo`

* cranelift: Add inverted forms of min/max rules
2023-02-15 15:06:21 -08:00
Trevor Elliott
f0137c2618 x64: Fix the formatting for andn (#5789)
* Print AluRmRVex instructions with the destination last
* Update andn tests
2023-02-15 11:16:59 -08:00
Ulrich Weigand
305000d14b s390x: Fix instruction encoding and disassembly format bugs (#5786)
- Fix encoding of the AHY instruction.
- Fix disassembly format of FIEBR, FIDBR, and LEDBRA instructions.
2023-02-15 08:36:44 -08:00
Ulrich Weigand
e10094dcd6 s390x: Support scalar min/max clif instructions (#5762)
We don't have ISA instructions for that, so simply expand them
to icmp + select.

Also enable fuzzing for those clif instructions now.
2023-02-15 11:45:09 +00:00
Nick Fitzgerald
6df3bbbe60 Cranelift: Collapse double extends into a single extend (#5772) 2023-02-13 22:43:17 +00:00
Alex Crichton
a0a97f5e8f Add (bnot (bxor x y)) lowerings for s390x/aarch64 (#5763)
* Add (bnot (bxor x y)) lowerings for s390x/aarch64

I originally thought that s390x's original lowering in #5709, but as was
rightfully pointed out `(bnot (bxor x y))` is equivalent to
`(bxor x (bnot y))` so the special lowering for one should apply as a
special lowering for the other. For the s390x and aarch64 backend that
have already have a fused lowering of the bxor/bnot add a lowering
additionally for the bnot/bxor combination.

* Add bnot(bxor(..)) tests for s390x 128-bit sizes
2023-02-13 15:41:18 +00:00
Trevor Elliott
15fe9c7c93 Inline jump tables in parsed br_table instructions (#5755)
As jump tables are used by at most one br_table instruction, inline their definition in those instructions instead of requiring them to be declared as function-level metadata.
2023-02-09 14:24:04 -08:00
Alex Crichton
72962c9f08 Add some minor souper-harvested optimizations (#5735)
I was playing around with souper recently on some wasms I had lying
around and these are some optimization opportunities that popped out
which seemed easy-enough to add to the egraph-based optimizations.
2023-02-07 14:06:24 -06:00
Chris Fallin
75ae976adc egraphs: fix accidental remat of call. (#5726)
In the provided test case in #5716, the result of a call was then
added to 0. We have a rewrite rule that sets the remat-bit on any add
of a value and a constant, because these frequently appear (e.g. from
address offset calculations) and this can frequently reduce register
pressure (one long-lived base vs. many long-lived base+offset values).
Separately, we have an algebraic rule that `x+0` rewrites to `x`.

The result of this was that we had an eclass with the remat bit set on
the add, but the add was also union'd into the call. We pick the
latter during extraction, because it's cheaper not to do the add at
all; but we still get the remat bit, and try to remat a call (!),
which blows up later.

This PR fixes the logic to look up the "best value" for a value (i.e.,
whatever extraction determined), and look up the remat bit on *that*
node, not the canonical node.

(Why did the canonical node become the iadd and not the call? Because
the former had a lower value-number, as an accident of IR
construction; we don't impose any requirements on the input CLIF's
value-number ordering, and I don't think this breaks any of the
important acyclic properties, even though there is technically a
dependence from a lower-numbered to a higher-numbered node. In essence
one can think of them as having "virtual numbers" in any true
topologically-sorted order, and the only place the actual integer
indices matter should be in choosing the "canonical ID", which is just
used for dedup'ing, modulo this bug.)

Fixes #5716.
2023-02-06 23:36:16 +00:00
Alex Crichton
de0e0bea3f Legalize b{and,or,xor}_not into component instructions (#5709)
* Remove trailing whitespace in `lower.isle` files

* Legalize the `band_not` instruction into simpler form

This commit legalizes the `band_not` instruction into `band`-of-`bnot`,
or two instructions. This is intended to assist with egraph-based
optimizations where the `band_not` instruction doesn't have to be
specifically included in other bit-operation-patterns.

Lowerings of the `band_not` instruction have been moved to a
specialization of the `band` instruction.

* Legalize `bor_not` into components

Same as prior commit, but for the `bor_not` instruction.

* Legalize bxor_not into bxor-of-bnot

Same as prior commits. I think this also ended up fixing a bug in the
s390x backend where `bxor_not x y` was actually translated as `bnot
(bxor x y)` by accident given the test update changes.

* Simplify not-fused operands for riscv64

Looks like some delegated-to rules have special-cases for "if this
feature is enabled use the fused instruction" so move the clause for
testing the feature up to the lowering phase to help trigger other rules
if the feature isn't enabled. This should make the riscv64 backend more
consistent with how other backends are implemented.

* Remove B{and,or,xor}Not from cost of egraph metrics

These shouldn't ever reach egraphs now that they're legalized away.

* Add an egraph optimization for `x^-1 => ~x`

This adds a simplification node to translate xor-against-minus-1 to a
`bnot` instruction. This helps trigger various other optimizations in
the egraph implementation and also various backend lowering rules for
instructions. This is chiefly useful as wasm doesn't have a `bnot`
equivalent, so it's encoded as `x^-1`.

* Add a wasm test for end-to-end bitwise lowerings

Test that end-to-end various optimizations are being applied for input
wasm modules.

* Specifically don't self-update rustup on CI

I forget why this was here originally, but this is failing on Windows
CI. In general there's no need to update rustup, so leave it as-is.

* Cleanup some aarch64 lowering rules

Previously a 32/64 split was necessary due to the `ALUOp` being different
but that's been refactored away no so there's no longer any need for
duplicate rules.

* Narrow a x64 lowering rule

This previously made more sense when it was `band_not` and rarely used,
but be more specific in the type-filter on this rule that it's only
applicable to SIMD types with lanes.

* Simplify xor-against-minus-1 rule

No need to have the commutative version since constants are already
shuffled right for egraphs

* Optimize band-of-bnot when bnot is on the left

Use some more rules in the egraph algebraic optimizations to
canonicalize band/bor/bxor with a `bnot` operand to put the operand on
the right. That way the lowerings in the backends only have to list the
rule once, with the operand on the right, to optimize both styles of
input.

* Add commutative lowering rules

* Update cranelift/codegen/src/isa/x64/lower.isle

Co-authored-by: Jamey Sharp <jamey@minilop.net>

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-06 13:53:40 -06:00
Trevor Elliott
6d8f2be9e1 Use andn for band_not when bmi1 is present (#5701)
We can use the andn instruction for the lowering of band_not on x64 when bmi1 is available.
2023-02-03 16:23:18 -08:00
Nick Fitzgerald
e18d4cb711 Cranelift: Introduce support for return_call in the interpreter (#5697)
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-03 15:53:54 -08:00
Nick Fitzgerald
72c8513411 Cranelift: Correctly wrap shifts in constant propagation (#5695)
Fixes #5690
Fixes #5696

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-03 00:12:57 +00:00
Jun Ryung Ju
9cd4146939 Implemented b{and,or,xor}_not bitops for ty_int_ref_scalar_64 type. (#5604)
* Implemented `b{and,or,xor}_not` bitops for ty_int_ref_scalar_64 type.

* Added tests.
2023-02-01 21:57:18 -08:00
Jamey Sharp
ac4d28f4dd Constant-fold icmp instructions (#5666)
We found examples of icmp instructions with both operands constant in
spidermonkey.wasm.
2023-02-01 21:55:36 +00:00
Nick Fitzgerald
bdfb746548 Cranelift: Introduce the return_call and return_call_indirect instructions (#5679)
* Cranelift: Introduce the `tail` calling convention

This is an unstable-ABI calling convention that we will eventually use to
support Wasm tail calls.

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* Cranelift: Introduce the `return_call` and `return_call_indirect` instructions

These will be used to implement tail calls for Wasm and any other language
targeting CLIF. The `return_call_indirect` instruction differs from the Wasm
instruction of the same name by taking a native address callee rather than a
Wasm function index.

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* Cranelift: Implement verification rules for `return_call[_indirect]`

They must:

* have the same return types between the caller and callee,
* have the same calling convention between caller and callee,
* and that calling convention must support tail calls.

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* cargo fmt

---------

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-01 21:20:35 +00:00
Nick Fitzgerald
ffbbfbffce Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) again (#5684)
This rewrite was introduced in #5676 and then reverted in #5682 due to a footgun
where we accidentally weren't actually checking the `y == !z` precondition. This
commit fixes the precondition check. It also fixes the arithmetic to be
correctly masked to the value type's width.

This reverts commit 268f6bfc1d.
2023-02-01 20:53:22 +00:00
yuyang
cb3b6c621f fix rotl.i16 with i128 shift value. (#5611)
* fix issue 5523.

* fix.

* add missing issue file.

* fix issue.

* fix duplicate shamt_128.

* issue 5523 add test target,and fix some wrong comment.

* fix output file.

* enable llvm_abi_extensions for regression test file.
2023-02-01 03:44:13 +00:00
Trevor Elliott
268f6bfc1d Revert "Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) (#5676)" (#5682)
This reverts commit 8c9eb9939b.

Fixes #5680
2023-02-01 02:53:23 +00:00
yuyang
0c66a1bba7 Fix issue 5528 (#5605)
* fix parameter error.

* fix float convert to i8 and i16   should extract sign bit.

* add missing regression test file.

* using tmp register.

* float convert i8 will consume more instructions.

* fix worse inst emit size.

* fix worst_case_size.
2023-01-31 15:37:36 -08:00
Nick Fitzgerald
8c9eb9939b Cranelift: Rewrite or(and(x, y), not(y)) => or(x, not(y)) (#5676)
Co-authored-by: Rainy Sinclair <844493+itsrainy@users.noreply.github.com>
2023-01-31 22:44:45 +00:00