Commit Graph

4327 Commits

Author SHA1 Message Date
Afonso Bordado
853ff787f3 fuzzgen: Refactor name and signature generation (#5764)
* fuzzgen: Move cranelift type generation into CraneliftArbitrary

* fuzzgen: Deduplicate DataValue generation

* fuzzgen: Remove unused code

* fuzzgen: Pass allowed function calls into `FunctionGenerator`
2023-02-17 20:48:12 +00:00
Afonso Bordado
a7bd65d116 fuzzgen: Allow inline stackprobes for riscv64 (#5822) 2023-02-17 20:47:39 +00:00
Trevor Elliott
a139ed6d56 Fix the postorder traversal in the DominatorTree (#5821)
Fix the postorder traversal computed by the `DominatorTree`. It was
recording nodes in the wrong order depending on the order child nodes
were visited. Consider the following program:

```
function %foo2(i8) -> i8 {
block0(v0: i8):
    brif v0, block1, block2

block1:
    return v0

block2:
    jump block1
}
```

The postorder produced by the previous implementation was:

```
block2
block1
block0
```

Which is incorrect, as `block1` is branched to by `block2`. Changing the
branch order in the function would also change the postorder result,
yielding the expected order with `block1` emitted first.

The problem was that when pushing successor nodes onto the stack, the
old implementation would also mark them SEEN. This would then prevent
them from being pushed on the stack again in the future, which is
incorrect as they might be visited by other nodes that have not yet been
pushed. This causes nodes to potentially show up later in the postorder
traversal than they should.

This PR reworks the implementation of `DominatorTree::compute` to
produce an order where `block1` is always returned first, regardless of
the branch order in the original program.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-17 20:39:04 +00:00
Berkus Decker
c8fa1b845f Fix typo (#5814) 2023-02-17 15:08:07 +00:00
Alex Crichton
453330b2db x64: Add rudimentary support for some AVX instructions (#5795)
* x64: Add rudimentary support for some AVX instructions

I was poking around Spidermonkey's wasm backend and saw that the various
assembler functions used are all `v*`-prefixed which look like they're
intended for use with AVX instructions. I looked at Cranelift and it
currently doesn't have support for many AVX-based instructions, so I
figured I'd take a crack at it!

The support added here is a bit of a mishmash when viewed alone, but my
general goal was to take a single instruction from the SIMD proposal for
WebAssembly and migrate all of its component instructions to AVX. I, by
random chance, picked a pretty complicated instruction of `f32x4.min`.
This wasm instruction is implemented on x64 with 4 unique SSE
instructions and ended up being a pretty good candidate.

Further digging about AVX-vs-SSE shows that there should be two major
benefits to using AVX over SSE:

* Primarily AVX instructions largely use a three-operand form where two
  input registers are operated with and an output register is also
  specified. This is in contrast to SSE's predominant
  one-register-is-input-but-also-output pattern. This should help free
  up the register allocator a bit and additionally remove the need for
  movement between registers.

* As #4767 notes the memory-based operations of VEX-encoded instructions
  (aka AVX instructions) do not have strict alignment requirements which
  means we would be able to sink loads and stores into individual
  instructions instead of having separate instructions.

So I set out on my journey to implement the instructions used by
`f32x4.min`. The first few were fairly easy. The machinst backends are
already of the shape "take these inputs and compute the output" where
the x86 requirement of a register being both input and output is
postprocessed in. This means that the `inst.isle` creation helpers for
SSE instructions were already of the correct form to use AVX. I chose to
add new `rule` branches for the instruction creation helpers, for
example `x64_andnps`. The new `rule` conditionally only runs if AVX is
enabled and emits an AVX instruction instead of an SSE instruction for
achieving the same goal. This means that no lowerings of clif
instructions were modified, instead just new instructions are being
generated.

The VEX encoding was previously not heavily used in Cranelift. The only
current user are the FMA-style instructions that Cranelift has at this
time. These FMA instructions have one extra operand than `vandnps`, for
example, so I split the existing `XmmRmRVex` into a few more variants to
fit the shape of the instructions that needed generating for
`f32x4.min`. This was accompanied then with more AVX opcode definitions,
more emission support, etc.

Upon implementing all of this it turned out that the test suite was
failing on my machine due to the memory-operand encodings of VEX
instructions not being supported. I didn't explicitly add those in
myself but some preexisting RIP-relative addressing was leaking into the
new instructions with existing tests. I opted to go ahead and fill out
the memory addressing modes of VEX encoding to get the tests passing
again.

All-in-all this PR adds new instructions to the x64 backend for a number
of AVX instructions, updates 5 existing instruction producers to use AVX
instructions conditionally, implements VEX memory operands, and adds
some simple tests for the new output of `f32x4.min`. The existing
runtest for `f32x.min` caught a few intermediate bugs along the way and
I additionally added a plain `target x86_64` to that runtest to ensure
that it executes with and without AVX to test the various lowerings.
I'll also note that this, and future support, should be well-fuzzed
through Wasmtime's fuzzing which may explicitly disable AVX support
despite the machine having access to AVX, so non-AVX lowerings should be
well-tested into the future.

It's also worth mentioning that I am not an AVX or VEX or x64 expert.
Implementing the memory operand part for VEX was the hardest part of
this PR and while I think it should be good someone else should
definitely double-check me. Additionally I haven't added many
instructions to the x64 backend yet so I may have missed obvious places
to tests or such, so am happy to follow-up with anything to be more
thorough if necessary.

Finally I should note that this is just the tip of the iceberg when it
comes to AVX. My hope is to get some of the idioms sorted out to make it
easier for future PRs to add one-off instruction lowerings or such.

* Review feedback
2023-02-17 01:29:55 +00:00
Trevor Elliott
d711872d63 Refactor collect_branches_and_targets to not need a smallvec (#5803)
* Refactor collect_branches_and_targets to not need a smallvec

Basic blocks are terminated by at most one branch instruction now, so we
can use that assumption in `collect_branches_and_targets` to return the
last instruction we saw instead.

* Review comments
2023-02-16 21:30:17 +00:00
Chris Fallin
c7e2571866 egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)
This is a short-term fix to the same bug that #5800 is addressing
(#5796), but with less risk: it simply turns off GVN'ing of effectful
but idempotent ops. Because we have an upcoming release, and this is a
miscompile (albeit to do with trapping behavior), we would like to make
the simplest possible fix that avoids the bug, and backport it. I will
then rebase #5800 on top of a revert of this followed by the more
complete fix.
2023-02-16 21:29:03 +00:00
Alex Crichton
cae3b26623 x64: Improve codegen for vectors with constant shift amounts (#5797)
I stumbled across this working on #5795 and figured this was a nice
opportunity to improve the codegen here.
2023-02-16 20:47:59 +00:00
Trevor Elliott
80c147d9c0 Rework br_table to use BlockCall (#5731)
Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.
2023-02-16 09:23:27 -08:00
Chris Fallin
c15c4ed23d Cranelift: upgrade to regalloc2 0.6.1. (#5799)
* Cranelift: upgrade to regalloc2 0.6.1.

Fixes #5791 by pulling in bytecodealliance/regalloc2#113.

* Add cargo-vet entry for regalloc2 0.6.1.
2023-02-16 03:22:58 +00:00
Trevor Elliott
cc073593a4 Fix block label printing in precise-output tests (#5798)
As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function.

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-16 02:35:26 +00:00
Trevor Elliott
f04decc4a1 Use capstone to validate precise-output tests (#5780)
Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.
2023-02-15 16:35:10 -08:00
Afonso Bordado
eabd43a178 aarch64: Support GOT Relative relocations in PIC mode (#5550)
* cranelift: Add `adrp` encoding to AArch64 backend

* cranelift: Support GOT Symbol References in AArch64

* cranelift: Add MachO GOT relocations

* cranelift: Do not mark the GOT PageOffset12 MachO relocation as relative
2023-02-15 15:19:18 -08:00
Trevor Elliott
aba239e9b8 Fix handling of jumps in bugpoint (#5794)
Fixes #5792
2023-02-15 15:07:03 -08:00
Afonso Bordado
76539ef9f2 cranelift: Optimize select+icmp into {s,u}{min,max} (#5546)
* cranelift: Optimize `select+icmp` into `{s,u}{min,max}`

* cranelift: Add generic egraph icmp reverse rule

* cranelift: Optimize `vselect+icmp` into `{s,u}{min,max}`

* cranelift: Optimize some `vselect+fcmp` into `f{min,max}_pseudo`

* cranelift: Add inverted forms of min/max rules
2023-02-15 15:06:21 -08:00
Trevor Elliott
f0137c2618 x64: Fix the formatting for andn (#5789)
* Print AluRmRVex instructions with the destination last
* Update andn tests
2023-02-15 11:16:59 -08:00
Alex Crichton
0037b71b11 Use xmm_rm_r more frequently in x64 backend (#5787)
This updates the signatures of the `xmm_rm_r` helper function and then
updates existing users and migrates other users to the helper now that
the type information is no longer required.
2023-02-15 09:03:19 -08:00
Ulrich Weigand
305000d14b s390x: Fix instruction encoding and disassembly format bugs (#5786)
- Fix encoding of the AHY instruction.
- Fix disassembly format of FIEBR, FIDBR, and LEDBRA instructions.
2023-02-15 08:36:44 -08:00
Ulrich Weigand
e10094dcd6 s390x: Support scalar min/max clif instructions (#5762)
We don't have ISA instructions for that, so simply expand them
to icmp + select.

Also enable fuzzing for those clif instructions now.
2023-02-15 11:45:09 +00:00
Alphyr
cb150d37ce Update dependencies (#5513) 2023-02-14 19:45:15 +00:00
Nick Fitzgerald
6df3bbbe60 Cranelift: Collapse double extends into a single extend (#5772) 2023-02-13 22:43:17 +00:00
Trevor Elliott
19f337e29b Move the default block to the front of the underlying jump table storage (#5770)
The new api on JumpTableData makese it easy to keep the default label first, and that shrinks the diff in #5731 a bit.
2023-02-13 20:50:29 +00:00
Alex Crichton
a0a97f5e8f Add (bnot (bxor x y)) lowerings for s390x/aarch64 (#5763)
* Add (bnot (bxor x y)) lowerings for s390x/aarch64

I originally thought that s390x's original lowering in #5709, but as was
rightfully pointed out `(bnot (bxor x y))` is equivalent to
`(bxor x (bnot y))` so the special lowering for one should apply as a
special lowering for the other. For the s390x and aarch64 backend that
have already have a fused lowering of the bxor/bnot add a lowering
additionally for the bnot/bxor combination.

* Add bnot(bxor(..)) tests for s390x 128-bit sizes
2023-02-13 15:41:18 +00:00
Trevor Elliott
d99783fc91 Move default blocks into jump tables (#5756)
Move the default block off of the br_table instrution, and into the JumpTable that it references.
2023-02-10 08:53:30 -08:00
Trevor Elliott
15fe9c7c93 Inline jump tables in parsed br_table instructions (#5755)
As jump tables are used by at most one br_table instruction, inline their definition in those instructions instead of requiring them to be declared as function-level metadata.
2023-02-09 14:24:04 -08:00
bjorn3
202d3af16a Remove the unused sigid argument purpose (#5753) 2023-02-09 09:18:39 -08:00
Amanieu d'Antras
a2d356d45e Add JITBuilder::with_flags constructor (#5751)
This allows custom flags to be set (e.g. `opt-level`) while still
leaving most of of the boilerplate to select the native target to
the `JITBuilder`.
2023-02-09 02:49:17 +00:00
Ayomide Bamidele
9637840b4b Update AMD and generic x86 CPU presets to match LLVM (#5575)
* Add x86 AMD and generic presets

* Fix typos

* Add zen2 definition
2023-02-08 15:56:45 -08:00
Trevor Elliott
34ec4b4e44 Reuse inst_predicates::visit_block_succs in more places (#5750)
Following up from #5730, replace some explicit matching over branch instructions with a use of inst_predicates::visit_block_succs.
2023-02-08 15:42:24 -08:00
Trevor Elliott
b0b3f67cb0 Move jump tables to the DataFlowGraph (#5745)
Move the storage for jump tables off of FunctionStencil and onto DataFlowGraph. This change is in service of #5731, making it easier to access the jump table data in the context of helpers like inst_values.
2023-02-07 21:21:35 -08:00
Jamey Sharp
7bf89683e9 Generalize and/or/xor optimizations (#5744)
* Generalize `n ^ !n` optimization to more types

* Generalize `x & -1` optimization to more types

Also mark the `x & x` rewrite to `subsume`.

* Cranelift: Optimize x|!x and x&!x to constants

These cases are much like the existing x^!x rules.
2023-02-08 02:18:36 +00:00
Trevor Elliott
d71c9458dc Make DataFlowGraph::blocks public (#5740)
Similar to when we exposed the DataFlowGraph::insts field through a restrictive newtype, expose DataFlowGraph::blocks through an interface that allows a restrictive set of operations. This field being public now allows us to avoid a rematch in ssa construction, and simplifies the implementation of adding a block argument to a block referenced by a br_table instruction.
2023-02-07 17:11:14 -08:00
Jamey Sharp
f3b408d5e2 Algebraic opts: Reuse iconst 0 from LHS (#5724)
We don't need to spend time going through the GVN map to dedup a
newly-constructed `iconst 0` when we already matched that value on the
left-hand side of these rules.

Also, mark these rules as subsuming any others since we can't do better
than reducing an expression to a constant.
2023-02-08 00:11:07 +00:00
Trevor Elliott
116e5a665f Bump regalloc2 to 0.6.0 (#5742)
* Bump regalloc2
* Certify regalloc2 0.6.0
2023-02-07 15:57:49 -08:00
Trevor Elliott
3343cf80e9 Add assertions for matches that used to use analyze_branch (#5733)
Following up from #5730, add debug assertions to ensure that new branch instructions don't slip through matches that used to use analyze_branch.
2023-02-07 14:51:18 -08:00
Trevor Elliott
2c8425998b Refactor matches that used to consume BranchInfo (#5734)
Explicitly borrow the instruction data, and use a mutable borrow to avoid rematch.
2023-02-07 13:29:42 -08:00
Raekye
fdd4a778fc Fix ABI of jitted function in cranelift-jit example. (#5736) 2023-02-07 21:16:05 +00:00
Alex Crichton
72962c9f08 Add some minor souper-harvested optimizations (#5735)
I was playing around with souper recently on some wasms I had lying
around and these are some optimization opportunities that popped out
which seemed easy-enough to add to the egraph-based optimizations.
2023-02-07 14:06:24 -06:00
Chris Fallin
673b448cfe Cranelift DFG: make inst clone deep-clone varargs lists. (#5727)
When investigating #5716, I found that rematerialization of a `call`, in
addition to blowing up for other reasons, caused aliasing of the varargs
list (the `EntityList` in the `ListPool`), such that editing the args of
the second copy of the call instruction inadvertently updated the first
as well.

This PR modifies `DataFlowGraph::clone_inst` so that it always clones
the varargs list if present. This shouldn't have any functional impact
on Cranelift today, because we don't rematerialize any instructions with
varargs; but it's important to get it right to avoid a bug later!
2023-02-07 01:21:09 +00:00
Trevor Elliott
c8a6adf825 Remove analyze_branch and BranchInfo (#5730)
We don't have overlap in behavior for branch instructions anymore, so we can remove analyze_branch and instead match on the InstructionData directly.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-06 17:06:57 -08:00
Chris Fallin
75ae976adc egraphs: fix accidental remat of call. (#5726)
In the provided test case in #5716, the result of a call was then
added to 0. We have a rewrite rule that sets the remat-bit on any add
of a value and a constant, because these frequently appear (e.g. from
address offset calculations) and this can frequently reduce register
pressure (one long-lived base vs. many long-lived base+offset values).
Separately, we have an algebraic rule that `x+0` rewrites to `x`.

The result of this was that we had an eclass with the remat bit set on
the add, but the add was also union'd into the call. We pick the
latter during extraction, because it's cheaper not to do the add at
all; but we still get the remat bit, and try to remat a call (!),
which blows up later.

This PR fixes the logic to look up the "best value" for a value (i.e.,
whatever extraction determined), and look up the remat bit on *that*
node, not the canonical node.

(Why did the canonical node become the iadd and not the call? Because
the former had a lower value-number, as an accident of IR
construction; we don't impose any requirements on the input CLIF's
value-number ordering, and I don't think this breaks any of the
important acyclic properties, even though there is technically a
dependence from a lower-numbered to a higher-numbered node. In essence
one can think of them as having "virtual numbers" in any true
topologically-sorted order, and the only place the actual integer
indices matter should be in choosing the "canonical ID", which is just
used for dedup'ing, modulo this bug.)

Fixes #5716.
2023-02-06 23:36:16 +00:00
bjorn3
16afefdab1 Some refactorings to the ISLE parser (#5693)
* Use is_ascii_digit and is_ascii_hexdigit in the ISLE lexer

* Use range pattern in ISLE lexer

* Use a couple of shorthands in the ISLE parser

* Use parse_ident instead of symbol + str_to_ident

* Introduce token eating api

This is a non-fatal version of the take api

* Rename take to expect and add expect_ prefixes to several methods

* Review comments
2023-02-06 15:11:25 -08:00
Trevor Elliott
e9c05622c0 Keep reachable jump tables (#5721)
Instead of identifying unused branch tables by looking for unused blocks inside of them, track used branch tables while traversing reachable blocks. This introduces an extra allocation of an EntitySet to track the used jump tables, but as those are few and this function runs once per ir::Function, the allocation seems reasonable.
2023-02-06 14:10:47 -08:00
Jamey Sharp
65c1f654f2 Cranelift: Only build iconst for ints <= 64 bits (#5723)
I audited the egraph "algebraic" optimization rules for any which
construct an `iconst` on the right-hand side of the rule. In these cases
we need to constrain the type passed to `iconst` to be both `fits_in_64`
and `ty_int`, because `iconst` is not defined on other types.
2023-02-06 14:10:29 -08:00
Alex Crichton
de0e0bea3f Legalize b{and,or,xor}_not into component instructions (#5709)
* Remove trailing whitespace in `lower.isle` files

* Legalize the `band_not` instruction into simpler form

This commit legalizes the `band_not` instruction into `band`-of-`bnot`,
or two instructions. This is intended to assist with egraph-based
optimizations where the `band_not` instruction doesn't have to be
specifically included in other bit-operation-patterns.

Lowerings of the `band_not` instruction have been moved to a
specialization of the `band` instruction.

* Legalize `bor_not` into components

Same as prior commit, but for the `bor_not` instruction.

* Legalize bxor_not into bxor-of-bnot

Same as prior commits. I think this also ended up fixing a bug in the
s390x backend where `bxor_not x y` was actually translated as `bnot
(bxor x y)` by accident given the test update changes.

* Simplify not-fused operands for riscv64

Looks like some delegated-to rules have special-cases for "if this
feature is enabled use the fused instruction" so move the clause for
testing the feature up to the lowering phase to help trigger other rules
if the feature isn't enabled. This should make the riscv64 backend more
consistent with how other backends are implemented.

* Remove B{and,or,xor}Not from cost of egraph metrics

These shouldn't ever reach egraphs now that they're legalized away.

* Add an egraph optimization for `x^-1 => ~x`

This adds a simplification node to translate xor-against-minus-1 to a
`bnot` instruction. This helps trigger various other optimizations in
the egraph implementation and also various backend lowering rules for
instructions. This is chiefly useful as wasm doesn't have a `bnot`
equivalent, so it's encoded as `x^-1`.

* Add a wasm test for end-to-end bitwise lowerings

Test that end-to-end various optimizations are being applied for input
wasm modules.

* Specifically don't self-update rustup on CI

I forget why this was here originally, but this is failing on Windows
CI. In general there's no need to update rustup, so leave it as-is.

* Cleanup some aarch64 lowering rules

Previously a 32/64 split was necessary due to the `ALUOp` being different
but that's been refactored away no so there's no longer any need for
duplicate rules.

* Narrow a x64 lowering rule

This previously made more sense when it was `band_not` and rarely used,
but be more specific in the type-filter on this rule that it's only
applicable to SIMD types with lanes.

* Simplify xor-against-minus-1 rule

No need to have the commutative version since constants are already
shuffled right for egraphs

* Optimize band-of-bnot when bnot is on the left

Use some more rules in the egraph algebraic optimizations to
canonicalize band/bor/bxor with a `bnot` operand to put the operand on
the right. That way the lowerings in the backends only have to list the
rule once, with the operand on the right, to optimize both styles of
input.

* Add commutative lowering rules

* Update cranelift/codegen/src/isa/x64/lower.isle

Co-authored-by: Jamey Sharp <jamey@minilop.net>

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-06 13:53:40 -06:00
Afonso Bordado
99c3936616 fuzzgen: Enable rotl for riscv64 (#5715)
It was fixed in #5611
2023-02-06 18:27:31 +00:00
Jamey Sharp
23e1d6b5e3 egraphs/cprop: Don't extend constants to i128 (#5717)
Fixes #5711.
2023-02-06 17:34:21 +00:00
wasmtime-publish
482f541101 Bump Wasmtime to 7.0.0 (#5712)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2023-02-06 09:10:19 -06:00
Jamey Sharp
97381792ac Generalize u/sextend constant folding to all types (#5706)
Also move these optimization rules to cprop.isle; it's where all the
other similar rules are.

Like the other cprop rules, these can subsume any other rules. We can't
do better than reducing an expression to a constant.

The new i64_sextend_imm64 and u64_uextend_imm64 constructors are useful
helpers to clean up other code. I applied them to `imm64_icmp` while I
was here, as well as using the existing `ty_mask` helper to clean up
`imm64_masked`.
2023-02-03 17:29:21 -08:00
Trevor Elliott
6d8f2be9e1 Use andn for band_not when bmi1 is present (#5701)
We can use the andn instruction for the lowering of band_not on x64 when bmi1 is available.
2023-02-03 16:23:18 -08:00