Commit Graph

10928 Commits

Author SHA1 Message Date
Alex Crichton
9b86a0b9b1 Remove the widening_pairwise_dot_product_s clif instruction (#5889)
This was added for the wasm SIMD proposal but I've been poking around at
this recently and the instruction can instead be represented by its
component parts with the same semantics I believe. This commit removes
the instruction and instead represents it with the existing
`iadd_pairwise` instruction (among others) and updates backends to with
new pattern matches to have the same codegen as before.

This interestingly entirely removed the codegen rule with no replacement
on the AArch64 backend as the existing rules all existed to produce the
same codegen.
2023-02-27 18:43:43 +00:00
Jamey Sharp
6cf7155052 Cranelift: Generalize (x << k) >> k optimization (#5746)
* Generalize unsigned `(x << k) >> k` optimization

Split the existing rule into three parts:
- A dual of the rule for `(x >> k) << k` that is only valid for unsigned
  shifts.
- Known-bits analysis for `(band (uextend x) k)`.
- A new rule for converting `sextend` to `uextend` if the sign-extended
  bits are masked out anyway.

The first two together cover the existing rule.

* Generalize signed `(x << k) >> k` optimization

* Review comments

* Generalize sign-extending shifts further

The shifts can be eliminated even if the shift amount isn't exactly
equal to the difference in bit-widths between the narrow and wide types.

* Add filetests
2023-02-27 17:34:46 +00:00
Volker Mische
6f64e39dda Fix function call on component instance (#5887)
The exported function in the instance is not called directly by
its name, but by `call_<the-name>`.
2023-02-27 15:10:56 +00:00
yuyang
3864286596 fix issue 5714. (#5845)
* fix issue 5714.

* add target for regression test.

* remove x86_64 test because of not implemented.
2023-02-26 16:25:38 +00:00
Jan-Justin van Tonder
66cb13cb4b cranelift: Add atomic_cas to interpreter (#5875)
As per issue #5818, atomic_cas was implemented without specific regard for thread safety.
2023-02-25 14:36:49 +00:00
Afonso Bordado
e9095050be cranelift-interpreter: Implement call_indirect and return_call_indirect (#5877)
* cranelift-interpreter: Implement `call_indirect`

* cranelift: Fix typo

* riscv64: Enable `call_indirect` tests
2023-02-25 13:16:59 +00:00
Afonso Bordado
36e92add6f riscv64: Move is_null/is_invalid to ISLE (#5874)
* riscv64: Move `is_null`/`is_invalid` to ISLE

* riscv64: Fix `is_invalid` codegen

* Implement review suggestions

Thanks!

Co-authored-by: Jamey Sharp <jamey@minilop.net>

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-02-25 12:48:44 +00:00
Dan Gohman
67e2e57b02 Allow WASI preopen file descriptors to be closed. (#5828)
Early on in WASI, we weren't sure whether we should allow preopens to be
closed, so conservatively, we disallowed them. Among other things, this
protected assumptions in wasi-libc that it can hold onto preopen file
descriptors and rely on them always being open.

However now, I think it makes sense to relax this restriction. wasi-libc
itself doesn't expose the preopen file descriptors, so users shouldn't
ever be closing them naively, unless they have wild closes. And
toolchains other than wasi-libc may want to close preopens as a way to
drop priveleges once the main file handles are opened.
2023-02-24 21:06:38 +00:00
Alex Crichton
fb2cbec34a Add vet entries for coredump support (#5878)
* Update the `num_cpus` crate

Audits for this update provided from our import from Mozilla.

* Add vet entries for coredump support
2023-02-24 18:26:39 +00:00
Trevor Elliott
4c88acbb89 Test all backends when a runtest is modified (#5872)
* Test all backends when a runtest is modified

* Check that this triggers all backend tests

* Revert "Check that this triggers all backend tests"

This reverts commit 1d12536d04f5a3b01fa5420f407960d7ab81da8f.
2023-02-24 15:39:37 +00:00
Jamey Sharp
5cfb461945 Only emit ISLE/egraph terms for single-value insts (#5848)
For instructions with no results (such as branches and stores) or
instructions with multiple results (such as add with carry), we have
assertions checking that an optimization rule doesn't try to match on
or construct such instructions.

When we generate terms for matching or constructing instructions, the
terms for these instructions are guaranteed to panic if they're ever
used. So let's just not generate them.

In the future we may wish to generate terms with different types for
these instructions, to make them usable in ISLE rules for optimization
that fall outside our current egraph constraints.
2023-02-24 15:38:48 +00:00
Ryan Levick
6d6bd0ea1c Result alias for convienient use of anyhow::Error without depending on anyhow (#5853)
* Add a Result type alias

* Refer to the type in top-level docs

* Use this inside the documentation for the bindgen! macro

* Fix tests

* Address small PR feedback

* Simply re-export anyhow types
2023-02-24 15:37:34 +00:00
Jamey Sharp
7d790fcdfe x64: Only branch once in br_table (#5850)
This uses the `cmov`, which was previously necessary for Spectre
mitigation, to clamp the table index instead of zeroing it. By then
placing the default target as the last entry in the table, we can use
just one branch instruction in all cases.

Since there isn't a bounds-check branch any more, this sequence no
longer needs Spectre mitigation. And since we don't need to be careful
about preserving flags, half the instructions can be removed from this
pseudoinstruction and emitted as regular instructions instead.

This is a net savings of three bytes in the encoding of x64's br_table
pseudoinstruction. The generated code can sometimes be longer overall
because the blocks are emitted in a slightly different order.

My benchmark results show a very small effect on runtime performance
with this change.

The spidermonkey benchmark in Sightglass runs "1.01x faster" than main
by instructions retired, but with no significant difference in CPU
cycles. I think that means it rarely hit the default case in any
br_table instructions it executed.

The pulldown-cmark benchmark in Sightglass runs "1.01x faster" than main
by CPU cycles, but main runs "1.00x faster" by instructions retired. I
think that means this benchmark hit the default case a significant
amount of the time, so it executes a few more instructions per br_table,
but maybe the branches were predicted better.
2023-02-24 04:46:38 +00:00
Trevor Elliott
c5d9d5b10f Remove module-level code generation tests (#5870)
* Remove module-level code generation tests

* Add cold block tests for each backend

* Better cold block tests
2023-02-24 01:19:26 +00:00
Alex Crichton
f91640ffab Fix a panic due to a race in unpark and park (#5871)
* Remove globals from parking spot tests

Use `std:🧵:scope` to keep everything local to just the tests.

* Fix a panic due to a race in `unpark` and `park`

This commit fixes a panic in the `ParkingSpot` implementation where an
`unpark` signal may not get acknowledged when a waiter times out,
causing the waiter to remove itself from the internal map but panic
thinking that it missed an unpark signal.

The fix in this commit is to consume unpark signals when a timeout
happens. This can lead to another possible race I've detailed in the
comments which I believe is allowed by the specification of park/unpark
in wasm.

* Update crates/runtime/src/parking_spot.rs

Co-authored-by: Andrew Brown <andrew.brown@intel.com>

---------

Co-authored-by: Andrew Brown <andrew.brown@intel.com>
2023-02-23 23:20:05 +00:00
Alex Crichton
3fc3bc9ec8 x64: Fill out more AVX instructions (#5849)
* x64: Fill out more AVX instructions

This commit fills out more AVX instructions for SSE counterparts
currently used. Many of these instructions do not benefit from the
3-operand form that AVX uses but instead benefit from being able to use
`XmmMem` instead of `XmmMemAligned` which may be able to avoid some
extra temporary registers in some cases.

* Review comments
2023-02-23 22:31:31 +00:00
Trevor Elliott
8abfe928d6 Reuse the DominatorTree postorder travesal in BlockLoweringOrder (#5843)
* Rework the blockorder module to reuse the dom tree's cfg postorder

* Update domtree tests

* Treat br_table with an empty jump table as multiple block exits

* Bless tests

* Change branch_idx to succ_idx and fix the comment
2023-02-23 22:05:20 +00:00
Ulrich Weigand
4314210162 s390x: Fix implementation of {s,u}{min,max} (#5864)
When expanding a min/max operation to a pair of icmp + select,
do not attempt to expand the input value operands twice, as
this might fail with memory operands.

Fixes https://github.com/bytecodealliance/wasmtime/issues/5859.
2023-02-23 20:01:51 +00:00
Afonso Bordado
fc080c739e fuzzgen: Add AtomicRMW (#5861) 2023-02-23 18:34:28 +00:00
Ulrich Weigand
9719147f91 s390x: Fix integer overflow during negation (#5866)
Use wrapping_neg in i{64,32,16}_from_negated_value to avoid Rust
aborts due to integer overflow.  The resulting INT_MIN is already
handled correctly in subsequent operations.

Fixes https://github.com/bytecodealliance/wasmtime/issues/5863.
2023-02-23 16:32:10 +00:00
Alex Crichton
761e44bd36 Fix running WASI tests in isolation (#5865)
Closes #5860
2023-02-23 16:04:15 +00:00
Noa
4f7746da60 Have StoreContext::data return &'a T (#5855) 2023-02-23 15:32:35 +00:00
Andrew Brown
f6b16a7178 wasi-threads: fix use of wait in test (#5858)
As @yamt points out [here], the `wait`/`notify` pairing used in this
manual WAT test was not effective. The `wait` always immediately
returned, meaning that the main thread essentially spins until a counter
is atomically incremented. This is fine for test correctness, but was
not the original intent, which was lost in a refactoring. This change
uses the `$i` local to keep track of the counter value we expect to see
for the `wait`, so that the `wait`/`notify` pair actually waits as
expected.

[here]: https://github.com/bytecodealliance/wasmtime/pull/5484#discussion_r1101200012
2023-02-23 15:23:58 +00:00
Jan-Justin van Tonder
0521155896 cranelift: Add atomic_rmw to interpreter (#5817) (#5856)
As per the linked issue, atomic_rmw was implemented without specific regard for thread safety.
Additionally, the relevant filetest (atomic-rmw-little.clif) was enabled and altered to fix an
inccorrect call to test function `%atomic_rmw_and_i64` after setting up test function
`%atomic_rmw_and_i32`.
2023-02-23 10:24:56 +00:00
Afonso Bordado
f6c6bc2155 riscv64: Improve signed and zero extend codegen (#5844)
* riscv64: Remove unused code

* riscv64: Group extend rules

* riscv64: Remove more unused rules

* riscv64: Cleanup existing extension rules

* riscv64: Move the existing Extend rules to ISLE

* riscv64: Use `sext.w` when extending

* riscv64: Remove duplicate extend tests

* riscv64: Use `zbb` instructions when extending values

* riscv64: Use `zbkb` extensions when zero extending

* riscv64: Enable additional tests for extend i128

* riscv64: Fix formatting for `Inst::Extend`

* riscv64: Reverse register for pack

* riscv64: Misc Cleanups

* riscv64: Cleanup extend rules
2023-02-22 17:41:14 +00:00
Afonso Bordado
6e6a1034d7 riscv64: Add bitmanip extension flags (#5847) 2023-02-21 22:12:44 +00:00
Alex Crichton
bd3dcd313d x64: Add more fma instruction lowerings (#5846)
The relaxed-simd proposal for WebAssembly adds a fused-multiply-add
operation for `v128` types so I was poking around at Cranelift's
existing support for its `fma` instruction. I was also poking around at
the x86_64 ISA's offerings for the FMA operation and ended up with this
PR that improves the lowering of the `fma` instruction on the x64
backend in a number of ways:

* A libcall-based fallback is now provided for `f32x4` and `f64x2` types
  in preparation for eventual support of the relaxed-simd proposal.
  These encodings are horribly slow, but it's expected that if FMA
  semantics must be guaranteed then it's the best that can be done
  without the `fma` feature. Otherwise it'll be up to producers (e.g.
  Wasmtime embedders) whether wasm-level FMA operations should be FMA or
  multiply-then-add.

* In addition to the existing `vfmadd213*` instructions opcodes were
  added for `vfmadd132*`. The `132` variant is selected based on which
  argument can have a sinkable load.

* Any argument in the `fma` CLIF instruction can now have a
  `sinkable_load` and it'll generate a single FMA instruction.

* All `vfnmadd*` opcodes were added as well. These are pattern-matched
  where one of the arguments to the CLIF instruction is an `fneg`. I
  opted to not add a new CLIF instruction here since it seemed like
  pattern matching was easy enough but I'm also not intimately familiar
  with the semantics here so if that's the preferred approach I can do
  that too.
2023-02-21 20:51:22 +00:00
Alex Crichton
d82ebcc102 x64: Enable load-coalescing for SSE/AVX instructions (#5841)
* x64: Enable load-coalescing for SSE/AVX instructions

This commit unlocks the ability to fold loads into operands of SSE and
AVX instructions. This is beneficial for both function size when it
happens in addition to being able to reduce register pressure.
Previously this was not done because most SSE instructions require
memory to be aligned. AVX instructions, however, do not have alignment
requirements.

The solution implemented here is one recommended by Chris which is to
add a new `XmmMemAligned` newtype wrapper around `XmmMem`. All SSE
instructions are now annotated as requiring an `XmmMemAligned` operand
except for a new new instruction styles used specifically for
instructions that don't require alignment (e.g.  `movdqu`, `*sd`, and
`*ss` instructions). All existing instruction helpers continue to take
`XmmMem`, however. This way if an AVX lowering is chosen it can be used
as-is. If an SSE lowering is chosen, however, then an automatic
conversion from `XmmMem` to `XmmMemAligned` kicks in. This automatic
conversion only fails for unaligned addresses in which case a load
instruction is emitted and the operand becomes a temporary register
instead. A number of prior `Xmm` arguments have now been converted to
`XmmMem` as well.

One change from this commit is that loading an unaligned operand for an
SSE instruction previously would use the "correct type" of load, e.g.
`movups` for f32x4 or `movup` for f64x2, but now the loading happens in
a context without type information so the `movdqu` instruction is
generated. According to [this stack overflow question][question] it
looks like modern processors won't penalize this "wrong" choice of type
when the operand is then used for f32 or f64 oriented instructions.

Finally this commit improves some reuse of logic in the `put_in_*_mem*`
helper to share code with `sinkable_load` and avoid duplication. With
this in place some various ISLE rules have been updated as well.

In the tests it can be seen that AVX-instructions are now automatically
load-coalesced and use memory operands in a few cases.

[question]: https://stackoverflow.com/questions/40854819/is-there-any-situation-where-using-movdqu-and-movupd-is-better-than-movups

* Fix tests

* Fix move-and-extend to be unaligned

These don't have alignment requirements like other xmm instructions as
well. Additionally add some ISA tests to ensure that their output is
tested.

* Review comments
2023-02-21 19:10:19 +00:00
Alex Crichton
c65de1f1b1 x64: Remove conditional SseOpcode::uses_src1 (#5842)
This is a follow-up to comments in #5795 to remove some cruft in the x64
instruction model to ensure that the shape of an `Inst` reflects what's
going to happen in regalloc and encoding. This accessor was used to
handle `round*`, `pextr*`, and `pshufb` instructions. The `round*` ones
had already moved to the appropriate `XmmUnary*` variant and `pshufb`
was additionally moved over to that variant as well.

The `pextr*` instructions got a new `Inst` variant and additionally had
their constructors slightly modified to no longer require the type as
input. The encoding for these instructions now automatically handles the
various type-related operands through a new `SseOpcode::Pextrq` operand
to represent 64-bit movements.
2023-02-21 18:17:07 +00:00
Alex Crichton
e6a5ec3fde x64: Tidy up some handling of sinkable loads (#5840)
This commit refactors a bit about how sinkable loads are handled in the
x64 backend. The intention is to bring most handling around sinkable
loads up to date with the current state of the backend since things have
changed since these were originally introduced, namely automatic
conversions between types in ISLE. For example the `Value` type can be
automatically converted to `RegMem` to perform load sinking, but some
rules are still explicitly doing matching themselves.

Here I've removed explicit handling of immediates and sinkable loads
when they're the right-hand-side of an operation. These cases are
already handle by the "base case" when converting a `Value` to a
`RegMemImm`. Instead only rules explicitly for left-hand-side immediates
and sinkable loads remain. This helps cut down on the number of explicit
rules needed.

Additionally in the same manner that `Value` can be automatically
converted to `RegMem` I've added automatic conversions from
`SinkableLoad` to `RegMem` and the various other newtypes. This helps
cut down a bit on rule verbosity where `sink_load_*` is largely no
longer necessary.
2023-02-21 18:15:08 +00:00
Afonso Bordado
0f51338def riscv64: Clear the top 32bits in the br_table index (#5831)
We were unintentionally relying on these to be zeroed when jumping.
2023-02-21 18:05:51 +00:00
Saúl Cabrera
4d954f5c0e winch: Add support for <i32|i64>.rem_* WebAssembly instructions (#5823)
This commit adds support for i32 and i64 remainder instructions for
x64.
2023-02-20 17:52:06 +00:00
Alex Crichton
c26a65a854 x64: Add most remaining AVX lowerings (#5819)
* x64: Add most remaining AVX lowerings

This commit goes through `inst.isle` and adds a corresponding AVX
lowering for most SSE lowerings. I opted to skip instructions where the
SSE lowering didn't read/modify a register, such as `roundps`. I think
that AVX will benefit these instructions when there's load-merging since
AVX doesn't require alignment, but I've deferred that work to a future
PR.

Otherwise though in this PR I think all (or almost all) of the 3-operand
forms of AVX instructions are supported with their SSE counterparts.
This should ideally improve codegen slightly by removing register
pressure and the need for `movdqa` between registers. I've attempted to
ensure that there's at least one codegen test for all the new instructions.

As a side note, the recent capstone integration into `precise-output`
tests helped me catch a number of encoding bugs much earlier than
otherwise, so I've found that incredibly useful in tests!

* Move `vpinsr*` instructions to their own variant

Use true `XmmMem` and `GprMem` types in the instruction as well to get
more type-level safety for what goes where.

* Remove `Inst::produces_const` accessor

Instead of conditionally defining regalloc and various other operations
instead add dedicated `MInst` variants for operations which are intended
to produce a constant to have more clear interactions with regalloc and
printing and such.

* Fix tests

* Register traps in `MachBuffer` for load-folding ops

This adds a missing `add_trap` to encoding of VEX instructions with
memory operands to ensure that if they cause a segfault that there's
appropriate metadata for Wasmtime to understand that the instruction
could in fact trap. This fixes a fuzz test case found locally where v8
trapped and Wasmtime didn't catch the signal and crashed the fuzzer.
2023-02-20 15:11:52 +00:00
wasmtime-publish
ad128b6811 Update release date of Wasmtime 6.0.0 (#5836)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2023-02-20 15:10:06 +00:00
Afonso Bordado
1e6c94bec1 cranelift-object: Make sections read only by default (#5619)
This changes the default section type to be `ReadOnlyDataWithRel` instead of `Data`.
On COFF types the CRT initializers do not run unless their section is read only.

The new SectionKind makes these sections read only for COFF and MachO, but leaves it as Writable as required by ELF.
2023-02-18 12:23:24 +00:00
Jamey Sharp
539c42e590 Audit object crate update to 0.30.3 (#5827)
This audit is needed for #5619. I'm going ahead and updating Cargo.toml
and Cargo.lock at the same time because no source code changes are
required for this update.
2023-02-18 00:36:21 +00:00
Saúl Cabrera
7ec925122d winch: Add support for the <i32|i64>.div_* instructions (#5807)
* Refactor the structure and responsibilities of `CodeGenContext`

This commit refactors how the `CodeGenContext` is used throughout the code
generation process, making it easier to pass it around when more flexibility is
desired in the MacroAssembler to perform the lowering of certain instructions.

As of this change, the responsibility of the `CodeGenContext` is to provide an
interface for operations that require an orchestration between the register
allocator, the value stack and function's frame. The MacroAssembler is removed
from the CodeGenContext as is passed as a dependency where needed, effectly
using it as an independent code generation interface only.

By giving more responsibilities to the `CodeGenContext` we can clearly separate
the concerns of the register allocator, which previously did more than it
should (e.g. popping values and spilling).

This change ultimately allows passing in the `CodeGenContext` to the
`MacroAssembler` when a given instruction cannot be generically described
through a common interface. Allowing each implementation to decide the best way
to lower a particular instruction.

* winch: Add support for the WebAssembly `<i32|i64>.div_*` instructions

Given that some architectures have very specific requirements on how to handle
division, this change uses `CodeGenContext` as a dependency to the `div`
MacroAssembler instruction to ensure that each implementation can decide on how to lower the
division. This approach also allows -- in architectures where division can be
expressed as an ordinary binary operation -- to rely on the
`CodeGenContext::i32_binop` or `CodeGenContext::i64_binop` helpers.
2023-02-17 22:42:03 +00:00
Afonso Bordado
853ff787f3 fuzzgen: Refactor name and signature generation (#5764)
* fuzzgen: Move cranelift type generation into CraneliftArbitrary

* fuzzgen: Deduplicate DataValue generation

* fuzzgen: Remove unused code

* fuzzgen: Pass allowed function calls into `FunctionGenerator`
2023-02-17 20:48:12 +00:00
Afonso Bordado
a7bd65d116 fuzzgen: Allow inline stackprobes for riscv64 (#5822) 2023-02-17 20:47:39 +00:00
Trevor Elliott
a139ed6d56 Fix the postorder traversal in the DominatorTree (#5821)
Fix the postorder traversal computed by the `DominatorTree`. It was
recording nodes in the wrong order depending on the order child nodes
were visited. Consider the following program:

```
function %foo2(i8) -> i8 {
block0(v0: i8):
    brif v0, block1, block2

block1:
    return v0

block2:
    jump block1
}
```

The postorder produced by the previous implementation was:

```
block2
block1
block0
```

Which is incorrect, as `block1` is branched to by `block2`. Changing the
branch order in the function would also change the postorder result,
yielding the expected order with `block1` emitted first.

The problem was that when pushing successor nodes onto the stack, the
old implementation would also mark them SEEN. This would then prevent
them from being pushed on the stack again in the future, which is
incorrect as they might be visited by other nodes that have not yet been
pushed. This causes nodes to potentially show up later in the postorder
traversal than they should.

This PR reworks the implementation of `DominatorTree::compute` to
produce an order where `block1` is always returned first, regardless of
the branch order in the original program.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-02-17 20:39:04 +00:00
Peter Huene
4fc768df36 Fix incorrect shadowing of world option in component bindgen macro options. (#5813) 2023-02-17 15:08:54 +00:00
Berkus Decker
c8fa1b845f Fix typo (#5814) 2023-02-17 15:08:07 +00:00
Fuu
db9efcb099 Correct some spelling errors in a comment (#5812) 2023-02-17 02:21:44 +00:00
Alex Crichton
453330b2db x64: Add rudimentary support for some AVX instructions (#5795)
* x64: Add rudimentary support for some AVX instructions

I was poking around Spidermonkey's wasm backend and saw that the various
assembler functions used are all `v*`-prefixed which look like they're
intended for use with AVX instructions. I looked at Cranelift and it
currently doesn't have support for many AVX-based instructions, so I
figured I'd take a crack at it!

The support added here is a bit of a mishmash when viewed alone, but my
general goal was to take a single instruction from the SIMD proposal for
WebAssembly and migrate all of its component instructions to AVX. I, by
random chance, picked a pretty complicated instruction of `f32x4.min`.
This wasm instruction is implemented on x64 with 4 unique SSE
instructions and ended up being a pretty good candidate.

Further digging about AVX-vs-SSE shows that there should be two major
benefits to using AVX over SSE:

* Primarily AVX instructions largely use a three-operand form where two
  input registers are operated with and an output register is also
  specified. This is in contrast to SSE's predominant
  one-register-is-input-but-also-output pattern. This should help free
  up the register allocator a bit and additionally remove the need for
  movement between registers.

* As #4767 notes the memory-based operations of VEX-encoded instructions
  (aka AVX instructions) do not have strict alignment requirements which
  means we would be able to sink loads and stores into individual
  instructions instead of having separate instructions.

So I set out on my journey to implement the instructions used by
`f32x4.min`. The first few were fairly easy. The machinst backends are
already of the shape "take these inputs and compute the output" where
the x86 requirement of a register being both input and output is
postprocessed in. This means that the `inst.isle` creation helpers for
SSE instructions were already of the correct form to use AVX. I chose to
add new `rule` branches for the instruction creation helpers, for
example `x64_andnps`. The new `rule` conditionally only runs if AVX is
enabled and emits an AVX instruction instead of an SSE instruction for
achieving the same goal. This means that no lowerings of clif
instructions were modified, instead just new instructions are being
generated.

The VEX encoding was previously not heavily used in Cranelift. The only
current user are the FMA-style instructions that Cranelift has at this
time. These FMA instructions have one extra operand than `vandnps`, for
example, so I split the existing `XmmRmRVex` into a few more variants to
fit the shape of the instructions that needed generating for
`f32x4.min`. This was accompanied then with more AVX opcode definitions,
more emission support, etc.

Upon implementing all of this it turned out that the test suite was
failing on my machine due to the memory-operand encodings of VEX
instructions not being supported. I didn't explicitly add those in
myself but some preexisting RIP-relative addressing was leaking into the
new instructions with existing tests. I opted to go ahead and fill out
the memory addressing modes of VEX encoding to get the tests passing
again.

All-in-all this PR adds new instructions to the x64 backend for a number
of AVX instructions, updates 5 existing instruction producers to use AVX
instructions conditionally, implements VEX memory operands, and adds
some simple tests for the new output of `f32x4.min`. The existing
runtest for `f32x.min` caught a few intermediate bugs along the way and
I additionally added a plain `target x86_64` to that runtest to ensure
that it executes with and without AVX to test the various lowerings.
I'll also note that this, and future support, should be well-fuzzed
through Wasmtime's fuzzing which may explicitly disable AVX support
despite the machine having access to AVX, so non-AVX lowerings should be
well-tested into the future.

It's also worth mentioning that I am not an AVX or VEX or x64 expert.
Implementing the memory operand part for VEX was the hardest part of
this PR and while I think it should be good someone else should
definitely double-check me. Additionally I haven't added many
instructions to the x64 backend yet so I may have missed obvious places
to tests or such, so am happy to follow-up with anything to be more
thorough if necessary.

Finally I should note that this is just the tip of the iceberg when it
comes to AVX. My hope is to get some of the idioms sorted out to make it
easier for future PRs to add one-off instruction lowerings or such.

* Review feedback
2023-02-17 01:29:55 +00:00
Alex Crichton
f8ca67cdc6 Handle failures in the determine step (#5810)
I saw some PRs fail this step earlier today due to rate limits but it
ended up not failing the entire PR's CI due to it not being listed in
the final set of dependencies, so add it there.
2023-02-17 00:40:05 +00:00
Alex Crichton
44fa189575 Run labeler triage once an hour (#5811)
Instead of every 5 minutes since this seems to eat through the rate
limits pretty quickly if it ends up running.
2023-02-16 23:21:49 +00:00
Trevor Elliott
d711872d63 Refactor collect_branches_and_targets to not need a smallvec (#5803)
* Refactor collect_branches_and_targets to not need a smallvec

Basic blocks are terminated by at most one branch instruction now, so we
can use that assumption in `collect_branches_and_targets` to return the
last instruction we saw instead.

* Review comments
2023-02-16 21:30:17 +00:00
Chris Fallin
c7e2571866 egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)
This is a short-term fix to the same bug that #5800 is addressing
(#5796), but with less risk: it simply turns off GVN'ing of effectful
but idempotent ops. Because we have an upcoming release, and this is a
miscompile (albeit to do with trapping behavior), we would like to make
the simplest possible fix that avoids the bug, and backport it. I will
then rebase #5800 on top of a revert of this followed by the more
complete fix.
2023-02-16 21:29:03 +00:00
Alex Crichton
3ce439ce57 Update PR tests slightly and when what runs where (#5805)
* Don't run LLDB tests on PRs

These take an extra minute or so, so only run them on the full test
suite of a merge instead of on all PRs as well.

* Add a test for the x64 isa files

This guarantees that if cranelift's x64 backend is modified that the
tests will be run on a PR, even if other backends were also modified.
2023-02-16 21:17:19 +00:00
Alex Crichton
65633db244 Only run deny/vet on CI if Cargo.lock changes (#5806)
These mostly only validate changes to `Cargo.lock` so skip these checks
by default on PRs which generally never need to trigger them. If
`Cargo.lock` changes, however, then run them for PRs.
2023-02-16 21:17:05 +00:00