Commit Graph

3994 Commits

Author SHA1 Message Date
yuyang-ok
cdecc858b4 add riscv64 backend for cranelift. (#4271)
Add a RISC-V 64 (`riscv64`, RV64GC) backend.

Co-authored-by: yuyang <756445638@qq.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
Co-authored-by: Afonso Bordado <afonsobordado@az8.co>
2022-09-27 17:30:31 -07:00
Jamey Sharp
9715d91c50 Resolve aliases before checking for unique values (#4966)
At control-flow join points, cranelift-frontend's SSA builder currently
checks to see if only one definition of a variable reaches the current
block. If so, it can eliminate the corresponding block parameter and use
the original def directly. It implements this by turning the block
parameter into an alias for the original value.

However, it didn't resolve aliases during this check, except after it
had already determined that there was only one definition.

Resolving aliases first instead allows it to detect that more block
parameters are redundant. And as more block parameters get converted to
aliases, later blocks can see common definitions from further away, so
this has a compounding effect.

This also merges a special case, where there's exactly one unique
non-sentinel definition but it's actually an alias for the sentinel,
into the general case where all definitions are from the sentinel. As a
result there's only one case that has to introduce a definition of the
variable to zero.

According to `valgrind --tool=dhat`, this is a significant memory
savings. On the pulldown-cmark benchmark from Sightglass:

- 15.3% (1.9MiB) less memory allocated at maximum heap
- 4.1% (6.7MiB) less memory allocated in total
- 9.8% (57MiB) fewer bytes read
- 12.6% (36MiB) fewer bytes written
- 5.4% fewer instructions retired
- 1.04x faster by instructions retired (per Sightglass/perf)
- 1.03x to 1.04x faster by CPU cycles (per Sightglass/perf)
- 1.03 ± 0.01 times faster by CPU time (per hyperfine)
- 1.04x faster by cache accesses (per Sightglass/perf)

On the bz2 benchmark:

- 1.06x faster by instructions retired (per Sightglass/perf)
- 1.05x faster by CPU cycles (per Sightglass/perf)
- 1.04 ± 0.01 times faster by CPU time (per hyperfine)
- 1.02x to 1.03x faster by cache accesses (per Sightglass/perf)

Even on the largest benchmark in Sightglass (spidermonkey.wasm), this is
a measurable improvement:

- 1.03x faster by instructions retired (per Sightglass/perf)
- 1.02x faster by CPU cycles (per Sightglass/perf)
- 1.02 ± 0.00 times faster by CPU time (per hyperfine)

There was no significant difference in cache misses for any benchmark,
according to Sightglass/perf.
2022-09-27 13:59:37 -07:00
Alex Crichton
29c7de7340 Update wasm-tools dependencies (#4970)
* Update wasm-tools dependencies

This update brings in a number of features such as:

* The component model binary format and AST has been slightly adjusted
  in a few locations. Names are dropped from parameters/results now in
  the internal representation since they were not used anyway. At this
  time the ability to bind a multi-return function has not been exposed.

* The `wasmparser` validator pass will now share allocations with prior
  functions, providing what's probably a very minor speedup for Wasmtime
  itself.

* The text format for many component-related tests now requires named
  parameters.

* Some new relaxed-simd instructions are updated to be ignored.

I hope to have a follow-up to expose the multi-return ability to the
embedding API of components.

* Update audit information for new crates
2022-09-27 13:12:34 -05:00
Afonso Bordado
10deb9b7fe fuzzgen: Add fcvt_* ops (#4958) 2022-09-27 09:52:58 -07:00
Afonso Bordado
65a3af72c7 fuzzgen: Statistics framework (#4868)
* cranelift: Add non user trap codes function

* cranelift: Add Fuzzgen stats

* cranelift: Use `once_cell` and cleanup some stuff

* fuzzgen: Remove total_inputs metric

* fuzzgen: Filter empty trap codes
2022-09-27 16:04:57 +00:00
Alex Crichton
ee2ef5bdd0 Tidy up some more deps with workspace dependencies (#4965)
I missed these in #4905 and noticed just now but wanted to handle them
to ensure we uniformly handle deps in the workspace.
2022-09-26 17:27:35 -07:00
Alex Crichton
7b311004b5 Leverage Cargo's workspace inheritance feature (#4905)
* Leverage Cargo's workspace inheritance feature

This commit is an attempt to reduce the complexity of the Cargo
manifests in this repository with Cargo's workspace-inheritance feature
becoming stable in Rust 1.64.0. This feature allows specifying fields in
the root workspace `Cargo.toml` which are then reused throughout the
workspace. For example this PR shares definitions such as:

* All of the Wasmtime-family of crates now use `version.workspace =
  true` to have a single location which defines the version number.
* All crates use `edition.workspace = true` to have one default edition
  for the entire workspace.
* Common dependencies are listed in `[workspace.dependencies]` to avoid
  typing the same version number in a lot of different places (e.g. the
  `wasmparser = "0.89.0"` is now in just one spot.

Currently the workspace-inheritance feature doesn't allow having two
different versions to inherit, so all of the Cranelift-family of crates
still manually specify their version. The inter-crate dependencies,
however, are shared amongst the root workspace.

This feature can be seen as a method of "preprocessing" of sorts for
Cargo manifests. This will help us develop Wasmtime but shouldn't have
any actual impact on the published artifacts -- everything's dependency
lists are still the same.

* Fix wasi-crypto tests
2022-09-26 11:30:01 -05:00
bjorn3
af226d37c2 [AArch64] Fix incorrect regalloc constraints for atomic_cas (#4959)
* [AArch64] Fix incorrect regalloc constraints for atomic_cas

* Update test for latest Cranelift changes
2022-09-26 16:05:57 +00:00
Damian Heaton
3a2b32bf4d Port branches to ISLE (AArch64) (#4943)
* Port branches to ISLE (AArch64)

Ported the existing implementations of the following opcodes for AArch64
to ISLE:
- `Brz`
- `Brnz`
- `Brif`
- `Brff`
- `BrIcmp`
- `Jump`
- `BrTable`

Copyright (c) 2022 Arm Limited

* Remove dead code

Copyright (c) 2022 Arm Limited
2022-09-26 09:45:32 +01:00
Jamey Sharp
bd870a9d6c Shrink all SmallVecs by 8 bytes (#4951)
We weren't using the "union" cargo feature for the smallvec crate, which
reduces the size of a SmallVec by one machine word. This feature
requires Rust 1.49 but we already require much newer versions.

When using Wasmtime to compile pulldown-cmark from Sightglass, this
saves a decent amount of memory allocations and writes. According to
`valgrind --tool=dhat`:

- 6.2MiB (3.69%) less memory allocated over the program's lifetime
- 0.5MiB (4.13%) less memory allocated at maximum heap size
- 5.5MiB (1.88%) fewer bytes written to
- 0.44% fewer instructions executed

Sightglass reports a statistically significant runtime improvement too:

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 24379323.60 ± 20051394.04 (confidence = 99%)

  shrink-abiarg-0406da67c.so is 1.01x to 1.13x faster than main-be690a468.so!

  [227506364 355007998.78 423280514] main-be690a468.so
  [227686018 330628675.18 406025344] shrink-abiarg-0406da67c.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 360151622.56 ± 278294316.90 (confidence = 99%)

  shrink-abiarg-0406da67c.so is 1.01x to 1.07x faster than main-be690a468.so!

  [8709162212 8911001926.44 9535111576] main-be690a468.so
  [5058015392 8550850303.88 9282148438] shrink-abiarg-0406da67c.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 6936570.28 ± 6897696.38 (confidence = 99%)

  shrink-abiarg-0406da67c.so is 1.00x to 1.08x faster than main-be690a468.so!

  [155810934 175260571.20 234737344] main-be690a468.so
  [119128240 168324000.92 257451074] shrink-abiarg-0406da67c.so
2022-09-23 16:32:13 -07:00
Afonso Bordado
bb6a8a717a fuzzgen: Avoid int_divz traps (#4932)
* fuzzgen: Insert `int_divz` sequence

* fuzzgen: matches!
2022-09-23 10:19:42 -07:00
Jamey Sharp
6e76e925f4 Avoid quadratic behavior in can_optimize_var_lookup (#4939)
* cranelift-frontend: Avoid quadratic behavior

Fixes #4923.

* Improve comments and debug assertions

* Improve comments

One thing that's especially neat about this PR is that, unlike the
`can_optimize_var_lookup` graph traversal, `update_predecessor_cycle`
doesn't need to keep track of all the blocks it has visited in order to
detect cycles. However, the reasons why are subtle and need careful
documentation.

Also neat: We've previously tried keeping either a HashSet or a
SecondaryMap around to re-use the same heap allocation for the `visited`
set, which needs space linear in the number of blocks. After this PR,
we're still using space that's linear in the number of blocks to store
the `in_predecessor_cycle` flag, but that flag fits inside existing
padding in `SSABlockData`, so it's a net savings in memory consumption.

* Avoid quadratic behavior in `update_predecessor_cycle`

So far I hadn't really eliminated the quadratic behavior from
`can_optimize_var_lookup`. I just moved it to happen when the CFG is
modified instead, and switched to indexing directly into the vector of
blocks instead of going through a HashSet. I suspect the latter change
is always a win, but the former is only an improvement assuming that
`use_var` is called more often than `declare_block_predecessor`.

But @cfallin pointed out that it feels like we should be able to do
better by taking advantage of the knowledge that once a block is sealed,
its predecessors can't change any more.

That's not completely trivial to do because changes to the property we
care about propagate toward successors, and we're only keeping pointers
to predecessors. Still, as long as frontends follow the existing
recommendation to seal blocks as soon as possible, maintaining a
conservative approximation using only local information works fine in
practice.

This significantly limits the situations where this graph traversal
could visit a lot of the CFG.

* Review comments
2022-09-23 16:41:22 +00:00
Chris Fallin
19bd8687ac Upgrade to regalloc2 0.4.1. (#4945)
* Upgrade to regalloc2 0.4.1.

Incorporates bytecodealliance/regalloc2#85, which fixes a fuzzbug
related to constraints and liverange splits.

* Add audit of regalloc2 upgrade.
2022-09-23 00:00:06 +00:00
Damian Heaton
3f8cccfb59 Port flag-based ops to ISLE (AArch64) (#4942)
Ported the existing implementations of the following opcodes for AArch64
to ISLE:
- `Trueif`
- `Trueff`
- `Trapif`
- `Trapff`
- `Select`
- `Selectif`
- `SelectifSpectreGuard`

Copyright (c) 2022 Arm Limited
2022-09-22 15:44:32 -07:00
Chris Fallin
89abd80c3c Add the aegraph (acyclic e-graph) implementation crate. (#4909)
* Add the aegraph (acyclic egraph) implementation crate.

* fix crate-dep version for cranelift-entity (rebase error)

* Review feedback.

* Fix link in Markdown doc comment.

* Doc link fix again.

* add cranelift-egraph to publish list.
2022-09-22 00:33:27 +00:00
Chris Fallin
b652ce2fb1 ISLE: add support for multi-extractors and multi-constructors. (#4908)
* ISLE: add support for multi-extractors and multi-constructors.

This support allows for rules that process multiple matching values per
extractor call on the left-hand side, and as a result, can produce
multiple values from the constructor whose body they define.

This is useful in situations where we are matching on an input data
structure that can have multiple "nodes" for a given value or ID, for
example in an e-graph.

* Review feedback: all multi-ctors and multi-etors return iterators; no `Vec` case.

* Add additional warning suppressions to generated-code toplevels to be consistent with new islec output.
2022-09-21 23:36:50 +00:00
Trevor Elliott
b167172715 Add an overlap checker to ISLE (#4906)
https://github.com/bytecodealliance/wasmtime/pull/4906

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2022-09-21 14:07:59 -07:00
Nick Fitzgerald
b2d13ebd46 Revert "Memoize can_optimize_var_lookup (#4924)" (#4937)
This reverts commit 562bb25360.
2022-09-21 10:30:18 -07:00
Damian Heaton
352c7595c6 Improve fcvt_to_{u,s}int_sat lowering (AArch64) (#4913)
Improved the instruction lowering for the following opcodes on AArch64,
and introduced support for converting to integers less than 32-bits wide
as per the docs:
- `FcvtToSintSat`
- `FcvtToUintSat`

Copyright (c) 2022 Arm Limited
2022-09-21 10:16:09 -07:00
Damian Heaton
e786bda002 Vector bitcast support (AArch64 & Interpreter) (#4820)
* Vector bitcast support (AArch64 & Interpreter)

Implemented support for `bitcast` on vector values for AArch64 and the
interpreter.

Also corrected the verifier to ensure that the size, in bits, of the input and
output types match for a `bitcast`, per the docs.

Copyright (c) 2022 Arm Limited

* `I128` same-type bitcast support

Copyright (c) 2022 Arm Limited

* Directly return input for 64-bit GPR<=>GPR bitcast

Copyright (c) 2022 Arm Limited
2022-09-21 09:20:28 -07:00
Chris Fallin
05cbd667c7 Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892)
* Cranelift: use regalloc2 constraints on caller side of ABI code.

This PR updates the shared ABI code and backends to use register-operand
constraints rather than explicit pinned-vreg moves for register
arguments and return values.

The s390x backend was not updated, because it has its own implementation
of ABI code. Ideally we could converge back to the code shared by x64
and aarch64 (which didn't exist when s390x ported calls to ISLE, so the
current situation is underestandable, to be clear!). I'll leave this for
future work.

This PR exposed several places where regalloc2 needed to be a bit more
flexible with constraints; it requires regalloc2#74 to be merged and
pulled in.

* Update to regalloc2 0.3.3.

In addition to version bump, this required removing two asserts as
`SpillSlot`s no longer carry their class (so we can't assert that they
have the correct class).

* Review comments.

* Filetest updates.

* Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade.

* Update to regalloc2 0.4.0.
2022-09-21 01:17:04 +00:00
Damian Heaton
cae7c196bb Interpreter: Implement floating point conversions (#4884)
* Interpreter: Implement floating point conversions

Implemented the following opcodes for the interpreter:
- `FcvtToUint`
- `FcvtToSint`
- `FcvtToUintSat`
- `FcvtToSintSat`
- `FcvtFromUint`
- `FcvtFromSint`
- `FcvtLowFromSint`
- `FvpromoteLow`
- `Fvdemote`

Copyright (c) 2022 Arm Limited

* Fix `I128` bounds checks for `FcvtTo{U,S}int{_,Sat}`

Copyright (c) 2022 Arm Limited

* Fix broken test

Copyright (c) 2022 Arm Limited
2022-09-20 11:10:20 -07:00
Adam Bratschi-Kaye
562bb25360 Memoize can_optimize_var_lookup (#4924)
* Memoize `can_optimize_var_lookup`

`can_optimize_var_lookup` can have quadratic behavior if there is a chain
of blocks each containing a `local.get` instruction because each run can
walk up the entire chain. This change memoizes the results of
`can_optimize_var_lookup` so that we can stop following the chain of
predecessors when we hit a block that has previously been handled
(making the operation linear again).
2022-09-19 10:18:11 -07:00
Afonso Bordado
09f46e351e fuzzgen: Mostly Forward Branching (#4894)
* cranelift: Test Forward branching

* fuzzgen: Separate terminators

* fuzzgen: Avoid generating jumptables if we have no valid targets

* fuzzgen: Forward Jump Tables

* fuzzgen: Cleanup some feedback

Thanks @jameysharp!

* fuzzgen: Cleanup block generation

Thanks @jameysharp!

* fuzzgen: Style Cleanups

These were accidentally reverted in a rebase

* fuzzgen: Prevent block0 from being targeted for branches

* fuzzgen: Add jump tables sorting TODO
2022-09-15 11:29:50 -07:00
Trevor Elliott
9d99eff6f9 Flatten and patterns in ISLE (#4915)
Flatten nested and patterns into a single vector in the ISLE front-end.
2022-09-15 17:40:37 +00:00
Afonso Bordado
2db7d7a8e0 fuzzgen: Disable verifier after NaN Canonicalization (#4914)
* fuzzgen: Disable verifier after NaN Canonicalization

We are currently running the verifier twice, once after the nan canonicalization pass, and again when JIT compiling the code.

The verifier first runs in the NaN Canonicalization pass. If it fails it prevents us from getting a nice `cargo fuzz fmt` test case.

So disable the verifier there, but ensure its enabled when JIT compiling.

* fuzzgen: Force enable verifier in cranelift-icache

This is already the default, but since we no longer run the verifier in `fuzzgen` its important to ensure that it runs in the fuzz targets.
2022-09-15 17:18:15 +00:00
Afonso Bordado
d0b98aa25f cranelift: Prepare fuzzgen for AArch64 (#4867)
* cranelift: Re-enable some shift operations

* fuzzgen: Disable Some FloatCC's for AArch64

* cranelift: Disable i128 divs on aarch64

* cranelift: Centralize IntCC selection
2022-09-14 12:23:25 -07:00
Damian Heaton
e9b08b856d Port icmp to ISLE (AArch64) (#4898)
* Port `icmp` to ISLE (AArch64)

Ported the existing implementation of `icmp` (and, by extension, the
`lower_icmp` function) to ISLE for AArch64.

Copyright (c) 2022 Arm Limited

* Allow 'producer chains', eliminating `Nop0`s

Copyright (c) 2022 Arm Limited
2022-09-13 08:56:50 -07:00
Trevor Elliott
024cad7e3d Remove function_alignment from ObjectBuilder (#4888)
Removes the function_alignment field from ObjectBuilder and ObjectModule. Alignment information is now provided either by the Module trait for minimum function alignment requirements, or on FunctionInfo for fucntion specific alignment requirements.
2022-09-12 10:15:21 -07:00
Trevor Elliott
ad09c273c6 Don't merge loads for xmm registers (#4891)
Do not merge loads for xmm registers, as alignment requirements currently aren't satisfied with clif lowered from wasm.

Fixes #4890
2022-09-12 10:14:35 -07:00
Afonso Bordado
bb3aae740a fuzzgen: Panic on failed NaN Canonicalization pass (#4896)
This should never fail anyway, but it's good to know that we
aren't accidentally ignoring an input
2022-09-12 09:08:48 -07:00
Chris Fallin
96bfd4e8c0 s390x: update some regalloc metadata to remove use of reg_mod. (#4856)
* s390x: update some regalloc metadata to remove use of `reg_mod`.

This is a step toward ultimately removing modify-operands, which along
with removal of pinned vregs, lets us move to a completely
constraint-based and fully-SSA regalloc input and get some nice
advantages eventually.

There are still a few uses of `mod` operands and pinned vregs remaining,
especially around the "regpair" abstraction. Those proved to be a bit
trickier to update though, so will have to be done separately.

* Review feedback: restore two-arg pretty-print form.

* Review feedback.
2022-09-09 18:43:36 -05:00
Chris Fallin
2986f6b0ff ABI: implement register arguments with constraints. (#4858)
* ABI: implement register arguments with constraints.

Currently, Cranelift's ABI code emits a sequence of moves from physical
registers into vregs at the top of the function body, one for every
register-carried argument.

For a number of reasons, we want to move to operand constraints instead,
and remove the use of explicitly-named "pinned vregs"; this allows for
better regalloc in theory, as it removes the need to "reverse-engineer"
the sequence of moves.

This PR alters the ABI code so that it generates a single "args"
pseudo-instruction as the first instruction in the function body. This
pseudo-inst defs all register arguments, and constrains them to the
appropriate registers at the def-point. Subsequently the regalloc can
move them wherever it needs to.

Some care was taken not to have this pseudo-inst show up in
post-regalloc disassemblies, but the change did cause a general regalloc
"shift" in many tests, so the precise-output updates are a bit noisy.
Sorry about that!

A subsequent PR will handle the other half of the ABI code, namely, the
callsite case, with a similar preg-to-constraint conversion.

* Update based on review feedback.

* Review feedback.
2022-09-08 18:03:14 -07:00
Chris Fallin
13c7846815 Cranelift: add a vreg limit check to correctly return an error on too-large inputs. (#4882)
Previously, Cranelift panicked (via a a panic in regalloc2) when the
virtual-register limit of 2M (2^21) was reached. This resulted in a
perplexing and unhelpful failure when the user provided a too-large
input (such as the Wasm module in #4865).

This PR adds an explicit check when allocating vregs that fails with a
"code too large" error when the limit is hit, producing output such as
(on the minimized testcase from #4865):

```
Error: failed to compile wasm function 3785 at offset 0xa3f3

Caused by:
    Compilation error: Code for function is too large
```

Fixes #4865.
2022-09-08 10:04:59 -07:00
Anton Kirilov
d8b290898c Initial forward-edge CFI implementation (#3693)
* Initial forward-edge CFI implementation

Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.

Copyright (c) 2022, Arm Limited.

* Refactor `from_artifacts` to avoid second `make_executable` (#1)

This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.

* Address the code review feedback

Copyright (c) 2022, Arm Limited.

Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-09-08 09:35:58 -05:00
Trevor Elliott
caad14826c Rework the ISA flag checking extractors for x64 (#4878)
Using fallible extractors that produce no values for flag checks means
that it's not possible to pattern match cases where those flags are
false. This change reworks the existing flag-checking extractors to be
infallible, returning the flag's boolean value from the context instead.
2022-09-07 13:49:35 -07:00
Andrew Brown
f063082474 x64: remove Inst::XmmLoadConst (#4876)
This is a cherry-pick of a long-ago commit, 2d46637. The original
message reads:

> Now that `SyntheticAmode` can refer to constants, there is no longer a
> need for a separate instruction format--standard load instructions will
> work.

Since then, the transition to ISLE and the use of `XmmLoadConst` in many
more places makes this change a larger diff than the original. The basic
idea is the same, though: the extra indirection of `Inst::XMmLoadConst`
is removed and replaced by a direct use of `VCodeConstant` as a
`SyntheticAmode`. This has no effect on codegen, but the CLIF output is
now clearer in that the actual instruction is displayed (e.g., `movdqu`)
instead of a made-up instruction (`load_const`).
2022-09-07 12:52:13 -07:00
Jamey Sharp
e694a6f5d4 Allocate less while constructing cranelift-fuzzgen tests (#4863)
* Improve panic message if typevar_operand is None

* cranelift-fuzzgen: Don't allocate for each choice

I don't think the performance of test-case generation is at all
important here. I'm actually doing this in preparation for a bigger
refactor where I want to be able to borrow the list of valid choices for
a given opcode without worrying about lifetimes.

* cranelift-fuzzgen: Remove next_func_index

It's only used locally within `generate_funcrefs`, so it doesn't need to
be in the FunctionBuilder struct.

Also there's already a local counter that I think is good enough for
this. As far as I know, the function indexes only need to be distinct,
not contiguous.

* cranelift-fuzzgen: Separate resources from config

The function-global variables, blocks, etc that are generated before
generating instructions are all owned collections without any lifetime
parameters. By contrast, the Unstructured and Config are both borrowed.
Separating them will make it easier to borrow from the owned resources.
2022-09-07 12:19:55 -07:00
Afonso Bordado
f57b4412ec cranelift: Implement missing i128 rotates on AArch64 (#4866) 2022-09-07 11:11:47 -07:00
Anton Kirilov
dd07e354b4 Cranelift AArch64: Fix the get_return_address lowering (#4851)
The previous implementation assumed that nothing had clobbered the
LR register since the current function had started executing, so
it would be incorrect for a non-leaf function, for example, that
contains the `get_return_address` operation right after a call.
The operation is valid only if the `preserve_frame_pointers` flag
is enabled, which implies that the presence of a frame record on
the stack is guaranteed.

Copyright (c) 2022, Arm Limited.
2022-09-07 11:09:22 -07:00
Afonso Bordado
e977f6a79d cranelift: Generate Store and Loads in fuzzgen (#4824) 2022-09-07 11:00:19 -07:00
Jamey Sharp
b8b2fadea8 cranelift-fuzzgen: Consume all trailing fuzz input (#4862)
But don't keep going once we've consumed it all.
2022-09-07 08:46:39 -07:00
Jamey Sharp
3d6d49daba cranelift: Remove of/nof overflow flags from icmp (#4879)
* cranelift: Remove of/nof overflow flags from icmp

Neither Wasmtime nor cg-clif use these flags under any circumstances.
From discussion on #3060 I see it's long been unclear what purpose these
flags served.

Fixes #3060, fixes #4406, and fixes #4875... by deleting all the code
that could have been buggy.

This changes the cranelift-fuzzgen input format by removing some IntCC
options, so I've gone ahead and enabled I128 icmp tests at the same
time. Since only the of/nof cases were failing before, I expect these to
work.

* Restore trapif tests

It's still useful to validate that iadd_ifcout's iflags result can be
forwarded correctly to trapif, and for that purpose it doesn't really
matter what condition code is checked.
2022-09-07 08:38:41 -07:00
Alex Crichton
65930640f8 Bump Wasmtime to 2.0.0 (#4874)
This commit replaces #4869 and represents the actual version bump that
should have happened had I remembered to bump the in-tree version of
Wasmtime to 1.0.0 prior to the branch-cut date. Alas!
2022-09-06 13:49:56 -05:00
Jamey Sharp
9856664f1f Make DataValue, not Ieee32/64, respect IEEE754 (#4860)
* cranelift-codegen: Remove all uses of DataValue

This type is only used by the interpreter, cranelift-fuzzgen, and
filetests. I haven't found another convenient crate for those to all
depend on where this type can live instead, but this small refactor at
least makes it obvious that code generation does not in any way depend
on the implementation of this type.

* Make DataValue, not Ieee32/64, respect IEEE754

This fixes #4857 by partially reverting #4849.

It turns out that Ieee32 and Ieee64 need bitwise equality semantics so
they can be used as hash-table keys.

Moving the IEEE754 semantics up a layer to DataValue makes sense in
conjunction with #4855, where we introduced a DataValue::bitwise_eq
alternative implementation of equality for those cases where users of
DataValue still want the bitwise equality semantics.

* cranelift-interpreter: Use eq/ord from DataValue

This fixes #4828, again, now that the comparison operators on DataValue
have the right IEEE754 semantics.

* Add regression test from issue #4857
2022-09-03 00:26:14 +00:00
Afonso Bordado
7e45cff459 cranelift: Bitwise compare fuzzgen results (#4855) 2022-09-02 19:34:16 +00:00
Afonso Bordado
3afb711a51 cranelift: Document Ieee{32,64} implementation (#4854) 2022-09-02 19:02:22 +00:00
Afonso Bordado
f30a7eb0c9 cranelift: Implement PartialEq on Ieee{32,64} (#4849)
* cranelift: Add `fcmp` tests

Some of these are disabled on aarch64 due to not being implemented yet.

* cranelift: Implement float PartialEq for Ieee{32,64} (fixes #4828)

Previously `PartialEq` was auto derived. This means that it was implemented in terms of PartialEq in a u32.

This is not correct for floats because `NaN != NaN`.

PartialOrd was manually implemented in 6d50099816, but it seems like it was an oversight to leave PartialEq out until now.

The test suite depends on the previous behaviour so we adjust it to keep comparing bits instead of floats.

* cranelift: Disable `fcmp ord` tests on aarch64

* cranelift: Disable `fcmp ueq` tests on aarch64
2022-09-02 10:42:42 -07:00
Anton Kirilov
48bf078c83 Cranelift AArch64: Fix the atomic memory operations (#4831)
Previously the implementations of the various atomic memory IR operations
ignored the memory operation flags that were passed.

Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 09:35:21 -07:00
Anton Kirilov
d2e19b8d74 Cranelift AArch64: Migrate AMode to ISLE (#4832)
Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 00:24:46 +00:00