Commit Graph

1724 Commits

Author SHA1 Message Date
Nick Fitzgerald
4a66c3b855 Cranelift: Remove duplicate IR signature legalizations (#5093)
The `SigData::from_func_sig` constructor will already ensure that the struct
return pointer is returned, so this is a purely unnecessary call.

Note that this is not a performance speed up, since
`ensure_struct_return_ptr_is_returned` doesn't do any significant work if the
signature is already legalized.
2022-10-21 13:08:44 -07:00
Ulrich Weigand
9dadba60a0 s390x: use constraints for call arguments and return values (#5092)
Use the regalloc constraint-based CallArgList / CallRetList
mechanism instead of directly using physregs in instructions.
2022-10-21 11:01:22 -07:00
Chris Fallin
86e77953f8 Fix some egraph-related issues. (#5088)
This fixes #5086 by addressing two separate issues:

- The `ValueDataPacked::set_type()` helper had an embarrassing bitfield-manipulation bug that would mangle the rest of a `ValueDef` when setting its type. This is not normally used, only when the egraph elaboration fills in types after-the-fact on a multi-value node.
- The lowering rules for `isplit` on aarch64 and s390x were dispatching on the first output type, rather than the input type. When only the second output is used (as in the example in #5086), the first output type actually remains `INVALID` (and this is fine because it's never used).
2022-10-21 10:24:48 -07:00
Trevor Elliott
d9753fac2b Remove uses of reg_mod from s390x (#5073)
Remove uses of reg_mod from the s390x backend. This required moving away from using r0/r1 as the result registers from a few different pseudo instructions, standardizing instead on r2/r3. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, which r0/r1 are not.

Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-10-21 09:22:16 -07:00
Chris Fallin
c392e461a3 egraphs: a few miscellaneous compile-time optimizations. (#5072)
* egraphs: a few miscellaneous compile-time optimizations.

These optimizations together are worth about a 2% compile-time
reduction, as measured on one core with spidermonkey.wasm as an input,
using `hyperfine` on `wasmtime compile`.

The changes included are:
- Some better pre-allocation (blockparams and side-effects concatenated
  list vecs);
- Avoiding the indirection of storing list-of-types for every Pure and
  Inst node, when almost all nodes produce only a single result;
  instead, store arity and single type if it exists, and allow result
  projection nodes to fill in types otherwise;
- Pack the `MemoryState` enum into one `u32` (this together with the
  above removal of the type slice allows `Node` to
  shrink from 48 bytes to 32 bytes);
- always-inline an accessor (`entry` on `CtxHash`) that wasn't
  (`always(inline)` appears to be load-bearing, rather than just
  `inline`);
- Split the update-analysis path into two hotpaths, one for the union
  case and one for the new-node case (and the former can avoid
  recomputing for the contained node when replacing a node with
  node-and-child eclass entry).

* Review feedback.

* Fix test build.

* Fix to lowering when unused output with invalid type is present.
2022-10-19 11:05:00 -07:00
Trevor Elliott
32a7593c94 cranelift: Remove booleans (#5031)
Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements.

Fixes #3205

Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-10-17 16:00:27 -07:00
Afonso Bordado
766ecb561e fuzzgen: Always generate reachable blocks (#5034)
* fuzzgen: Always reachable blocks

* fuzzgen: Rename BlockTerminator

* fuzzgen: Rename `finalize_block`

* fuzzgen: Use `cloned` instead of map clone

Thanks @jameysharp!

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* fuzzgen: `rustfmt`

* fuzzgen: Document paramless targets

* fuzzgen:  Add `BlockTerminatorKind`

* fuzzen: Update BrTable/Switch comment

* fuzzgen: Minor cleanup

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2022-10-17 12:51:20 -07:00
Chris Fallin
1aaea279e5 egraph opts: fix uextend-of-i32. (#5061)
This is a simple error in the const-prop rules: uextend was not
masking iconst's u64 immediate when extending from i32 to
i64. Arguably an iconst.i32 should not have nonzero bits in the upper
32 of its immediate, but that's a separate design question. For now,
if our invariant is that the upper bits are ignored, then it is
required to mask the bits when const-evaling a `uextend`.

Fixes #5047.
2022-10-17 12:45:49 -07:00
Nick Fitzgerald
03d77d4d6b Cranelift: Derive Copy for InstructionData (#5043)
* Cranelift: Derive `Copy` for `InstructionData`

And update `clone` calls to be copies.

* Add a test for `InstructionData`'s size
2022-10-12 07:58:27 -07:00
Chris Fallin
2be12a5167 egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-10-11 18:15:53 -07:00
Nick Fitzgerald
e2f1ced0b6 Cranelift: Make Opcode represented as a u8 instead of u16 and remove vestigial conversion impls (#5042)
* Cranelift: Make `Opcode` represented as a `u8` instead of `u16`

* Cranelift: Remove unused conversion impls for `Opcode`

These are vestigial, left over from Peepmatic.
2022-10-11 12:57:12 -07:00
Jun Ryung Ju
39fbff92c3 cranelift: Added fp and, or, xor, not ops to interpreter. (#4999)
* cranelift: Added fp and, or, xor, not ops to interpreter.

* Formatting.

* Removed archtecture dependent test on float-bitops.
2022-10-06 18:24:45 -07:00
Chris Fallin
e95ffe4413 Fix StructReturn handling: properly mark the clobber, and offset actual rets. (#5023)
* Fix StructReturn handling: properly mark the clobber, and offset actual rets.

The legalization of `StructReturn` was causing issues in the new
call-handling code: the `StructReturn` ret was included in the `SigData` as
if it were an actual CLIF-level return value, but it is not.

Prior to using regalloc constraints for return values, we
unconditionally included rax (or the architecture's usual return
register) as a def, so it would be properly handled as "clobbered" by
the regalloc. With the new scheme, we include defs on the call only for
CLIF-level outputs. Callees with `StructReturn` args were thus not known
to clobber the return-value register, and values might be corrupted.

This PR updates the code to include a `StructReturn` ret as a clobber
rather than a returned value in the relevant spots. I observed it
causing saves/restores of rax in some CLIF that @bjorn3 provided me, but
I was having difficulty minimizing this into a test-case that I would be
comfortable including as a precise-output case (including the whole
thing verbatim would lock down a bunch of other irrelevant details and
cause test-update noise later). If we can find a more minimized example
I'm happy to include it as a filetest.

Fixes #5018.
2022-10-07 00:14:38 +00:00
Jamey Sharp
04b30acad9 Misc cleanups (#5014)
* Replace resize+copy_from_slice with extend_from_slice

Vec::resize initializes the new space, which is wasted effort if we're
just going to call `copy_from_slice` on it immediately afterward. Using
`extend_from_slice` is simpler, and very slightly faster.

If the new size were bigger than the buffer we're copying from, then it
would make sense to initialize the excess. But it isn't: it's always
exactly the same size.

* Move helpers from Context to CompiledCode

These methods only use information from Context::compiled_code, so they
should live on CompiledCode instead.

* Remove an unnecessary #[cfg_attr]

There are other uses of `#[allow(clippy::too_many_arguments)]` in this
file, so apparently it doesn't need to be guarded by the "cargo-clippy"
feature.

* Fix a few comments

Two of these were wrong/misleading:

- `FunctionBuilder::new` does not clear the provided func_ctx. It does
  debug-assert that the context is already clear, but I don't think
  that's worth a comment.

- `switch_to_block` does not "create values for the arguments." That's
  done by the combination of `append_block_params_for_function_params`
  and `declare_wasm_parameters`.

* wasmtime-cranelift: Misc cleanups

The main change is to use the `CompiledCode` reference we already had
instead of getting it out of `Context` repeatedly. This removes a bunch
of `unwrap()` calls.

* wasmtime-cranelift: Factor out uncached compile
2022-10-05 10:35:59 -07:00
Trevor Elliott
a209cb63f5 ISLE: Enable the overlap checker (#5011)
This PR turns the overlap checker on by default, requiring the use of priorities to resolve overlap between rules.
2022-10-04 21:56:49 +00:00
yuyang
07584f6ac8 fix issue 4996. (#5003) 2022-10-04 11:18:42 -07:00
Trevor Elliott
c9ff14e00b Resolve overlap in the s390x backend (#5002)
Resolve overlap in the s390x backend by adding rule priorities to disambiguate rule order.
2022-10-03 17:06:10 -07:00
Trevor Elliott
ab4be2bdd1 ISLE: Resolve overlaps in the aarch64 backend (#4988) 2022-09-30 12:57:50 -07:00
Trevor Elliott
c1d6ca48a7 ISLE: Resolve overlap in the riscv64 backend (#4982)
Resolve overlap in the RiscV64 backend by adding priorities to rules. Additionally, one test updated as a result of this work, as a peephole optimization for addition with immediates fires now.
2022-09-29 17:22:25 -07:00
Trevor Elliott
46e42601eb ISLE: Always default the priority to 0 (#4983)
* Always default priorities to 0 in the ISLE IR
* Fix a hidden overlap in the x64 backend
2022-09-29 15:32:29 -07:00
Trevor Elliott
7d5b2b5bb2 ISLE: Add the overlap_errors pragma (#4981)
* Add the overlap_errors pragma to ISLE
* Enable overlap errors in the x64 backend
2022-09-29 12:40:39 -07:00
Trevor Elliott
db06e4e622 ISLE: Resolve remaining x64 overlap errors (#4977)
Resolve overlap errors with the x64 backend.
2022-09-29 10:09:37 -07:00
Trevor Elliott
faf31f6216 ISLE: Resolve overlap in prelude.isle and x64/inst.isle (#4941)
Resolve overlap in the ISLE prelude and the x64 inst module by introducing new types that allow better sharing of extractor resuls, or falling back on priorities.
2022-09-28 10:54:39 -07:00
Ulrich Weigand
83bb6fd4c8 s390x: Fix regalloc checker error (#4973)
For ShiftRR and VecShiftRR, if shift_reg is zero_reg(), the
instruction does not actually use any register value.

Fixes #4969
2022-09-28 08:25:23 -07:00
yuyang-ok
cdecc858b4 add riscv64 backend for cranelift. (#4271)
Add a RISC-V 64 (`riscv64`, RV64GC) backend.

Co-authored-by: yuyang <756445638@qq.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
Co-authored-by: Afonso Bordado <afonsobordado@az8.co>
2022-09-27 17:30:31 -07:00
Afonso Bordado
65a3af72c7 fuzzgen: Statistics framework (#4868)
* cranelift: Add non user trap codes function

* cranelift: Add Fuzzgen stats

* cranelift: Use `once_cell` and cleanup some stuff

* fuzzgen: Remove total_inputs metric

* fuzzgen: Filter empty trap codes
2022-09-27 16:04:57 +00:00
bjorn3
af226d37c2 [AArch64] Fix incorrect regalloc constraints for atomic_cas (#4959)
* [AArch64] Fix incorrect regalloc constraints for atomic_cas

* Update test for latest Cranelift changes
2022-09-26 16:05:57 +00:00
Damian Heaton
3a2b32bf4d Port branches to ISLE (AArch64) (#4943)
* Port branches to ISLE (AArch64)

Ported the existing implementations of the following opcodes for AArch64
to ISLE:
- `Brz`
- `Brnz`
- `Brif`
- `Brff`
- `BrIcmp`
- `Jump`
- `BrTable`

Copyright (c) 2022 Arm Limited

* Remove dead code

Copyright (c) 2022 Arm Limited
2022-09-26 09:45:32 +01:00
Damian Heaton
3f8cccfb59 Port flag-based ops to ISLE (AArch64) (#4942)
Ported the existing implementations of the following opcodes for AArch64
to ISLE:
- `Trueif`
- `Trueff`
- `Trapif`
- `Trapff`
- `Select`
- `Selectif`
- `SelectifSpectreGuard`

Copyright (c) 2022 Arm Limited
2022-09-22 15:44:32 -07:00
Chris Fallin
b652ce2fb1 ISLE: add support for multi-extractors and multi-constructors. (#4908)
* ISLE: add support for multi-extractors and multi-constructors.

This support allows for rules that process multiple matching values per
extractor call on the left-hand side, and as a result, can produce
multiple values from the constructor whose body they define.

This is useful in situations where we are matching on an input data
structure that can have multiple "nodes" for a given value or ID, for
example in an e-graph.

* Review feedback: all multi-ctors and multi-etors return iterators; no `Vec` case.

* Add additional warning suppressions to generated-code toplevels to be consistent with new islec output.
2022-09-21 23:36:50 +00:00
Damian Heaton
352c7595c6 Improve fcvt_to_{u,s}int_sat lowering (AArch64) (#4913)
Improved the instruction lowering for the following opcodes on AArch64,
and introduced support for converting to integers less than 32-bits wide
as per the docs:
- `FcvtToSintSat`
- `FcvtToUintSat`

Copyright (c) 2022 Arm Limited
2022-09-21 10:16:09 -07:00
Damian Heaton
e786bda002 Vector bitcast support (AArch64 & Interpreter) (#4820)
* Vector bitcast support (AArch64 & Interpreter)

Implemented support for `bitcast` on vector values for AArch64 and the
interpreter.

Also corrected the verifier to ensure that the size, in bits, of the input and
output types match for a `bitcast`, per the docs.

Copyright (c) 2022 Arm Limited

* `I128` same-type bitcast support

Copyright (c) 2022 Arm Limited

* Directly return input for 64-bit GPR<=>GPR bitcast

Copyright (c) 2022 Arm Limited
2022-09-21 09:20:28 -07:00
Chris Fallin
05cbd667c7 Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892)
* Cranelift: use regalloc2 constraints on caller side of ABI code.

This PR updates the shared ABI code and backends to use register-operand
constraints rather than explicit pinned-vreg moves for register
arguments and return values.

The s390x backend was not updated, because it has its own implementation
of ABI code. Ideally we could converge back to the code shared by x64
and aarch64 (which didn't exist when s390x ported calls to ISLE, so the
current situation is underestandable, to be clear!). I'll leave this for
future work.

This PR exposed several places where regalloc2 needed to be a bit more
flexible with constraints; it requires regalloc2#74 to be merged and
pulled in.

* Update to regalloc2 0.3.3.

In addition to version bump, this required removing two asserts as
`SpillSlot`s no longer carry their class (so we can't assert that they
have the correct class).

* Review comments.

* Filetest updates.

* Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade.

* Update to regalloc2 0.4.0.
2022-09-21 01:17:04 +00:00
Damian Heaton
e9b08b856d Port icmp to ISLE (AArch64) (#4898)
* Port `icmp` to ISLE (AArch64)

Ported the existing implementation of `icmp` (and, by extension, the
`lower_icmp` function) to ISLE for AArch64.

Copyright (c) 2022 Arm Limited

* Allow 'producer chains', eliminating `Nop0`s

Copyright (c) 2022 Arm Limited
2022-09-13 08:56:50 -07:00
Trevor Elliott
ad09c273c6 Don't merge loads for xmm registers (#4891)
Do not merge loads for xmm registers, as alignment requirements currently aren't satisfied with clif lowered from wasm.

Fixes #4890
2022-09-12 10:14:35 -07:00
Chris Fallin
96bfd4e8c0 s390x: update some regalloc metadata to remove use of reg_mod. (#4856)
* s390x: update some regalloc metadata to remove use of `reg_mod`.

This is a step toward ultimately removing modify-operands, which along
with removal of pinned vregs, lets us move to a completely
constraint-based and fully-SSA regalloc input and get some nice
advantages eventually.

There are still a few uses of `mod` operands and pinned vregs remaining,
especially around the "regpair" abstraction. Those proved to be a bit
trickier to update though, so will have to be done separately.

* Review feedback: restore two-arg pretty-print form.

* Review feedback.
2022-09-09 18:43:36 -05:00
Chris Fallin
2986f6b0ff ABI: implement register arguments with constraints. (#4858)
* ABI: implement register arguments with constraints.

Currently, Cranelift's ABI code emits a sequence of moves from physical
registers into vregs at the top of the function body, one for every
register-carried argument.

For a number of reasons, we want to move to operand constraints instead,
and remove the use of explicitly-named "pinned vregs"; this allows for
better regalloc in theory, as it removes the need to "reverse-engineer"
the sequence of moves.

This PR alters the ABI code so that it generates a single "args"
pseudo-instruction as the first instruction in the function body. This
pseudo-inst defs all register arguments, and constrains them to the
appropriate registers at the def-point. Subsequently the regalloc can
move them wherever it needs to.

Some care was taken not to have this pseudo-inst show up in
post-regalloc disassemblies, but the change did cause a general regalloc
"shift" in many tests, so the precise-output updates are a bit noisy.
Sorry about that!

A subsequent PR will handle the other half of the ABI code, namely, the
callsite case, with a similar preg-to-constraint conversion.

* Update based on review feedback.

* Review feedback.
2022-09-08 18:03:14 -07:00
Chris Fallin
13c7846815 Cranelift: add a vreg limit check to correctly return an error on too-large inputs. (#4882)
Previously, Cranelift panicked (via a a panic in regalloc2) when the
virtual-register limit of 2M (2^21) was reached. This resulted in a
perplexing and unhelpful failure when the user provided a too-large
input (such as the Wasm module in #4865).

This PR adds an explicit check when allocating vregs that fails with a
"code too large" error when the limit is hit, producing output such as
(on the minimized testcase from #4865):

```
Error: failed to compile wasm function 3785 at offset 0xa3f3

Caused by:
    Compilation error: Code for function is too large
```

Fixes #4865.
2022-09-08 10:04:59 -07:00
Anton Kirilov
d8b290898c Initial forward-edge CFI implementation (#3693)
* Initial forward-edge CFI implementation

Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.

Copyright (c) 2022, Arm Limited.

* Refactor `from_artifacts` to avoid second `make_executable` (#1)

This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.

* Address the code review feedback

Copyright (c) 2022, Arm Limited.

Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-09-08 09:35:58 -05:00
Trevor Elliott
caad14826c Rework the ISA flag checking extractors for x64 (#4878)
Using fallible extractors that produce no values for flag checks means
that it's not possible to pattern match cases where those flags are
false. This change reworks the existing flag-checking extractors to be
infallible, returning the flag's boolean value from the context instead.
2022-09-07 13:49:35 -07:00
Andrew Brown
f063082474 x64: remove Inst::XmmLoadConst (#4876)
This is a cherry-pick of a long-ago commit, 2d46637. The original
message reads:

> Now that `SyntheticAmode` can refer to constants, there is no longer a
> need for a separate instruction format--standard load instructions will
> work.

Since then, the transition to ISLE and the use of `XmmLoadConst` in many
more places makes this change a larger diff than the original. The basic
idea is the same, though: the extra indirection of `Inst::XMmLoadConst`
is removed and replaced by a direct use of `VCodeConstant` as a
`SyntheticAmode`. This has no effect on codegen, but the CLIF output is
now clearer in that the actual instruction is displayed (e.g., `movdqu`)
instead of a made-up instruction (`load_const`).
2022-09-07 12:52:13 -07:00
Jamey Sharp
e694a6f5d4 Allocate less while constructing cranelift-fuzzgen tests (#4863)
* Improve panic message if typevar_operand is None

* cranelift-fuzzgen: Don't allocate for each choice

I don't think the performance of test-case generation is at all
important here. I'm actually doing this in preparation for a bigger
refactor where I want to be able to borrow the list of valid choices for
a given opcode without worrying about lifetimes.

* cranelift-fuzzgen: Remove next_func_index

It's only used locally within `generate_funcrefs`, so it doesn't need to
be in the FunctionBuilder struct.

Also there's already a local counter that I think is good enough for
this. As far as I know, the function indexes only need to be distinct,
not contiguous.

* cranelift-fuzzgen: Separate resources from config

The function-global variables, blocks, etc that are generated before
generating instructions are all owned collections without any lifetime
parameters. By contrast, the Unstructured and Config are both borrowed.
Separating them will make it easier to borrow from the owned resources.
2022-09-07 12:19:55 -07:00
Afonso Bordado
f57b4412ec cranelift: Implement missing i128 rotates on AArch64 (#4866) 2022-09-07 11:11:47 -07:00
Anton Kirilov
dd07e354b4 Cranelift AArch64: Fix the get_return_address lowering (#4851)
The previous implementation assumed that nothing had clobbered the
LR register since the current function had started executing, so
it would be incorrect for a non-leaf function, for example, that
contains the `get_return_address` operation right after a call.
The operation is valid only if the `preserve_frame_pointers` flag
is enabled, which implies that the presence of a frame record on
the stack is guaranteed.

Copyright (c) 2022, Arm Limited.
2022-09-07 11:09:22 -07:00
Jamey Sharp
3d6d49daba cranelift: Remove of/nof overflow flags from icmp (#4879)
* cranelift: Remove of/nof overflow flags from icmp

Neither Wasmtime nor cg-clif use these flags under any circumstances.
From discussion on #3060 I see it's long been unclear what purpose these
flags served.

Fixes #3060, fixes #4406, and fixes #4875... by deleting all the code
that could have been buggy.

This changes the cranelift-fuzzgen input format by removing some IntCC
options, so I've gone ahead and enabled I128 icmp tests at the same
time. Since only the of/nof cases were failing before, I expect these to
work.

* Restore trapif tests

It's still useful to validate that iadd_ifcout's iflags result can be
forwarded correctly to trapif, and for that purpose it doesn't really
matter what condition code is checked.
2022-09-07 08:38:41 -07:00
Jamey Sharp
9856664f1f Make DataValue, not Ieee32/64, respect IEEE754 (#4860)
* cranelift-codegen: Remove all uses of DataValue

This type is only used by the interpreter, cranelift-fuzzgen, and
filetests. I haven't found another convenient crate for those to all
depend on where this type can live instead, but this small refactor at
least makes it obvious that code generation does not in any way depend
on the implementation of this type.

* Make DataValue, not Ieee32/64, respect IEEE754

This fixes #4857 by partially reverting #4849.

It turns out that Ieee32 and Ieee64 need bitwise equality semantics so
they can be used as hash-table keys.

Moving the IEEE754 semantics up a layer to DataValue makes sense in
conjunction with #4855, where we introduced a DataValue::bitwise_eq
alternative implementation of equality for those cases where users of
DataValue still want the bitwise equality semantics.

* cranelift-interpreter: Use eq/ord from DataValue

This fixes #4828, again, now that the comparison operators on DataValue
have the right IEEE754 semantics.

* Add regression test from issue #4857
2022-09-03 00:26:14 +00:00
Afonso Bordado
7e45cff459 cranelift: Bitwise compare fuzzgen results (#4855) 2022-09-02 19:34:16 +00:00
Afonso Bordado
3afb711a51 cranelift: Document Ieee{32,64} implementation (#4854) 2022-09-02 19:02:22 +00:00
Afonso Bordado
f30a7eb0c9 cranelift: Implement PartialEq on Ieee{32,64} (#4849)
* cranelift: Add `fcmp` tests

Some of these are disabled on aarch64 due to not being implemented yet.

* cranelift: Implement float PartialEq for Ieee{32,64} (fixes #4828)

Previously `PartialEq` was auto derived. This means that it was implemented in terms of PartialEq in a u32.

This is not correct for floats because `NaN != NaN`.

PartialOrd was manually implemented in 6d50099816, but it seems like it was an oversight to leave PartialEq out until now.

The test suite depends on the previous behaviour so we adjust it to keep comparing bits instead of floats.

* cranelift: Disable `fcmp ord` tests on aarch64

* cranelift: Disable `fcmp ueq` tests on aarch64
2022-09-02 10:42:42 -07:00
Anton Kirilov
48bf078c83 Cranelift AArch64: Fix the atomic memory operations (#4831)
Previously the implementations of the various atomic memory IR operations
ignored the memory operation flags that were passed.

Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 09:35:21 -07:00