Commit Graph

991 Commits

Author SHA1 Message Date
Sam Parker
010e028d67 [AArch64] Port AtomicCAS to isle (#4140)
Copyright (c) 2022, Arm Limited.
2022-05-25 09:19:24 +01:00
Chris Fallin
b830c3cf93 Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182)
RA2 recently removed the need for a dedicated scratch register for
cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive
performance impact on function bodies that were register-constrained, as
it means that one more register is available. In Sightglass, I measured
+5-8% on `blake3-scalar`, at least among current benchmarks.
2022-05-23 12:51:04 -07:00
Benjamin Bouvier
6e828df632 Remove unused SourceLoc in many Mach data structures (#4180)
* Remove unused srcloc in MachReloc

* Remove unused srcloc in MachTrap

* Use `into_iter` on array in bench code to suppress a warning

* Remove unused srcloc in MachCallSite
2022-05-23 09:27:28 -07:00
Chris Fallin
32622b3e6f Cranelift: fix use of pinned reg with SysV calling convention. (#4176)
Previously, the pinned register (enabled by the `enable_pinned_reg`
Cranelift setting and used via the `get_pinned_reg` and `set_pinned_reg`
CLIF ops) was only used when Cranelift was embedded in SpiderMonkey, in
order to support a pinned heap register. SpiderMonkey has its own
calling convention in Cranelift (named after the integration layer,
"Baldrdash").

However, the feature is more general, and should be usable with the
default system calling convention too, e.g. SysV or Windows Fastcall.

This PR fixes the ABI code to properly treat the pinned register as a
globally allocated register -- and hence an implicit input and output to
every function, not saved/restored in the prologue/epilogue -- for SysV
on x86-64 and aarch64, and Fastcall on x86-64.

Fixes #4170.
2022-05-23 09:18:51 -07:00
Andrew Brown
e898cb750a x64: remove TODO for i128 load (#4159)
This work has already been finished in a previous PR.
2022-05-17 17:43:43 -07:00
Anton Kirilov
edf07a8da6 Cranelift AArch64: Migrate Bitselect and Vselect to ISLE (#4139)
Copyright (c) 2022, Arm Limited.
2022-05-16 09:39:28 -07:00
Ulrich Weigand
0243a16679 s390x: Fix bitwise operations (#4146)
Current codegen had a number of logic errors confusing
NAND with AND WITH COMPLEMENT, and NOR with OR WITH COMPLEMENT.

Add support for the missing z15 instructions and fix logic.
2022-05-12 10:05:22 -07:00
Chris Fallin
5d671952ee Cranelift: do not check in generated ISLE code; regenerate on every compile. (#4143)
This PR fixes #4066: it modifies the Cranelift `build.rs` workflow to
invoke the ISLE DSL compiler on every compilation, rather than only
when the user specifies a special "rebuild ISLE" feature.

The main benefit of this change is that it vastly simplifies the mental
model required of developers, and removes a bunch of failure modes
we have tried to work around in other ways. There is now just one
"source of truth", the ISLE source itself, in the repository, and so there
is no need to understand a special "rebuild" step and how to handle
merge errors. There is no special process needed to develop the compiler
when modifying the DSL. And there is no "noise" in the git history produced
by constantly-regenerated files.

The two main downsides we discussed in #4066 are:
- Compile time could increase, by adding more to the "meta" step before the main build;
- It becomes less obvious where the source definitions are (everything becomes
  more "magic"), which makes exploration and debugging harder.

This PR addresses each of these concerns:

1. To maintain reasonable compile time, it includes work to cut down the
   dependencies of the `cranelift-isle` crate to *nothing* (only the Rust stdlib),
   in the default build. It does this by putting the error-reporting bits
   (`miette` crate) under an optional feature, and the logging (`log` crate) under
   a feature-controlled macro, and manually writing an `Error` impl rather than
   using `thiserror`. This completely avoids proc macros and the `syn` build slowness.

   The user can still get nice errors out of `miette`: this is enabled by specifying
   a Cargo feature `--features isle-errors`.

2. To allow the user to optionally inspect the generated source, which nominally
   lives in a hard-to-find path inside `target/` now, this PR adds a feature `isle-in-source-tree`
   that, as implied by the name, moves the target for ISLE generated source into
   the source tree, at `cranelift/codegen/isle_generated_source/`. It seems reasonable
   to do this when an explicit feature (opt-in) is specified because this is how ISLE regeneration
   currently works as well. To prevent surprises, if the feature is *not* specified, the
   build fails if this directory exists.
2022-05-11 22:25:24 -07:00
Chris Fallin
eb435f3057 x64: use constant pool for u64 constants rather than movabs. (#4088)
* Allow emitting u64 constants into constant pool.

* Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm.

* Fix rip-relative addressing bug in pinsrd emission.
2022-05-10 09:21:05 -07:00
Benjamin Bouvier
71fc16bbeb Narrow allow(dead_code) declarations (#4116)
* Narrow `allow(dead_code)` declarations

Having module wide `allow(dead_code)` may hide some code that's really
dead. In this commit I just narrowed the declarations to the specific
enum variants that were not used (as it seems reasonable to keep them
and their handling in all the matches, for future use). And the compiler
found more dead code that I think we can remove safely in the short
term.

With this, the only files annotated with a module-wide
`allow(dead_code)` are isle-generated files.

* resurrect some functions as test helpers
2022-05-10 12:02:52 +02:00
Chris Fallin
2af8d1e93c Cranelift/ISLE: re-apply prio-trie fix, this time with fixed fix. (#4117)
* ISLE compiler: fix priority-trie interval bug. (#4093)

This PR fixes a bug in the ISLE compiler related to rule priorities.

An important note first: the bug did not affect the correctness of the
Cranelift backends, either in theory (because the rules should be
correct applied in any order, even contrary to the stated priorities)
or in practice (because the generated code actually does not change at
all with the DSL compiler fix, only with a separate minimized bug
example).

The issue was a simple swap of `min` for `max` (see first
commit). This is the minimal fix, I think, to get a correct
priority-trie with the minimized bug example in this commit.

However, while debugging this, I started to convince myself that the
complexity of merging multiple priority ranges using the sort of
hybrid interval tree / string-matching trie data structure was
unneeded. The original design was built with the assumption we might
have a bunch of different priority levels, and would need the
efficiency of merging where possible. But in practice we haven't used
priorities this way: the vast majority of lowering rules exist at the
default (priority 0), and just a few overrides are explicitly at prio
1, 2 or (rarely) 3.

So, it turns out to be a lot simpler to label trie edges with (prio,
symbol) rather than (prio-range, symbol), and delete the whole mess of
interval-splitting logic on insertion. It's easier (IMHO) to convince
oneself that the resulting insertion algorithm is correct.

I was worried that this might impact the size of the generated Rust
code or its runtime, but In fact, to my initial surprise (but it makes
sense given the above "rarely used" factor), the generated code with
this compiler fix is *exactly the same*. I rebuilt with `--features
rebuild-isle,all-arch` but... there were no diffs to commit! This is
to me the simplest evidence that we didn't really need that
complexity.

* Fix earlier commit from #4093: properly sort trie.

This commit fixes an in-hindsight-obvious bug in #4093: the trie's edges
must be sorted recursively, not just at the top level.

With this fix, the generated code differs only in one cosmetic way (a
let-binding moves) but otherwise is the same.

This includes @fitzgen's fix to the CI (from the revert in #4102) that
deletes manifests to actually check that the checked-in source is
consistent with the checked-in compiler. The force-rebuild step is now
in a shell script for convenience: anyone hacking on the ISLE compiler
itself can use this script to more easily rebuild everything.

* Add note to build.rs to remind to update force-rebuild-isle.sh
2022-05-09 16:36:48 -07:00
Chris Fallin
019ebf47b1 x64: fix pretty-printing argument order for XmmRmR instructions. (#4094)
The pretty-printing had swapped dst and src2; this was introduced when
we moved to RA2 (sorry about that! IMHO we should do something to
automate the mapping between regalloc arg collection and pretty
printing/emission).

`src2` comes at the end because it has a variable number of register
mentions; this is in line with how many of the other inst formats work.

Actual emitted code was never incorrect, just the pretty-printing.

Updated test golden outputs look correct to me now, including the one
that we saw was incorrect in #3945.
2022-05-03 10:12:58 -07:00
Chris Fallin
f85047b084 Rework x64 addressing-mode lowering to be slightly more flexible. (#4080)
This PR refactors the x64 backend address-mode lowering to use an
incremental-build approach, where it considers each node in a tree of
`iadd`s that feed into a load/store address and, at each step, builds
the best possible `Amode`. It will combine an arbitrary number of
constant offsets (an extension beyond the current rules), and can
capture a left-shifted (scaled) index in any position of the tree
(another extension).

This doesn't have any measurable performance improvement on our Wasm
benchmarks in Sightglass, unfortunately, because the IR lowered from
wasm32 will do address computation in 32 bits and then `uextend` it to
add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64
bits because this loses the wraparound semantics.

(We could label adds as "expected not to overflow", and allow *those* to
be lifted to 64 bit operations; wasm32 heap address computation should
fit this.  This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's
likely my next step.)

Nevertheless, (i) this generalizes the cases we can handle, which should
be a good thing, all other things being equal (and in this case, no
compile time impact was measured); and (ii) might benefit non-Wasm
frontends.
2022-05-02 16:20:39 -07:00
Chris Fallin
61dc38c065 Implement Spectre mitigations for table accesses and br_tables. (#4092)
Currently, we have partial Spectre mitigation: we protect heap accesses
with dynamic bounds checks. Specifically, we guard against errant
accesses on the misspeculated path beyond the bounds-check conditional
branch by adding a conditional move that is also dependent on the
bounds-check condition. This data dependency on the condition is not
speculated and thus will always pick the "safe" value (in the heap case,
a NULL address) on the misspeculated path, until the pipeline flushes
and recovers onto the correct path.

This PR uses the same technique both for table accesses -- used to
implement Wasm tables -- and for jumptables, used to implement Wasm
`br_table` instructions.

In the case of Wasm tables, the cmove picks the table base address on
the misspeculated path. This is equivalent to reading the first table
entry. This prevents loads of arbitrary data addresses on the
misspeculated path.

In the case of `br_table`, the cmove picks index 0 on the misspeculated
path. This is safer than allowing a branch to an address loaded from an
index under misspeculation (i.e., it preserves control-flow integrity
even under misspeculation).

The table mitigation is controlled by a Cranelift setting, on by
default. The br_table mitigation is always on, because it is part of the
single lowering pseudoinstruction. In both cases, the impact should be
minimal: a single extra cmove in a (relatively) rarely-used operation.

The table mitigation is architecture-independent (happens during
legalization); the br_table mitigation has been implemented for both x64
and aarch64. (I don't know enough about s390x to implement this
confidently there, but would happily review a PR to do the same on that
platform.)
2022-05-02 11:19:16 -07:00
Chris Fallin
03793b71a7 ISLE: remove all uses of argument polarity, and remove it from the language. (#4091)
This PR removes "argument polarity": the feature of ISLE extractors that lets them take
inputs aside from the value to be matched.

Cases that need this expressivity have been subsumed by #4072 with if-let clauses;
we can now finally remove this misfeature of the language, which has caused significant
confusion and has always felt like a bit of a hack.

This PR (i) removes the feature from the ISLE compiler; (ii) removes it from the reference
documentation; and (iii) refactors away all uses of the feature in our three existing
backends written in ISLE.
2022-05-02 09:52:12 -07:00
Chris Fallin
eceb433b28 Remove =x uses from ISLE, and remove support from the DSL compiler. (#4078)
This is a follow-up on #4074: now that we have the simplified syntax, we
can remove the old, redundant syntax.
2022-04-28 11:17:08 -07:00
Sam Parker
12b4374cd5 [AArch64] Port atomic rmw to ISLE (#4021)
Also fix and extend the current implementation:
- AtomicRMWOp::Clr != AtomicRmwOp::And, as the input needs to be
  inverted first.
- Inputs to the cmp for the RMWLoop case are sign-extended when
  needed.
- Lower Xchg to Swp.
- Lower Sub to Add with a negated input.
- Added more runtests.

Copyright (c) 2022, Arm Limited.
2022-04-27 13:13:59 -07:00
Chris Fallin
dd45f44511 x64 backend: add lowerings with load-op-store fusion. (#4071)
x64 backend: add lowerings with load-op-store fusion.

These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP
%reg, (mem)`) -- i.e., x86 instructions that load from memory, perform
an ALU operation, and store the result, all in one instruction. Using
these instruction forms, we can merge three CLIF ops together: a load,
an arithmetic operation, and a store.
2022-04-26 18:58:26 -07:00
Chris Fallin
164bfeaf7e x64 backend: migrate stores, and remainder of loads (I128 case), to ISLE. (#4069) 2022-04-26 09:50:46 -07:00
Chris Fallin
f384938a10 x64 backend: fix a load-op merging bug with integer min/max. (#4068)
The recent work in #4061 introduced a notion of "unique uses" for CLIF
values that both simplified the load-op merging rules and allowed
loads to merge in some more places.

Unfortunately there's one factor that PR didn't account for: a unique
use at the CLIF level could become a multiple-use at the VCode level,
when a lowering uses a value multiple times!

Making this less error-prone in general is hard, because we don't know
the lowering in VCode until it's emitted, so we can't ahead-of-time
know that a value will be used multiple times and prevent its
merging. But we *can* know in the lowerings themselves when we're
doing this. At least we get a panic from regalloc when we get this
wrong; no bad code (uninitialized register being read) should ever
come from a backend bug like this.

This is still a bit less than ideal, but for now the fix is: in
`cmp_and_choose` in the x64 backend (which compares values, then
picks one or the other with a cmove), explicitly put values in
registers.

Fixes #4067 (thanks @Mrmaxmeier for the report!).
2022-04-25 10:32:09 -07:00
Chris Fallin
e4b7c8a737 Cranelift: fix #3953: rework single/multiple-use logic in lowering. (#4061)
* Cranelift: fix #3953: rework single/multiple-use logic in lowering.

This PR addresses the longstanding issue with loads trying to merge
into compares on x86-64, and more generally, with the lowering
framework falsely recognizing "single uses" of one op by
another (which would normally allow merging of side-effecting ops like
loads) when there is *indirect* duplication.

To fix this, we replace the direct `value_uses` count with a
transitive notion of uniqueness (not unlike Rust's `&`/`&mut` and how
a `&mut` downgrades to `&` when accessed through another `&`!). A
value is used multiple times transitively if it has multiple direct
uses, or is used by another op that is used multiple times
transitively.

The canonical example of badness is:

```
    v1 := load
    v2 := ifcmp v1, ...
    v3 := selectif v2, ...
    v4 := selectif v2, ...
```

both `v3` and `v4` effectively merge the `ifcmp` (`v2`), so even
though the use of `v1` is "unique", it is codegenned twice. This is
why we ~~can't have nice things~~ can't merge loads into
compares (#3953).

There is quite a subtle and interesting design space around this
problem and how we might solve it. See the long doc-comment on
`ValueUseState` in this PR for more justification for the particular
design here. In particular, this design deliberately simplifies a bit
relative to an "optimal" solution: some uses can *become* unique
depending on merging, but we don't design our data structures for such
updates because that would require significant extra costly
tracking (some sort of transitive refcounting). For example, in the
above, if `selectif` somehow did not merge `ifcmp`, then we would only
codegen the `ifcmp` once into its result register (and use that
register twice); then the load *is* uniquely used, and could be
merged. But that requires transitioning from "multiple use" back to
"unique use" with careful tracking as we do pattern-matching, which
I've chosen to make out-of-scope here for now. In practice, I don't
think it will matter too much (and we can always improve later).

With this PR, we can now re-enable load-op merging for compares. A
subsequent commit does this.

* Update x64 backend to allow load-op merging for `cmp`.

* Update filetests.

* Add test for cmp-mem merging on x64.

* Comment fixes.

* Rework ValueUseState analysis for better performance.

* Update s390x filetest: iadd_ifcout cannot merge loads anymore because it has multiple outputs (ValueUseState limitation)

* Address review comments.
2022-04-22 18:00:48 -07:00
Johnnie Birch
6a36a1d15d X64: Port Sqrt to ISLE (#4065) 2022-04-22 00:42:22 -07:00
Chris Fallin
5774e068b7 Cranelift: fix regalloc2 integration bug wrt blockparam branch args. (#4042)
Previously, the block successor accumulation and the blockparam branch
arg setup were decoupled. The lowering backend implicitly specified
the order of successor edges via its `MachTerminator` enum on the last
instruction in the block, while the `Lower` toplevel
machine-independent driver set up blockparam branch args in the edge
order seen in CLIF.

In some cases, these orders did not match -- for example, when the
conditional branch depended on an FP condition that was implemented by
swapping taken/not-taken edges and inverting the condition code.

This PR refactors the successor handling to be centralized in `Lower`
rather than flow through the terminator `MachInst`, and adds a
successor block and its blockparam args at the same time, ensuring the
orders match.
2022-04-18 09:53:57 -07:00
Sam Parker
cf533a8041 [AArch64] Merge Fcmp32 and Fcmp64 (#4032)
Copyright (c) 2022, Arm Limited.
2022-04-14 15:39:43 -07:00
Sam Parker
682ef7b470 [AArch64] Refactor Mov instructions (#4033)
Merge Mov32 and Mov64 into a single instruction parameterized by a new
OperandSize field. Also combine the Mov[K,N,Z] into a single instruction
with a new opcode to select between the operations.

Copyright (c) 2022, Arm Limited.
2022-04-14 14:51:12 -07:00
Sam Parker
dd442a4d2f [AArch64] Merge 32- and 64-bit FPUOp1 (#4031)
Copyright (c) 2022, Arm Limited.
2022-04-14 14:00:48 -07:00
Sam Parker
7c0ea28fc8 [AArch64] Merge 32- and 64-bit FPUOp2 (#4029)
And remove the unused saturating add/sub opcodes.

Copyright (c) 2022, Arm Limited.
2022-04-14 13:07:00 -07:00
Sam Parker
e142f587a7 [AArch64] Refactor ALUOp3 (#3950)
As well as adding generic pattern for msub along with runtests
for madd and msub.

Copyright (c) 2022, Arm Limited.
2022-04-14 12:16:56 -07:00
Chris Fallin
a0318f36f0 Switch Cranelift over to regalloc2. (#3989)
This PR switches Cranelift over to the new register allocator, regalloc2.

See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801)
for a summary of the design changes. This switchover has implications for
core VCode/MachInst types and the lowering pass.

Overall, this change brings improvements to both compile time and speed of
generated code (runtime), as reported in #3942:

```
Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A
```
2022-04-14 10:28:21 -07:00
Andrew Brown
7a55779c6b x64: fix miscompilation of select.i128 (#4017)
Issue #3963 identified a miscompilation with select in which the second
in the pair of `CMOV`s (one pair per `i128` register) used the wrong
flag. This change fixes the error in the x64 ISLE helper function
emitting these `CMOV` instructions.
2022-04-12 09:56:57 -07:00
Andrew Brown
f62199da8c x64: port load to ISLE (#3993)
This change moves the majority of the lowerings for CLIF's `load`
instruction over to ISLE. To do so, it also migrates the previous
mechanism for creating an `Amode` (`lower_to_amode`) to several ISLE
rules (see `to_amode`).
2022-04-07 18:31:22 -07:00
Andrew Brown
bd6fe11ca9 cranelift: remove load_complex and store_complex (#3976)
This change removes all variants of `load*_complex` and `store*_complex`
from Cranelift; this is a breaking change to the instructions exposed by
CLIF. The complete list of instructions removed is: `load_complex`,
`store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`,
`sload8_complex`, `uload16_complex`, `sload16_complex`,
`istore16_complex`, `uload32_complex`, `sload32_complex`,
`istore32_complex`, `uload8x8_complex`, `sload8x8_complex`,
`sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`,
`sload32x2_complex`.

The rationale for this removal is that the Cranelift backend now has the
ability to pattern-match multiple upstream additions in order to
calculate the address to access. Previously, this was not possible so
the `*_complex` instructions were needed. Over time, these instructions
have fallen out of use in this repository, making the additional
overhead of maintaining them a chore.
2022-03-31 10:05:10 -07:00
Andrew Brown
e8dd13cf87 x64: port the remainder of select to ISLE (#3973)
Previous changes had ported the difficult "`select` based on an `fcmp`"
patterns to ISLE; this completes porting of `select` by moving over the
final two kinds of patterns:
 - `select` based on an `icmp`
 - `select` based on a value
2022-03-30 13:32:26 -07:00
Andrew Brown
5d8dd648d7 x64: port fcmp to ISLE (#3967)
* x64: port scalar `fcmp` to ISLE

Implement the CLIF lowering for the `fcmp` to ISLE. This adds a new
type-matcher, `ty_scalar_float`, for detecting uses of `F32` and `F64`.

* isle: rename `vec128` to `ty_vec12`

This refactoring changes the name of the `vec128` matcher function to
follow the `ty_*` convention of the other type matchers. It also makes
the helper an inline function call.

* x64: port vector `fcmp` to ISLE
2022-03-29 15:41:49 -07:00
Andrew Brown
4d5bd5f90e x64: fix register allocation panic due to load-coalesced value (#3954)
Fuzz testing identified a lowering case for CLIF's `icmp` in which the
double use of a loaded operand resulted in a register allocation error.
This change manually adds `put_in_xmm` to avoid load-coalescing these
values and includes a CLIF filetest to trigger this issue. Closes #3951.

I opened #3953 to discuss a way in which this kind of mistake (i.e.,
forgetting to add `put_in_*` in certain situations) could be avoided.
2022-03-21 18:46:27 -07:00
Andrew Brown
3bfbb3226e x64: prefix all machine instructions with x64_ (#3947)
This change is refactoring only--it should have no logic changes. As
discussed previously, prefixing all machine code instructions with
`x64_` will make it easier to identify what parts of the ISLE code
correspond to single instructions and what parts rely on helpers that
may emit more than one instruction.
2022-03-18 17:53:15 -07:00
Andrew Brown
5fa104205d x64: improve generation of i128 icmp (#3946)
Previously, we used the flags of `AND` for `SETcc`. This change uses
`TEST` instead, which discards the AND result but sets the flags needed
for `SETcc`. This reduces register pressure slightly for this sequence.
2022-03-18 16:36:31 -07:00
Andrew Brown
e92cbfb283 x64: port icmp to ISLE (#3886)
* x64: port GPR-held `icmp` to ISLE
* x64: port equality `icmp` for i128 type
* x64: port `icmp` for vector types
* x64: rename from_intcc to intcc_to_cc
2022-03-18 11:22:09 -07:00
Chris Fallin
58062b5efe x64 backend: fix fpcmp to avoid load-op merging. (#3934)
The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of
its operands, which allows the compiler to merge a load with the compare
instruction (`ucomiss` or `ucomisd`).

Unfortunately, as we saw in #2576 for the integer-compare case, this
does not work with our lowering algorithm because compares can be
lowered more than once (unlike all other instructions) to reproduce the
flags where needed. Merging a load into an op that executes more than
once is invalid in general (the two loads may observe different values,
which violates the original program semantics because there was only one
load originally).

This does not result in a miscompilation, but instead will cause a panic
at regalloc time because the register that should have been defined by
the separate load is never written (the load is never emitted
separately).

I think this (very subtle, easy to miss) condition was unfortunately not
ported over when we moved the logic in #3682.

The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not
seem to trigger it, for a reason I haven't fully deduced. I just added
the verbatim function body (happens to come from `clang.wasm`) that
triggers the bug as a test.

Discovered while bringing up regalloc2 support. It's pretty unlikely to
hit by chance, which is why I think none of our fuzzing has hit it yet.
2022-03-16 09:48:20 -07:00
FreddieLiardet
13b9396931 Add vector compare to 0 optims (#3887)
Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>
2022-03-09 16:20:06 -08:00
Chris Fallin
26ce9a3853 Fix uextend on x64 for non-i32-source cases. (#3906)
In #3849, I moved uextend over to ISLE in the x64 backend. Unfortunately, the lowering patterns had a bug in the i32-to-i64 special case (when we know the generating instruction zeroes the upper 32 bits): it wasn't actually special casing for an i32 source! This meant that e.g. zero extends of the results of i8 adds did not work properly.

This PR fixes the bug and updates the runtest for extends significantly to cover the narrow-value cases.

No security impact to Wasm as Wasm does not use narrow integer types.

Thanks @bjorn3 for reporting!
2022-03-09 11:10:59 -08:00
Chris Fallin
cd173cfe8e ISLE: port fmin, fmax, fmin_pseudo, fmax_pseudo on x64. (#3856) 2022-02-28 14:40:26 -08:00
Chris Fallin
d9dfc44c32 ISLE: port more ops on x64 to lowering patterns. (#3855) 2022-02-28 13:28:42 -08:00
Chris Fallin
90a081a731 ISLE: port extend/reduce opcodes on x64. (#3849) 2022-02-28 11:49:28 -08:00
Chris Fallin
24f145cd1e Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848) 2022-02-28 09:45:13 -08:00
Ulrich Weigand
b064e60087 ISLE: Re-implement ValueSlice (#3784)
The current definition of `ValueSlice` is not usable, since any call to
a constructor returning a `ValueSlice` will extend the mutable borrow
on the context taken by the constructor call, with the result that it
cannot be passed to any other constructor ever.

Re-implement `ValueSlice` as a pair of a `ValueList` identifer plus an
offset into the list.  This type can simply be copied without requiring
a borrow on the context.
2022-02-24 15:24:40 -08:00
Ulrich Weigand
07d615d3f7 ISLE: Lowering of multi-output instructions (#3783)
This changes the output of the `lower` constructor from a
`ValueRegs` to a new `InstOutput` type, which is a vector
of `ValueRegs`.

Code in `lower_common` is updated to use this new type to
handle instructions with multiple outputs.  All back-ends
are updated to use the new type.
2022-02-24 14:03:06 -08:00
Chris Fallin
e8881b2cc0 ISLE lowering rules: make use of implicit conversions. (#3847)
This PR makes use of the new implicit-conversion feature of the ISLE DSL
that was introduced in #3807 in order to make the lowering rules
significantly simpler and more concise.

The basic idea is to eliminate the repetitive and mechanical use of
terms that convert from one type to another when there is only one real
way to do the conversion -- for example, to go from a `WritableReg` to a
`Reg`, the only sensible way is to use `writable_reg_to_reg`.

This PR generally takes any term of the form "A_to_B" and makes it an
automatic conversion, as well as some others that are similar in spirit.

The notable exception to the pure-value-convsion category is the
`put_in_reg` family of operations, which actually do have side-effects.
However, as noted in the doc additions in #3807, this is fine as long as
the side-effects are idempotent. And on balance, making `put_in_reg`
automatic is a significant clarity win -- together with other operand
converters, it enables rules like:

```
;; Add two registers.
(rule (lower (has_type (fits_in_64 ty)
                       (iadd x y)))
      (add ty x y))
```

There may be other converters that we could define to make the rules
even simpler; we can make such improvements as we think of them, but
this should be a good start!
2022-02-23 16:14:38 -08:00
Sam Parker
5b7df72bce [AArch64] Merge 32- and 64-bit BitOps (#3840)
Copyright (c) 2022, Arm Limited.
2022-02-23 11:36:23 -08:00
Sam Parker
d307a4ab9a [AArch64] Improve AtomicRMWLoop (#3839)
Add more tests, use accurate disassembly, respect data sizes and
simplify the Xchg implementation.

Copyright (c) 2022, Arm Limited
2022-02-23 10:47:59 -08:00