Commit Graph

171 Commits

Author SHA1 Message Date
Trevor Elliott
b077854b57 Generate SSA code from returns (#5172)
Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.
2022-11-08 16:00:49 -08:00
Alex Crichton
22159848c5 Fix instruction size test for Rust 1.65.0 (#5188)
Looks like Rust generously shrank our `enum` in 1.65.0, so update the
test assertion to pass CI.
2022-11-03 16:53:51 +00:00
11evan
4ca9e82bd1 cranelift: Add Bswap instruction (#1092) (#5147)
Adds Bswap to the Cranelift IR. Implements the Bswap instruction
in the x64 and aarch64 codegen backends. Cranelift users can now:
```
builder.ins().bswap(value)
```
to get a native byteswap instruction.

* x64: implements the 32- and 64-bit bswap instruction, following
the pattern set by similar unary instrutions (Neg and Not) - it
only operates on a dst register, but is parameterized with both
a src and dst which are expected to be the same register.

As x64 bswap instruction is only for 32- or 64-bit registers,
the 16-bit swap is implemented as a rotate left by 8.

Updated x64 RexFlags type to support emitting for single-operand
instructions like bswap

* aarch64: Bswap gets emitted as aarch64 rev16, rev32,
or rev64 instruction as appropriate.

* s390x: Bswap was already supported in backend, just had to add
a bit of plumbing

* For completeness, added bswap to the interpreter as well.

* added filetests and runtests for each ISA

* added bswap to fuzzgen, thanks to afonso360 for the code there

* 128-bit swaps are not yet implemented, that can be done later
2022-10-31 19:30:00 +00:00
Trevor Elliott
32a7593c94 cranelift: Remove booleans (#5031)
Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements.

Fixes #3205

Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-10-17 16:00:27 -07:00
Chris Fallin
05cbd667c7 Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892)
* Cranelift: use regalloc2 constraints on caller side of ABI code.

This PR updates the shared ABI code and backends to use register-operand
constraints rather than explicit pinned-vreg moves for register
arguments and return values.

The s390x backend was not updated, because it has its own implementation
of ABI code. Ideally we could converge back to the code shared by x64
and aarch64 (which didn't exist when s390x ported calls to ISLE, so the
current situation is underestandable, to be clear!). I'll leave this for
future work.

This PR exposed several places where regalloc2 needed to be a bit more
flexible with constraints; it requires regalloc2#74 to be merged and
pulled in.

* Update to regalloc2 0.3.3.

In addition to version bump, this required removing two asserts as
`SpillSlot`s no longer carry their class (so we can't assert that they
have the correct class).

* Review comments.

* Filetest updates.

* Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade.

* Update to regalloc2 0.4.0.
2022-09-21 01:17:04 +00:00
Chris Fallin
2986f6b0ff ABI: implement register arguments with constraints. (#4858)
* ABI: implement register arguments with constraints.

Currently, Cranelift's ABI code emits a sequence of moves from physical
registers into vregs at the top of the function body, one for every
register-carried argument.

For a number of reasons, we want to move to operand constraints instead,
and remove the use of explicitly-named "pinned vregs"; this allows for
better regalloc in theory, as it removes the need to "reverse-engineer"
the sequence of moves.

This PR alters the ABI code so that it generates a single "args"
pseudo-instruction as the first instruction in the function body. This
pseudo-inst defs all register arguments, and constrains them to the
appropriate registers at the def-point. Subsequently the regalloc can
move them wherever it needs to.

Some care was taken not to have this pseudo-inst show up in
post-regalloc disassemblies, but the change did cause a general regalloc
"shift" in many tests, so the precise-output updates are a bit noisy.
Sorry about that!

A subsequent PR will handle the other half of the ABI code, namely, the
callsite case, with a similar preg-to-constraint conversion.

* Update based on review feedback.

* Review feedback.
2022-09-08 18:03:14 -07:00
Andrew Brown
f063082474 x64: remove Inst::XmmLoadConst (#4876)
This is a cherry-pick of a long-ago commit, 2d46637. The original
message reads:

> Now that `SyntheticAmode` can refer to constants, there is no longer a
> need for a separate instruction format--standard load instructions will
> work.

Since then, the transition to ISLE and the use of `XmmLoadConst` in many
more places makes this change a larger diff than the original. The basic
idea is the same, though: the extra indirection of `Inst::XMmLoadConst`
is removed and replaced by a direct use of `VCodeConstant` as a
`SyntheticAmode`. This has no effect on codegen, but the CLIF output is
now clearer in that the actual instruction is displayed (e.g., `movdqu`)
instead of a made-up instruction (`load_const`).
2022-09-07 12:52:13 -07:00
Chris Fallin
385bd0cbf8 x64: fix CvtFloatToUintSeq: do not clobber src. (#4842)
This slipped through the regalloc2 operand code update in #4811: the
CvtFloatToUintSeq pseudo-instruction actually clobbers its source. It
was marked as a "mod" operand in the original and I mistakenly
converted it to a "use" as I had not seen the actual clobber. The
instruction now takes an extra temp and makes a copy of `src` in the
appropriate place.

Fixes #4840.
2022-09-01 22:46:57 +00:00
Afonso Bordado
08e7a7f1a0 cranelift: Add inline stack probing for x64 (#4747)
* cranelift: Add inline stack probe for x64

* cranelift: Cleanups comments

Thanks @jameysharp!
2022-09-01 22:32:54 +00:00
Trevor Elliott
fb8b9838fe Add MInst.XmmUnaryRmRImm to handle rounding instructions (#4823)
Add a new pseudo-instruction, XmmUnaryRmRImm, to handle instructions like roundss that only use their first register argument for the instruction's result. This has the added benefit of allowing the isle wrappers for those instructions to take an XmmMem argument, allowing for more cases where loads may be merged.
2022-08-31 08:29:32 -07:00
Chris Fallin
186c7c3b89 x64: clean up regalloc-related semantics on several instructions. (#4811)
* x64: clean up regalloc-related semantics on several instructions.

This PR removes all uses of "modify" operands on instructions in the x64
backend, and also removes all uses of "pinned vregs", or vregs that are
explicitly tied to particular physical registers. In place of both of
these mechanisms, which are legacies of the old regalloc design and
supported via compatibility code, the backend now uses operand
constraints. This is more flexible as it allows the regalloc to see the
liveranges and constraints without "reverse-engineering" move instructions.

Eventually, after removing all such uses (including in other backends
and by the ABI code), we can remove the compatibility code in regalloc2,
significantly simplifying its liverange-construction frontend and
thus allowing for higher confidence in correctness as well as possibly a
bit more compilation speed.

Curiously, there are a few extra move instructions now; they are likely
poor splitting decisions and I can try to chase these down later.

* Fix cranelift-codegen tests.

* Review feedback.
2022-08-30 17:21:14 -07:00
Trevor Elliott
25d960f9c4 x64: Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE (#4793)
Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE.
2022-08-26 16:33:48 -07:00
Trevor Elliott
c1f9736938 x64: Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE (#4787)
Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE.
2022-08-26 09:07:22 -07:00
Trevor Elliott
9386409607 x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780)
Lower extractlane, scalar_to_vector and splat in ISLE.

This PR also makes some changes to the SinkableLoad api
* change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem
* add reg_mem_to_reg_mem_imm and register it as an automatic conversion
2022-08-25 09:38:03 -07:00
Trevor Elliott
b8b6f2781e x64: Lower shuffle and swizzle in ISLE (#4772)
Lower `shuffle` and `swizzle` in ISLE.

This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result.

I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: 94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)
2022-08-24 21:49:51 +00:00
Trevor Elliott
b5f1ab7780 x64: Lower stack_addr, udiv, sdiv, urem, srem, umulhi, smulhi in ISLE (#4741)
Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE.

For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.
2022-08-23 11:22:49 -07:00
Trevor Elliott
80c77da334 x64: Lower bitcast, fabs, and fneg in ISLE (#4729)
* Add tests for bitcast

* Migrate bitcast to ISLE

* Add tests for fabs

* Lower fabs in ISLE

* Add tests for fneg

* Lower fneg in ISLE
2022-08-18 17:59:23 -07:00
Trevor Elliott
3c1490dd59 x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE (#4704)
https://github.com/bytecodealliance/wasmtime/pull/4704
2022-08-16 09:03:50 -07:00
Trevor Elliott
498e7156b4 Remove the handling of cmpps in produces_const (#4714)
https://github.com/bytecodealliance/wasmtime/pull/4714
2022-08-15 15:48:01 -07:00
Trevor Elliott
1d0f6fa4fb Fix a bug in produces_const (#4709)
https://github.com/bytecodealliance/wasmtime/pull/4709
2022-08-15 19:00:33 +00:00
Nick Fitzgerald
f0c60f46a8 Cranelift: Remove ABICallee trait (#4701)
* Cranelift: Remove `ABICallee` trait

It has only one implementation: the `ABICalleeImpl` struct. By using that
directly we can avoid unnecessary layers of generics and abstractions as well as
a couple `Box`es that were previously putting the single implementation into a
`Box<dyn>`.

* Cranelift: Rename `ABICalleeImpl` to `AbiCallee`

* Fix comments as per review

* Rename `AbiCallee` to `Callee`
2022-08-15 18:27:05 +00:00
Benjamin Bouvier
8a9b1a9025 Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. 

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
  - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
  - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
  - This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. 

Fixes #4155.
2022-08-12 16:47:43 +00:00
Afonso Bordado
3ea1813173 x64: Add native lowering for scalar fma (#4539)
Use `vfmadd213{ss,sd}` for these lowerings.
2022-08-11 22:48:16 +00:00
Trevor Elliott
0c2e0494bd x64: Lower fcvt_from_uint in ISLE (#4684)
* Add a test for the existing behavior of fcvt_from_unit

* Migrate the I8, I16, I32 cases of fcvt_from_uint

* Implement the I64 case of fcvt_from_uint

* Add a test for the existing behavior of fcvt_from_uint.f64x2

* Migrate fcvt_from_uint.f64x2 to ISLE

* Lower the last case of `fcvt_from_uint`

* Add a test for `fcvt_from_uint`

* Finish lowering fcmp_from_uint

* Format
2022-08-11 12:28:41 -07:00
Afonso Bordado
c5bc368cfe cranelift: Add COFF TLS Support (#4546)
* cranelift: Implement COFF TLS Relocations

* cranelift: Emit SecRel relocations

* cranelift: Handle _tls_index symbol in backend
2022-08-11 09:33:40 -07:00
Trevor Elliott
0c2a48f682 x64: Migrate selectif and selectif_spectre_guard to ISLE (#4619)
https://github.com/bytecodealliance/wasmtime/pull/4619
2022-08-05 09:36:11 -07:00
Trevor Elliott
dc8362ceec x64: Finish migrating brz and brnz to ISLE (#4614)
https://github.com/bytecodealliance/wasmtime/pull/4614
2022-08-04 12:58:43 -07:00
Trevor Elliott
1fc11bbe51 x64: Migrate brff and I128 branching instructions to ISLE (#4599)
https://github.com/bytecodealliance/wasmtime/pull/4599
2022-08-04 08:58:50 -07:00
Nick Fitzgerald
42bba452a6 Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573)
* Cranelift: Add instructions for getting the current stack/frame pointers and return address

This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535

* x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead

We just special case getting operands from `Amode`s now.

* Fix s390x `get_return_address`; require `preserve_frame_pointers=true`

* Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp

* Handle non-allocatable registers in Amode::with_allocs

* Use "stack" instead of "r15" on s390x

* r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`
2022-08-02 14:37:17 -07:00
Chris Fallin
43f1765272 Cranellift: remove Baldrdash support and related features. (#4571)
* Cranellift: remove Baldrdash support and related features.

As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.

In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.

This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the  understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425

* Review feedback.

* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.

The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.

* Review feedback.
2022-08-02 19:37:56 +00:00
Trevor Elliott
25782b527e x64: Migrate trapif and trapff to ISLE (#4545)
https://github.com/bytecodealliance/wasmtime/pull/4545
2022-08-01 11:24:11 -07:00
Benjamin Bouvier
8d0224341c cranelift: Introduce a feature to enable trace logs (#4484)
* Don't use `log::trace` directly but a feature-enabled `trace` macro
* Don't emit disassembly based on the log level
2022-08-01 11:19:15 +02:00
Trevor Elliott
7ac6134894 x64: Shrink Inst from 72 to 48 bytes (#4514)
https://github.com/bytecodealliance/wasmtime/pull/4514
2022-07-27 10:39:22 -07:00
Afonso Bordado
02c3b47db2 x64: Implement SIMD fma (#4474)
* x64: Add VEX Instruction Encoder

This uses a similar builder pattern to the EVEX Encoder.
Does not yet support memory accesses.

* x64: Add FMA Flag

* x64: Implement SIMD `fma`

* x64: Use 4 register Vex Inst

* x64: Reorder VEX pretty print args
2022-07-25 22:01:02 +00:00
Trevor Elliott
06407dd337 Add a test to prevent x64 Inst size slipping further (#4489)
* Add a test to prevent x64 Inst size slipping further

* Enable the test for 64-bit pointers only
2022-07-21 00:01:33 +00:00
Andrew Brown
8629cbc6a4 x64: port atomic_rmw to ISLE (#4389)
* x64: port `atomic_rmw` to ISLE

This change ports `atomic_rmw` to ISLE for the x64 backend. It does not
change the lowering in any way, though it seems possible that the fixed
regs need not be as fixed and that there are opportunities for single
instruction lowerings. It does rename `inst_common::AtomicRmwOp` to
`MachAtomicRmwOp` to disambiguate with the IR enum with the same name.

* x64: remove remaining hardcoded register constraints for `atomic_rmw`

* x64: use `SyntheticAmode` in `AtomicRmwSeq`

* review: add missing reg collector for amode

* review: collect memory registers in the 'late' phase
2022-07-06 23:58:59 +00:00
bjorn3
d1446f767d Mark return value as define instead of clobber for TLS pseudoinstructions (#4357) 2022-06-30 10:44:51 -07:00
Chris Fallin
b2e28b917a Cranelift: update to latest regalloc2: (#4324)
- Handle call instructions' clobbers with the clobbers API, using RA2's
  clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers
  list;

- Pull in changes from bytecodealliance/regalloc2#59 for much more sane
  edge-case behavior w.r.t. liverange splitting.
2022-06-28 09:01:59 -07:00
Chris Fallin
5c2c285dd7 Cranelift/x64: fix register allocator metadata for 8-bit divides. (#4332)
`idiv` on x86-64 only reads `rdx`/`edx`/`dx`/`dl` for divides with width
greater than 8 bits; for an 8-bit divide, it reads the whole 16-bit
divisor from `ax`, as our CISC ancestors intended. This PR fixes the
metadata to avoid a regalloc panic (due to undefined `rdx`) in this
case. Does not affect Wasmtime or other Wasm-frontend embedders.
2022-06-27 12:31:06 -07:00
Alex Crichton
8bb07523e2 x64: Fix codegen for the select instruction with v128 (#4317)
This commit fixes a bug in the previous codegen for the `select`
instruction when the operations of the `select` were of the `v128` type.
Previously teh `XmmCmove` instruction only stored an `OperandSize` of 32
or 64 for a 64 or 32-bit move, but this was also used for these 128-bit
types which meant that when used the wrong move instruction was
generated. The fix applied here is to store the whole `Type` being moved
so the 128-bit variant can be selected as well.
2022-06-27 11:02:40 -07:00
Andrew Brown
816aae6aca x64: port some atomics to ISLE (#4212)
* x64: port `fence` to ISLE
* x64: port `atomic_load` to ISLE
* x64: port `atomic_store` to ISLE
2022-06-02 14:13:10 -07:00
Benjamin Bouvier
6e828df632 Remove unused SourceLoc in many Mach data structures (#4180)
* Remove unused srcloc in MachReloc

* Remove unused srcloc in MachTrap

* Use `into_iter` on array in bench code to suppress a warning

* Remove unused srcloc in MachCallSite
2022-05-23 09:27:28 -07:00
Chris Fallin
019ebf47b1 x64: fix pretty-printing argument order for XmmRmR instructions. (#4094)
The pretty-printing had swapped dst and src2; this was introduced when
we moved to RA2 (sorry about that! IMHO we should do something to
automate the mapping between regalloc arg collection and pretty
printing/emission).

`src2` comes at the end because it has a variable number of register
mentions; this is in line with how many of the other inst formats work.

Actual emitted code was never incorrect, just the pretty-printing.

Updated test golden outputs look correct to me now, including the one
that we saw was incorrect in #3945.
2022-05-03 10:12:58 -07:00
Chris Fallin
61dc38c065 Implement Spectre mitigations for table accesses and br_tables. (#4092)
Currently, we have partial Spectre mitigation: we protect heap accesses
with dynamic bounds checks. Specifically, we guard against errant
accesses on the misspeculated path beyond the bounds-check conditional
branch by adding a conditional move that is also dependent on the
bounds-check condition. This data dependency on the condition is not
speculated and thus will always pick the "safe" value (in the heap case,
a NULL address) on the misspeculated path, until the pipeline flushes
and recovers onto the correct path.

This PR uses the same technique both for table accesses -- used to
implement Wasm tables -- and for jumptables, used to implement Wasm
`br_table` instructions.

In the case of Wasm tables, the cmove picks the table base address on
the misspeculated path. This is equivalent to reading the first table
entry. This prevents loads of arbitrary data addresses on the
misspeculated path.

In the case of `br_table`, the cmove picks index 0 on the misspeculated
path. This is safer than allowing a branch to an address loaded from an
index under misspeculation (i.e., it preserves control-flow integrity
even under misspeculation).

The table mitigation is controlled by a Cranelift setting, on by
default. The br_table mitigation is always on, because it is part of the
single lowering pseudoinstruction. In both cases, the impact should be
minimal: a single extra cmove in a (relatively) rarely-used operation.

The table mitigation is architecture-independent (happens during
legalization); the br_table mitigation has been implemented for both x64
and aarch64. (I don't know enough about s390x to implement this
confidently there, but would happily review a PR to do the same on that
platform.)
2022-05-02 11:19:16 -07:00
Chris Fallin
dd45f44511 x64 backend: add lowerings with load-op-store fusion. (#4071)
x64 backend: add lowerings with load-op-store fusion.

These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP
%reg, (mem)`) -- i.e., x86 instructions that load from memory, perform
an ALU operation, and store the result, all in one instruction. Using
these instruction forms, we can merge three CLIF ops together: a load,
an arithmetic operation, and a store.
2022-04-26 18:58:26 -07:00
Chris Fallin
5774e068b7 Cranelift: fix regalloc2 integration bug wrt blockparam branch args. (#4042)
Previously, the block successor accumulation and the blockparam branch
arg setup were decoupled. The lowering backend implicitly specified
the order of successor edges via its `MachTerminator` enum on the last
instruction in the block, while the `Lower` toplevel
machine-independent driver set up blockparam branch args in the edge
order seen in CLIF.

In some cases, these orders did not match -- for example, when the
conditional branch depended on an FP condition that was implemented by
swapping taken/not-taken edges and inverting the condition code.

This PR refactors the successor handling to be centralized in `Lower`
rather than flow through the terminator `MachInst`, and adds a
successor block and its blockparam args at the same time, ensuring the
orders match.
2022-04-18 09:53:57 -07:00
Chris Fallin
a0318f36f0 Switch Cranelift over to regalloc2. (#3989)
This PR switches Cranelift over to the new register allocator, regalloc2.

See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801)
for a summary of the design changes. This switchover has implications for
core VCode/MachInst types and the lowering pass.

Overall, this change brings improvements to both compile time and speed of
generated code (runtime), as reported in #3942:

```
Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A
```
2022-04-14 10:28:21 -07:00
Chris Fallin
cd173cfe8e ISLE: port fmin, fmax, fmin_pseudo, fmax_pseudo on x64. (#3856) 2022-02-28 14:40:26 -08:00
Chris Fallin
24f145cd1e Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848) 2022-02-28 09:45:13 -08:00
Andrew Brown
f87c61176a x64: port select to ISLE (#3682)
* x64: port `select` using an FP comparison to ISLE

This change includes quite a few interlocking parts, required mainly by
the current x64 conventions in ISLE:
 - it adds a way to emit a `cmove` with multiple OR-ing conditions;
   because x64 ISLE cannot currently safely emit a comparison followed
   by several jumps, this adds `MachInst::CmoveOr` and
   `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro
   instructions hide the multi-instruction sequence in `lower.isle`
 - to properly keep track of what instructions consume and produce
   flags, @cfallin added a way to pass around variants of
   `ConsumesFlags` and `ProducesFlags`--these changes affect all
   backends
 - then, to lower the `fcmp + select` CLIF, this change adds several
   `cmove*_from_values` helpers that perform all of the awkward
   conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one
   upside is that now these lowerings have much-improved documentation
   explaining why the various `FloatCC` and `CC` choices are made the
   the way they are.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-02-23 10:03:16 -08:00