Commit Graph

168 Commits

Author SHA1 Message Date
T0b1-iOS
569089e473 Add {u,s}{add,sub,mul}_overflow instructions (#5784)
* add `{u,s}{add,sub,mul}_overflow` with interpreter

* add `{u,s}{add,sub,mul}_overflow` for x64

* add `{u,s}{add,sub,mul}_overflow` for aarch64

* 128bit filetests for `{u,s}{add,sub,mul}_overflow`

* `{u,s}{add,sub,mul}_overflow` emit tests for x64

* `{u,s}{add,sub,mul}_overflow` emit tests for aarch64

* Initial review changes

* add `with_flags_extended` helper

* add `with_flags_chained` helper
2023-04-11 20:16:04 +00:00
Remo Senekowitsch
7eb8914090 Chaos mode MVP: Skip branch optimization in MachBuffer (#6039)
* fuzz: Add chaos mode control plane

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: Skip branch optimization with chaos mode

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: Rename chaos engine -> control plane

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* chaos mode: refactoring ControlPlane to be passed through the call stack by reference

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Remo Senekowitsch <contact@remsle.dev>

* fuzz: annotate chaos todos

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: cleanup control plane

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: remove control plane from compiler context

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: move control plane into emit state

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fuzz: fix remaining compiler errors

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* fix tests

* refactor emission state ctrl plane accessors

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* centralize conditional compilation of chaos mode

Also cleanup a few straggling dependencies on cranelift-control
that aren't needed anymore.

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* add cranelift-control to published crates

prtest:full

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

* add cranelift-control to public crates

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>

---------

Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
Co-authored-by: Remo Senekowitsch <contact@remsle.dev>
2023-04-05 19:28:46 +00:00
Alex Crichton
967543eb43 aarch64: Add more lowerings for the CLIF fma (#6150)
This commit adds new lowerings to the AArch64 backend of the
element-based `fmla` and `fmls` instructions. These instructions have
one of the multiplicands as an implicit broadcast of a single lane of
another register and can help remove `shuffle` or `dup` instructions
that would otherwise be used to implement them.
2023-04-05 17:22:55 +00:00
Nathan Whitaker
c3decdf910 cranelift: Implement TLS on aarch64 Mach-O (Apple Silicon) (#5434)
* Implement TLS on Aarch64 Mach-O

* Add aarch64 macho TLS filetest

* Address review comments

- `Aarch64` instead of `AArch64` in comments
- Remove unnecessary guard in tls_value lowering
- Remove unnecessary regalloc metadata in emission

* Use x1 as temporary register in emission

- Instead of passing in a temporary register to use when emitting
the TLS code, just use `x1`, as it's already in the clobber set.
This also keeps the size of `aarch64::inst::Inst` at 32 bytes.
- Update filetest accordingly

* Update aarch64 mach-o TLS filetest
2023-03-24 17:54:01 +00:00
Alex Crichton
a3b21031d4 Add a MachBuffer::defer_trap method (#6011)
* Add a `MachBuffer::defer_trap` method

This commit adds a new method to `MachBuffer` to defer trap opcodes to
the end of a function in a similar manner to how constants are deferred
to the end of the function. This is useful for backends which frequently
use `TrapIf`-style opcodes. Currently a jump is emitted which skips the
next instruction, a trap, and then execution continues normally. While
there isn't any pressing problem with this construction the trap opcode
is in the middle of the instruction stream as opposed to "off on the
side" despite rarely being taken.

With this method in place all the backends (except riscv64 since I
couldn't figure it out easily enough) have a new lowering of their
`TrapIf` opcode. Now a trap is deferred, which returns a label, and then
that label is jumped to when executing the trap. A fixup is then
recorded in `MachBuffer` to get patched later on during emission, or at
the end of the function. Subsequently all `TrapIf` instructions
translate to a single branch plus a single trap at the end of the
function.

I've additionally further updated some more lowerings in the x64 backend
which were explicitly using traps to instead use `TrapIf` where
applicable to avoid jumping over traps mid-function. Other backends
didn't appear to have many jump-over-the-next-trap patterns.

Lots of tests have had their expectations updated here which should
reflect all the traps being sunk to the end of functions.

* Print trap code on all platforms

* Emit traps before constants

* Preserve source location information for traps

* Fix test expectations

* Attempt to fix s390x

The MachBuffer was registering trap codes with the first byte of the
trap, but the SIGILL handler was expecting it to be registered with the
last byte of the trap. Exploit that SIGILL is always represented with a
2-byte instruction and always march 2-backwards for SIGILL, continuing
to march backwards 1 byte for SIGFPE-generating instructions.

* Back out s390x changes

* Back out more s390x bits

* Review comments
2023-03-20 21:24:47 +00:00
Alex Crichton
03b5dbb3e0 aarch64: Use VCodeConstant for f64/v128 constants (#5997)
* aarch64: Translate float and splat lowering to ISLE

I was looking into `constant_f128` and its fallback lowering into memory
and to get familiar with the code I figured it'd be good to port some
Rust logic to ISLE. This commit ports the `constant_{f128,f64,f32}`
helpers into ISLE from Rust as well as the `splat_const` helper which
ended up being closely related.

Tests reflect a number of regalloc changes that happened but also namely
one major difference is that in the lowering of `f32` a 32-bit immediate
is created now instead of a 64-bit immediate (in a GP register before
it's moved into a FP register). This semantically has no change but the
generated code is slightly different in a few minor cases.

* aarch64: Load f64/v128 constants from a pool

This commit removes the `LoadFpuConst64` and `LoadFpuConst128`
pseudo-instructions from the AArch64 backend which internally loaded a
nearby constant and then jumped over it. Constants now go through the
`VCodeConstant` infrastructure which gets placed at the end of the
function similar to how x64 works. Some minor support was added in as
well to add a new addressing mode for a `MachLabel`-relative load.
2023-03-13 19:33:52 +00:00
Alex Crichton
52896e020d aarch64: Add specialized shuffle lowerings (#5977)
* aarch64: Add `shuffle` lowerings for the `uzp{1,2}` instructions

This commit uses the same style of patterns in the x64 backend to start
adding specific lowerings of the Cranelift `shuffle` instruction to
particular AArch64 instructions.

* aarch64: Add `shuffle` lowerings to the `zip{1,2}` instructions

These instructions match the `punpck*` family of instructions on x64 and
should help provide more efficient lowerings than the current `shuffle`
fallback.

* aarch64: Add `shuffle` lowerings for `trn{1,2}`

Along the lines of prior commits adds specific patterns to lowering for
individual AArch64 instructions available.

* aarch64: Add a `shuffle` lowering for the `ext` instruction

This instruction will more-or-less concatenate two 128-bit vector
registers to create a 256-bit value, shift it right, and then take the
lower 128-bits into the destination. This can be modeled with a
`shuffle` of consecutive bytes so this adds a lowering rule to generate
this instruction.

* aarch64: Add `shuffle` special case for `dup`

This commit adds special cases for Cranelift's `shuffle` on AArch64 when
the lowering can be represented with a `dup` instruction which
broadcasts one vector's lane into all lanes of the destination.

* aarch64: Add `shuffle` specializations for `rev` instructions

This commit adds shuffle mask specializations for the `rev{16,32,64}`
family of instructions on AArch64 which can be used to reverse bytes,
16-bit values, or 32-bit values within larger values.

* Fix tests

* Add doc-comments in ISLE
2023-03-10 21:37:13 +00:00
Alex Crichton
9984e959cd aarch64: Add support for the fmls instruction (#5895)
This commit adds lowerings to the AArch64 backend for the `fmls`
instruction which is intended to be leveraged in the relaxed-simd
proposal for WebAssembly. This should hopefully allow for a
teeny-bit-more efficient codegen for this operator instead of using the
`fmla` instruction plus a negation instruction.
2023-03-02 05:45:58 +00:00
Afonso Bordado
eabd43a178 aarch64: Support GOT Relative relocations in PIC mode (#5550)
* cranelift: Add `adrp` encoding to AArch64 backend

* cranelift: Support GOT Symbol References in AArch64

* cranelift: Add MachO GOT relocations

* cranelift: Do not mark the GOT PageOffset12 MachO relocation as relative
2023-02-15 15:19:18 -08:00
Saúl Cabrera
f0979af157 cranelift-codegen: Prepare aarch64 for usage from Winch (#5570)
This commit exposes the necessary aarch64 pieces to be used by Winch for binary emission.
2023-01-13 19:46:25 +00:00
Trevor Elliott
d54a27d0ea Allocate temporary intermediates when loading constants on aarch64 (#5366)
As loading constants on aarch64 can take up to 4 instructions, we need to plumb through some additional registers. Rather than pass a fixed list of registers in, pass an allocation function.
2022-12-01 22:29:36 +00:00
Alex Crichton
830885383f Implement inline stack probes for AArch64 (#5353)
* Turn off probestack by default in Cranelift

The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-by-default status requires the aarch64 and
s390x implementations to be a stub. Turning off probestack by default
allows the s390x and aarch64 backends to panic with an error message to
avoid providing a false sense of security. When the probestack option is
implemented for all backends, however, it may be reasonable to
re-enable.

* aarch64: Improve codegen for AMode fallback

Currently the final fallback for finalizing an `AMode` will generate
both a constant-loading instruction as well as an `add` instruction to
the base register into the same temporary. This commit improves the
codegen by removing the `add` instruction and folding the final add into
the finalized `AMode`. This changes the `extendop` used but both
registers are 64-bit so shouldn't be affected by the extending
operation.

* aarch64: Implement inline stack probes

This commit implements inline stack probes for the aarch64 backend in
Cranelift. The support here is modeled after the x64 support where
unrolled probes are used up to a particular threshold after which a loop
is generated. The instructions here are similar in spirit to x64 except
that unlike x64 the stack pointer isn't modified during the unrolled
loop to avoid needing to re-adjust it back up at the end of the loop.

* Enable inline probestack for AArch64 and Riscv64

This commit enables inline probestacks for the AArch64 and Riscv64
architectures in the same manner that x86_64 has it enabled now. Some
more testing was additionally added since on Unix platforms we should be
guaranteed that Rust's stack overflow message is now printed too.

* Enable probestack for aarch64 in cranelift-fuzzgen

* Address review comments

* Remove implicit stack overflow traps from x64 backend

This commit removes implicit `StackOverflow` traps inserted by the x64
backend for stack-based operations. This was historically required when
stack overflow was detected with page faults but Wasmtime no longer
requires that since it's not suitable for wasm modules which call host
functions. Additionally no other backend implements this form of
implicit trap-code additions so this is intended to synchronize the
behavior of all the backends.

This fixes a test added prior for aarch64 to properly abort the process
instead of accidentally being caught by Wasmtime.

* Fix a style issue
2022-11-30 12:30:00 -06:00
Trevor Elliott
a007e02bd2 Add fixed_nonallocatable constraints when appropriate (#5253)
Plumb the set of allocatable registers through the OperandCollector and use it validate uses of fixed-nonallocatable registers, like %rsp on x86_64.
2022-11-15 12:49:17 -08:00
Trevor Elliott
0367fbc2d4 cranelift: Rework pinned register lowering (#5249)
Rework pinned register lowering to avoid the use of pinned virtual registers, instead using the MovFromPReg and MovToPReg pseudo instructions.
2022-11-10 16:19:25 -08:00
11evan
4ca9e82bd1 cranelift: Add Bswap instruction (#1092) (#5147)
Adds Bswap to the Cranelift IR. Implements the Bswap instruction
in the x64 and aarch64 codegen backends. Cranelift users can now:
```
builder.ins().bswap(value)
```
to get a native byteswap instruction.

* x64: implements the 32- and 64-bit bswap instruction, following
the pattern set by similar unary instrutions (Neg and Not) - it
only operates on a dst register, but is parameterized with both
a src and dst which are expected to be the same register.

As x64 bswap instruction is only for 32- or 64-bit registers,
the 16-bit swap is implemented as a rotate left by 8.

Updated x64 RexFlags type to support emitting for single-operand
instructions like bswap

* aarch64: Bswap gets emitted as aarch64 rev16, rev32,
or rev64 instruction as appropriate.

* s390x: Bswap was already supported in backend, just had to add
a bit of plumbing

* For completeness, added bswap to the interpreter as well.

* added filetests and runtests for each ISA

* added bswap to fuzzgen, thanks to afonso360 for the code there

* 128-bit swaps are not yet implemented, that can be done later
2022-10-31 19:30:00 +00:00
Chris Fallin
2986f6b0ff ABI: implement register arguments with constraints. (#4858)
* ABI: implement register arguments with constraints.

Currently, Cranelift's ABI code emits a sequence of moves from physical
registers into vregs at the top of the function body, one for every
register-carried argument.

For a number of reasons, we want to move to operand constraints instead,
and remove the use of explicitly-named "pinned vregs"; this allows for
better regalloc in theory, as it removes the need to "reverse-engineer"
the sequence of moves.

This PR alters the ABI code so that it generates a single "args"
pseudo-instruction as the first instruction in the function body. This
pseudo-inst defs all register arguments, and constrains them to the
appropriate registers at the def-point. Subsequently the regalloc can
move them wherever it needs to.

Some care was taken not to have this pseudo-inst show up in
post-regalloc disassemblies, but the change did cause a general regalloc
"shift" in many tests, so the precise-output updates are a bit noisy.
Sorry about that!

A subsequent PR will handle the other half of the ABI code, namely, the
callsite case, with a similar preg-to-constraint conversion.

* Update based on review feedback.

* Review feedback.
2022-09-08 18:03:14 -07:00
Anton Kirilov
d8b290898c Initial forward-edge CFI implementation (#3693)
* Initial forward-edge CFI implementation

Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.

Copyright (c) 2022, Arm Limited.

* Refactor `from_artifacts` to avoid second `make_executable` (#1)

This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.

* Address the code review feedback

Copyright (c) 2022, Arm Limited.

Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-09-08 09:35:58 -05:00
Anton Kirilov
48bf078c83 Cranelift AArch64: Fix the atomic memory operations (#4831)
Previously the implementations of the various atomic memory IR operations
ignored the memory operation flags that were passed.

Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 09:35:21 -07:00
Anton Kirilov
d2e19b8d74 Cranelift AArch64: Migrate AMode to ISLE (#4832)
Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 00:24:46 +00:00
Chris Fallin
ae5fe8a728 aarch64: fix up regalloc2 semantics. (#4830)
This PR removes all uses of modify-operands in the aarch64 backend,
replacing them with reused-input operands instead. This has the nice
effect of removing a bunch of move instructions and more clearly
representing inputs and outputs.

This PR also removes the explicit use of pinned vregs in the aarch64
backend, instead using fixed-register constraints on the operands when
insts or pseudo-inst sequences require certain registers.

This is the second PR in the regalloc-semantics cleanup series; after
the remaining backend (s390x) and the ABI code are cleaned up as well,
we'll be able to simplify the regalloc2 frontend.
2022-09-01 21:25:20 +00:00
Chris Fallin
1a59b3e6c6 AArch64: port tls_value to ISLE. (#4821) 2022-08-30 16:51:15 +00:00
Damian Heaton
3d9d759380 Port fcmp to ISLE (AArch64) (#4819)
Ported the existing implementation of `fcmp` for AArch64 to ISLE.

This also ports the `lower_vector_comparison` method to ISLE.

Copyright (c) 2022 Arm Limited
2022-08-30 09:06:15 -07:00
Damian Heaton
94bcbe8446 Port Fcopysign..FcvtToSintSat to ISLE (AArch64) (#4753)
* Port `Fcopysign`..``FcvtToSintSat` to ISLE (AArch64)

Ported the existing implementations of the following opcodes to ISLE on
AArch64:
- `Fcopysign`
  - Also introduced missing support for `fcopysign` on vector values, as
    per the docs.
  - This introduces the vector encoding for the `SLI` machine
    instruction.
- `FcvtToUint`
- `FcvtToSint`
- `FcvtFromUint`
- `FcvtFromSint`
- `FcvtToUintSat`
- `FcvtToSintSat`

Copyright (c) 2022 Arm Limited

* Document helpers and abstract conversion checks
2022-08-24 10:37:14 -07:00
Damian Heaton
3b68d76905 Port widening ops to ISLE (AArch64) (#4751)
Ported the existing implementations of the following opcodes for AArch64
to ISLE, and implemented support for 64-bit vectors (per the docs):
- `SwidenLow`
- `SwidenHigh`
- `UwidenLow`
- `UwidenHigh`

Also ported `WideningPairwiseDotProductS` as-is.

Copyright (c) 2022 Arm Limited
2022-08-23 09:42:11 -07:00
Anton Kirilov
1481721c9d Enable back-edge CFI by default on macOS (#4720)
Also, adjust the tests that are executed on that platform. Finally,
fix a bug with obtaining backtraces when back-edge CFI is enabled.

Copyright (c) 2022, Arm Limited.
2022-08-17 15:06:20 -05:00
Nick Fitzgerald
f0c60f46a8 Cranelift: Remove ABICallee trait (#4701)
* Cranelift: Remove `ABICallee` trait

It has only one implementation: the `ABICalleeImpl` struct. By using that
directly we can avoid unnecessary layers of generics and abstractions as well as
a couple `Box`es that were previously putting the single implementation into a
`Box<dyn>`.

* Cranelift: Rename `ABICalleeImpl` to `AbiCallee`

* Fix comments as per review

* Rename `AbiCallee` to `Callee`
2022-08-15 18:27:05 +00:00
Afonso Bordado
863cbc345c cranelift: Fix icmp.i128 eq for aarch64 (#4706)
* cranelift: Fix `icmp.i128 eq` for aarch64

* cranelift: Use ccmp in `icmp.i128 eq` for aarch64
2022-08-15 11:11:22 -07:00
Benjamin Bouvier
8a9b1a9025 Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. 

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
  - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
  - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
  - This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. 

Fixes #4155.
2022-08-12 16:47:43 +00:00
Damian Heaton
e463890f26 Port AvgRound & SqmulRoundSat to ISLE (AArch64) (#4639)
Ported the existing implementations of the following opcodes on AArch64
to ISLE:
- `AvgRound`
  - Also introduced support for `i64x2` vectors, as per the docs.
- `SqmulRoundSat`

Copyright (c) 2022 Arm Limited
2022-08-08 11:35:43 -07:00
Damian Heaton
47a67d752b Split Fmla and Bsl out into new VecRRRMod op (#4638)
Separates the following opcodes for AArch64 into a separate `VecALUModOp` enum,
which is emitted via the `VecRRRMod` instruction. This separates vector ALU
instructions which modify a register from instructions which write to a new register:
- `Bsl`
- `Fmla`

Addresses [a discussion](https://github.com/bytecodealliance/wasmtime/pull/4608#discussion_r937975581) in #4608.

Copyright (c) 2022 Arm Limited
2022-08-08 11:33:13 -07:00
Chris Fallin
c5e3c0cafb AArch64: don't assert inst within worst-case size when island emitted. (#4627)
We assert after emitting each instruction that its size was less than
the "worst-case size", which is used to determine when we need to
proactively emit an island so pending branch fixups don't go out of
bounds. However, the `EmitIsland` pseudo-inst itself can cause an
arbitrarily large island to be emitted; this should not have to fit
within the worst-case size (because island size is explicitly accounted
for by the threshold computation). This PR fixes the assert accordingly.

Fixes #4626.
2022-08-05 17:27:56 -07:00
Damian Heaton
eb332b8369 Convert fma, valltrue & vanytrue to ISLE (AArch64) (#4608)
* Convert `fma`, `valltrue` & `vanytrue` to ISLE (AArch64)

Ported the existing implementations of the following opcodes to ISLE on
AArch64:
- `fma`
  - Introduced missing support for `fma` on vector values, as per the
    docs.
- `valltrue`
- `vanytrue`

Also fixed `fcmp` on scalar values in the interpreter, and enabled
interpreter tests in `simd-fma.clif`.

This introduces the `FMLA` machine instruction.

Copyright (c) 2022 Arm Limited

* Add comments for `Fmla` and `Bsl`

Copyright (c) 2022 Arm Limited
2022-08-05 09:47:56 -07:00
Anton Kirilov
a897742593 Initial back-edge CFI implementation (#3606)
Give the user the option to sign and to authenticate function
return addresses with the operations introduced by the Pointer
Authentication extension to the Arm instruction set architecture.

Copyright (c) 2021, Arm Limited.
2022-08-03 11:08:29 -07:00
Nick Fitzgerald
42bba452a6 Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573)
* Cranelift: Add instructions for getting the current stack/frame pointers and return address

This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535

* x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead

We just special case getting operands from `Amode`s now.

* Fix s390x `get_return_address`; require `preserve_frame_pointers=true`

* Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp

* Handle non-allocatable registers in Amode::with_allocs

* Use "stack" instead of "r15" on s390x

* r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`
2022-08-02 14:37:17 -07:00
Chris Fallin
43f1765272 Cranellift: remove Baldrdash support and related features. (#4571)
* Cranellift: remove Baldrdash support and related features.

As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.

In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.

This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the  understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425

* Review feedback.

* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.

The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.

* Review feedback.
2022-08-02 19:37:56 +00:00
Anton Kirilov
a47a82d2e5 Cranelift AArch64: Harden the Spectre mitigations (#4555)
Use the `CSDB` instruction following Arm's recommendation.

Copyright (c) 2022, Arm Limited.
2022-08-01 10:20:48 -07:00
Benjamin Bouvier
8d0224341c cranelift: Introduce a feature to enable trace logs (#4484)
* Don't use `log::trace` directly but a feature-enabled `trace` macro
* Don't emit disassembly based on the log level
2022-08-01 11:19:15 +02:00
Anton Kirilov
ead6edb0c5 Cranelift AArch64: Migrate Splat to ISLE (#4521)
Copyright (c) 2022, Arm Limited.
2022-07-26 17:57:15 +00:00
Sam Parker
c5ddb4b803 [AArch64] Port SIMD narrowing to ISLE (#4478)
* [AArch64] Port SIMD narrowing to ISLE

Fvdemote, snarrow, unarrow and uunarrow.

Also refactor the aarch64 instructions descriptions to parameterize
on ScalarSize instead of using different opcodes.

The zero_value pure constructor has been introduced and used by the
integer narrow operations and it replaces, and extends, the compare
zero patterns.

Copright (c) 2022, Arm Limited.

* use short 'if' patterns
2022-07-25 12:40:36 -07:00
Damian Heaton
d792646677 Implement iabs in ISLE (AArch64) (#4399)
* Implement `iabs` in ISLE (AArch64)

Converts the existing implementation of `iabs` for AArch64 into ISLE,
and fixes support for `iabs` on scalar values.

Copyright (c) 2022 Arm Limited.

* Improve scalar `iabs` implementation.

Also introduces `CSNeg` instruction.

Copyright (c) 2022 Arm Limited
2022-07-18 11:12:34 -07:00
Damian Heaton
db7f9ccd2b Convert scalar_to_vector to ISLE (AArch64) (#4401)
* Convert `scalar_to_vector` to ISLE (AArch64)

Converted the exisiting implementation of `scalar_to_vector` for AArch64 to
ISLE.

Copyright (c) 2022 Arm Limited

* Add support for floats and fix FpuExtend

- Added rules to cover `f32 -> f32x4` and `f64 -> f64x2` for
`scalar_to_vector`
- Added tests for `scalar_to_vector` on floats.
- Corrected an invalid instruction emitted by `FpuExtend` on 64-bit
values.

Copyright (c) 2022 Arm Limited
2022-07-18 11:11:54 -07:00
Sam Parker
9c43749dfe [RFC] Dynamic Vector Support (#4200)
Introduce a new concept in the IR that allows a producer to create
dynamic vector types. An IR function can now contain global value(s)
that represent a dynamic scaling factor, for a given fixed-width
vector type. A dynamic type is then created by 'multiplying' the
corresponding global value with a fixed-width type. These new types
can be used just like the existing types and the type system has a
set of hard-coded dynamic types, such as I32X4XN, which the user
defined types map onto. The dynamic types are also used explicitly
to create dynamic stack slots, which have no set size like their
existing counterparts. New IR instructions are added to access these
new stack entities.

Currently, during codegen, the dynamic scaling factor has to be
lowered to a constant so the dynamic slots do eventually have a
compile-time known size, as do spill slots.

The current lowering for aarch64 just targets Neon, using a dynamic
scale of 1.

Copyright (c) 2022, Arm Limited.
2022-07-07 12:54:39 -07:00
Sam Parker
fb61774df2 [AArch64] Port AtomicLoad and AtomicStore to ISLE (#4301)
Copyright (c) 2022, Arm Limited.
2022-06-29 12:12:48 -07:00
Anton Kirilov
25a588c35f Cranelift AArch64: Use an allocated encoding for Udf (#4281)
Preserve the current behaviour when code is generated for SpiderMonkey.

Copyright (c) 2022, Arm Limited.
2022-06-22 15:03:28 +01:00
Benjamin Bouvier
6e828df632 Remove unused SourceLoc in many Mach data structures (#4180)
* Remove unused srcloc in MachReloc

* Remove unused srcloc in MachTrap

* Use `into_iter` on array in bench code to suppress a warning

* Remove unused srcloc in MachCallSite
2022-05-23 09:27:28 -07:00
Benjamin Bouvier
71fc16bbeb Narrow allow(dead_code) declarations (#4116)
* Narrow `allow(dead_code)` declarations

Having module wide `allow(dead_code)` may hide some code that's really
dead. In this commit I just narrowed the declarations to the specific
enum variants that were not used (as it seems reasonable to keep them
and their handling in all the matches, for future use). And the compiler
found more dead code that I think we can remove safely in the short
term.

With this, the only files annotated with a module-wide
`allow(dead_code)` are isle-generated files.

* resurrect some functions as test helpers
2022-05-10 12:02:52 +02:00
Chris Fallin
61dc38c065 Implement Spectre mitigations for table accesses and br_tables. (#4092)
Currently, we have partial Spectre mitigation: we protect heap accesses
with dynamic bounds checks. Specifically, we guard against errant
accesses on the misspeculated path beyond the bounds-check conditional
branch by adding a conditional move that is also dependent on the
bounds-check condition. This data dependency on the condition is not
speculated and thus will always pick the "safe" value (in the heap case,
a NULL address) on the misspeculated path, until the pipeline flushes
and recovers onto the correct path.

This PR uses the same technique both for table accesses -- used to
implement Wasm tables -- and for jumptables, used to implement Wasm
`br_table` instructions.

In the case of Wasm tables, the cmove picks the table base address on
the misspeculated path. This is equivalent to reading the first table
entry. This prevents loads of arbitrary data addresses on the
misspeculated path.

In the case of `br_table`, the cmove picks index 0 on the misspeculated
path. This is safer than allowing a branch to an address loaded from an
index under misspeculation (i.e., it preserves control-flow integrity
even under misspeculation).

The table mitigation is controlled by a Cranelift setting, on by
default. The br_table mitigation is always on, because it is part of the
single lowering pseudoinstruction. In both cases, the impact should be
minimal: a single extra cmove in a (relatively) rarely-used operation.

The table mitigation is architecture-independent (happens during
legalization); the br_table mitigation has been implemented for both x64
and aarch64. (I don't know enough about s390x to implement this
confidently there, but would happily review a PR to do the same on that
platform.)
2022-05-02 11:19:16 -07:00
Sam Parker
12b4374cd5 [AArch64] Port atomic rmw to ISLE (#4021)
Also fix and extend the current implementation:
- AtomicRMWOp::Clr != AtomicRmwOp::And, as the input needs to be
  inverted first.
- Inputs to the cmp for the RMWLoop case are sign-extended when
  needed.
- Lower Xchg to Swp.
- Lower Sub to Add with a negated input.
- Added more runtests.

Copyright (c) 2022, Arm Limited.
2022-04-27 13:13:59 -07:00
Sam Parker
cf533a8041 [AArch64] Merge Fcmp32 and Fcmp64 (#4032)
Copyright (c) 2022, Arm Limited.
2022-04-14 15:39:43 -07:00
Sam Parker
682ef7b470 [AArch64] Refactor Mov instructions (#4033)
Merge Mov32 and Mov64 into a single instruction parameterized by a new
OperandSize field. Also combine the Mov[K,N,Z] into a single instruction
with a new opcode to select between the operations.

Copyright (c) 2022, Arm Limited.
2022-04-14 14:51:12 -07:00