Commit Graph

506 Commits

Author SHA1 Message Date
Nick Fitzgerald
f6ae67f3f0 Cranelift(aarch64): Use an existing extractor instead of a new pure constructor (#5273) 2022-11-15 20:40:44 +00:00
Nick Fitzgerald
9967782726 Cranelift(Aarch64): Optimize lowering of icmps with immediates (#5252)
We can encode more constants into 12-bit immediates if we do the following
rewrite for comparisons with odd constants:

        A >= B + 1
    ==> A - 1 >= B
    ==> A > B
2022-11-15 09:18:55 -08:00
Trevor Elliott
0367fbc2d4 cranelift: Rework pinned register lowering (#5249)
Rework pinned register lowering to avoid the use of pinned virtual registers, instead using the MovFromPReg and MovToPReg pseudo instructions.
2022-11-10 16:19:25 -08:00
Trevor Elliott
b077854b57 Generate SSA code from returns (#5172)
Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.
2022-11-08 16:00:49 -08:00
Ulrich Weigand
3e5938e65a Support big- and little-endian lane order with bitcast (#5196)
Add a MemFlags operand to the bitcast instruction, where only the
`big` and `little` flags are accepted.  These define the lane order
to be used when casting between types of different lane counts.

Update all users to pass an appropriate MemFlags argument.

Implement lane swaps where necessary in the s390x back-end.

This is the final part necessary to fix
https://github.com/bytecodealliance/wasmtime/issues/4566.
2022-11-07 14:41:10 -08:00
Afonso Bordado
3ef30b5b67 cranelift: Rename i{min,max} to s{min,max} (#5187)
This brings these instructions with our general naming convention
of signed instructions being prefixed with `s`.
2022-11-03 18:20:33 +00:00
Afonso Bordado
2c69b94744 cranelift: Add support for bswap.i128 (#5186)
* fuzzgen: Request only one variable for bswap

This was included by accident. Bswap only has one input, instead of two.

* cranelift: Add `bswap.i128` support

Adds support only for x86, AArch64, S390X.

RISCV does not yet have bswap.
2022-11-03 18:03:37 +00:00
Trevor Elliott
aeceea28e2 Remove trapif and trapff (#5162)
This branch removes the trapif and trapff instructions, in favor of using an explicit comparison and trapnz. This moves us closer to removing iflags and fflags, but introduces the need to implement instructions like iadd_cout in the x64 and aarch64 backends.
2022-11-03 09:25:11 -07:00
Ulrich Weigand
961107ec63 Merge raw_bitcast and bitcast (#5175)
- Allow bitcast for vectors with differing lane widths
- Remove raw_bitcast IR instruction
- Change all users of raw_bitcast to bitcast
- Implement support for no-op bitcast cases across backends

This implements the second step of the plan outlined here:
https://github.com/bytecodealliance/wasmtime/issues/4566#issuecomment-1234819394
2022-11-02 10:16:27 -07:00
Alex Crichton
cd53bed898 Implement AOT compilation for components (#5160)
* Pull `Module` out of `ModuleTextBuilder`

This commit is the first in what will likely be a number towards
preparing for serializing a compiled component to bytes, a precompiled
artifact. To that end my rough plan is to merge all of the compiled
artifacts for a component into one large object file instead of having
lots of separate object files and lots of separate mmaps to manage. To
that end I plan on eventually using `ModuleTextBuilder` to build one
large text section for all core wasm modules and trampolines, meaning
that `ModuleTextBuilder` is no longer specific to one module. I've
extracted out functionality such as function name calculation as well as
relocation resolving (now a closure passed in) in preparation for this.

For now this just keeps tests passing, and the trajectory for this
should become more clear over the following commits.

* Remove component-specific object emission

This commit removes the `ComponentCompiler::emit_obj` function in favor
of `Compiler::emit_obj`, now renamed `append_code`. This involved
significantly refactoring code emission to take a flat list of functions
into `append_code` and the caller is responsible for weaving together
various "families" of functions and un-weaving them afterwards.

* Consolidate ELF parsing in `CodeMemory`

This commit moves the ELF file parsing and section iteration from
`CompiledModule` into `CodeMemory` so one location keeps track of
section ranges and such. This is in preparation for sharing much of this
code with components which needs all the same sections to get tracked
but won't be using `CompiledModule`. A small side benefit from this is
that the section parsing done in `CodeMemory` and `CompiledModule` is no
longer duplicated.

* Remove separately tracked traps in components

Previously components would generate an "always trapping" function
and the metadata around which pc was allowed to trap was handled
manually for components. With recent refactorings the Wasmtime-standard
trap section in object files is now being generated for components as
well which means that can be reused instead of custom-tracking this
metadata. This commit removes the manual tracking for the `always_trap`
functions and plumbs the necessary bits around to make components look
more like modules.

* Remove a now-unnecessary `Arc` in `Module`

Not expected to have any measurable impact on performance, but
complexity-wise this should make it a bit easier to understand the
internals since there's no longer any need to store this somewhere else
than its owner's location.

* Merge compilation artifacts of components

This commit is a large refactoring of the component compilation process
to produce a single artifact instead of multiple binary artifacts. The
core wasm compilation process is refactored as well to share as much
code as necessary with the component compilation process.

This method of representing a compiled component necessitated a few
medium-sized changes internally within Wasmtime:

* A new data structure was created, `CodeObject`, which represents
  metadata about a single compiled artifact. This is then stored as an
  `Arc` within a component and a module. For `Module` this is always
  uniquely owned and represents a shuffling around of data from one
  owner to another. For a `Component`, however, this is shared amongst
  all loaded modules and the top-level component.

* The "module registry" which is used for symbolicating backtraces and
  for trap information has been updated to account for a single region
  of loaded code holding possibly multiple modules. This involved adding
  a second-level `BTreeMap` for now. This will likely slow down
  instantiation slightly but if it poses an issue in the future this
  should be able to be represented with a more clever data structure.

This commit additionally solves a number of longstanding issues with
components such as compiling only one host-to-wasm trampoline per
signature instead of possibly once-per-module. Additionally the
`SignatureCollection` registration now happens once-per-component
instead of once-per-module-within-a-component.

* Fix compile errors from prior commits

* Support AOT-compiling components

This commit adds support for AOT-compiled components in the same manner
as `Module`, specifically adding:

* `Engine::precompile_component`
* `Component::serialize`
* `Component::deserialize`
* `Component::deserialize_file`

Internally the support for components looks quite similar to `Module`.
All the prior commits to this made adding the support here
(unsurprisingly) easy. Components are represented as a single object
file as are modules, and the functions for each module are all piled
into the same object file next to each other (as are areas such as data
sections). Support was also added here to quickly differentiate compiled
components vs compiled modules via the `e_flags` field in the ELF
header.

* Prevent serializing exported modules on components

The current representation of a module within a component means that the
implementation of `Module::serialize` will not work if the module is
exported from a component. The reason for this is that `serialize`
doesn't actually do anything and simply returns the underlying mmap as a
list of bytes. The mmap, however, has `.wasmtime.info` describing
component metadata as opposed to this module's metadata. While rewriting
this section could be implemented it's not so easy to do so and is
otherwise seen as not super important of a feature right now anyway.

* Fix windows build

* Fix an unused function warning

* Update crates/environ/src/compilation.rs

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-11-02 15:26:26 +00:00
Nick Fitzgerald
6d289723bd Cranelift: Use a single, shared vector allocation for all ABIArgs (#5127)
* Cranelift: Use a single, shared vector allocation for all `ABIArg`s

Instead of two `SmallVec`s per `SigData`.

* Remove `Deref` and `DerefMut` impls for `ArgsAccumulator`
2022-10-31 14:32:17 -07:00
Afonso Bordado
04fa8951ba aarch64: Deduplicate bmask lowering rule (#5154) 2022-10-31 12:44:35 -07:00
11evan
4ca9e82bd1 cranelift: Add Bswap instruction (#1092) (#5147)
Adds Bswap to the Cranelift IR. Implements the Bswap instruction
in the x64 and aarch64 codegen backends. Cranelift users can now:
```
builder.ins().bswap(value)
```
to get a native byteswap instruction.

* x64: implements the 32- and 64-bit bswap instruction, following
the pattern set by similar unary instrutions (Neg and Not) - it
only operates on a dst register, but is parameterized with both
a src and dst which are expected to be the same register.

As x64 bswap instruction is only for 32- or 64-bit registers,
the 16-bit swap is implemented as a rotate left by 8.

Updated x64 RexFlags type to support emitting for single-operand
instructions like bswap

* aarch64: Bswap gets emitted as aarch64 rev16, rev32,
or rev64 instruction as appropriate.

* s390x: Bswap was already supported in backend, just had to add
a bit of plumbing

* For completeness, added bswap to the interpreter as well.

* added filetests and runtests for each ISA

* added bswap to fuzzgen, thanks to afonso360 for the code there

* 128-bit swaps are not yet implemented, that can be done later
2022-10-31 19:30:00 +00:00
Afonso Bordado
879b52825f cranelift: Implement ineg.i128 for everyone (#5129)
* cranelift: Add `ineg` runtests

* aarch64: Implement `ineg.i128`

* x64: Implement `ineg.i128`

* riscv: Implement `ineg.i128`

* fuzzgen: Enable `ineg.i128`
2022-10-28 16:10:00 -07:00
Afonso Bordado
e8f3d03bbe cranelift: Mask high bits on bmask for types smaller than a register (#5118)
* aarch64: Fix incorrect masking for small types on bmask

`bmask` was accidentally relying on the uppermost bits of the register
for small types.

This was found by fuzzgen,  when it generated a shift left followed by
a bmask, the shift left shifted the bits out of the range of the input
type (i8), however these are not automatically cleared since they
remained inside the 32 bits of the register.

That caused issues when the bmask tried to compare the whole register
instead of just the bottom bits. The solution here is to mask the upper
bits for small types.

* aarch64: Emit 32bit cmp on bmask

This fixes an issue where bmask was accidentally comparing the
upper bits of the register by always using a 64bit cmp.

* riscv: Mask high bits in bmask

* riscv: Add compile tests for br{z,nz}

* riscv: Use shifts to mask 32bit values

This produces less code than the AND since that version needs to
load an immediate constant from memory.

* cranelift: Update test input to hexadecimal values

This makes it a bit more clear what is being tested.

* riscv: Use addiw for masking 32 bit values

Co-authored-by: Trevor Elliott <telliott@fastly.com>

* aarch64: Update bmask rule priority

Co-authored-by: Trevor Elliott <telliott@fastly.com>
2022-10-27 09:45:39 -07:00
Trevor Elliott
02620441c3 Add uadd_overflow_trap (#5123)
Add a new instruction uadd_overflow_trap, which is a fused version of iadd_ifcout and trapif. Adding this instruction removes a dependency on the iflags type, and would allow us to move closer to removing it entirely.

The instruction is defined for the i32 and i64 types only, and is currently only used in the legalization of heap_addr.
2022-10-27 09:43:15 -07:00
Afonso Bordado
4867813f77 cranelift: Remove copy instruction (#5125) 2022-10-25 17:27:33 -07:00
Trevor Elliott
ec12415b1f cranelift: Remove redundant branch and select instructions (#5097)
As discussed in the 2022/10/19 meeting, this PR removes many of the branch and select instructions that used iflags, in favor if using brz/brnz and select in their place. Additionally, it reworks selectif_spectre_guard to take an i8 input instead of an iflags input.

For reference, the removed instructions are: br_icmp, brif, brff, trueif, trueff, and selectif.
2022-10-24 16:14:35 -07:00
Nick Fitzgerald
442f9fa01b Cranelift: pass iterators to ABIMachineSpec::compute_arg_locs (#5095)
Instead of slices. This gives us more flexibility to pass custom sequences
without needing to allocate a `Vec` to hold them and pass in as a slice.
2022-10-21 16:08:09 -07:00
Chris Fallin
86e77953f8 Fix some egraph-related issues. (#5088)
This fixes #5086 by addressing two separate issues:

- The `ValueDataPacked::set_type()` helper had an embarrassing bitfield-manipulation bug that would mangle the rest of a `ValueDef` when setting its type. This is not normally used, only when the egraph elaboration fills in types after-the-fact on a multi-value node.
- The lowering rules for `isplit` on aarch64 and s390x were dispatching on the first output type, rather than the input type. When only the second output is used (as in the example in #5086), the first output type actually remains `INVALID` (and this is fine because it's never used).
2022-10-21 10:24:48 -07:00
Trevor Elliott
32a7593c94 cranelift: Remove booleans (#5031)
Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements.

Fixes #3205

Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-10-17 16:00:27 -07:00
Chris Fallin
2be12a5167 egraph-based midend: draw the rest of the owl (productionized). (#4953)
* egraph-based midend: draw the rest of the owl.

* Rename `egg` submodule of cranelift-codegen to `egraph`.

* Apply some feedback from @jsharp during code walkthrough.

* Remove recursion from find_best_node by doing a single pass.

Rather than recursively computing the lowest-cost node for a given
eclass and memoizing the answer at each eclass node, we can do a single
forward pass; because every eclass node refers only to earlier nodes,
this is sufficient. The behavior may slightly differ from the earlier
behavior because we cannot short-circuit costs to zero once a node is
elaborated; but in practice this should not matter.

* Make elaboration non-recursive.

Use an explicit stack instead (with `ElabStackEntry` entries,
alongside a result stack).

* Make elaboration traversal of the domtree non-recursive/stack-safe.

* Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph.

* Apply static recursion limit to rule application.

* Fix aarch64 wrt dynamic-vector support -- broken rebase.

* Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to!

* Fix multi-result call testcase.

* Include `cranelift-egraph` in `PUBLISHED_CRATES`.

* Fix atomic_rmw: not really a load.

* Remove now-unnecessary PartialOrd/Ord derivations.

* Address some code-review comments.

* Review feedback.

* Review feedback.

* No overlap in mid-end rules, because we are defining a multi-constructor.

* rustfmt

* Review feedback.

* Review feedback.

* Review feedback.

* Review feedback.

* Remove redundant `mut`.

* Add comment noting what rules can do.

* Review feedback.

* Clarify comment wording.

* Update `has_memory_fence_semantics`.

* Apply @jameysharp's improved loop-level computation.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion commit.

* Fix off-by-one in new loop-nest analysis.

* Review feedback.

* Review feedback.

* Review feedback.

* Use `Default`, not `std::default::Default`, as per @fitzgen

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Apply @fitzgen's comment elaboration to a doc-comment.

Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>

* Add stat for hitting the rewrite-depth limit.

* Some code motion in split prelude to make the diff a little clearer wrt `main`.

* Take @jameysharp's suggested `try_into()` usage for blockparam indices.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Take @jameysharp's suggestion to avoid double-match on load op.

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix suggestion (add import).

* Review feedback.

* Fix stack_load handling.

* Remove redundant can_store case.

* Take @jameysharp's suggested improvement to FuncEGraph::build() logic

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Tweaks to FuncEGraph::build() on top of suggestion.

* Take @jameysharp's suggested clarified condition

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Clean up after suggestion (unused variable).

* Fix loop analysis.

* loop level asserts

* Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now.

* Take @jameysharp's suggestion re: result_tys

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix up after suggestion

* Take @jameysharp's suggestion to use fold rather than reduce

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fixup after suggestion

* Take @jameysharp's suggestion to remove elaborate_eclass_use's return value.

* Clarifying comment in terminator insts.

Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
2022-10-11 18:15:53 -07:00
Jamey Sharp
04b30acad9 Misc cleanups (#5014)
* Replace resize+copy_from_slice with extend_from_slice

Vec::resize initializes the new space, which is wasted effort if we're
just going to call `copy_from_slice` on it immediately afterward. Using
`extend_from_slice` is simpler, and very slightly faster.

If the new size were bigger than the buffer we're copying from, then it
would make sense to initialize the excess. But it isn't: it's always
exactly the same size.

* Move helpers from Context to CompiledCode

These methods only use information from Context::compiled_code, so they
should live on CompiledCode instead.

* Remove an unnecessary #[cfg_attr]

There are other uses of `#[allow(clippy::too_many_arguments)]` in this
file, so apparently it doesn't need to be guarded by the "cargo-clippy"
feature.

* Fix a few comments

Two of these were wrong/misleading:

- `FunctionBuilder::new` does not clear the provided func_ctx. It does
  debug-assert that the context is already clear, but I don't think
  that's worth a comment.

- `switch_to_block` does not "create values for the arguments." That's
  done by the combination of `append_block_params_for_function_params`
  and `declare_wasm_parameters`.

* wasmtime-cranelift: Misc cleanups

The main change is to use the `CompiledCode` reference we already had
instead of getting it out of `Context` repeatedly. This removes a bunch
of `unwrap()` calls.

* wasmtime-cranelift: Factor out uncached compile
2022-10-05 10:35:59 -07:00
Trevor Elliott
a209cb63f5 ISLE: Enable the overlap checker (#5011)
This PR turns the overlap checker on by default, requiring the use of priorities to resolve overlap between rules.
2022-10-04 21:56:49 +00:00
Trevor Elliott
ab4be2bdd1 ISLE: Resolve overlaps in the aarch64 backend (#4988) 2022-09-30 12:57:50 -07:00
bjorn3
af226d37c2 [AArch64] Fix incorrect regalloc constraints for atomic_cas (#4959)
* [AArch64] Fix incorrect regalloc constraints for atomic_cas

* Update test for latest Cranelift changes
2022-09-26 16:05:57 +00:00
Damian Heaton
3a2b32bf4d Port branches to ISLE (AArch64) (#4943)
* Port branches to ISLE (AArch64)

Ported the existing implementations of the following opcodes for AArch64
to ISLE:
- `Brz`
- `Brnz`
- `Brif`
- `Brff`
- `BrIcmp`
- `Jump`
- `BrTable`

Copyright (c) 2022 Arm Limited

* Remove dead code

Copyright (c) 2022 Arm Limited
2022-09-26 09:45:32 +01:00
Damian Heaton
3f8cccfb59 Port flag-based ops to ISLE (AArch64) (#4942)
Ported the existing implementations of the following opcodes for AArch64
to ISLE:
- `Trueif`
- `Trueff`
- `Trapif`
- `Trapff`
- `Select`
- `Selectif`
- `SelectifSpectreGuard`

Copyright (c) 2022 Arm Limited
2022-09-22 15:44:32 -07:00
Chris Fallin
b652ce2fb1 ISLE: add support for multi-extractors and multi-constructors. (#4908)
* ISLE: add support for multi-extractors and multi-constructors.

This support allows for rules that process multiple matching values per
extractor call on the left-hand side, and as a result, can produce
multiple values from the constructor whose body they define.

This is useful in situations where we are matching on an input data
structure that can have multiple "nodes" for a given value or ID, for
example in an e-graph.

* Review feedback: all multi-ctors and multi-etors return iterators; no `Vec` case.

* Add additional warning suppressions to generated-code toplevels to be consistent with new islec output.
2022-09-21 23:36:50 +00:00
Damian Heaton
352c7595c6 Improve fcvt_to_{u,s}int_sat lowering (AArch64) (#4913)
Improved the instruction lowering for the following opcodes on AArch64,
and introduced support for converting to integers less than 32-bits wide
as per the docs:
- `FcvtToSintSat`
- `FcvtToUintSat`

Copyright (c) 2022 Arm Limited
2022-09-21 10:16:09 -07:00
Damian Heaton
e786bda002 Vector bitcast support (AArch64 & Interpreter) (#4820)
* Vector bitcast support (AArch64 & Interpreter)

Implemented support for `bitcast` on vector values for AArch64 and the
interpreter.

Also corrected the verifier to ensure that the size, in bits, of the input and
output types match for a `bitcast`, per the docs.

Copyright (c) 2022 Arm Limited

* `I128` same-type bitcast support

Copyright (c) 2022 Arm Limited

* Directly return input for 64-bit GPR<=>GPR bitcast

Copyright (c) 2022 Arm Limited
2022-09-21 09:20:28 -07:00
Chris Fallin
05cbd667c7 Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892)
* Cranelift: use regalloc2 constraints on caller side of ABI code.

This PR updates the shared ABI code and backends to use register-operand
constraints rather than explicit pinned-vreg moves for register
arguments and return values.

The s390x backend was not updated, because it has its own implementation
of ABI code. Ideally we could converge back to the code shared by x64
and aarch64 (which didn't exist when s390x ported calls to ISLE, so the
current situation is underestandable, to be clear!). I'll leave this for
future work.

This PR exposed several places where regalloc2 needed to be a bit more
flexible with constraints; it requires regalloc2#74 to be merged and
pulled in.

* Update to regalloc2 0.3.3.

In addition to version bump, this required removing two asserts as
`SpillSlot`s no longer carry their class (so we can't assert that they
have the correct class).

* Review comments.

* Filetest updates.

* Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade.

* Update to regalloc2 0.4.0.
2022-09-21 01:17:04 +00:00
Damian Heaton
e9b08b856d Port icmp to ISLE (AArch64) (#4898)
* Port `icmp` to ISLE (AArch64)

Ported the existing implementation of `icmp` (and, by extension, the
`lower_icmp` function) to ISLE for AArch64.

Copyright (c) 2022 Arm Limited

* Allow 'producer chains', eliminating `Nop0`s

Copyright (c) 2022 Arm Limited
2022-09-13 08:56:50 -07:00
Chris Fallin
2986f6b0ff ABI: implement register arguments with constraints. (#4858)
* ABI: implement register arguments with constraints.

Currently, Cranelift's ABI code emits a sequence of moves from physical
registers into vregs at the top of the function body, one for every
register-carried argument.

For a number of reasons, we want to move to operand constraints instead,
and remove the use of explicitly-named "pinned vregs"; this allows for
better regalloc in theory, as it removes the need to "reverse-engineer"
the sequence of moves.

This PR alters the ABI code so that it generates a single "args"
pseudo-instruction as the first instruction in the function body. This
pseudo-inst defs all register arguments, and constrains them to the
appropriate registers at the def-point. Subsequently the regalloc can
move them wherever it needs to.

Some care was taken not to have this pseudo-inst show up in
post-regalloc disassemblies, but the change did cause a general regalloc
"shift" in many tests, so the precise-output updates are a bit noisy.
Sorry about that!

A subsequent PR will handle the other half of the ABI code, namely, the
callsite case, with a similar preg-to-constraint conversion.

* Update based on review feedback.

* Review feedback.
2022-09-08 18:03:14 -07:00
Anton Kirilov
d8b290898c Initial forward-edge CFI implementation (#3693)
* Initial forward-edge CFI implementation

Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.

Copyright (c) 2022, Arm Limited.

* Refactor `from_artifacts` to avoid second `make_executable` (#1)

This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.

* Address the code review feedback

Copyright (c) 2022, Arm Limited.

Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-09-08 09:35:58 -05:00
Afonso Bordado
f57b4412ec cranelift: Implement missing i128 rotates on AArch64 (#4866) 2022-09-07 11:11:47 -07:00
Anton Kirilov
dd07e354b4 Cranelift AArch64: Fix the get_return_address lowering (#4851)
The previous implementation assumed that nothing had clobbered the
LR register since the current function had started executing, so
it would be incorrect for a non-leaf function, for example, that
contains the `get_return_address` operation right after a call.
The operation is valid only if the `preserve_frame_pointers` flag
is enabled, which implies that the presence of a frame record on
the stack is guaranteed.

Copyright (c) 2022, Arm Limited.
2022-09-07 11:09:22 -07:00
Jamey Sharp
3d6d49daba cranelift: Remove of/nof overflow flags from icmp (#4879)
* cranelift: Remove of/nof overflow flags from icmp

Neither Wasmtime nor cg-clif use these flags under any circumstances.
From discussion on #3060 I see it's long been unclear what purpose these
flags served.

Fixes #3060, fixes #4406, and fixes #4875... by deleting all the code
that could have been buggy.

This changes the cranelift-fuzzgen input format by removing some IntCC
options, so I've gone ahead and enabled I128 icmp tests at the same
time. Since only the of/nof cases were failing before, I expect these to
work.

* Restore trapif tests

It's still useful to validate that iadd_ifcout's iflags result can be
forwarded correctly to trapif, and for that purpose it doesn't really
matter what condition code is checked.
2022-09-07 08:38:41 -07:00
Anton Kirilov
48bf078c83 Cranelift AArch64: Fix the atomic memory operations (#4831)
Previously the implementations of the various atomic memory IR operations
ignored the memory operation flags that were passed.

Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 09:35:21 -07:00
Anton Kirilov
d2e19b8d74 Cranelift AArch64: Migrate AMode to ISLE (#4832)
Copyright (c) 2022, Arm Limited.

Co-authored-by: Chris Fallin <chris@cfallin.org>
2022-09-02 00:24:46 +00:00
Afonso Bordado
08e7a7f1a0 cranelift: Add inline stack probing for x64 (#4747)
* cranelift: Add inline stack probe for x64

* cranelift: Cleanups comments

Thanks @jameysharp!
2022-09-01 22:32:54 +00:00
Jamey Sharp
84ac24c23d cranelift: Remove const_addr instruction (fixes #2398) (#4843) 2022-09-01 21:57:37 +00:00
Chris Fallin
ae5fe8a728 aarch64: fix up regalloc2 semantics. (#4830)
This PR removes all uses of modify-operands in the aarch64 backend,
replacing them with reused-input operands instead. This has the nice
effect of removing a bunch of move instructions and more clearly
representing inputs and outputs.

This PR also removes the explicit use of pinned vregs in the aarch64
backend, instead using fixed-register constraints on the operands when
insts or pseudo-inst sequences require certain registers.

This is the second PR in the regalloc-semantics cleanup series; after
the remaining backend (s390x) and the ABI code are cleaned up as well,
we'll be able to simplify the regalloc2 frontend.
2022-09-01 21:25:20 +00:00
Trevor Elliott
dde2c5a3b6 Align functions according to their ISA's requirements (#4826)
Add a function_alignment function to the TargetIsa trait, and use it to align functions when generating objects. Additionally, collect the maximum alignment required for pc-relative constants in functions and pass that value out. Use the max of these two values when padding functions for alignment.

This fixes a bug on x86_64 where rip-relative loads to sse registers could cause a segfault, as functions weren't always guaranteed to be aligned to 16-byte addresses.

Fixes #4812
2022-08-31 14:41:44 -07:00
Nick Fitzgerald
f18a1f1488 Cranelift: Deduplicate ABI signatures during lowering (#4829)
* Cranelift: Deduplicate ABI signatures during lowering

This commit creates the `SigSet` type which interns and deduplicates the ABI
signatures that we create from `ir::Signature`s. The ABI signatures are now
referred to indirectly via a `Sig` (which is a `cranelift_entity` ID), and we
pass around a `SigSet` to anything that needs to access the actual underlying
`SigData` (which is what `ABISig` used to be).

I had to change a couple methods to return a `SmallInstVec` instead of emitting
directly to work around what would otherwise be shared and exclusive borrows of
the lowering context overlapping. I don't expect any of these to heap allocate
in practice.

This does not remove the often-unnecessary allocations caused by
`ensure_struct_return_ptr_is_returned`. That is left for follow up work.

This also opens the door for further shuffling of signature data into more
efficient representations in the future, now that we have `SigSet` to store it
all in one place and it is threaded through all the code. We could potentially
move each signature's parameter and return vectors into one big vector shared
between all signatures, for example, which could cut down on allocations and
shrink the size of `SigData` since those `SmallVec`s have pretty large inline
capacity.

Overall, this refactoring gives a 1-7% speedup for compilation on
`pulldown-cmark`:

```
compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 8754213.66 ± 7526266.23 (confidence = 99%)

  dedupe.so is 1.01x to 1.07x faster than main.so!

  [191003295 234620642.20 280597986] dedupe.so
  [197626699 243374855.86 321816763] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [170406200 194299792.68 253001201] dedupe.so
  [172071888 193230743.11 223608329] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3870997347 4437735062.59 5216007266] dedupe.so
  [4019924063 4424595349.24 4965088931] main.so
```

* Use full path instead of import to avoid warnings in some build configurations

Warnings will then cause CI to fail.

* Move `SigSet` into `VCode`
2022-08-31 20:39:32 +00:00
Chris Fallin
1a59b3e6c6 AArch64: port tls_value to ISLE. (#4821) 2022-08-30 16:51:15 +00:00
Damian Heaton
3d9d759380 Port fcmp to ISLE (AArch64) (#4819)
Ported the existing implementation of `fcmp` for AArch64 to ISLE.

This also ports the `lower_vector_comparison` method to ISLE.

Copyright (c) 2022 Arm Limited
2022-08-30 09:06:15 -07:00
Chris Fallin
955d4e4ba1 AArch64: port load and store operations to ISLE. (#4785)
This retains `lower_amode` in the handwritten code (@akirilov-arm
reports that there is an upcoming patch to port this), but tweaks it
slightly to take a `Value` rather than an `Inst`.
2022-08-29 17:45:55 -07:00
Nick Fitzgerald
5392d7cdd7 cranelift: Merge abi and abi_impl modules (#4805) 2022-08-29 23:20:36 +00:00
Chris Fallin
a6eb24bd4f AArch64: port misc ops to ISLE. (#4796)
* Add some precise-output compile tests for aarch64.

* AArch64: port misc ops to ISLE.

- get_pinned_reg / set_pinned_reg
- bitcast
- stack_addr
- extractlane
- insertlane
- vhigh_bits
- iadd_ifcout
- fcvt_low_from_sint
2022-08-29 12:56:39 -07:00