Commit Graph

2054 Commits

Author SHA1 Message Date
Chris Fallin
feaa7ca75f Alias analysis: refactor for use by other driver loops. (#5380)
* Alias analysis: refactor for use by other driver loops.

This PR pulls the core of the alias analysis infrastructure into a
`process_inst()` method that operates on a single instruction, and
allows another compiler pass to apply store-to-load forwarding and
redundant-load elimination interleaved with other work. The existing
behavior remains unchanged; the pass's toplevel loop calls this
extracted method.

This refactor is a prerequisite for using the alias analysis as part of
a refactored egraph-based optimization framework.

* Review feedback: remove unneeded mut.
2022-12-06 18:30:02 +00:00
Trevor Elliott
353a681671 Avoid reusing a register during constant loading (#5379)
Avoid reusing a register when loading a constant, allocating a temporary instead.
2022-12-05 18:37:53 -08:00
Trevor Elliott
7d28d586da riscv64: Don't reuse registers when loading constants (#5376)
Rework the constant loading functions in the riscv64 backend to generate fresh temporaries instead of reusing the destination register.
2022-12-05 16:51:52 -08:00
Saúl Cabrera
28cfa57533 cranelift: Small documentation fixes (#5377)
* `translate_operator` doesn't return a boolean.
* `from_base_offset` doesn't panic if offset is smaller than base.
2022-12-06 00:46:58 +00:00
Trevor Elliott
817c2b205c riscv64: Use a temporary when translating shift amount (#5375)
Use a temporary when translating the shift amount, instead of reusing the destination register.
2022-12-05 20:54:14 +00:00
Trevor Elliott
b475b9bd19 Terminate blocks with a single branch in riscv64 (#5374)
Ensure that we're terminating blocks with a single branch instruction, when testing I128 values against zero.
2022-12-05 20:13:28 +00:00
Anton Romanov
29d4d1063f [codegen] Fixed mutability of domtree reference (#5371) 2022-12-05 08:19:32 -08:00
Trevor Elliott
6aea8e0d7e Don't reuse destination registers when lowering splat on aarch64 (#5370) 2022-12-05 08:18:49 -08:00
wasmtime-publish
a28d4d3c89 Bump Wasmtime to 5.0.0 (#5372)
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
2022-12-05 08:38:57 -06:00
Trevor Elliott
2e9b0802ab aarch64: Rework amode compilation to produce SSA code (#5369)
Rework the compilation of amodes in the aarch64 backend to stop reusing registers and instead generate fresh virtual registers for intermediates. This resolves some SSA checker violations with the aarch64 backend, and as a nice side-effect removes some unnecessary movs in the generated code.
2022-12-02 01:23:15 +00:00
Trevor Elliott
d54a27d0ea Allocate temporary intermediates when loading constants on aarch64 (#5366)
As loading constants on aarch64 can take up to 4 instructions, we need to plumb through some additional registers. Rather than pass a fixed list of registers in, pass an allocation function.
2022-12-01 22:29:36 +00:00
Nick Fitzgerald
4510a4a805 Cranelift: mark post-legalization trapping blocks as cold (#5367)
Trapping is a rare event.
2022-12-01 12:46:26 -08:00
Trevor Elliott
37c3c5b1e0 Remove an unnecessary debug trace (#5359) 2022-11-30 20:37:20 -08:00
Trevor Elliott
c16f2956db Allocate a temporary for 64-bit constant loads in the s390x backend (#5357)
Avoid reusing a destination virtual register for 64-bit constants in the s390x backend. This change addresses a case identified by the regalloc2 ssa validator, as the destination register was written to twice when constants were generated via the MachInst::gen_constant function.
2022-11-30 17:01:14 -08:00
Trevor Elliott
d8dbabfe6b Don't reuse registers in the x64 div lowering (#5356)
Introduce a temporary for an intermediate value in the lowering of div in the x64 backend. Additionally, add a src argument to the shift_r smart constructor, which is why the diff got larger than just the div lowering.
2022-11-30 22:44:59 +00:00
Trevor Elliott
87b63174b1 Don't reuse registers in make_i64x2_from_lanes (#5355)
Avoid reusing output registers in make_i64x2_from_lanes by threading the output name instead, and using smart constructors for x64_pinsrd instead of constructing the instructions directly.
2022-11-30 14:37:01 -08:00
Nick Fitzgerald
79f7fa6079 Cranelift: implement heap_{load,store} instruction legalization (#5351)
* Cranelift: implement `heap_{load,store}` instruction legalization

This does not remove `heap_addr` yet, but it does factor out the common
bounds-check-and-compute-the-native-address functionality that is shared between
all of `heap_{addr,load,store}`.

Finally, this adds a missing optimization for when we can dedupe explicit bounds
checks for static memories and Spectre mitigations.

* Cranelift: Enable `heap_load_store_*` run tests on all targets
2022-11-30 19:12:49 +00:00
Alex Crichton
830885383f Implement inline stack probes for AArch64 (#5353)
* Turn off probestack by default in Cranelift

The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-by-default status requires the aarch64 and
s390x implementations to be a stub. Turning off probestack by default
allows the s390x and aarch64 backends to panic with an error message to
avoid providing a false sense of security. When the probestack option is
implemented for all backends, however, it may be reasonable to
re-enable.

* aarch64: Improve codegen for AMode fallback

Currently the final fallback for finalizing an `AMode` will generate
both a constant-loading instruction as well as an `add` instruction to
the base register into the same temporary. This commit improves the
codegen by removing the `add` instruction and folding the final add into
the finalized `AMode`. This changes the `extendop` used but both
registers are 64-bit so shouldn't be affected by the extending
operation.

* aarch64: Implement inline stack probes

This commit implements inline stack probes for the aarch64 backend in
Cranelift. The support here is modeled after the x64 support where
unrolled probes are used up to a particular threshold after which a loop
is generated. The instructions here are similar in spirit to x64 except
that unlike x64 the stack pointer isn't modified during the unrolled
loop to avoid needing to re-adjust it back up at the end of the loop.

* Enable inline probestack for AArch64 and Riscv64

This commit enables inline probestacks for the AArch64 and Riscv64
architectures in the same manner that x86_64 has it enabled now. Some
more testing was additionally added since on Unix platforms we should be
guaranteed that Rust's stack overflow message is now printed too.

* Enable probestack for aarch64 in cranelift-fuzzgen

* Address review comments

* Remove implicit stack overflow traps from x64 backend

This commit removes implicit `StackOverflow` traps inserted by the x64
backend for stack-based operations. This was historically required when
stack overflow was detected with page faults but Wasmtime no longer
requires that since it's not suitable for wasm modules which call host
functions. Additionally no other backend implements this form of
implicit trap-code additions so this is intended to synchronize the
behavior of all the backends.

This fixes a test added prior for aarch64 to properly abort the process
instead of accidentally being caught by Wasmtime.

* Fix a style issue
2022-11-30 12:30:00 -06:00
Timothy Chen
67fc5389b0 Remove sig data arg and ret fields to reduce size (#5319)
* Remove sig data arg and ret fields to reduce size

* Update cranelift/codegen/src/machinst/abi.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Update cranelift/codegen/src/machinst/abi.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix offsets

* Add comment

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2022-11-30 07:19:41 -08:00
Nick Fitzgerald
913a2ec8c8 Cranelift: consider heap's guard pages when legalizing heap_addr (#5335)
* Cranelift: consider heap's guard pages when legalizing `heap_addr`

Fixes #5328

* Update comment to align more directly with implementation

* Add legalization tests for `heap_addr` and offset guard pages
2022-11-29 19:54:25 +00:00
Trevor Elliott
f138fc0ed3 Bump regalloc2 to 0.5.0 (#5345)
* Bump the regalloc2 dependency to 0.5.0
* Replace preg_set_from_machine_env with PRegSet::from
* Vet the regalloc2 update
2022-11-29 11:25:35 -08:00
Afonso Bordado
ec342c20e3 cranelift: Add iadd_cout lowerings for aarch64 (#5177)
* cranelift: Add `iadd_cout`/`isub_bout`  i128 tests

* aarch64: Add `iadd_cout` lowerings

* fuzzgen: Add `iadd_cout`
2022-11-29 10:58:44 -08:00
Jamey Sharp
ff5abfd993 cranelift-isle: Minor error-handling cleanups (#5338)
- Remove remaining references to Miette
- Borrow implementation of `line_starts` from codespan-reporting
- Clean up a use of `Result` that no longer conflicts with a local
  definition
- When printing plain errors, add a blank line between errors for
  readability
2022-11-29 03:07:05 +00:00
Trevor Elliott
a5a0645aff Don't allow reuse_def constraints in the s390x Loop instruction (#5336) 2022-11-28 17:52:11 -08:00
Trevor Elliott
368004428a Fix rule shadowing instances in x64 and aarch64 backends (#5334)
Fix shadowing identified in #5322 for imul and swiden_high/swiden_low/uwiden_high/uwiden_low combinations in the x64 backend, and remove some redundant rules from the aarch64 dynamic neon ruleset. Additionally, add tests to the x64 backend showing that the imul specializations are firing.
2022-11-28 15:48:34 -08:00
Nick Fitzgerald
58a5089e48 Cranelift: log number of CLIF insts/blocks to optimize/lower (#5333) 2022-11-28 19:35:29 +00:00
Nick Fitzgerald
6fe69d00ca Cranelift: add debug logs counting how many vcode instructions and blocks we lower to (#5332) 2022-11-28 18:57:02 +00:00
Trevor Elliott
54a6d2f79a Generate more fixed_nonallocatable constraints, and add debug assertions (#5132)
Add assertions to the OperandCollector that show we're not using pinned vregs, and use reg_fixed_nonallocatable constraints when a real register is used with other constraint generation functions like reg_use etc.
2022-11-28 10:31:56 -08:00
Jamey Sharp
044b57f334 cranelift-isle: Rewrite error reporting (#5318)
There were several issues with ISLE's existing error reporting
implementation.

- When using Miette for more readable error reports, it would panic if
  errors were reported from multiple files in the same run.
- Miette is pretty heavy-weight for what we're doing, with a lot of
  dependencies.
- The `Error::Errors` enum variant led to normalization steps in many
  places, to avoid using that variant to represent a single error.

This commit:
- replaces Miette with codespan-reporting
- gets rid of a bunch of cargo-vet exemptions
- replaces the `Error::Errors` variant with a new `Errors` type
- removes source info from `Error` variants so they're easy to construct
- adds source info only when formatting `Errors`
- formats `Errors` with a custom `Debug` impl
- shares common code between ISLE's callers, islec and cranelift-codegen
- includes a source snippet even with fancy-errors disabled

I tried to make this a series of smaller commits but I couldn't find any
good split points; everything was too entangled with everything else.
2022-11-23 14:20:48 -08:00
Timothy Chen
48ee42efc2 Refactor Sigdata methods with sigset (#5307)
* Refactor sigdata methods

* Update cranelift/codegen/src/machinst/abi.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Address comments

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2022-11-22 09:03:51 -08:00
Nick Fitzgerald
d0d3245a35 Cranelift: Add heap_load and heap_store instructions (#5300)
* Cranelift: Define `heap_load` and `heap_store` instructions

* Cranelift: Implement interpreter support for `heap_load` and `heap_store`

* Cranelift: Add a suite runtests for `heap_{load,store}`

There are so many knobs we can twist for heaps and I wanted to exhaustively test
all of them, so I wrote a script to generate the tests. I've checked in the
script in case we want to make any changes in the future, but I don't think it
is worth adding this to CI to check that scripts are up to date or anything like
that.

* Review feedback
2022-11-21 23:00:39 +00:00
Trevor Elliott
54cfa4df34 cranelift: Fix implicit pointer argument register use (#5301)
* Fix arg handling to write to VRegs instead of physical regs

* Make is_included_in_clobbers required, and handle Args on x64 and riscv64
2022-11-18 16:47:03 -08:00
Jun Ryung Ju
e5f93d9ec0 cranelift: Support bnot, band, bor, bxor for x86_64. (#5036)
* Support `bnot`, `band`, `bor`, `bxor` for x86_64.

* Fix-up to handle `B{8,16,32,64}` type on bitops

* Fix-up conflict.
2022-11-18 07:45:54 -08:00
Nick Fitzgerald
3b6544dc66 Fix warnings in cranelift-codegen docs builds (#5292) 2022-11-17 21:13:24 +00:00
Timothy Chen
de6e4a4e20 Shrink the size of SigData in Cranelift (#5261)
* Shrink the size of SigData in Cranelift

* Update cranelift/codegen/src/machinst/abi.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Change ret arg length to u16

* Add test

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2022-11-17 00:15:19 +00:00
Trevor Elliott
4780bd5902 Don't use %rcx directly with CoffTlsGetAddr (#5278)
Avoid naming %rcx as written by the CoffTlsGetAddr pseudo-instruction in the x64 backend, and instead emit a fixed-def constraint for a fresh VReg and %rcx.
2022-11-16 11:32:09 -08:00
Trevor Elliott
07bd8bf34a Remove unnecessary moves in x64 gen_memcpy (#5277)
Remove some unnecessary moves in the x64 gen_memcpy implementation -- the call instruction that's generated will already constrain the args to those registers.
2022-11-16 10:33:00 -08:00
Afonso Bordado
a793648eb2 cranelift: Fix fdemote on the interpreter (#5158)
* cranelift: Cleanup `fdemote`/`fpromote` tests

* cranelift: Fix `fdemote`/`fpromote` instruction docs

The verifier fails if the input and output types are the same
for these instructions

* cranelift: Fix `fdemote`/`fpromote` in the interpreter

* fuzzgen: Add `fdemote`/`fpromote`
2022-11-15 22:22:00 +00:00
Trevor Elliott
a007e02bd2 Add fixed_nonallocatable constraints when appropriate (#5253)
Plumb the set of allocatable registers through the OperandCollector and use it validate uses of fixed-nonallocatable registers, like %rsp on x86_64.
2022-11-15 12:49:17 -08:00
Nick Fitzgerald
f6ae67f3f0 Cranelift(aarch64): Use an existing extractor instead of a new pure constructor (#5273) 2022-11-15 20:40:44 +00:00
Nick Fitzgerald
d335dc8d5a Cranelift: Do not optimize heap bounds checking comparison in legalization (#5272)
That optimization is only for 12-bit immediates in Aarch64, which is now handled
in backend lowering, so we can simplify this code a bit now.
2022-11-15 19:54:52 +00:00
Nick Fitzgerald
9967782726 Cranelift(Aarch64): Optimize lowering of icmps with immediates (#5252)
We can encode more constants into 12-bit immediates if we do the following
rewrite for comparisons with odd constants:

        A >= B + 1
    ==> A - 1 >= B
    ==> A > B
2022-11-15 09:18:55 -08:00
Nick Fitzgerald
c2a7ea7e24 Cranelift: de-duplicate bounds checks in legalizations (#5190)
* Cranelift: Add the `DataFlowGraph::display_value_inst` convenience method

* Cranelift: Add some `trace!` logs to some parts of legalization

* Cranelift: de-duplicate bounds checks in legalizations

When both (1) "dynamic" memories that need explicit bounds checks and (2)
spectre mitigations that perform bounds checks are enabled, reuse the same
bounds checks between the two legalizations.

This reduces the overhead of explicit bounds checks and spectre mitigations over
using virtual memory guard pages with spectre mitigations from ~1.9-2.1x
overhead to ~1.6-1.8x overhead. That is about a 14-19% speed up for when dynamic
memories and spectre mitigations are enabled.

<details>

```
execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 3422455129.47 ± 120159.49 (confidence = 99%)

  virtual-memory-guards.so is 2.09x to 2.09x faster than bounds-checks.so!

  [6563931659 6564063496.07 6564301535] bounds-checks.so
  [3141492675 3141608366.60 3141895249] virtual-memory-guards.so

execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 338716136.87 ± 1.38 (confidence = 99%)

  virtual-memory-guards.so is 2.08x to 2.08x faster than bounds-checks.so!

  [651961494 651961495.47 651961497] bounds-checks.so
  [313245357 313245358.60 313245362] virtual-memory-guards.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 22742944.07 ± 331.73 (confidence = 99%)

  virtual-memory-guards.so is 1.87x to 1.87x faster than bounds-checks.so!

  [48841295 48841567.33 48842139] bounds-checks.so
  [26098439 26098623.27 26099479] virtual-memory-guards.so
```

</details>

<details>

```
execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 2465900207.27 ± 146476.61 (confidence = 99%)

  virtual-memory-guards.so is 1.78x to 1.78x faster than de-duped-bounds-checks.so!

  [5607275431 5607442989.13 5607838342] de-duped-bounds-checks.so
  [3141445345 3141542781.87 3141711213] virtual-memory-guards.so

execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 234253620.20 ± 2.33 (confidence = 99%)

  virtual-memory-guards.so is 1.75x to 1.75x faster than de-duped-bounds-checks.so!

  [547498977 547498980.93 547498985] de-duped-bounds-checks.so
  [313245357 313245360.73 313245363] virtual-memory-guards.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 16605659.13 ± 315.78 (confidence = 99%)

  virtual-memory-guards.so is 1.64x to 1.64x faster than de-duped-bounds-checks.so!

  [42703971 42704284.40 42704787] de-duped-bounds-checks.so
  [26098432 26098625.27 26099234] virtual-memory-guards.so
```

</details>

<details>

```
execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 104462517.13 ± 7.32 (confidence = 99%)

  de-duped-bounds-checks.so is 1.19x to 1.19x faster than bounds-checks.so!

  [651961493 651961500.80 651961532] bounds-checks.so
  [547498981 547498983.67 547498989] de-duped-bounds-checks.so

execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 956556982.80 ± 103034.59 (confidence = 99%)

  de-duped-bounds-checks.so is 1.17x to 1.17x faster than bounds-checks.so!

  [6563930590 6564019842.40 6564243651] bounds-checks.so
  [5607307146 5607462859.60 5607677763] de-duped-bounds-checks.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 6137307.87 ± 247.75 (confidence = 99%)

  de-duped-bounds-checks.so is 1.14x to 1.14x faster than bounds-checks.so!

  [48841303 48841472.93 48842000] bounds-checks.so
  [42703965 42704165.07 42704718] de-duped-bounds-checks.so
```

</details>

* Update test expectations

* Add a test for deduplicating bounds checks between dynamic memories and spectre mitigations

* Define a struct for the Spectre comparison instead of using a tuple

* More trace logging for heap legalization
2022-11-15 08:47:22 -08:00
Trevor Elliott
dece901d16 Use regalloc constraints for sse blend operations (#5251)
Instead of using xmm0 explicitly for the mask argument to instructions like blendvpd, use regalloc constraints to constrain it to xmm0 instead.
2022-11-14 16:44:34 -08:00
Denys Zadorozhnyi
d3692c2f2b fix typo in caller_conv arg name in ABIMachineSpec::gen_call; (#5259) 2022-11-14 09:02:07 -08:00
Trevor Elliott
0367fbc2d4 cranelift: Rework pinned register lowering (#5249)
Rework pinned register lowering to avoid the use of pinned virtual registers, instead using the MovFromPReg and MovToPReg pseudo instructions.
2022-11-10 16:19:25 -08:00
Nick Fitzgerald
fc62d4ad65 Cranelift: Make heap_addr return calculated base + index + offset (#5231)
* Cranelift: Make `heap_addr` return calculated `base + index + offset`

Rather than return just the `base + index`.

(Note: I've chosen to use the nomenclature "index" for the dynamic operand and
"offset" for the static immediate.)

This move the addition of the `offset` into `heap_addr`, instead of leaving it
for the subsequent memory operation, so that we can Spectre-guard the full
address, and not allow speculative execution to read the first 4GiB of memory.

Before this commit, we were effectively doing

    load(spectre_guard(base + index) + offset)

Now we are effectively doing

    load(spectre_guard(base + index + offset))

Finally, this also corrects `heap_addr`'s documented semantics to say that it
returns an address that will trap on access if `index + offset + access_size` is
out of bounds for the given heap, rather than saying that the `heap_addr` itself
will trap. This matches the implemented behavior for static memories, and after
https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked
on this commit) will also match the implemented behavior for dynamic memories.

* Update heap_addr docs

* Factor out `offset + size` to a helper
2022-11-09 19:53:51 +00:00
Trevor Elliott
b077854b57 Generate SSA code from returns (#5172)
Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.
2022-11-08 16:00:49 -08:00
Nick Fitzgerald
fd7b903f33 Cranelift: Use a custom enum instead of boolean for the ISLE target (#5228)
Easier to read and doesn't require `/* is_lower = */`-style comments at call
sites.
2022-11-08 21:44:02 +00:00
Trevor Elliott
70bca801ab cranelift: Resize with types::INVALID isntead of types::I8 (#5227) 2022-11-08 20:42:20 +00:00