Commit Graph

515 Commits

Author SHA1 Message Date
Chris Fallin
b1fb4d7c35 Fix lowering issue in x64 vany_true: sinking and using original value. (#4815)
The x64 lowring of `vany_true` both sinks mergeable loads and uses the
original register. This PR fixes the lowering to force the value into a
register first. Ideally we should solve the issue by catching this in
the ISLE type system, as described in #4745, but this resolves the issue
for now.

Fixes #4807.
2022-08-29 22:22:12 -07:00
Nick Fitzgerald
5392d7cdd7 cranelift: Merge abi and abi_impl modules (#4805) 2022-08-29 23:20:36 +00:00
Trevor Elliott
25d960f9c4 x64: Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE (#4793)
Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE.
2022-08-26 16:33:48 -07:00
Chris Fallin
8e8dfdf5f9 AArch64: Migrate calls and returns to ISLE. (#4788) 2022-08-26 16:26:39 -07:00
Trevor Elliott
ca6d648e37 x64: Ensure that constants are always 16 bytes for XmmMem (#4790)
Ensure that constants generated for the memory case of XmmMem values are always 16 bytes, ensuring that we don't accidantally perform an unaligned load.

Fixes #4761
2022-08-26 20:04:38 +00:00
Trevor Elliott
c1f9736938 x64: Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE (#4787)
Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE.
2022-08-26 09:07:22 -07:00
Trevor Elliott
9386409607 x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780)
Lower extractlane, scalar_to_vector and splat in ISLE.

This PR also makes some changes to the SinkableLoad api
* change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem
* add reg_mem_to_reg_mem_imm and register it as an automatic conversion
2022-08-25 09:38:03 -07:00
Trevor Elliott
b8b6f2781e x64: Lower shuffle and swizzle in ISLE (#4772)
Lower `shuffle` and `swizzle` in ISLE.

This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result.

I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: 94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)
2022-08-24 21:49:51 +00:00
Afonso Bordado
7e3c481f4e cranelift: Avoid lowering VEX instructions with memory encodings (#4768) 2022-08-24 10:35:06 -07:00
Afonso Bordado
d394edcefe x64: Mask shift amounts for small types (#4752)
* x64: Mask shift amounts for small types

* cranelift: Disable i128 shifts in fuzzer again

They are fixed. But we had a bunch of fuzzgen issues come in, and we don't want to accidentaly mark them as fixed

* cranelift: Avoid masking shifts for 32 and 64 bit cases

* cranelift: Add const shift tests and fix them

* cranelift: Remove const `rotl` cases

Now that `put_masked_in_imm8_gpr` works properly we can simplify rotl/rotr
2022-08-24 10:31:38 -07:00
Trevor Elliott
4bdfa76370 x64: Migrate get_pinned_reg, set_pinned_reg, vconst, and raw_bitcast to ISLE (#4763)
https://github.com/bytecodealliance/wasmtime/pull/4763
2022-08-23 16:32:00 -07:00
Trevor Elliott
b5f1ab7780 x64: Lower stack_addr, udiv, sdiv, urem, srem, umulhi, smulhi in ISLE (#4741)
Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE.

For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.
2022-08-23 11:22:49 -07:00
Trevor Elliott
cee4b209f3 x64: Lower fcopysign, ceil, floor, nearest, and trunc in ISLE (#4730)
https://github.com/bytecodealliance/wasmtime/pull/4730
2022-08-22 13:57:36 -07:00
Trevor Elliott
754cf7156a x64: Fix load sinking bugs in new lowerings (#4740)
Fixes #4736

Fix lowerings that were using values as both a Reg and a RegMem, making it look like a load could be sunk while its value in a register was still being used. Also add an assert that checks that loads that are sunk are never used.
2022-08-19 14:21:06 -07:00
Trevor Elliott
80c77da334 x64: Lower bitcast, fabs, and fneg in ISLE (#4729)
* Add tests for bitcast

* Migrate bitcast to ISLE

* Add tests for fabs

* Lower fabs in ISLE

* Add tests for fneg

* Lower fneg in ISLE
2022-08-18 17:59:23 -07:00
Trevor Elliott
8b6019909b x64: Lower widening and narrowing operations in ISLE (#4722)
Lower uwiden_high, uwiden_low, swiden_high, swiden_low, snarrow, and unarrow in ISLE.
2022-08-18 11:53:24 -07:00
Trevor Elliott
0a71df6a37 x64: Refactor vector_all_ones, and remove buggy sse_cmp_op (#4728)
The sse_cmp_op rule had cases that would produce SseOperand values that aren't legal to use with MInst.XmmRmR, and was only used in vector_all_ones when constructing an XmmRmR value. Additionally, vector_all_ones always called sse_cmp_op with the same type, so the other cases were redundant.

The solution in this PR is to remove sse_cmp_op entirely and inline a call to x64_pcmpeqd directly in vector_all_ones, and remove the unused argument from vector_all_ones.
2022-08-17 21:30:52 +00:00
Jamey Sharp
c569e7bea5 Remove unreachable x64 lowerings for iadd_imm (#4726)
All of the `*_imm` instructions are rewritten during legalization to an
explicit `iconst` plus the general form of the operator, so backends
never see them. Therefore these ISLE rules in the x64 backend can never
match anything.
2022-08-16 22:54:48 +00:00
Trevor Elliott
fbfceaec98 x64: Migrate iadd_pairwise to ISLE (#4718)
* Add a test for iadd_pairwise with swiden input

* Implement iadd_pairwise for swiden_{low,high} input

* Add a test case for iadd_pairwise with uwiden input

* Implement iadd_pairwise with uwiden
2022-08-16 12:21:06 -07:00
Trevor Elliott
3c1490dd59 x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE (#4704)
https://github.com/bytecodealliance/wasmtime/pull/4704
2022-08-16 09:03:50 -07:00
Trevor Elliott
498e7156b4 Remove the handling of cmpps in produces_const (#4714)
https://github.com/bytecodealliance/wasmtime/pull/4714
2022-08-15 15:48:01 -07:00
Nick Fitzgerald
e0d4934ef4 Cranelift: Remove the ABICaller trait (#4711)
* Cranelift: Remove the `ABICaller` trait

It has only one implementation: the `ABICallerImpl` struct. We can just use that
directly rather than having extra, unnecessary layers of generics and abstractions.

* Cranelift: Rename `ABICallerImpl` to `Caller`
2022-08-15 20:41:08 +00:00
Trevor Elliott
1d0f6fa4fb Fix a bug in produces_const (#4709)
https://github.com/bytecodealliance/wasmtime/pull/4709
2022-08-15 19:00:33 +00:00
Nick Fitzgerald
f0c60f46a8 Cranelift: Remove ABICallee trait (#4701)
* Cranelift: Remove `ABICallee` trait

It has only one implementation: the `ABICalleeImpl` struct. By using that
directly we can avoid unnecessary layers of generics and abstractions as well as
a couple `Box`es that were previously putting the single implementation into a
`Box<dyn>`.

* Cranelift: Rename `ABICalleeImpl` to `AbiCallee`

* Fix comments as per review

* Rename `AbiCallee` to `Callee`
2022-08-15 18:27:05 +00:00
Afonso Bordado
c6d2a3f94e cranelift: Add ireduce/iconcat/isplit to the clif fuzzer (#4703)
* cranelift: Add ireduce to fuzzer

* cranelift: Add iconcat/isplit to fuzzer
2022-08-15 09:18:08 -07:00
Benjamin Bouvier
8a9b1a9025 Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. 

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
  - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
  - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
  - This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. 

Fixes #4155.
2022-08-12 16:47:43 +00:00
Nick Fitzgerald
532fb22af6 Cranelift: Remove the LowerCtx trait (#4697)
The trait had only one implementation: the `Lower` struct. It is easier to just
use that directly, and not introduce unnecessary layers of generics and
abstractions.

Once upon a time, there was hope that we would have other implementations of the
`LowerCtx` trait, that did things like lower CLIF to SMTLIB for
verification. However, this is not practical these days given the way that the
trait has evolved over time, and our verification efforts are focused on ISLE
now anyways, and we're actually making some progress on that front (much more
than anyone ever did on a second `LowerCtx` trait implementation!)
2022-08-11 16:54:17 -07:00
Afonso Bordado
3ea1813173 x64: Add native lowering for scalar fma (#4539)
Use `vfmadd213{ss,sd}` for these lowerings.
2022-08-11 22:48:16 +00:00
Trevor Elliott
0c2e0494bd x64: Lower fcvt_from_uint in ISLE (#4684)
* Add a test for the existing behavior of fcvt_from_unit

* Migrate the I8, I16, I32 cases of fcvt_from_uint

* Implement the I64 case of fcvt_from_uint

* Add a test for the existing behavior of fcvt_from_uint.f64x2

* Migrate fcvt_from_uint.f64x2 to ISLE

* Lower the last case of `fcvt_from_uint`

* Add a test for `fcvt_from_uint`

* Finish lowering fcmp_from_uint

* Format
2022-08-11 12:28:41 -07:00
Afonso Bordado
e4adc46e6d cranelift: Fix shifts and implement rotates in interpreter (#4519)
* cranelift: Fix shifts and implement rotates in interpreter

* x64: Implement `rotl`/`rotr` for some small type combinations
2022-08-11 12:15:52 -07:00
Afonso Bordado
c5bc368cfe cranelift: Add COFF TLS Support (#4546)
* cranelift: Implement COFF TLS Relocations

* cranelift: Emit SecRel relocations

* cranelift: Handle _tls_index symbol in backend
2022-08-11 09:33:40 -07:00
Trevor Elliott
a25d52046b x64: Migrate fcvt_from_sint and fcvt_low_from_sint to ISLE (#4650)
https://github.com/bytecodealliance/wasmtime/pull/4650
2022-08-10 10:49:02 -07:00
Trevor Elliott
63c2d1e0c3 x64: Remove unnecessary register use when comparing against constants (#4645)
https://github.com/bytecodealliance/wasmtime/pull/4645
2022-08-09 23:53:51 +00:00
bjorn3
a4aa7258de Remove some dead code from the abi code (#4653)
These were originally used by the old backend framework as part of
legalizing function signatures for the respective ABI.
2022-08-09 12:21:55 -07:00
Chris Fallin
de8d44d0e5 Cranelift: MachBuffer: apply branch peephole opts one last time at buffer tail. (#4652)
The `MachBuffer` applies a set of peephole-optimization rules to do
branch threading, leverage fallthrough paths, eliminate empty blocks,
and flip conditional branches where needed to make branches more
efficient starting from naive always-branch-at-end-of-BB code.

This works by applying the rules at every label-bind, which is
equivalent to applying them at the end of every basic block, where
branches are usually inserted.

However, this misses one case: the end of the buffer! Currently we
don't optimize any redundant or foldable branches at the very end of
the machine code.

This usually doesn't matter when the function ends in an epilogue with
`ret` as the last instruction. However, when cold blocks exist, it can
actually matter.

Thanks to @mchesser for pointing out this issue in #4636.
2022-08-09 10:38:48 -07:00
Trevor Elliott
ed7dfd3925 x64: Peephole optimization for x < 0 (#4625)
https://github.com/bytecodealliance/wasmtime/pull/4625

Fixes #4607
2022-08-09 09:45:53 -07:00
Michael Chesser
8aee85ebaa Propagate cold annotations to edge blocks (#4636)
Update the lowering stage to mark edge blocks as cold if either the
predecessor or successor block is cold.
2022-08-09 05:05:57 +00:00
Trevor Elliott
0c2a48f682 x64: Migrate selectif and selectif_spectre_guard to ISLE (#4619)
https://github.com/bytecodealliance/wasmtime/pull/4619
2022-08-05 09:36:11 -07:00
Trevor Elliott
cd847d071d x64: Migrate br_table to ISLE (#4615)
https://github.com/bytecodealliance/wasmtime/pull/4615
2022-08-04 22:12:37 +00:00
Trevor Elliott
dc8362ceec x64: Finish migrating brz and brnz to ISLE (#4614)
https://github.com/bytecodealliance/wasmtime/pull/4614
2022-08-04 12:58:43 -07:00
Trevor Elliott
1fc11bbe51 x64: Migrate brff and I128 branching instructions to ISLE (#4599)
https://github.com/bytecodealliance/wasmtime/pull/4599
2022-08-04 08:58:50 -07:00
Trevor Elliott
301be7438e x64: Begin migrating branch instructions to ISLE (#4587)
https://github.com/bytecodealliance/wasmtime/pull/4587
2022-08-03 20:28:52 +00:00
Ulrich Weigand
b9dd48e34b [s390x, abi_impl] Support struct args using explicit pointers (#4585)
This adds support for StructArgument on s390x.  The ABI for this
platform requires that the address of the buffer holding the copy
of the struct argument is passed from caller to callee as hidden
pointer, using a register or overflow stack slot.

To implement this, I've added an optional "pointer" filed to
ABIArg::StructArg, and code to handle the pointer both in common
abi_impl code and the s390x back-end.

One notable change necessary to make this work involved the
"copy_to_arg_order" mechanism.  Currently, for struct args
we only need to copy the data (and that need to happen before
setting up any other args), while for non-struct args we only
need to set up the appropriate registers or stack slots.
This order is ensured by sorting the arguments appropriately
into a "copy_to_arg_order" list.

However, for struct args with explicit pointers we need to *both*
copy the data (again, before everything else), *and* set up a
register or stack slot.  Since we now need to touch the argument
twice, we cannot solve the ordering problem by a simple sort.
Instead, the abi_impl common code now provided *two* callbacks,
emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects
the back end to first call copy..to_buffer for all args, and
then call copy.._to_arg for all args.  This required updates
to all back ends.

In the s390x back end, in addition to the new ABI code, I'm now
adding code to actually copy the struct data, using the MVC
instruction (for small buffers) or a memcpy libcall (for larger
buffers).  This also requires a bit of new infrastructure:
- MVC is the first memory-to-memory instruction we use, which
  needed a bit of memory argument tweaking
- We also need to set up the infrastructure to emit libcalls.

(This implements the first half of issue #4565.)
2022-08-03 19:00:07 +00:00
Anton Kirilov
a897742593 Initial back-edge CFI implementation (#3606)
Give the user the option to sign and to authenticate function
return addresses with the operations introduced by the Pointer
Authentication extension to the Arm instruction set architecture.

Copyright (c) 2021, Arm Limited.
2022-08-03 11:08:29 -07:00
Afonso Bordado
709716bb8e cranelift: Implement scalar FMA on x86 (#4460)
x86 does not have dedicated instructions for scalar FMA, lower
to a libcall which seems to be what llvm does.
2022-08-03 10:29:10 -07:00
Nick Fitzgerald
55215bbd1e Use a SmallVec for ABIArgSlots (#4586)
These are always length 1 for Wasm benchmarks.

<h3>Sightglass Benchmark Results</h3>

```
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 328624015.86 ± 40274677.93 (confidence = 99%)

  main.so is 0.88x to 0.91x faster than slots-smallvec.so!
  slots-smallvec.so is 1.10x to 1.13x faster than main.so!

  [3070752447 3203778792.55 3446269274] main.so
  [2503544039 2875154776.69 3197966713] slots-smallvec.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 9685705.06 ± 3221286.87 (confidence = 99%)

  main.so is 0.91x to 0.96x faster than slots-smallvec.so!
  slots-smallvec.so is 1.05x to 1.09x faster than main.so!

  [129356493 145594942.79 165038803] main.so
  [118555011 135909237.73 188780619] slots-smallvec.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [79080493 86757564.46 112649639] main.so
  [78083384 85934125.69 94992743] slots-smallvec.so
```
2022-08-02 17:40:36 -07:00
Nick Fitzgerald
ab1cf3df2d Use a SmallVec for ABIArgs (#4584)
Instead of a regular `Vec`.

These vectors are usually very small, for example here is the histogram of sizes
when running Sightglass's `pulldown-cmark` benchmark:

```
;; Number of samples = 10332
;; Min = 0
;; Max = 11
;;
;; Mean = 2.496128532713901
;; Standard deviation = 2.2859559855427243
;; Variance = 5.225594767838607
;;
;; Each ∎ is a count of 62
;;
 0 ..  1 [ 3134 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 1 ..  2 [ 2032 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 2 ..  3 [  159 ]: ∎∎
 3 ..  4 [  838 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
 4 ..  5 [  970 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 5 ..  6 [ 2566 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 6 ..  7 [  303 ]: ∎∎∎∎
 7 ..  8 [  272 ]: ∎∎∎∎
 8 ..  9 [   40 ]:
 9 .. 10 [   18 ]:
```

By using a `SmallVec` with capacity of 6 we avoid the vast majority of heap
allocations and get some nice benchmark wins of up to ~1.11x faster compilation.

<h3>Sightglass Benchmark Results</h3>

```
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 340361395.90 ± 63384608.15 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [3101467423 3425524333.41 4060621653] main.so
  [2820915877 3085162937.51 3375167352] smallvec.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 988446098.59 ± 184075718.89 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [9006994951 9948091070.66 11792481990] main.so
  [8192243090 8959644972.07 9801848982] smallvec.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 7854567.87 ± 2215491.16 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [80354527 93864666.76 119789198] main.so
  [77554917 86010098.89 94726994] smallvec.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 22810509.85 ± 6434024.63 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [233358190 272593088.57 347880715] main.so
  [225227821 249782578.72 275097380] smallvec.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 10849521.41 ± 4324757.85 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [133875427 156859544.47 222455440] main.so
  [126073854 146010023.06 181611647] smallvec.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 31508176.97 ± 12559561.91 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [388788638 455536988.31 646034523] main.so
  [366132033 424028811.34 527419755] smallvec.so
```
2022-08-02 15:53:44 -07:00
Nick Fitzgerald
42bba452a6 Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573)
* Cranelift: Add instructions for getting the current stack/frame pointers and return address

This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535

* x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead

We just special case getting operands from `Amode`s now.

* Fix s390x `get_return_address`; require `preserve_frame_pointers=true`

* Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp

* Handle non-allocatable registers in Amode::with_allocs

* Use "stack" instead of "r15" on s390x

* r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`
2022-08-02 14:37:17 -07:00
Chris Fallin
8dddd6f1f7 Cranelift: Remove ifcmp_sp opcode. (#4578)
This was temporarily added back in #3502 due to a need from Lucet; now
that Lucet is EOL, the opcode is no longer needed and we can remove it.
2022-08-02 13:15:39 -07:00
Chris Fallin
43f1765272 Cranellift: remove Baldrdash support and related features. (#4571)
* Cranellift: remove Baldrdash support and related features.

As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.

In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.

This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the  understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425

* Review feedback.

* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.

The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.

* Review feedback.
2022-08-02 19:37:56 +00:00