Commit Graph

551 Commits

Author SHA1 Message Date
Trevor Elliott
80c77da334 x64: Lower bitcast, fabs, and fneg in ISLE (#4729)
* Add tests for bitcast

* Migrate bitcast to ISLE

* Add tests for fabs

* Lower fabs in ISLE

* Add tests for fneg

* Lower fneg in ISLE
2022-08-18 17:59:23 -07:00
Trevor Elliott
8b6019909b x64: Lower widening and narrowing operations in ISLE (#4722)
Lower uwiden_high, uwiden_low, swiden_high, swiden_low, snarrow, and unarrow in ISLE.
2022-08-18 11:53:24 -07:00
Trevor Elliott
0a71df6a37 x64: Refactor vector_all_ones, and remove buggy sse_cmp_op (#4728)
The sse_cmp_op rule had cases that would produce SseOperand values that aren't legal to use with MInst.XmmRmR, and was only used in vector_all_ones when constructing an XmmRmR value. Additionally, vector_all_ones always called sse_cmp_op with the same type, so the other cases were redundant.

The solution in this PR is to remove sse_cmp_op entirely and inline a call to x64_pcmpeqd directly in vector_all_ones, and remove the unused argument from vector_all_ones.
2022-08-17 21:30:52 +00:00
Jamey Sharp
c569e7bea5 Remove unreachable x64 lowerings for iadd_imm (#4726)
All of the `*_imm` instructions are rewritten during legalization to an
explicit `iconst` plus the general form of the operator, so backends
never see them. Therefore these ISLE rules in the x64 backend can never
match anything.
2022-08-16 22:54:48 +00:00
Trevor Elliott
fbfceaec98 x64: Migrate iadd_pairwise to ISLE (#4718)
* Add a test for iadd_pairwise with swiden input

* Implement iadd_pairwise for swiden_{low,high} input

* Add a test case for iadd_pairwise with uwiden input

* Implement iadd_pairwise with uwiden
2022-08-16 12:21:06 -07:00
Trevor Elliott
3c1490dd59 x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE (#4704)
https://github.com/bytecodealliance/wasmtime/pull/4704
2022-08-16 09:03:50 -07:00
Trevor Elliott
498e7156b4 Remove the handling of cmpps in produces_const (#4714)
https://github.com/bytecodealliance/wasmtime/pull/4714
2022-08-15 15:48:01 -07:00
Nick Fitzgerald
e0d4934ef4 Cranelift: Remove the ABICaller trait (#4711)
* Cranelift: Remove the `ABICaller` trait

It has only one implementation: the `ABICallerImpl` struct. We can just use that
directly rather than having extra, unnecessary layers of generics and abstractions.

* Cranelift: Rename `ABICallerImpl` to `Caller`
2022-08-15 20:41:08 +00:00
Trevor Elliott
1d0f6fa4fb Fix a bug in produces_const (#4709)
https://github.com/bytecodealliance/wasmtime/pull/4709
2022-08-15 19:00:33 +00:00
Nick Fitzgerald
f0c60f46a8 Cranelift: Remove ABICallee trait (#4701)
* Cranelift: Remove `ABICallee` trait

It has only one implementation: the `ABICalleeImpl` struct. By using that
directly we can avoid unnecessary layers of generics and abstractions as well as
a couple `Box`es that were previously putting the single implementation into a
`Box<dyn>`.

* Cranelift: Rename `ABICalleeImpl` to `AbiCallee`

* Fix comments as per review

* Rename `AbiCallee` to `Callee`
2022-08-15 18:27:05 +00:00
Afonso Bordado
c6d2a3f94e cranelift: Add ireduce/iconcat/isplit to the clif fuzzer (#4703)
* cranelift: Add ireduce to fuzzer

* cranelift: Add iconcat/isplit to fuzzer
2022-08-15 09:18:08 -07:00
Benjamin Bouvier
8a9b1a9025 Implement an incremental compilation cache for Cranelift (#4551)
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. 

After the suggestion of Chris, `Function` has been split into mostly two parts:

- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.

Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:

- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
  - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
  - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.

The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.

A basic fuzz target has been introduced that tries to do the bare minimum:

- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
  - This last check is less efficient and less likely to happen, so probably should be rethought a bit.

Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.

Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. 

Fixes #4155.
2022-08-12 16:47:43 +00:00
Nick Fitzgerald
532fb22af6 Cranelift: Remove the LowerCtx trait (#4697)
The trait had only one implementation: the `Lower` struct. It is easier to just
use that directly, and not introduce unnecessary layers of generics and
abstractions.

Once upon a time, there was hope that we would have other implementations of the
`LowerCtx` trait, that did things like lower CLIF to SMTLIB for
verification. However, this is not practical these days given the way that the
trait has evolved over time, and our verification efforts are focused on ISLE
now anyways, and we're actually making some progress on that front (much more
than anyone ever did on a second `LowerCtx` trait implementation!)
2022-08-11 16:54:17 -07:00
Afonso Bordado
3ea1813173 x64: Add native lowering for scalar fma (#4539)
Use `vfmadd213{ss,sd}` for these lowerings.
2022-08-11 22:48:16 +00:00
Trevor Elliott
0c2e0494bd x64: Lower fcvt_from_uint in ISLE (#4684)
* Add a test for the existing behavior of fcvt_from_unit

* Migrate the I8, I16, I32 cases of fcvt_from_uint

* Implement the I64 case of fcvt_from_uint

* Add a test for the existing behavior of fcvt_from_uint.f64x2

* Migrate fcvt_from_uint.f64x2 to ISLE

* Lower the last case of `fcvt_from_uint`

* Add a test for `fcvt_from_uint`

* Finish lowering fcmp_from_uint

* Format
2022-08-11 12:28:41 -07:00
Afonso Bordado
e4adc46e6d cranelift: Fix shifts and implement rotates in interpreter (#4519)
* cranelift: Fix shifts and implement rotates in interpreter

* x64: Implement `rotl`/`rotr` for some small type combinations
2022-08-11 12:15:52 -07:00
Afonso Bordado
c5bc368cfe cranelift: Add COFF TLS Support (#4546)
* cranelift: Implement COFF TLS Relocations

* cranelift: Emit SecRel relocations

* cranelift: Handle _tls_index symbol in backend
2022-08-11 09:33:40 -07:00
Trevor Elliott
a25d52046b x64: Migrate fcvt_from_sint and fcvt_low_from_sint to ISLE (#4650)
https://github.com/bytecodealliance/wasmtime/pull/4650
2022-08-10 10:49:02 -07:00
Trevor Elliott
63c2d1e0c3 x64: Remove unnecessary register use when comparing against constants (#4645)
https://github.com/bytecodealliance/wasmtime/pull/4645
2022-08-09 23:53:51 +00:00
bjorn3
a4aa7258de Remove some dead code from the abi code (#4653)
These were originally used by the old backend framework as part of
legalizing function signatures for the respective ABI.
2022-08-09 12:21:55 -07:00
Chris Fallin
de8d44d0e5 Cranelift: MachBuffer: apply branch peephole opts one last time at buffer tail. (#4652)
The `MachBuffer` applies a set of peephole-optimization rules to do
branch threading, leverage fallthrough paths, eliminate empty blocks,
and flip conditional branches where needed to make branches more
efficient starting from naive always-branch-at-end-of-BB code.

This works by applying the rules at every label-bind, which is
equivalent to applying them at the end of every basic block, where
branches are usually inserted.

However, this misses one case: the end of the buffer! Currently we
don't optimize any redundant or foldable branches at the very end of
the machine code.

This usually doesn't matter when the function ends in an epilogue with
`ret` as the last instruction. However, when cold blocks exist, it can
actually matter.

Thanks to @mchesser for pointing out this issue in #4636.
2022-08-09 10:38:48 -07:00
Trevor Elliott
ed7dfd3925 x64: Peephole optimization for x < 0 (#4625)
https://github.com/bytecodealliance/wasmtime/pull/4625

Fixes #4607
2022-08-09 09:45:53 -07:00
Michael Chesser
8aee85ebaa Propagate cold annotations to edge blocks (#4636)
Update the lowering stage to mark edge blocks as cold if either the
predecessor or successor block is cold.
2022-08-09 05:05:57 +00:00
Trevor Elliott
0c2a48f682 x64: Migrate selectif and selectif_spectre_guard to ISLE (#4619)
https://github.com/bytecodealliance/wasmtime/pull/4619
2022-08-05 09:36:11 -07:00
Trevor Elliott
cd847d071d x64: Migrate br_table to ISLE (#4615)
https://github.com/bytecodealliance/wasmtime/pull/4615
2022-08-04 22:12:37 +00:00
Trevor Elliott
dc8362ceec x64: Finish migrating brz and brnz to ISLE (#4614)
https://github.com/bytecodealliance/wasmtime/pull/4614
2022-08-04 12:58:43 -07:00
Trevor Elliott
1fc11bbe51 x64: Migrate brff and I128 branching instructions to ISLE (#4599)
https://github.com/bytecodealliance/wasmtime/pull/4599
2022-08-04 08:58:50 -07:00
Trevor Elliott
301be7438e x64: Begin migrating branch instructions to ISLE (#4587)
https://github.com/bytecodealliance/wasmtime/pull/4587
2022-08-03 20:28:52 +00:00
Ulrich Weigand
b9dd48e34b [s390x, abi_impl] Support struct args using explicit pointers (#4585)
This adds support for StructArgument on s390x.  The ABI for this
platform requires that the address of the buffer holding the copy
of the struct argument is passed from caller to callee as hidden
pointer, using a register or overflow stack slot.

To implement this, I've added an optional "pointer" filed to
ABIArg::StructArg, and code to handle the pointer both in common
abi_impl code and the s390x back-end.

One notable change necessary to make this work involved the
"copy_to_arg_order" mechanism.  Currently, for struct args
we only need to copy the data (and that need to happen before
setting up any other args), while for non-struct args we only
need to set up the appropriate registers or stack slots.
This order is ensured by sorting the arguments appropriately
into a "copy_to_arg_order" list.

However, for struct args with explicit pointers we need to *both*
copy the data (again, before everything else), *and* set up a
register or stack slot.  Since we now need to touch the argument
twice, we cannot solve the ordering problem by a simple sort.
Instead, the abi_impl common code now provided *two* callbacks,
emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects
the back end to first call copy..to_buffer for all args, and
then call copy.._to_arg for all args.  This required updates
to all back ends.

In the s390x back end, in addition to the new ABI code, I'm now
adding code to actually copy the struct data, using the MVC
instruction (for small buffers) or a memcpy libcall (for larger
buffers).  This also requires a bit of new infrastructure:
- MVC is the first memory-to-memory instruction we use, which
  needed a bit of memory argument tweaking
- We also need to set up the infrastructure to emit libcalls.

(This implements the first half of issue #4565.)
2022-08-03 19:00:07 +00:00
Anton Kirilov
a897742593 Initial back-edge CFI implementation (#3606)
Give the user the option to sign and to authenticate function
return addresses with the operations introduced by the Pointer
Authentication extension to the Arm instruction set architecture.

Copyright (c) 2021, Arm Limited.
2022-08-03 11:08:29 -07:00
Afonso Bordado
709716bb8e cranelift: Implement scalar FMA on x86 (#4460)
x86 does not have dedicated instructions for scalar FMA, lower
to a libcall which seems to be what llvm does.
2022-08-03 10:29:10 -07:00
Nick Fitzgerald
55215bbd1e Use a SmallVec for ABIArgSlots (#4586)
These are always length 1 for Wasm benchmarks.

<h3>Sightglass Benchmark Results</h3>

```
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 328624015.86 ± 40274677.93 (confidence = 99%)

  main.so is 0.88x to 0.91x faster than slots-smallvec.so!
  slots-smallvec.so is 1.10x to 1.13x faster than main.so!

  [3070752447 3203778792.55 3446269274] main.so
  [2503544039 2875154776.69 3197966713] slots-smallvec.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 9685705.06 ± 3221286.87 (confidence = 99%)

  main.so is 0.91x to 0.96x faster than slots-smallvec.so!
  slots-smallvec.so is 1.05x to 1.09x faster than main.so!

  [129356493 145594942.79 165038803] main.so
  [118555011 135909237.73 188780619] slots-smallvec.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [79080493 86757564.46 112649639] main.so
  [78083384 85934125.69 94992743] slots-smallvec.so
```
2022-08-02 17:40:36 -07:00
Nick Fitzgerald
ab1cf3df2d Use a SmallVec for ABIArgs (#4584)
Instead of a regular `Vec`.

These vectors are usually very small, for example here is the histogram of sizes
when running Sightglass's `pulldown-cmark` benchmark:

```
;; Number of samples = 10332
;; Min = 0
;; Max = 11
;;
;; Mean = 2.496128532713901
;; Standard deviation = 2.2859559855427243
;; Variance = 5.225594767838607
;;
;; Each ∎ is a count of 62
;;
 0 ..  1 [ 3134 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 1 ..  2 [ 2032 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 2 ..  3 [  159 ]: ∎∎
 3 ..  4 [  838 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
 4 ..  5 [  970 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 5 ..  6 [ 2566 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 6 ..  7 [  303 ]: ∎∎∎∎
 7 ..  8 [  272 ]: ∎∎∎∎
 8 ..  9 [   40 ]:
 9 .. 10 [   18 ]:
```

By using a `SmallVec` with capacity of 6 we avoid the vast majority of heap
allocations and get some nice benchmark wins of up to ~1.11x faster compilation.

<h3>Sightglass Benchmark Results</h3>

```
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 340361395.90 ± 63384608.15 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [3101467423 3425524333.41 4060621653] main.so
  [2820915877 3085162937.51 3375167352] smallvec.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 988446098.59 ± 184075718.89 (confidence = 99%)

  main.so is 0.88x to 0.92x faster than smallvec.so!
  smallvec.so is 1.09x to 1.13x faster than main.so!

  [9006994951 9948091070.66 11792481990] main.so
  [8192243090 8959644972.07 9801848982] smallvec.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 7854567.87 ± 2215491.16 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [80354527 93864666.76 119789198] main.so
  [77554917 86010098.89 94726994] smallvec.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 22810509.85 ± 6434024.63 (confidence = 99%)

  main.so is 0.89x to 0.94x faster than smallvec.so!
  smallvec.so is 1.07x to 1.12x faster than main.so!

  [233358190 272593088.57 347880715] main.so
  [225227821 249782578.72 275097380] smallvec.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 10849521.41 ± 4324757.85 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [133875427 156859544.47 222455440] main.so
  [126073854 146010023.06 181611647] smallvec.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 31508176.97 ± 12559561.91 (confidence = 99%)

  main.so is 0.90x to 0.96x faster than smallvec.so!
  smallvec.so is 1.04x to 1.10x faster than main.so!

  [388788638 455536988.31 646034523] main.so
  [366132033 424028811.34 527419755] smallvec.so
```
2022-08-02 15:53:44 -07:00
Nick Fitzgerald
42bba452a6 Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573)
* Cranelift: Add instructions for getting the current stack/frame pointers and return address

This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535

* x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead

We just special case getting operands from `Amode`s now.

* Fix s390x `get_return_address`; require `preserve_frame_pointers=true`

* Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp

* Handle non-allocatable registers in Amode::with_allocs

* Use "stack" instead of "r15" on s390x

* r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`
2022-08-02 14:37:17 -07:00
Chris Fallin
8dddd6f1f7 Cranelift: Remove ifcmp_sp opcode. (#4578)
This was temporarily added back in #3502 due to a need from Lucet; now
that Lucet is EOL, the opcode is no longer needed and we can remove it.
2022-08-02 13:15:39 -07:00
Chris Fallin
43f1765272 Cranellift: remove Baldrdash support and related features. (#4571)
* Cranellift: remove Baldrdash support and related features.

As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.

In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.

This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the  understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425

* Review feedback.

* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.

The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.

* Review feedback.
2022-08-02 19:37:56 +00:00
Benjamin Bouvier
ff37c9d8a4 [cranelift] Rejigger the compile API (#4540)
* Move `emit_to_memory` to `MachCompileResult`

This small refactoring makes it clearer to me that emitting to memory
doesn't require anything else from the compilation `Context`. While it's
a trivial change, it's a small public API change that shouldn't cause
too much trouble, and doesn't seem RFC-worthy. Happy to hear different
opinions about this, though!

* hide the MachCompileResult behind a method

* Add a `CompileError` wrapper type that references a `Function`

* Rename MachCompileResult to CompiledCode

* Additionally remove the last unsafe API in cranelift-codegen
2022-08-02 12:05:40 -07:00
Trevor Elliott
586ec95c11 ISLE: Allow shadowing in let expressions (#4562)
* Support shadowing in isle

* Re-run the isle build.rs if the examples change

* Print error messages when isle tests fail

* Move run tests

* Refactor `let` uses that don't need to introduce unique names
2022-08-01 21:10:28 +00:00
Trevor Elliott
25782b527e x64: Migrate trapif and trapff to ISLE (#4545)
https://github.com/bytecodealliance/wasmtime/pull/4545
2022-08-01 11:24:11 -07:00
Benjamin Bouvier
8d0224341c cranelift: Introduce a feature to enable trace logs (#4484)
* Don't use `log::trace` directly but a feature-enabled `trace` macro
* Don't emit disassembly based on the log level
2022-08-01 11:19:15 +02:00
Trevor Elliott
29d4edc76b x64: Migrate call and call_indirect to ISLE (#4542)
https://github.com/bytecodealliance/wasmtime/pull/4542
2022-07-28 13:10:03 -07:00
Afonso Bordado
0508932174 cranelift: Align Scalar and SIMD shift semantics (#4520)
* cranelift: Reorganize test suite

Group some SIMD operations by instruction.

* cranelift: Deduplicate some shift tests

Also, new tests with the mod behaviour

* aarch64: Lower shifts with mod behaviour

* x64: Lower shifts with mod behaviour

* wasmtime: Don't mask SIMD shifts
2022-07-27 17:54:00 +00:00
Trevor Elliott
7ac6134894 x64: Shrink Inst from 72 to 48 bytes (#4514)
https://github.com/bytecodealliance/wasmtime/pull/4514
2022-07-27 10:39:22 -07:00
Afonso Bordado
02c3b47db2 x64: Implement SIMD fma (#4474)
* x64: Add VEX Instruction Encoder

This uses a similar builder pattern to the EVEX Encoder.
Does not yet support memory accesses.

* x64: Add FMA Flag

* x64: Implement SIMD `fma`

* x64: Use 4 register Vex Inst

* x64: Reorder VEX pretty print args
2022-07-25 22:01:02 +00:00
Trevor Elliott
9e9e043174 x64: Migrate the return and fallthrough_return lowerings to ISLE (#4518)
https://github.com/bytecodealliance/wasmtime/pull/4518
2022-07-25 21:28:52 +00:00
Trevor Elliott
ee7e4f4c6b x64: Port func_addr and symbol_value to ISLE (#4485)
https://github.com/bytecodealliance/wasmtime/pull/4485
2022-07-25 11:11:16 -07:00
Afonso Bordado
af62037f62 cranelift: Restrict br_table to i32 indices (#4510)
* cranelift: Restrict `br_table` to `i32` indices

In #4498 it was proposed that we should only accept `i32` indices
to `br_table`. The rationale for this is that larger types lead the
users to a false sense of flexibility (since we don't support jump
tables larger than u32's), and narrower types are not well tested
paths that would be safer if we removed them.

* cranelift: Reduce directly from i128 to i32 in Switch
2022-07-22 23:32:40 +00:00
Nick Fitzgerald
b24c561ceb cranelift: Don't log CLIF and assembly at debug level (#4503)
Too verbose. Only log them at trace level.
2022-07-21 15:31:05 -07:00
Trevor Elliott
06407dd337 Add a test to prevent x64 Inst size slipping further (#4489)
* Add a test to prevent x64 Inst size slipping further

* Enable the test for 64-bit pointers only
2022-07-21 00:01:33 +00:00
Trevor Elliott
b519c975cb x64: Port fdemote and fvdemote to ISLE (#4449)
https://github.com/bytecodealliance/wasmtime/pull/4449
2022-07-18 14:26:23 -07:00