wasmtime

Author	SHA1	Message	Date
Afonso Bordado	e4adc46e6d	cranelift: Fix shifts and implement rotates in interpreter (#4519 ) * cranelift: Fix shifts and implement rotates in interpreter * x64: Implement `rotl`/`rotr` for some small type combinations	2022-08-11 12:15:52 -07:00
Afonso Bordado	c5bc368cfe	cranelift: Add COFF TLS Support (#4546 ) * cranelift: Implement COFF TLS Relocations * cranelift: Emit SecRel relocations * cranelift: Handle _tls_index symbol in backend	2022-08-11 09:33:40 -07:00
Trevor Elliott	a25d52046b	x64: Migrate fcvt_from_sint and fcvt_low_from_sint to ISLE (#4650 ) https://github.com/bytecodealliance/wasmtime/pull/4650	2022-08-10 10:49:02 -07:00
Trevor Elliott	63c2d1e0c3	x64: Remove unnecessary register use when comparing against constants (#4645 ) https://github.com/bytecodealliance/wasmtime/pull/4645	2022-08-09 23:53:51 +00:00
bjorn3	a4aa7258de	Remove some dead code from the abi code (#4653 ) These were originally used by the old backend framework as part of legalizing function signatures for the respective ABI.	2022-08-09 12:21:55 -07:00
Chris Fallin	de8d44d0e5	Cranelift: MachBuffer: apply branch peephole opts one last time at buffer tail. (#4652 ) The `MachBuffer` applies a set of peephole-optimization rules to do branch threading, leverage fallthrough paths, eliminate empty blocks, and flip conditional branches where needed to make branches more efficient starting from naive always-branch-at-end-of-BB code. This works by applying the rules at every label-bind, which is equivalent to applying them at the end of every basic block, where branches are usually inserted. However, this misses one case: the end of the buffer! Currently we don't optimize any redundant or foldable branches at the very end of the machine code. This usually doesn't matter when the function ends in an epilogue with `ret` as the last instruction. However, when cold blocks exist, it can actually matter. Thanks to @mchesser for pointing out this issue in #4636.	2022-08-09 10:38:48 -07:00
Trevor Elliott	ed7dfd3925	x64: Peephole optimization for `x < 0` (#4625 ) https://github.com/bytecodealliance/wasmtime/pull/4625 Fixes #4607	2022-08-09 09:45:53 -07:00
Michael Chesser	8aee85ebaa	Propagate cold annotations to edge blocks (#4636 ) Update the lowering stage to mark edge blocks as cold if either the predecessor or successor block is cold.	2022-08-09 05:05:57 +00:00
Trevor Elliott	0c2a48f682	x64: Migrate selectif and selectif_spectre_guard to ISLE (#4619 ) https://github.com/bytecodealliance/wasmtime/pull/4619	2022-08-05 09:36:11 -07:00
Trevor Elliott	cd847d071d	x64: Migrate br_table to ISLE (#4615 ) https://github.com/bytecodealliance/wasmtime/pull/4615	2022-08-04 22:12:37 +00:00
Trevor Elliott	dc8362ceec	x64: Finish migrating brz and brnz to ISLE (#4614 ) https://github.com/bytecodealliance/wasmtime/pull/4614	2022-08-04 12:58:43 -07:00
Trevor Elliott	1fc11bbe51	x64: Migrate brff and I128 branching instructions to ISLE (#4599 ) https://github.com/bytecodealliance/wasmtime/pull/4599	2022-08-04 08:58:50 -07:00
Trevor Elliott	301be7438e	x64: Begin migrating branch instructions to ISLE (#4587 ) https://github.com/bytecodealliance/wasmtime/pull/4587	2022-08-03 20:28:52 +00:00
Ulrich Weigand	b9dd48e34b	[s390x, abi_impl] Support struct args using explicit pointers (#4585 ) This adds support for StructArgument on s390x. The ABI for this platform requires that the address of the buffer holding the copy of the struct argument is passed from caller to callee as hidden pointer, using a register or overflow stack slot. To implement this, I've added an optional "pointer" filed to ABIArg::StructArg, and code to handle the pointer both in common abi_impl code and the s390x back-end. One notable change necessary to make this work involved the "copy_to_arg_order" mechanism. Currently, for struct args we only need to copy the data (and that need to happen before setting up any other args), while for non-struct args we only need to set up the appropriate registers or stack slots. This order is ensured by sorting the arguments appropriately into a "copy_to_arg_order" list. However, for struct args with explicit pointers we need to both copy the data (again, before everything else), and set up a register or stack slot. Since we now need to touch the argument twice, we cannot solve the ordering problem by a simple sort. Instead, the abi_impl common code now provided two callbacks, emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects the back end to first call copy..to_buffer for all args, and then call copy.._to_arg for all args. This required updates to all back ends. In the s390x back end, in addition to the new ABI code, I'm now adding code to actually copy the struct data, using the MVC instruction (for small buffers) or a memcpy libcall (for larger buffers). This also requires a bit of new infrastructure: - MVC is the first memory-to-memory instruction we use, which needed a bit of memory argument tweaking - We also need to set up the infrastructure to emit libcalls. (This implements the first half of issue #4565.)	2022-08-03 19:00:07 +00:00
Anton Kirilov	a897742593	Initial back-edge CFI implementation (#3606 ) Give the user the option to sign and to authenticate function return addresses with the operations introduced by the Pointer Authentication extension to the Arm instruction set architecture. Copyright (c) 2021, Arm Limited.	2022-08-03 11:08:29 -07:00
Afonso Bordado	709716bb8e	cranelift: Implement scalar FMA on x86 (#4460 ) x86 does not have dedicated instructions for scalar FMA, lower to a libcall which seems to be what llvm does.	2022-08-03 10:29:10 -07:00
Nick Fitzgerald	55215bbd1e	Use a `SmallVec` for `ABIArgSlot`s (#4586 ) These are always length 1 for Wasm benchmarks. <h3>Sightglass Benchmark Results</h3> ``` compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 328624015.86 ± 40274677.93 (confidence = 99%) main.so is 0.88x to 0.91x faster than slots-smallvec.so! slots-smallvec.so is 1.10x to 1.13x faster than main.so! [3070752447 3203778792.55 3446269274] main.so [2503544039 2875154776.69 3197966713] slots-smallvec.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 9685705.06 ± 3221286.87 (confidence = 99%) main.so is 0.91x to 0.96x faster than slots-smallvec.so! slots-smallvec.so is 1.05x to 1.09x faster than main.so! [129356493 145594942.79 165038803] main.so [118555011 135909237.73 188780619] slots-smallvec.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [79080493 86757564.46 112649639] main.so [78083384 85934125.69 94992743] slots-smallvec.so ```	2022-08-02 17:40:36 -07:00
Nick Fitzgerald	ab1cf3df2d	Use a `SmallVec` for `ABIArg`s (#4584 ) Instead of a regular `Vec`. These vectors are usually very small, for example here is the histogram of sizes when running Sightglass's `pulldown-cmark` benchmark: ``` ;; Number of samples = 10332 ;; Min = 0 ;; Max = 11 ;; ;; Mean = 2.496128532713901 ;; Standard deviation = 2.2859559855427243 ;; Variance = 5.225594767838607 ;; ;; Each ∎ is a count of 62 ;; 0 .. 1 [ 3134 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 1 .. 2 [ 2032 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 2 .. 3 [ 159 ]: ∎∎ 3 .. 4 [ 838 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎ 4 .. 5 [ 970 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 5 .. 6 [ 2566 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 6 .. 7 [ 303 ]: ∎∎∎∎ 7 .. 8 [ 272 ]: ∎∎∎∎ 8 .. 9 [ 40 ]: 9 .. 10 [ 18 ]: ``` By using a `SmallVec` with capacity of 6 we avoid the vast majority of heap allocations and get some nice benchmark wins of up to ~1.11x faster compilation. <h3>Sightglass Benchmark Results</h3> ``` compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 340361395.90 ± 63384608.15 (confidence = 99%) main.so is 0.88x to 0.92x faster than smallvec.so! smallvec.so is 1.09x to 1.13x faster than main.so! [3101467423 3425524333.41 4060621653] main.so [2820915877 3085162937.51 3375167352] smallvec.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 988446098.59 ± 184075718.89 (confidence = 99%) main.so is 0.88x to 0.92x faster than smallvec.so! smallvec.so is 1.09x to 1.13x faster than main.so! [9006994951 9948091070.66 11792481990] main.so [8192243090 8959644972.07 9801848982] smallvec.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 7854567.87 ± 2215491.16 (confidence = 99%) main.so is 0.89x to 0.94x faster than smallvec.so! smallvec.so is 1.07x to 1.12x faster than main.so! [80354527 93864666.76 119789198] main.so [77554917 86010098.89 94726994] smallvec.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 22810509.85 ± 6434024.63 (confidence = 99%) main.so is 0.89x to 0.94x faster than smallvec.so! smallvec.so is 1.07x to 1.12x faster than main.so! [233358190 272593088.57 347880715] main.so [225227821 249782578.72 275097380] smallvec.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 10849521.41 ± 4324757.85 (confidence = 99%) main.so is 0.90x to 0.96x faster than smallvec.so! smallvec.so is 1.04x to 1.10x faster than main.so! [133875427 156859544.47 222455440] main.so [126073854 146010023.06 181611647] smallvec.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 31508176.97 ± 12559561.91 (confidence = 99%) main.so is 0.90x to 0.96x faster than smallvec.so! smallvec.so is 1.04x to 1.10x faster than main.so! [388788638 455536988.31 646034523] main.so [366132033 424028811.34 527419755] smallvec.so ```	2022-08-02 15:53:44 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Chris Fallin	8dddd6f1f7	Cranelift: Remove `ifcmp_sp` opcode. (#4578 ) This was temporarily added back in #3502 due to a need from Lucet; now that Lucet is EOL, the opcode is no longer needed and we can remove it.	2022-08-02 13:15:39 -07:00
Chris Fallin	43f1765272	Cranellift: remove Baldrdash support and related features. (#4571 ) * Cranellift: remove Baldrdash support and related features. As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team has recently determined that their current form of integration with Cranelift is too hard to maintain, and they have chosen to remove it from their codebase. If and when they decide to build updated support for Cranelift, they will adopt different approaches to several details of the integration. In the meantime, after discussion with the SpiderMonkey folks, they agree that it makes sense to remove the bits of Cranelift that exist to support the integration ("Baldrdash"), as they will not need them. Many of these bits are difficult-to-maintain special cases that are not actually tested in Cranelift proper: for example, the Baldrdash integration required Cranelift to emit function bodies without prologues/epilogues, and instead communicate very precise information about the expected frame size and layout, then stitched together something post-facto. This was brittle and caused a lot of incidental complexity ("fallthrough returns", the resulting special logic in block-ordering); this is just one example. As another example, one particular Baldrdash ABI variant processed stack args in reverse order, so our ABI code had to support both traversal orders. We had a number of other Baldrdash-specific settings as well that did various special things. This PR removes Baldrdash ABI support, the `fallthrough_return` instruction, and pulls some threads to remove now-unused bits as a result of those two, with the understanding that the SpiderMonkey folks will build new functionality as needed in the future and we can perhaps find cleaner abstractions to make it all work. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425 * Review feedback. * Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations. The debugger tests invoke `wasmtime` from within each test case under the control of a debugger (gdb or lldb). Some of these tests started to inexplicably fail in CI with unrelated changes, and the failures were only inconsistently reproducible locally. It seems to be cache related: if we disable cached compilation on the nested `wasmtime` invocations, the tests consistently pass. * Review feedback.	2022-08-02 19:37:56 +00:00
Benjamin Bouvier	ff37c9d8a4	[cranelift] Rejigger the `compile` API (#4540 ) * Move `emit_to_memory` to `MachCompileResult` This small refactoring makes it clearer to me that emitting to memory doesn't require anything else from the compilation `Context`. While it's a trivial change, it's a small public API change that shouldn't cause too much trouble, and doesn't seem RFC-worthy. Happy to hear different opinions about this, though! * hide the MachCompileResult behind a method * Add a `CompileError` wrapper type that references a `Function` * Rename MachCompileResult to CompiledCode * Additionally remove the last unsafe API in cranelift-codegen	2022-08-02 12:05:40 -07:00
Trevor Elliott	586ec95c11	ISLE: Allow shadowing in let expressions (#4562 ) * Support shadowing in isle * Re-run the isle build.rs if the examples change * Print error messages when isle tests fail * Move run tests * Refactor `let` uses that don't need to introduce unique names	2022-08-01 21:10:28 +00:00
Trevor Elliott	25782b527e	x64: Migrate trapif and trapff to ISLE (#4545 ) https://github.com/bytecodealliance/wasmtime/pull/4545	2022-08-01 11:24:11 -07:00
Benjamin Bouvier	8d0224341c	cranelift: Introduce a feature to enable `trace` logs (#4484 ) * Don't use `log::trace` directly but a feature-enabled `trace` macro * Don't emit disassembly based on the log level	2022-08-01 11:19:15 +02:00
Trevor Elliott	29d4edc76b	x64: Migrate call and call_indirect to ISLE (#4542 ) https://github.com/bytecodealliance/wasmtime/pull/4542	2022-07-28 13:10:03 -07:00
Afonso Bordado	0508932174	cranelift: Align Scalar and SIMD shift semantics (#4520 ) * cranelift: Reorganize test suite Group some SIMD operations by instruction. * cranelift: Deduplicate some shift tests Also, new tests with the mod behaviour * aarch64: Lower shifts with mod behaviour * x64: Lower shifts with mod behaviour * wasmtime: Don't mask SIMD shifts	2022-07-27 17:54:00 +00:00
Trevor Elliott	7ac6134894	x64: Shrink Inst from 72 to 48 bytes (#4514 ) https://github.com/bytecodealliance/wasmtime/pull/4514	2022-07-27 10:39:22 -07:00
Afonso Bordado	02c3b47db2	x64: Implement SIMD `fma` (#4474 ) * x64: Add VEX Instruction Encoder This uses a similar builder pattern to the EVEX Encoder. Does not yet support memory accesses. * x64: Add FMA Flag * x64: Implement SIMD `fma` * x64: Use 4 register Vex Inst * x64: Reorder VEX pretty print args	2022-07-25 22:01:02 +00:00
Trevor Elliott	9e9e043174	x64: Migrate the return and fallthrough_return lowerings to ISLE (#4518 ) https://github.com/bytecodealliance/wasmtime/pull/4518	2022-07-25 21:28:52 +00:00
Trevor Elliott	ee7e4f4c6b	x64: Port func_addr and symbol_value to ISLE (#4485 ) https://github.com/bytecodealliance/wasmtime/pull/4485	2022-07-25 11:11:16 -07:00
Afonso Bordado	af62037f62	cranelift: Restrict `br_table` to `i32` indices (#4510 ) * cranelift: Restrict `br_table` to `i32` indices In #4498 it was proposed that we should only accept `i32` indices to `br_table`. The rationale for this is that larger types lead the users to a false sense of flexibility (since we don't support jump tables larger than u32's), and narrower types are not well tested paths that would be safer if we removed them. * cranelift: Reduce directly from i128 to i32 in Switch	2022-07-22 23:32:40 +00:00
Nick Fitzgerald	b24c561ceb	cranelift: Don't log CLIF and assembly at debug level (#4503 ) Too verbose. Only log them at trace level.	2022-07-21 15:31:05 -07:00
Trevor Elliott	06407dd337	Add a test to prevent x64 Inst size slipping further (#4489 ) * Add a test to prevent x64 Inst size slipping further * Enable the test for 64-bit pointers only	2022-07-21 00:01:33 +00:00
Trevor Elliott	b519c975cb	x64: Port fdemote and fvdemote to ISLE (#4449 ) https://github.com/bytecodealliance/wasmtime/pull/4449	2022-07-18 14:26:23 -07:00
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Andrew Brown	8629cbc6a4	x64: port `atomic_rmw` to ISLE (#4389 ) * x64: port `atomic_rmw` to ISLE This change ports `atomic_rmw` to ISLE for the x64 backend. It does not change the lowering in any way, though it seems possible that the fixed regs need not be as fixed and that there are opportunities for single instruction lowerings. It does rename `inst_common::AtomicRmwOp` to `MachAtomicRmwOp` to disambiguate with the IR enum with the same name. * x64: remove remaining hardcoded register constraints for `atomic_rmw` * x64: use `SyntheticAmode` in `AtomicRmwSeq` * review: add missing reg collector for amode * review: collect memory registers in the 'late' phase	2022-07-06 23:58:59 +00:00
Sam Parker	a2d49ebf27	Use u32 in Type API (#4280 ) Move from passing and returning u8 and u16 values to u32 in many of the functions. This removes a number of type conversions and gives a small compilation time speedup, around ~0.7% on my aarch64 machine. Copyright (c) 2022, Arm Limited.	2022-06-30 12:43:36 -07:00
bjorn3	d1446f767d	Mark return value as define instead of clobber for TLS pseudoinstructions (#4357 )	2022-06-30 10:44:51 -07:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Chris Fallin	5c2c285dd7	Cranelift/x64: fix register allocator metadata for 8-bit divides. (#4332 ) `idiv` on x86-64 only reads `rdx`/`edx`/`dx`/`dl` for divides with width greater than 8 bits; for an 8-bit divide, it reads the whole 16-bit divisor from `ax`, as our CISC ancestors intended. This PR fixes the metadata to avoid a regalloc panic (due to undefined `rdx`) in this case. Does not affect Wasmtime or other Wasm-frontend embedders.	2022-06-27 12:31:06 -07:00
Alex Crichton	dc2fe0ac67	x64: Fix codegen for the `i8x16.swizzle` instruction (#4318 ) This commit fixes a mistake in the `Swizzle` opcode implementation in the x64 backend of Cranelift. Previously an input register was casted to a writable register and then modified, which I believe instructions are not supposed to do. This was discovered as part of my investigation into #4315.	2022-06-27 13:20:31 -05:00
Alex Crichton	8bb07523e2	x64: Fix codegen for the `select` instruction with v128 (#4317 ) This commit fixes a bug in the previous codegen for the `select` instruction when the operations of the `select` were of the `v128` type. Previously teh `XmmCmove` instruction only stored an `OperandSize` of 32 or 64 for a 64 or 32-bit move, but this was also used for these 128-bit types which meant that when used the wrong move instruction was generated. The fix applied here is to store the whole `Type` being moved so the 128-bit variant can be selected as well.	2022-06-27 11:02:40 -07:00
Trevor Elliott	7e0bb465d0	X64: port the rest of icmp to ISLE (#4254 ) Finish migrating icmp to ISLE for x64	2022-06-13 16:34:11 -07:00
Chris Fallin	ed9db962de	x64 backend: fix cmpxchg (don't return RealReg as result). (#4243 ) The current lowering helper for `cmpxchg` returns the literal RealReg `rax` as its result. However, this breaks a number of invariants, and eventually causes a regalloc panic if used as a blockparam arg (pinned vregs cannot be used in this way). In general we have to return regular vregs, not a RealReg, as results of instructions during lowering. However #4223 added a helper for `x64_cmpxchg` that returns a literal `rax`. Fortunately we can do the right thing here by just giving a fresh vreg to the instruction; the regalloc constraints mean that this vreg is constrained to `rax` at the instruction (at its def/late point), so the generator of the instruction need not worry about `rax` here.	2022-06-08 06:13:31 -07:00
Trevor Elliott	bc3c4fa206	X64: port fvpromote to ISLE (#4242 )	2022-06-07 17:18:23 -07:00
Chris Fallin	54acd8b3e2	x64 backend: fix `to_amode` with constant address (no registers). (#4239 ) If an address expression is given to `to_amode` that is completely constant (no registers at all), then it will produce an `Amode` that has the resulting constant as an offset, and `(invalid_reg)` as the base. This is a side-effect of the way we build up the amode step-by-step -- we're waiting to see a register and plug it into the base field. If we never get a reg though, we need to generate a constant zero into a register and use that as the base. This PR adds a `finalize_amode` helper to do just that. Fixes #4234.	2022-06-07 11:40:10 -07:00
Johnnie Birch	3f152273d3	X64: Port fpromote to ISLE (#4230 )	2022-06-06 14:47:44 -07:00
Andrew Brown	6df56e6aa6	x64: port `atomic_cas` to ISLE (#4223 )	2022-06-06 13:20:33 -07:00
Andrew Brown	816aae6aca	x64: port some atomics to ISLE (#4212 ) * x64: port `fence` to ISLE * x64: port `atomic_load` to ISLE * x64: port `atomic_store` to ISLE	2022-06-02 14:13:10 -07:00

1 2 3 4 5 ...

586 Commits