wasmtime

Author	SHA1	Message	Date
Trevor Elliott	0c2e0494bd	x64: Lower fcvt_from_uint in ISLE (#4684 ) * Add a test for the existing behavior of fcvt_from_unit * Migrate the I8, I16, I32 cases of fcvt_from_uint * Implement the I64 case of fcvt_from_uint * Add a test for the existing behavior of fcvt_from_uint.f64x2 * Migrate fcvt_from_uint.f64x2 to ISLE * Lower the last case of `fcvt_from_uint` * Add a test for `fcvt_from_uint` * Finish lowering fcmp_from_uint * Format	2022-08-11 12:28:41 -07:00
Afonso Bordado	e4adc46e6d	cranelift: Fix shifts and implement rotates in interpreter (#4519 ) * cranelift: Fix shifts and implement rotates in interpreter * x64: Implement `rotl`/`rotr` for some small type combinations	2022-08-11 12:15:52 -07:00
Ulrich Weigand	67870d1518	s390x: Support both big- and little-endian vector lane order (#4682 ) This implements the s390x back-end portion of the solution for https://github.com/bytecodealliance/wasmtime/issues/4566 We now support both big- and little-endian vector lane order in code generation. The order used for a function is determined by the function's ABI: if it uses a Wasmtime ABI, it will use little-endian lane order, and big-endian lane order otherwise. (This ensures that all raw_bitcast instructions generated by both wasmtime and other cranelift frontends can always be implemented as a no-op.) Lane order affects the implementation of a number of operations: - Vector immediates - Vector memory load / store (in big- and little-endian variants) - Operations explicitly using lane numbers (insertlane, extractlane, shuffle, swizzle) - Operations implicitly using lane numbers (iadd_pairwise, narrow/widen, promote/demote, fcvt_low, vhigh_bits) In addition, when calling a function using a different lane order, we need to lane-swap all vector values passed or returned in registers. A small number of changes to common code were also needed: - Ensure we always select a Wasmtime calling convention on s390x in crates/cranelift (func_signature). - Fix vector immediates for filetests/runtests. In PR #4427, I attempted to fix this by byte-swapping the V128 value, but with the new scheme, we'd instead need to perform a per-lane byte swap. Since we do not know the actual type in write_to_slice and read_from_slice, this isn't easily possible. Revert this part of PR #4427 again, and instead just mark the memory buffer as little-endian when emitting the trampoline; the back-end will then emit correct code to load the constant. - Change a runtest in simd-bitselect-to-vselect.clif to no longer make little-endian lane order assumptions. - Remove runtests in simd-swizzle.clif that make little-endian lane order assumptions by relying on implicit type conversion when using a non-i16x8 swizzle result type (this feature should probably be removed anyway). Tested with both wasmtime and cg_clif.	2022-08-11 12:10:46 -07:00
Afonso Bordado	c5bc368cfe	cranelift: Add COFF TLS Support (#4546 ) * cranelift: Implement COFF TLS Relocations * cranelift: Emit SecRel relocations * cranelift: Handle _tls_index symbol in backend	2022-08-11 09:33:40 -07:00
Afonso Bordado	268ddf2f6c	cranelift: Implement pinned reg in interpreter (#4375 )	2022-08-10 21:33:45 +00:00
Trevor Elliott	a25d52046b	x64: Migrate fcvt_from_sint and fcvt_low_from_sint to ISLE (#4650 ) https://github.com/bytecodealliance/wasmtime/pull/4650	2022-08-10 10:49:02 -07:00
bjorn3	f8c0a88299	Fix sret for AArch64 (#4634 ) * Fix sret for AArch64 AArch64 requires the struct return address argument to be stored in the x8 register. This register is never used for regular arguments. * Add extra sret tests for x86_64	2022-08-10 10:34:51 -07:00
Ulrich Weigand	50fcab2984	s390x: Implement tls_value (#4616 ) Implement the tls_value for s390 in the ELF general-dynamic mode. Notable differences to the x86_64 implementation are: - We use a __tls_get_offset libcall instead of __tls_get_addr. - The current thread pointer (stored in a pair of access registers) needs to be added to the result of __tls_get_offset. - __tls_get_offset has a variant ABI that requires the address of the GOT (global offset table) is passed in %r12. This means we need a new libcall entries for __tls_get_offset. In addition, we also need a way to access _GLOBAL_OFFSET_TABLE_. The latter is a "magic" symbol with a well-known name defined by the ABI and recognized by the linker. This patch introduces a new ExternalName::KnownSymbol variant to support such names (originally due to @afonso360). We also need to emit a relocation on a symbol placed in a constant pool, as well as an extra relocation on the call to __tls_get_offset required for TLS linker optimization. Needed by the cg_clif frontend.	2022-08-10 10:02:07 -07:00
Afonso Bordado	30e2a9bd29	cranelift: Upgrade libm to 0.2.4 (#4670 ) * cranelift: Upgrade libm to 0.2.4 This resolves an issue with incorrect fmaf on the x86_64-pc-windows-gnu target under some inputs. See: #4517 * supply-chain: Vet `libm` 0.2.4	2022-08-10 16:08:39 +00:00
Trevor Elliott	63c2d1e0c3	x64: Remove unnecessary register use when comparing against constants (#4645 ) https://github.com/bytecodealliance/wasmtime/pull/4645	2022-08-09 23:53:51 +00:00
Afonso Bordado	4d2a2cfae6	cranelift: Use `cranelift-jit` in runtests (#4453 ) * cranelift: Use JIT in runtests Using `cranelift-jit` in run tests allows us to preform relocations and libcalls. This is important since some instruction lowerings fallback to libcall's when an extension is missing, or when it's too complicated to implement manually. This is also a first step to being able to test `call`'s between functions in the runtest suite. It should also make it easier to eventually test TLS relocations, symbol resolution and ABI's. Another benefit of this is that we also get to test the JIT more, since it now runs the runtests, and gets some fuzzing via `fuzzgen` (which uses the `SingleFunctionCompiler`). This change causes regressions in terms of runtime for the filetests. I haven't done any serious benchmarking but what I've been seeing is that it now takes about ~3 seconds to run the testsuite while it previously took around 2 seconds. * Add FMA tests for X86	2022-08-09 14:54:25 -07:00
bjorn3	a4aa7258de	Remove some dead code from the abi code (#4653 ) These were originally used by the old backend framework as part of legalizing function signatures for the respective ABI.	2022-08-09 12:21:55 -07:00
Chris Fallin	953f83e6ac	Cranelift: disallow marking entry block 'cold'. (#4659 ) This is a nonsensical constraint: the entry block must come first in the compiled code's layout, so it cannot also be sunk to the end of the function. This PR modifies the CLIF verifier to disallow this situation entirely. It also adds an assert during final block-order computation to catch the problem (and avoid a silent miscompile) even if the verifier is disabled. Fixes #4656.	2022-08-09 11:52:30 -07:00
Trevor Elliott	ed7dfd3925	x64: Peephole optimization for `x < 0` (#4625 ) https://github.com/bytecodealliance/wasmtime/pull/4625 Fixes #4607	2022-08-09 09:45:53 -07:00
Damian Heaton	e463890f26	Port `AvgRound` & `SqmulRoundSat` to ISLE (AArch64) (#4639 ) Ported the existing implementations of the following opcodes on AArch64 to ISLE: - `AvgRound` - Also introduced support for `i64x2` vectors, as per the docs. - `SqmulRoundSat` Copyright (c) 2022 Arm Limited	2022-08-08 11:35:43 -07:00
Nick Fitzgerald	95e72db458	Some little Cranelift logging things (#4624 ) * Cranelift: Don't print "skipped TEST can't run aarch64" on x64, etc It's way too noisy. Move it to the logs. * Cranelift: Enable Cranelift trace logs in `clif-util` by default * cranelift-filetest: use `log::warn!` for warnings Instead of `println!` * rustfmt	2022-08-05 13:25:24 -07:00
Damian Heaton	eb332b8369	Convert `fma`, `valltrue` & `vanytrue` to ISLE (AArch64) (#4608 ) * Convert `fma`, `valltrue` & `vanytrue` to ISLE (AArch64) Ported the existing implementations of the following opcodes to ISLE on AArch64: - `fma` - Introduced missing support for `fma` on vector values, as per the docs. - `valltrue` - `vanytrue` Also fixed `fcmp` on scalar values in the interpreter, and enabled interpreter tests in `simd-fma.clif`. This introduces the `FMLA` machine instruction. Copyright (c) 2022 Arm Limited * Add comments for `Fmla` and `Bsl` Copyright (c) 2022 Arm Limited	2022-08-05 09:47:56 -07:00
wasmtime-publish	412fa04911	Bump Wasmtime to 0.41.0 (#4620 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-08-04 20:02:19 -05:00
Ulrich Weigand	f552a53654	s390x: Implement bitrev (#4617 ) Since we do not have an instruction for this, this is a simple open-coded implementation. Needed by the cg_clif frontend.	2022-08-04 16:24:55 -07:00
Trevor Elliott	cd847d071d	x64: Migrate br_table to ISLE (#4615 ) https://github.com/bytecodealliance/wasmtime/pull/4615	2022-08-04 22:12:37 +00:00
Ulrich Weigand	b17b1eb25d	[s390x, abi_impl] Add i128 support (#4598 ) This adds full i128 support to the s390x target, including new filetests and enabling the existing i128 runtest on s390x. The ABI requires that i128 is passed and returned via implicit pointer, but the front end still generates direct i128 types in call. This means we have to implement ABI support to implicitly convert i128 types to pointers when passing arguments. To do so, we add a new variant ABIArg::ImplicitArg. This acts like StructArg, except that the value type is the actual target type, not a pointer type. The required conversions have to be inserted in the prologue and at function call sites. Note that when dereferencing the implicit pointer in the prologue, we may require a temp register: the pointer may be passed on the stack so it needs to be loaded first, but the value register may be in the wrong class for pointer values. In this case, we use the "stack limit" register, which should be available at this point in the prologue. For return values, we use a mechanism similar to the one used for supporting multiple return values in the Wasmtime ABI. The only difference is that the hidden pointer to the return buffer must be the first, not last, argument in this case. (This implements the second half of issue #4565.)	2022-08-04 20:41:26 +00:00
Trevor Elliott	1fc11bbe51	x64: Migrate brff and I128 branching instructions to ISLE (#4599 ) https://github.com/bytecodealliance/wasmtime/pull/4599	2022-08-04 08:58:50 -07:00
Ulrich Weigand	b9dd48e34b	[s390x, abi_impl] Support struct args using explicit pointers (#4585 ) This adds support for StructArgument on s390x. The ABI for this platform requires that the address of the buffer holding the copy of the struct argument is passed from caller to callee as hidden pointer, using a register or overflow stack slot. To implement this, I've added an optional "pointer" filed to ABIArg::StructArg, and code to handle the pointer both in common abi_impl code and the s390x back-end. One notable change necessary to make this work involved the "copy_to_arg_order" mechanism. Currently, for struct args we only need to copy the data (and that need to happen before setting up any other args), while for non-struct args we only need to set up the appropriate registers or stack slots. This order is ensured by sorting the arguments appropriately into a "copy_to_arg_order" list. However, for struct args with explicit pointers we need to both copy the data (again, before everything else), and set up a register or stack slot. Since we now need to touch the argument twice, we cannot solve the ordering problem by a simple sort. Instead, the abi_impl common code now provided two callbacks, emit_copy_regs_to_buffer and emit_copy_regs_to_arg, and expects the back end to first call copy..to_buffer for all args, and then call copy.._to_arg for all args. This required updates to all back ends. In the s390x back end, in addition to the new ABI code, I'm now adding code to actually copy the struct data, using the MVC instruction (for small buffers) or a memcpy libcall (for larger buffers). This also requires a bit of new infrastructure: - MVC is the first memory-to-memory instruction we use, which needed a bit of memory argument tweaking - We also need to set up the infrastructure to emit libcalls. (This implements the first half of issue #4565.)	2022-08-03 19:00:07 +00:00
Anton Kirilov	a897742593	Initial back-edge CFI implementation (#3606 ) Give the user the option to sign and to authenticate function return addresses with the operations introduced by the Pointer Authentication extension to the Arm instruction set architecture. Copyright (c) 2021, Arm Limited.	2022-08-03 11:08:29 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Ulrich Weigand	6b4e6523f7	[abi_impl] Respect extension for incoming stack arguments (#4576 ) The gen_copy_arg_to_regs routine currently ignores argument extension flags when loading incoming arguments. This causes a problem with stack arguments on big-endian systems, since the argument address points to the word on the stack as extended by the caller, but the generated code only loads the inner type from the address, causing it to receive an incorrect value. (This happens to work on little- endian systems.) Fixed by loading extended arguments as full words.	2022-08-02 13:54:13 -07:00
Chris Fallin	8dddd6f1f7	Cranelift: Remove `ifcmp_sp` opcode. (#4578 ) This was temporarily added back in #3502 due to a need from Lucet; now that Lucet is EOL, the opcode is no longer needed and we can remove it.	2022-08-02 13:15:39 -07:00
Chris Fallin	43f1765272	Cranellift: remove Baldrdash support and related features. (#4571 ) * Cranellift: remove Baldrdash support and related features. As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team has recently determined that their current form of integration with Cranelift is too hard to maintain, and they have chosen to remove it from their codebase. If and when they decide to build updated support for Cranelift, they will adopt different approaches to several details of the integration. In the meantime, after discussion with the SpiderMonkey folks, they agree that it makes sense to remove the bits of Cranelift that exist to support the integration ("Baldrdash"), as they will not need them. Many of these bits are difficult-to-maintain special cases that are not actually tested in Cranelift proper: for example, the Baldrdash integration required Cranelift to emit function bodies without prologues/epilogues, and instead communicate very precise information about the expected frame size and layout, then stitched together something post-facto. This was brittle and caused a lot of incidental complexity ("fallthrough returns", the resulting special logic in block-ordering); this is just one example. As another example, one particular Baldrdash ABI variant processed stack args in reverse order, so our ABI code had to support both traversal orders. We had a number of other Baldrdash-specific settings as well that did various special things. This PR removes Baldrdash ABI support, the `fallthrough_return` instruction, and pulls some threads to remove now-unused bits as a result of those two, with the understanding that the SpiderMonkey folks will build new functionality as needed in the future and we can perhaps find cleaner abstractions to make it all work. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425 * Review feedback. * Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations. The debugger tests invoke `wasmtime` from within each test case under the control of a debugger (gdb or lldb). Some of these tests started to inexplicably fail in CI with unrelated changes, and the failures were only inconsistently reproducible locally. It seems to be cache related: if we disable cached compilation on the nested `wasmtime` invocations, the tests consistently pass. * Review feedback.	2022-08-02 19:37:56 +00:00
Benjamin Bouvier	ff37c9d8a4	[cranelift] Rejigger the `compile` API (#4540 ) * Move `emit_to_memory` to `MachCompileResult` This small refactoring makes it clearer to me that emitting to memory doesn't require anything else from the compilation `Context`. While it's a trivial change, it's a small public API change that shouldn't cause too much trouble, and doesn't seem RFC-worthy. Happy to hear different opinions about this, though! * hide the MachCompileResult behind a method * Add a `CompileError` wrapper type that references a `Function` * Rename MachCompileResult to CompiledCode * Additionally remove the last unsafe API in cranelift-codegen	2022-08-02 12:05:40 -07:00
Sam Parker	37cd96beff	[AArch64] i64x2 support for min/max (#4575 ) Also added interpreter support for vector min/max. Copyright (c) 2022, Arm Limited.	2022-08-02 11:42:05 -07:00
Trevor Elliott	25782b527e	x64: Migrate trapif and trapff to ISLE (#4545 ) https://github.com/bytecodealliance/wasmtime/pull/4545	2022-08-01 11:24:11 -07:00
Anton Kirilov	a47a82d2e5	Cranelift AArch64: Harden the Spectre mitigations (#4555 ) Use the `CSDB` instruction following Arm's recommendation. Copyright (c) 2022, Arm Limited.	2022-08-01 10:20:48 -07:00
Afonso Bordado	1f058a02c0	cranelift: Add MinGW `fma` regression tests (#4517 ) * cranelift: Add MinGW `fma` regression tests * cranelift: Fix FMA in interpreter * cranelift: Add separate `fma` test suite for the interpreter The interpreter can run `fma.clif` on most platforms, however on `x86_64-pc-windows-gnu` we use libm which has issues with some inputs. We should delete `fma-interpreter.clif` and enable the interpreter on the main `fma.clif` file once those are fixed.	2022-07-29 09:09:37 -05:00
Damian Heaton	5e3bb588a8	Port `Fence`, `IsNull`/`IsInvalid` & `Debugtrap` to ISLE (AArch64) (#4548 ) Ported the existing implementation of the following Opcodes for AArch64 to ISLE: - `Fence` - `IsNull` - `IsInvalid` - `Debugtrap` Copyright (c) 2022 Arm Limited	2022-07-28 15:36:13 -07:00
Trevor Elliott	29d4edc76b	x64: Migrate call and call_indirect to ISLE (#4542 ) https://github.com/bytecodealliance/wasmtime/pull/4542	2022-07-28 13:10:03 -07:00
Afonso Bordado	0508932174	cranelift: Align Scalar and SIMD shift semantics (#4520 ) * cranelift: Reorganize test suite Group some SIMD operations by instruction. * cranelift: Deduplicate some shift tests Also, new tests with the mod behaviour * aarch64: Lower shifts with mod behaviour * x64: Lower shifts with mod behaviour * wasmtime: Don't mask SIMD shifts	2022-07-27 17:54:00 +00:00
Afonso Bordado	e121c209fc	cranelift: Fix `urem`/`srem` in interpreter (#4532 )	2022-07-27 10:47:08 -07:00
Anton Kirilov	ead6edb0c5	Cranelift AArch64: Migrate Splat to ISLE (#4521 ) Copyright (c) 2022, Arm Limited.	2022-07-26 17:57:15 +00:00
Anton Kirilov	d041c4b376	Cranelift AArch64: Further integral constant fixes (#4530 ) Copyright (c) 2022, Arm Limited.	2022-07-26 09:35:06 -07:00
Afonso Bordado	02c3b47db2	x64: Implement SIMD `fma` (#4474 ) * x64: Add VEX Instruction Encoder This uses a similar builder pattern to the EVEX Encoder. Does not yet support memory accesses. * x64: Add FMA Flag * x64: Implement SIMD `fma` * x64: Use 4 register Vex Inst * x64: Reorder VEX pretty print args	2022-07-25 22:01:02 +00:00
Damian Heaton	3ef89b7787	Allow 64-bit vectors and implement for interpreter (#4509 ) * Allow 64-bit vectors and implement for interpreter The AArch64 backend already supports 64-bit vectors; this simply allows instructions to make use of that. Implemented support for 64-bit vectors within the interpreter to allow interpret runtests to use them. Copyright (c) 2022 Arm Limited * Disable 64-bit SIMD `iaddpairwise` tests on s390x Copyright (c) 2022 Arm Limited	2022-07-25 13:00:43 -07:00
Sam Parker	c5ddb4b803	[AArch64] Port SIMD narrowing to ISLE (#4478 ) * [AArch64] Port SIMD narrowing to ISLE Fvdemote, snarrow, unarrow and uunarrow. Also refactor the aarch64 instructions descriptions to parameterize on ScalarSize instead of using different opcodes. The zero_value pure constructor has been introduced and used by the integer narrow operations and it replaces, and extends, the compare zero patterns. Copright (c) 2022, Arm Limited. * use short 'if' patterns	2022-07-25 12:40:36 -07:00
Ulrich Weigand	dd40bf075a	s390x: Enable more runtests, and fix a few bugs (#4516 ) This enables more runtests to be executed on s390x. Doing so uncovered a two back-end bugs, which are fixed as well: - The result of cls was always off by one. - The result of popcnt.i16 has uninitialized high bits. In addition, I found a bug in the load-op-store.clif test case: v3 = heap_addr.i64 heap0, v1, 4 v4 = iconst.i64 42 store.i32 v4, v3 This was clearly intended to perform a 32-bit store, but actually performs a 64-bit store (it seems the type annotation of the store opcode is ignored, and the type of the operand is used instead). That bug did not show any noticable symptoms on little-endian architectures, but broke on big-endian.	2022-07-25 12:37:06 -07:00
Trevor Elliott	ee7e4f4c6b	x64: Port func_addr and symbol_value to ISLE (#4485 ) https://github.com/bytecodealliance/wasmtime/pull/4485	2022-07-25 11:11:16 -07:00
Afonso Bordado	446efd3e11	cranelift: Fix `icmp_imm` for small types in interpreter (#4506 )	2022-07-23 00:26:56 +00:00
Afonso Bordado	af62037f62	cranelift: Restrict `br_table` to `i32` indices (#4510 ) * cranelift: Restrict `br_table` to `i32` indices In #4498 it was proposed that we should only accept `i32` indices to `br_table`. The rationale for this is that larger types lead the users to a false sense of flexibility (since we don't support jump tables larger than u32's), and narrower types are not well tested paths that would be safer if we removed them. * cranelift: Reduce directly from i128 to i32 in Switch	2022-07-22 23:32:40 +00:00
Damian Heaton	f1a0c40a53	Convert `sqrt`..`nearest` to ISLE (AArch64) (#4508 ) Converted the existing implementations for the following opcodes to ISLE on AArch64: - `sqrt` - `fneg` - `fabs` - `fpromote` - `fdemote` - `ceil` - `floor` - `trunc` - `nearest` Copyright (c) 2022 Arm Limited	2022-07-22 14:48:07 -07:00
Afonso Bordado	d89c262657	cranelift: Implement `{u,s}extend.i128` in interpreter (#4505 )	2022-07-22 10:47:10 -07:00
Ulrich Weigand	fd639dd044	s390x: Support preserve_frame_pointers flag (#4477 ) On s390x, we do not have a frame pointer that can be used to chain stack frames for easy unwinding. Instead, our ABI defines a stack "backchain" mechanism that can be used to the same effect. This PR uses that backchain mechanism to implement the new preserve_frame_pointers flags introduced here: https://github.com/bytecodealliance/wasmtime/pull/4469	2022-07-21 10:09:16 -07:00
Anton Kirilov	2ba4bce5cc	Merge pull request from GHSA-7f6x-jwh5-m9r4 Copyright (c) 2022, Arm Limited.	2022-07-20 11:53:56 -05:00

... 3 4 5 6 7 ...

1401 Commits