wasmtime

Author	SHA1	Message	Date
Anton Kirilov	a1b39276e1	Enable more CLIF tests on AArch64 The tests for the SIMD floating-point maximum and minimum operations require particular care because the handling of the NaN values is non-deterministic and may vary between platforms. There is no way to match several NaN values in a test, so the solution is to extract the non-deterministic test cases into a separate file that is subsequently replicated for every backend under test, with adjustments made to the expected results. Copyright (c) 2021, Arm Limited.	2021-08-17 13:27:58 +01:00
Alex Crichton	9e142f8792	Fix some warnings on nightly Rust (#3148 ) Looks like these trailing-semicolons-in-macros are likely to become a hard error in the future, so this updates to remove them as necessary.	2021-08-05 13:02:44 -05:00
Johnnie Birch	e519fca61c	Refactor and turn on lowering for extend-add-pairwise	2021-07-31 10:52:39 -07:00
Johnnie Birch	e373ddfe1b	Add extend-add-pairwise instructions x64	2021-07-30 15:06:58 -07:00
Andrew Brown	26c78c06ef	refactor: remove unused field PR #3131 fixed the failing builds by allowing this field to be dead. After looking at it further the field is not being used and can be removedi completely.	2021-07-30 10:58:37 -07:00
Alex Crichton	4632b6a816	Fix warning on new-stable (#3131 ) One of the fields of `TargetIsa` isn't used in the cranelift-codegen-meta crate, but instead of refactoring to try to remove it this just adds `#[allow(dead_code)]` for now in the assumption that when the old backends go away this will probably go away as well.	2021-07-30 11:13:21 -05:00
Johnnie Birch	d8e813204e	Fold fcvt_low_from_uinit into previously existing clif instructions	2021-07-09 10:39:05 -07:00
Johnnie Birch	6dd2df4fb3	Update comment on fcvt_low_from_sint instruction	2021-07-09 10:39:05 -07:00
Johnnie Birch	2d676d838f	Implements f64x2.convert_low_i32x4_u for x64	2021-07-09 10:39:05 -07:00
Anton Kirilov	330f02aa09	Enable the simd_i32x4_trunc_sat_f64x2 test for AArch64 Also, reorganize the AArch64-specific VCode instructions for unary narrowing and widening vector operations, so that they are more straightforward to use. Copyright (c) 2021, Arm Limited.	2021-06-30 12:17:53 +01:00
Anton Kirilov	98f1ac789e	Enable the simd_i16x8_q15mulr_sat_s test on AArch64 Copyright (c) 2021, Arm Limited.	2021-06-28 12:24:31 +01:00
bjorn3	1415dd824a	Remove all dsl legalizations for arm32, arm64 and s390x	2021-06-20 18:43:45 +02:00
Ulrich Weigand	def54fb1fa	s390x: Add z14 support * Add support for processor features (including auto-detection). * Move base architecture set requirement back to z14. * Add z15 feature sets and re-enable z15-specific code generation when required features are available.	2021-06-17 10:23:15 +02:00
Chris Fallin	ffb92d9109	Merge pull request #2966 from akirilov-arm/simd_int_to_int_extend Enable the simd_int_to_int_extend test for AArch64	2021-06-06 23:34:52 -07:00
Johnnie Birch	1770880e19	x64: add support for packed promote and demote (#2783 ) * Add support for x64 packed promote low * Add support for x64 packed floating point demote * Update vector promote low and demote by adding constraints Also does some renaming and minor refactoring	2021-06-04 15:59:20 -07:00
Anton Kirilov	5e8a8fe5a0	Enable the simd_int_to_int_extend test for AArch64 Copyright (c) 2021, Arm Limited.	2021-06-04 16:10:02 +01:00
Andrew Brown	2a9f458ea3	x64: lower i8x16.shuffle to VPERMI2B when possible When shuffling values from two different registers, the x64 lowering for `i8x16.shuffle` must first shuffle each register separately and then OR the results with SSE instructions. With `VPERMI2B`, available in AVX512VL + AVX512VBMI, this can be done in a single instruction after the shuffle mask has been moved into the destination register. This change uses `VPERMI2B` for that case when the CPU supports it.	2021-06-01 11:40:53 -07:00
Andrew Brown	459fce3467	x64: lower i8x16.popcnt to VPOPCNTB when possible When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt` instruction can be lowered to a single x64 instruction, `VPOPCNTB`, instead of 8+ instructions.	2021-05-25 12:16:25 -07:00
Chris Fallin	800cf25bb5	Make the CFG metadata computation conditional on a flag.	2021-05-24 13:01:15 -07:00
Johnnie Birch	9a5c9607e1	Vpopcnt for x64	2021-05-21 19:23:26 -07:00
Andrew Brown	1fe7676831	cranelift: only allow vector types with saturating arithmetic This fixes #2883 by restricting which types are available to the `uadd_sat`, `sadd_sat`, `usub_sat`, and `ssub_sat` IR operations.	2021-05-17 06:54:45 -07:00
Ulrich Weigand	89b5fc776d	Support IBM z/Architecture This adds support for the IBM z/Architecture (s390x-ibm-linux). The status of the s390x backend in its current form is: - Wasmtime is fully functional and passes all tests on s390x. - All back-end features supported, with the exception of SIMD. - There is still a lot of potential for performance improvements. - Currently the only supported processor type is z15.	2021-05-10 16:01:16 +02:00
Peter Huene	0ddfe97a09	Change how flags are stored in serialized modules. This commit changes how both the shared flags and ISA flags are stored in the serialized module to detect incompatibilities when a serialized module is instantiated. It improves the error reporting when a compiled module has mismatched shared flags.	2021-04-01 21:39:57 -07:00
Peter Huene	abf3bf29f9	Add a `wasmtime settings` command to print Cranelift settings. This commit adds the `wasmtime settings` command to print out available Cranelift settings for a target (defaults to the host). The compile command has been updated to remove the Cranelift ISA options in favor of encouraging users to use `wasmtime settings` to discover what settings are available. This will reduce the maintenance cost for syncing the compile command with Cranelift ISA flags.	2021-04-01 19:38:19 -07:00
Peter Huene	29d366db7b	Add a compile command to Wasmtime. This commit adds a `compile` command to the Wasmtime CLI. The command can be used to Ahead-Of-Time (AOT) compile WebAssembly modules. With the `all-arch` feature enabled, AOT compilation can be performed for non-native architectures (i.e. cross-compilation). The `Module::compile` method has been added to perform AOT compilation. A few of the CLI flags relating to "on by default" Wasm features have been changed to be "--disable-XYZ" flags. A simple example of using the `wasmtime compile` command: ```text $ wasmtime compile input.wasm $ wasmtime input.cwasm ```	2021-04-01 19:38:18 -07:00
Johnnie Birch	31d3db1ec2	Implements convert low signed integer to float for x64 simd	2021-03-26 12:13:29 -07:00
Alex Crichton	30d9164b6e	Fix a number of warnings cropping up on nightly Rust (#2767 ) Various small issues here and there, nothing major	2021-03-25 13:19:37 -05:00
Benjamin Bouvier	6e6713ae0b	cranelift: add support for the Mac aarch64 calling convention This bumps target-lexicon and adds support for the AppleAarch64 calling convention. Specifically for WebAssembly support, we only have to worry about the new stack slots convention. Stack slots don't need to be at least 8-bytes, they can be as small as the data type's size. For instance, if we need stack slots for (i32, i32), they can be located at offsets (+0, +4). Note that they still need to be properly aligned on the data type they're containing, though, so if we need stack slots for (i32, i64), we can't start the i64 slot at the +4 offset (it must start at the +8 offset). Added one test that was failing on the Mac M1, as well as other tests stressing different yet similar situations.	2021-03-22 10:06:13 +01:00
Anton Kirilov	07c27039b1	Cranelift AArch64: Add initial support for the Armv8.1 atomics This commit enables Cranelift's AArch64 backend to generate code for instruction set extensions (previously only the base Armv8-A architecture was supported); also, it makes it possible to detect the extensions supported by the host when JIT compiling. The new functionality is applied to the IR instruction `AtomicCas`. Copyright (c) 2021, Arm Limited.	2021-03-13 02:31:51 +00:00
Chris Fallin	2d5db92a9e	Rework/simplify unwind infrastructure and implement Windows unwind. Our previous implementation of unwind infrastructure was somewhat complex and brittle: it parsed generated instructions in order to reverse-engineer unwind info from prologues. It also relied on some fragile linkage to communicate instruction-layout information that VCode was not designed to provide. A much simpler, more reliable, and easier-to-reason-about approach is to embed unwind directives as pseudo-instructions in the prologue as we generate it. That way, we can say what we mean and just emit it directly. The usual reasoning that leads to the reverse-engineering approach is that metadata is hard to keep in sync across optimization passes; but here, (i) prologues are generated at the very end of the pipeline, and (ii) if we ever do a post-prologue-gen optimization, we can treat unwind directives as black boxes with unknown side-effects, just as we do for some other pseudo-instructions today. It turns out that it was easier to just build this for both x64 and aarch64 (since they share a factored-out ABI implementation), and wire up the platform-specific unwind-info generation for Windows and SystemV. Now we have simpler unwind on all platforms and we can delete the old unwind infra as soon as we remove the old backend. There were a few consequences to supporting Fastcall unwind in particular that led to a refactor of the common ABI. Windows only supports naming clobbered-register save locations within 240 bytes of the frame-pointer register, whatever one chooses that to be (RSP or RBP). We had previously saved clobbers below the fixed frame (and below nominal-SP). The 240-byte range has to include the old RBP too, so we're forced to place clobbers at the top of the frame, just below saved RBP/RIP. This is fine; we always keep a frame pointer anyway because we use it to refer to stack args. It does mean that offsets of fixed-frame slots (spillslots, stackslots) from RBP are no longer known before we do regalloc, so if we ever want to index these off of RBP rather than nominal-SP because we add support for `alloca` (dynamic frame growth), then we'll need a "nominal-BP" mode that is resolved after regalloc and clobber-save code is generated. I added a comment to this effect in `abi_impl.rs`. The above refactor touched both x64 and aarch64 because of shared code. This had a further effect in that the old aarch64 prologue generation subtracted from `sp` once to allocate space, then used stores to `[sp, offset]` to save clobbers. Unfortunately the offset only has 7-bit range, so if there are enough clobbered registers (and there can be -- aarch64 has 384 bytes of registers; at least one unit test hits this) the stores/loads will be out-of-range. I really don't want to synthesize large-offset sequences here; better to go back to the simpler pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like a "push". It's likely not much worse microarchitecturally (dependence chain on SP, but oh well) and it actually saves an instruction if there's no other frame to allocate. As a further advantage, it's much simpler to understand; simpler is usually better. This PR adds the new backend on Windows to CI as well.	2021-03-11 20:03:52 -08:00
Chris Fallin	6c94eb82aa	x86-64 Windows fastcall ABI support. This adds support for the "fastcall" ABI, which is the native C/C++ ABI on Windows platforms on x86-64. It is similar to but not exactly like System V; primarily, its argument register assignments are different, and it requires stack shadow space. Note that this also adjusts the handling of multi-register values in the shared ABI implementation, and with this change, adjusts handling of `i128`s on both Fastcall/x64 and SysV/x64 platforms. This was done to align with actual behavior by the "rustc ABI" on both platforms, as mapped out experimentally (Compiler Explorer link in comments). This behavior is gated under the `enable_llvm_abi_extensions` flag. Note also that this does not add x64 unwind info on Windows. That will come in a future PR (but is planned!).	2021-03-03 19:53:18 -08:00
Chris Fallin	c07ec4c525	Merge pull request #2653 from bjorn3/more_atomic_ops More atomic ops	2021-02-18 08:34:58 -08:00
bjorn3	ff22842da5	More atomic ops	2021-02-18 14:16:15 +01:00
bjorn3	2fc964ea35	Add serde serialization support for the full clif ir	2021-02-18 11:27:02 +01:00
Alex Crichton	09b976e1d5	Fix a number of warnings on nightly Rust (#2652 ) This fixes some issues that are cropping up where some syntax will get phased out in 2021	2021-02-11 12:42:45 -06:00
Alex Crichton	503129ad91	Add a method to share `Config` across machines (#2608 ) With `Module::{serialize,deserialize}` it should be possible to share wasmtime modules across machines or CPUs. Serialization, however, embeds a hash of all configuration values, including cranelift compilation settings. By default wasmtime's selection of the native ISA would enable ISA flags according to CPU features available on the host, but the same CPU features may not be available across two machines. This commit adds a `Config::cranelift_clear_cpu_flags` method which allows clearing the target-specific ISA flags that are automatically inferred by default for the native CPU. Options can then be incrementally built back up as-desired with teh `cranelift_other_flag` method.	2021-01-26 15:59:12 -06:00
Yury Delendik	2964023a77	[SIMD][x86_64] Add encoding for PMADDWD (#2530 ) * [SIMD][x86_64] Add encoding for PMADDWD * also for "experimental_x64"	2020-12-24 07:52:50 -06:00
bjorn3	8f7f8ee0b4	Fix iconst.i8 0 miscompilation	2020-12-12 09:44:05 +01:00
Chris Fallin	39b5736727	Remove LoadSplat opcode, in preparation for pattern-matching Load+Splat. This was added as an incremental step to improve AArch64 code quality in PR #2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR #2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.	2020-11-16 15:31:56 -08:00
Julian Seward	dd9bfcefaa	CL/aarch64: implement the wasm SIMD `v128.load{32,64}_zero` instructions. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions https://github.com/WebAssembly/simd/pull/237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (https://github.com/bytecodealliance/wasmtime/issues/2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.	2020-11-04 20:00:04 +01:00
Julian Seward	5a5fb11979	CL/aarch64: implement the wasm SIMD `i32x4.dot_i16x8_s` instruction This patch implements, for aarch64, the following wasm SIMD extensions i32x4.dot_i16x8_s instruction https://github.com/WebAssembly/simd/pull/127 It also updates dependencies as follows, in order that the new instruction can be parsed, decoded, etc: wat to 1.0.27 wast to 26.0.1 wasmparser to 0.65.0 wasmprinter to 0.2.12 The changes are straightforward: * new CLIF instruction `widening_pairwise_dot_product_s` * translation from wasm into `widening_pairwise_dot_product_s` * new AArch64 instructions `smull`, `smull2` (part of the `VecRRR` group) * translation from `widening_pairwise_dot_product_s` to `smull ; smull2 ; addv` There is no testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.	2020-11-03 14:25:04 +01:00
Chris Fallin	c35904a8bf	Merge pull request #2278 from akirilov-arm/load_splat Introduce the Cranelift IR instruction `LoadSplat`	2020-10-28 12:54:03 -07:00
Julian Seward	c15d9bd61b	CL/aarch64: implement the wasm SIMD pseudo-max/min and FP-rounding instructions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions https://github.com/WebAssembly/simd/pull/232 Pseudo-Minimum and Pseudo-Maximum instructions https://github.com/WebAssembly/simd/pull/122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.	2020-10-26 10:37:07 +01:00
Julian Seward	2702942050	CL/aarch64 back end: implement the wasm SIMD `bitmask` instructions The `bitmask.{8x16,16x8,32x4}` instructions do not map neatly to any single AArch64 SIMD instruction, and instead need a sequence of around ten instructions. Because of this, this patch is somewhat longer and more complex than it would be for (eg) x64. Main changes are: * the relevant testsuite test (`simd_boolean.wast`) has been enabled on aarch64. * at the CLIF level, add a new instruction `vhigh_bits`, into which these wasm instructions are to be translated. * in the wasm->CLIF translation (code_translator.rs), translate into `vhigh_bits`. This is straightforward. * in the CLIF->AArch64 translation (lower_inst.rs), translate `vhigh_bits` into equivalent sequences of AArch64 instructions. There is a different sequence for each of the `{8x16, 16x8, 32x4}` variants. All other changes are AArch64-specific, and add instruction definitions needed by the previous step: * Add two new families of AArch64 instructions: `VecShiftImm` (vector shift by immediate) and `VecExtract` (effectively a double-length vector shift) * To the existing AArch64 family `VecRRR`, add a `zip1` variant. To the `VecLanesOp` family add an `addv` variant. * Add supporting code for the above changes to AArch64 instructions: - getting the register uses (`aarch64_get_regs`) - mapping the registers (`aarch64_map_regs`) - printing instructions - emitting instructions (`impl MachInstEmit for Inst`). The handling of `VecShiftImm` is a bit complex. - emission tests for new instructions and variants.	2020-10-23 05:26:25 +02:00
Anton Kirilov	e0b911a4df	Introduce the Cranelift IR instruction `LoadSplat` It corresponds to WebAssembly's `load*_splat` operations, which were previously represented as a combination of `Load` and `Splat` instructions. However, there are architectures such as Armv8-A that have a single machine instruction equivalent to the Wasm operations. In order to generate it, it is necessary to merge the `Load` and the `Splat` in the backend, which is not possible because the load may have side effects. The new IR instruction works around this limitation. The AArch64 backend leverages the new instruction to improve code generation. Copyright (c) 2020, Arm Limited.	2020-10-14 13:07:13 +01:00
Chris Fallin	835db11bea	Support for SpiderMonkey's "Wasm ABI 2020". As part of a Wasm JIT update, SpiderMonkey is changing its internal WebAssembly function ABI. The new ABI's frame format includes "caller TLS" and "callee TLS" slots. The details of where these come from are not important; from Cranelift's point of view, the only relevant requirement is that we have two on-stack args that are always present (offsetting other on-stack args), and that we define special argument purposes so that we can supply values for these slots. Note that this adds a new ABI (a variant of the Baldrdash ABI) because we do not want to tightly couple the landing of this PR to the landing of the changes in SpiderMonkey; it's better if both the old and new behavior remain available in Cranelift, so SpiderMonkey can continue to vendor Cranelift even if it does not land (or backs out) the ABI change. Furthermore, note that this needs to be a Cranelift-level change (i.e. cannot be done purely from the translator environment implementation) because the special TLS arguments must always go on the stack, which would not otherwise happen with the usual argument-placement logic; and there is no primitive to push a value directly in CLIF code (the notion of a stack frame is a lower-level concept).	2020-09-30 14:55:56 -07:00
bjorn3	45ccc6940e	Fix Switch for 128bit integers	2020-09-21 14:50:59 +02:00
Nick Fitzgerald	89f1e02f1f	Remove executable bits from a few Rust source files	2020-09-14 16:27:47 -07:00
Nick Fitzgerald	05bf9ea3f3	Rename "Stackmap" to "StackMap" And "stackmap" to "stack_map". This commit is purely mechanical.	2020-08-07 10:08:44 -07:00
Julian Seward	25e31739a6	Implement Wasm Atomics for Cranelift/newBE/aarch64. The implementation is pretty straightforward. Wasm atomic instructions fall into 5 groups * atomic read-modify-write * atomic compare-and-swap * atomic loads * atomic stores * fences and the implementation mirrors that structure, at both the CLIF and AArch64 levels. At the CLIF level, there are five new instructions, one for each group. Some comments about these: * for those that take addresses (all except fences), the address is contained entirely in a single `Value`; there is no offset field as there is with normal loads and stores. Wasm atomics require alignment checks, and removing the offset makes implementation of those checks a bit simpler. * atomic loads and stores get their own instructions, rather than reusing the existing load and store instructions, for two reasons: - per above comment, makes alignment checking simpler - reuse of existing loads and stores would require extension of `MemFlags` to indicate atomicity, which sounds semantically unclean. For example, then any instruction carrying `MemFlags` could be marked as atomic, even in cases where it is meaningless or ambiguous. * I tried to specify, in comments, the behaviour of these instructions as tightly as I could. Unfortunately there is no way (per my limited CLIF knowledge) to enforce the constraint that they may only be used on I8, I16, I32 and I64 types, and in particular not on floating point or vector types. The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable. At the AArch64 level, there are also five new instructions, one for each group. All of them except `::Fence` contain multiple real machine instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual load-linked store-conditional loops, guarded at both ends by memory fences. Atomic loads and stores are emitted as a load preceded by a fence, and a store followed by a fence, respectively. The amount of fencing may be overkill, but it reflects exactly what the SM Wasm baseline compiler for AArch64 does. One reason to implement r-m-w and c-a-s as a single insn which is expanded only at emission time is that we must be very careful what instructions we allow in between the load-linked and store-conditional. In particular, we cannot allow any extra memory transactions in there, since -- particularly on low-end hardware -- that might cause the transaction to fail, hence deadlocking the generated code. That implies that we can't present the LL/SC loop to the register allocator as its constituent instructions, since it might insert spills anywhere. Hence we must present it as a single indivisible unit, as we do here. It also has the benefit of reducing the total amount of work the RA has to do. The only other notable feature of the r-m-w and c-a-s translations into AArch64 code, is that they both need a scratch register internally. Rather than faking one up by claiming, in `get_regs` that it modifies an extra scratch register, and having to have a dummy initialisation of it, these new instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so as to be call-clobbered, hence their use is less likely to interfere with long live ranges that span calls. One subtlety regarding the use of completely fixed input and output registers is that we must be careful how the surrounding copy from/to of the arg/result registers is done. In particular, it is not safe to simply emit copies in some arbitrary order if one of the arg registers is a real reg. For that reason, the arguments are first moved into virtual regs if they are not already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`. Again, we rely on coalescing to turn them into no-ops in the common case. There is also a ridealong fix for the AArch64 lowering case for `Opcode::Trapif \| Opcode::Trapff`, which removes a bug in which two trap insns in a row were generated. In the patch as submitted there are 6 "FIXME JRS" comments, which mark things which I believe to be correct, but for which I would appreciate a second opinion. Unless otherwise directed, I will remove them for the final commit but leave the associated code/comments unchanged.	2020-08-04 09:35:50 +02:00

1 2 3 4 5 ...

398 Commits