wasmtime

Author	SHA1	Message	Date
Julian Seward	dd9bfcefaa	CL/aarch64: implement the wasm SIMD `v128.load{32,64}_zero` instructions. This patch implements, for aarch64, the following wasm SIMD extensions. v128.load32_zero and v128.load64_zero instructions https://github.com/WebAssembly/simd/pull/237 The changes are straightforward: * no new CLIF instructions. They are translated into an existing CLIF scalar load followed by a CLIF `scalar_to_vector`. * the comment/specification for CLIF `scalar_to_vector` has been changed to match the actual intended semantics, per consulation with Andrew Brown. * translation from `scalar_to_vector` to aarch64 `fmov` instruction. This has been generalised slightly so as to allow both 32- and 64-bit transfers. * special-case zero in `lower_constant_f128` in order to avoid a potentially slow call to `Inst::load_fp_constant128`. * Once "Allow loads to merge into other operations during instruction selection in MachInst backends" (https://github.com/bytecodealliance/wasmtime/issues/2340) lands, we can use that functionality to pattern match the two-CLIF pair and emit a single AArch64 instruction. * A simple filetest has been added. There is no comprehensive testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.	2020-11-04 20:00:04 +01:00
Julian Seward	5a5fb11979	CL/aarch64: implement the wasm SIMD `i32x4.dot_i16x8_s` instruction This patch implements, for aarch64, the following wasm SIMD extensions i32x4.dot_i16x8_s instruction https://github.com/WebAssembly/simd/pull/127 It also updates dependencies as follows, in order that the new instruction can be parsed, decoded, etc: wat to 1.0.27 wast to 26.0.1 wasmparser to 0.65.0 wasmprinter to 0.2.12 The changes are straightforward: * new CLIF instruction `widening_pairwise_dot_product_s` * translation from wasm into `widening_pairwise_dot_product_s` * new AArch64 instructions `smull`, `smull2` (part of the `VecRRR` group) * translation from `widening_pairwise_dot_product_s` to `smull ; smull2 ; addv` There is no testcase in this commit, because that is a separate repo. The implementation has been tested, nevertheless.	2020-11-03 14:25:04 +01:00
Chris Fallin	c35904a8bf	Merge pull request #2278 from akirilov-arm/load_splat Introduce the Cranelift IR instruction `LoadSplat`	2020-10-28 12:54:03 -07:00
Julian Seward	c15d9bd61b	CL/aarch64: implement the wasm SIMD pseudo-max/min and FP-rounding instructions This patch implements, for aarch64, the following wasm SIMD extensions Floating-point rounding instructions https://github.com/WebAssembly/simd/pull/232 Pseudo-Minimum and Pseudo-Maximum instructions https://github.com/WebAssembly/simd/pull/122 The changes are straightforward: * `build.rs`: the relevant tests have been enabled * `cranelift/codegen/meta/src/shared/instructions.rs`: new CLIF instructions `fmin_pseudo` and `fmax_pseudo`. The wasm rounding instructions do not need any new CLIF instructions. * `cranelift/wasm/src/code_translator.rs`: translation into CLIF; this is pretty much the same as any other unary or binary vector instruction (for the rounding and the pmin/max respectively) * `cranelift/codegen/src/isa/aarch64/lower_inst.rs`: - `fmin_pseudo` and `fmax_pseudo` are converted into a two instruction sequence, `fcmpgt` followed by `bsl` - the CLIF rounding instructions are converted to a suitable vector `frint{n,z,p,m}` instruction. * `cranelift/codegen/src/isa/aarch64/inst/mod.rs`: minor extension of `pub enum VecMisc2` to handle the rounding operations. And corresponding `emit` cases.	2020-10-26 10:37:07 +01:00
Julian Seward	2702942050	CL/aarch64 back end: implement the wasm SIMD `bitmask` instructions The `bitmask.{8x16,16x8,32x4}` instructions do not map neatly to any single AArch64 SIMD instruction, and instead need a sequence of around ten instructions. Because of this, this patch is somewhat longer and more complex than it would be for (eg) x64. Main changes are: * the relevant testsuite test (`simd_boolean.wast`) has been enabled on aarch64. * at the CLIF level, add a new instruction `vhigh_bits`, into which these wasm instructions are to be translated. * in the wasm->CLIF translation (code_translator.rs), translate into `vhigh_bits`. This is straightforward. * in the CLIF->AArch64 translation (lower_inst.rs), translate `vhigh_bits` into equivalent sequences of AArch64 instructions. There is a different sequence for each of the `{8x16, 16x8, 32x4}` variants. All other changes are AArch64-specific, and add instruction definitions needed by the previous step: * Add two new families of AArch64 instructions: `VecShiftImm` (vector shift by immediate) and `VecExtract` (effectively a double-length vector shift) * To the existing AArch64 family `VecRRR`, add a `zip1` variant. To the `VecLanesOp` family add an `addv` variant. * Add supporting code for the above changes to AArch64 instructions: - getting the register uses (`aarch64_get_regs`) - mapping the registers (`aarch64_map_regs`) - printing instructions - emitting instructions (`impl MachInstEmit for Inst`). The handling of `VecShiftImm` is a bit complex. - emission tests for new instructions and variants.	2020-10-23 05:26:25 +02:00
Anton Kirilov	e0b911a4df	Introduce the Cranelift IR instruction `LoadSplat` It corresponds to WebAssembly's `load*_splat` operations, which were previously represented as a combination of `Load` and `Splat` instructions. However, there are architectures such as Armv8-A that have a single machine instruction equivalent to the Wasm operations. In order to generate it, it is necessary to merge the `Load` and the `Splat` in the backend, which is not possible because the load may have side effects. The new IR instruction works around this limitation. The AArch64 backend leverages the new instruction to improve code generation. Copyright (c) 2020, Arm Limited.	2020-10-14 13:07:13 +01:00
Chris Fallin	835db11bea	Support for SpiderMonkey's "Wasm ABI 2020". As part of a Wasm JIT update, SpiderMonkey is changing its internal WebAssembly function ABI. The new ABI's frame format includes "caller TLS" and "callee TLS" slots. The details of where these come from are not important; from Cranelift's point of view, the only relevant requirement is that we have two on-stack args that are always present (offsetting other on-stack args), and that we define special argument purposes so that we can supply values for these slots. Note that this adds a new ABI (a variant of the Baldrdash ABI) because we do not want to tightly couple the landing of this PR to the landing of the changes in SpiderMonkey; it's better if both the old and new behavior remain available in Cranelift, so SpiderMonkey can continue to vendor Cranelift even if it does not land (or backs out) the ABI change. Furthermore, note that this needs to be a Cranelift-level change (i.e. cannot be done purely from the translator environment implementation) because the special TLS arguments must always go on the stack, which would not otherwise happen with the usual argument-placement logic; and there is no primitive to push a value directly in CLIF code (the notion of a stack frame is a lower-level concept).	2020-09-30 14:55:56 -07:00
Alex Crichton	5e08eb3b83	Bump wasmtime to 0.20.0 (#2222 ) At the same time bump cranelift crates to 0.67.0	2020-09-23 13:54:02 -05:00
bjorn3	45ccc6940e	Fix Switch for 128bit integers	2020-09-21 14:50:59 +02:00
Nick Fitzgerald	89f1e02f1f	Remove executable bits from a few Rust source files	2020-09-14 16:27:47 -07:00
Nick Fitzgerald	05bf9ea3f3	Rename "Stackmap" to "StackMap" And "stackmap" to "stack_map". This commit is purely mechanical.	2020-08-07 10:08:44 -07:00
Julian Seward	25e31739a6	Implement Wasm Atomics for Cranelift/newBE/aarch64. The implementation is pretty straightforward. Wasm atomic instructions fall into 5 groups * atomic read-modify-write * atomic compare-and-swap * atomic loads * atomic stores * fences and the implementation mirrors that structure, at both the CLIF and AArch64 levels. At the CLIF level, there are five new instructions, one for each group. Some comments about these: * for those that take addresses (all except fences), the address is contained entirely in a single `Value`; there is no offset field as there is with normal loads and stores. Wasm atomics require alignment checks, and removing the offset makes implementation of those checks a bit simpler. * atomic loads and stores get their own instructions, rather than reusing the existing load and store instructions, for two reasons: - per above comment, makes alignment checking simpler - reuse of existing loads and stores would require extension of `MemFlags` to indicate atomicity, which sounds semantically unclean. For example, then any instruction carrying `MemFlags` could be marked as atomic, even in cases where it is meaningless or ambiguous. * I tried to specify, in comments, the behaviour of these instructions as tightly as I could. Unfortunately there is no way (per my limited CLIF knowledge) to enforce the constraint that they may only be used on I8, I16, I32 and I64 types, and in particular not on floating point or vector types. The translation from Wasm to CLIF, in `code_translator.rs` is unremarkable. At the AArch64 level, there are also five new instructions, one for each group. All of them except `::Fence` contain multiple real machine instructions. Atomic r-m-w and atomic c-a-s are emitted as the usual load-linked store-conditional loops, guarded at both ends by memory fences. Atomic loads and stores are emitted as a load preceded by a fence, and a store followed by a fence, respectively. The amount of fencing may be overkill, but it reflects exactly what the SM Wasm baseline compiler for AArch64 does. One reason to implement r-m-w and c-a-s as a single insn which is expanded only at emission time is that we must be very careful what instructions we allow in between the load-linked and store-conditional. In particular, we cannot allow any extra memory transactions in there, since -- particularly on low-end hardware -- that might cause the transaction to fail, hence deadlocking the generated code. That implies that we can't present the LL/SC loop to the register allocator as its constituent instructions, since it might insert spills anywhere. Hence we must present it as a single indivisible unit, as we do here. It also has the benefit of reducing the total amount of work the RA has to do. The only other notable feature of the r-m-w and c-a-s translations into AArch64 code, is that they both need a scratch register internally. Rather than faking one up by claiming, in `get_regs` that it modifies an extra scratch register, and having to have a dummy initialisation of it, these new instructions (`::LLSC` and `::CAS`) simply use fixed registers in the range x24-x28. We rely on the RA's ability to coalesce V<-->R copies to make the cost of the resulting extra copies zero or almost zero. x24-x28 are chosen so as to be call-clobbered, hence their use is less likely to interfere with long live ranges that span calls. One subtlety regarding the use of completely fixed input and output registers is that we must be careful how the surrounding copy from/to of the arg/result registers is done. In particular, it is not safe to simply emit copies in some arbitrary order if one of the arg registers is a real reg. For that reason, the arguments are first moved into virtual regs if they are not already there, using a new method `<LowerCtx for Lower>::ensure_in_vreg`. Again, we rely on coalescing to turn them into no-ops in the common case. There is also a ridealong fix for the AArch64 lowering case for `Opcode::Trapif \| Opcode::Trapff`, which removes a bug in which two trap insns in a row were generated. In the patch as submitted there are 6 "FIXME JRS" comments, which mark things which I believe to be correct, but for which I would appreciate a second opinion. Unless otherwise directed, I will remove them for the final commit but leave the associated code/comments unchanged.	2020-08-04 09:35:50 +02:00
Benjamin Bouvier	694af3aec2	machinst x64: implement float Floor/Ceil/Trunc/Nearest as VM calls;	2020-07-24 19:29:12 +02:00
Nick Fitzgerald	ee5982fd16	peepmatic: Be generic over the operator type This lets us avoid the cost of `cranelift_codegen::ir::Opcode` to `peepmatic_runtime::Operator` conversion overhead, and paves the way for allowing Peepmatic to support non-clif optimizations (e.g. vcode optimizations). Rather than defining our own `peepmatic::Operator` type like we used to, now the whole `peepmatic` crate is effectively generic over a `TOperator` type parameter. For the Cranelift integration, we use `cranelift_codegen::ir::Opcode` as the concrete type for our `TOperator` type parameter. For testing, we also define a `TestOperator` type, so that we can test Peepmatic code without building all of Cranelift, and we can keep them somewhat isolated from each other. The methods that `peepmatic::Operator` had are now translated into trait bounds on the `TOperator` type. These traits need to be shared between all of `peepmatic`, `peepmatic-runtime`, and `cranelift-codegen`'s Peepmatic integration. Therefore, these new traits live in a new crate: `peepmatic-traits`. This crate acts as a header file of sorts for shared trait/type/macro definitions. Additionally, the `peepmatic-runtime` crate no longer depends on the `peepmatic-macro` procedural macro crate, which should lead to faster build times for Cranelift when it is using pre-built peephole optimizers.	2020-07-17 16:16:49 -07:00
bjorn3	7b7b1f4997	Rename sarg__ to sarg_t	2020-07-17 12:03:17 +02:00
bjorn3	0d4fa6d32a	Fix review comments	2020-07-17 12:03:17 +02:00
bjorn3	4431ac1108	Implement SystemV struct argument passing	2020-07-17 12:03:17 +02:00
Alex Crichton	63d5b91930	Wasmtime 0.19.0 and Cranelift 0.66.0 (#2027 ) This commit updates Wasmtime's version to 0.19.0, Cranelift's version to 0.66.0, and updates the release notes as well.	2020-07-16 12:46:21 -05:00
Andrew Brown	f0b083c6ad	Legalize `[u\|s]widen_high` for x86 Use `x86_palignr` and `[u\|s]widen_low` for legalizing this instruction.	2020-07-15 11:32:08 -07:00
Andrew Brown	c8ddf8a34c	Encode `[u\|s]widen_low` for x86	2020-07-15 11:32:08 -07:00
Andrew Brown	fafef7db77	Add `x86_palignr` instructions This instruction is necessary for implementing `[s\|u]widen_high`.	2020-07-15 11:32:08 -07:00
Andrew Brown	0e5e8a62c8	Add `DerivedFunction` for doubling lane widths and halving the number of lanes (i.e. merging) Certain operations (e.g. widening) will have operands with types like `NxM` but will return results with types like `(N*2)x(M/2)` (double the lane width, halve the number of lanes; maintain the same number of vector bits). This is equivalent to applying two `DerivedFunction`s to the type: `DerivedFunction::DoubleWidth` then `DerivedFunction::HalfVector`. Since there is no easy way to apply multiple `DerivedFunction`s (e.g. most of the logic is one-level deep, `1d5a678124/cranelift/codegen/meta/src/gen_inst.rs (L618-L621)`), I added `DerivedFunction::MergeLanes` to do the necessary type conversion.	2020-07-15 11:32:08 -07:00
Benjamin Bouvier	abf157bd69	machinst x64: Only use the feature flag to enable the x64 new backend; Before this patch, running the x64 new backend would require both compiling with --features experimental_x64 and running with `use_new_backend`. This patches changes this behavior so that the runtime flag is not needed anymore: using the feature flag will enforce usage of the new backend everywhere, making using and testing it much simpler: cargo run --features experimental_x64 ;; other CLI options/flags This also gives a hint at what the meta language generation would look like after switching to the new backend. Compiling only with the x64 codegen flag gives a nice compile time speedup.	2020-07-15 13:11:28 +02:00
Andrew Brown	c5a69cee9f	Add x86 legalization for fcvt_to_uint_sat.i32x4 This converts an `f32x4` into an `i32x4` (unsigned) with rounding by using a long sequence of SSE4.1 compatible instructions.	2020-07-08 10:20:01 -07:00
Andrew Brown	057c93b64e	Add `unarrow` instruction with x86 implementation Adds a shared `unarrow` instruction in order to lower the Wasm SIMD specification's unsigned narrowing (see https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing). Additionally, this commit implements the instruction for x86 using PACKUSWB and PACKUSDW for the applicable encodings.	2020-07-02 09:35:45 -07:00
Andrew Brown	65e6de2344	Replace `x86_packss` with `snarrow` Since the Wasm specification contains narrowing instructions (see https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing) that lower to PACKSS*, the x86-specific instruction is not necessary in the CLIF IR.	2020-07-02 09:35:45 -07:00
Andrew Brown	152d7fcee7	Fix typo and wording of CDSL error messages	2020-07-02 09:35:45 -07:00
Chris Fallin	a351fa52b5	Merge pull request #1930 from cfallin/spectre-heap Spectre mitigation on heap access overflow checks.	2020-07-01 09:23:04 -07:00
Chris Fallin	e694fb1312	Spectre mitigation on heap access overflow checks. This PR adds a conditional move following a heap bounds check through which the address to be accessed flows. This conditional move ensures that even if the branch is mispredicted (access is actually out of bounds, but speculation goes down in-bounds path), the acually accessed address is zero (a NULL pointer) rather than the out-of-bounds address. The mitigation is controlled by a flag that is off by default, but can be set by the embedding. Note that in order to turn it on by default, we would need to add conditional-move support to the current x86 backend; this does not appear to be present. Once the deprecated backend is removed in favor of the new backend, IMHO we should turn this flag on by default. Note that the mitigation is unneccessary when we use the "huge heap" technique on 64-bit systems, in which we allocate a range of virtual address space such that no 32-bit offset can reach other data. Hence, this only affects small-heap configurations.	2020-07-01 08:36:09 -07:00
Andrew Brown	737cf1d605	Implement `iabs` for x86 SIMD This only covers the types necessary for implementing the Wasm SIMD spec--`i8x16`, `i16x8`, `i32x4`.	2020-06-30 14:00:17 -07:00
Andrew Brown	26bdf9c333	Add `iabs` instructions.rs This is necessary for implementing SIMD [abs](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#lane-wise-integer-absolute-value).	2020-06-30 14:00:17 -07:00
Nick Fitzgerald	8c5f59c0cf	wasmtime: Implement `table.get` and `table.set` These instructions have fast, inline JIT paths for the common cases, and only call out to host VM functions for the slow paths. This required some changes to `cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert an instruction sequence within the current basic block, `FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder` so that they can create whole new basic blocks. This is necessary for implementing GC read/write barriers that involve branching (e.g. checking for null, or whether a store buffer is at capacity). Furthermore, it required that the `load`, `load_complex`, and `store` instructions handle loading and storing through an `r{32,64}` rather than just `i{32,64}` addresses. This involved making `r{32,64}` types acceptable instantiations of the `iAddr` type variable, plus a few new instruction encodings. Part of #929	2020-06-30 12:00:57 -07:00
Andrew Brown	c9d573d841	Provide spec-compliant legalization for SIMD floating point min/max	2020-06-25 14:48:16 -07:00
Benjamin Bouvier	c9a3f05afd	machinst x64: implement calls and int cmp/store/loads; This makes it possible to run a simple recursive fibonacci function in wasmtime.	2020-06-25 16:20:33 +02:00
Andrew Brown	3675f95bb2	Legalize fcvt_to_sint_sat.i32x4 on x86 Use a lengthy sequence involving CVTTPS2DQ to quiet NaNs and saturate overflow.	2020-06-18 11:39:38 -07:00
Andrew Brown	3740772176	Add encoding for x86 CVTTPS2DQ This reuses the `x86_cvtt2si` instruction since the packed and scalar versions seem to group together well.	2020-06-18 11:39:38 -07:00
Benjamin Bouvier	dad56a2488	cranelift: add a new resumable_trapnz instruction; This is useful to have to allow resumable_trap to happen in loop headers, for instance. This is the correct way to implement interrupt checks in Spidermonkey, which are effectively resumable traps. Previous implementation was using traps, which is wrong, since traps semantically can't be resumed after.	2020-06-15 12:04:28 +02:00
Andrew Brown	01d34e71b9	Add x86 legalization for fcvt_from_uint.f32x4 This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.	2020-06-12 15:06:22 -07:00
Andrew Brown	23ed48f269	Add AVX512F flag	2020-06-12 15:06:22 -07:00
Andrew Brown	772ce73f7f	Add x86_pblendw instruction This instruction is necessary for lowering `fcvt_from_uint`.	2020-06-12 15:06:22 -07:00
Andrew Brown	546fc9ddf1	Add x86_vcvtudq2ps instruction This instruction converts i32x4 to f32x4 in several AVX512 feature sets.	2020-06-12 15:06:22 -07:00
Dan Gohman	caa87048ab	Wasmtime 0.18.0 and Cranelift 0.65.0.	2020-06-11 17:49:56 -07:00
whitequark	3796164642	x86_32: legalize br{z,nz}.i64.	2020-06-08 12:52:13 -07:00
Andrew Brown	5db384cd76	Rename opcode: PMULLQ to VPMULLQ	2020-06-03 16:27:57 -07:00
Andrew Brown	1ea09088be	Add x86 legalization for imul.i64x2 for non-AVX CPUs The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.	2020-06-03 16:27:57 -07:00
Andrew Brown	b3a6985cd5	Re-organize transform groups for x86 legalization	2020-06-03 16:27:57 -07:00
Andrew Brown	5a32500518	Remove non-existent x86 encoding for sshr_imm.i64x2 This instruction does not exist in the SSE2 feature set; it can be added later with an VEX/EVEX encoding.	2020-06-03 16:27:57 -07:00
Andrew Brown	df171f01b5	Add x86_pmuludq This instruction multiplies the lower 32 bits of two 64x2 unsigned integers into an i64x2; this is necessary for lowering Wasm's i64x2.mul.	2020-06-03 16:27:57 -07:00
Andrew Brown	9ba9fd0f64	Add x86-specific instruction for i64x2 multiplication Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.	2020-06-03 16:27:57 -07:00
Dan Gohman	a76639c6fb	Wasmtime 0.17.0 and Cranelift 0.64.0. (#1805 )	2020-06-02 18:51:59 -07:00

1 2 3 4 5 ...

456 Commits