wasmtime

Author	SHA1	Message	Date
Afonso Bordado	f9ada24bcf	cranelift: Fix br_table for i64 inputs We still only support a maximum of u32::MAX entries, however we no longer crash when compiling 64 bit indexes. Fixes #3100	2021-09-02 15:31:48 +01:00
bjorn3	690ea640b3	Implement fmin_pseudo and fmax_pseudo for scalars	2021-08-27 16:59:47 +02:00
Johnnie Birch	e3aae9e498	Refactor to avoid too strict assertion. Fix for 3160 and 3161. Assertion was intended for SIMD lowering of F64x2ConvertLowI32x4U	2021-08-24 19:39:47 -07:00
Johnnie Birch	e519fca61c	Refactor and turn on lowering for extend-add-pairwise	2021-07-31 10:52:39 -07:00
Johnnie Birch	e373ddfe1b	Add extend-add-pairwise instructions x64	2021-07-30 15:06:58 -07:00
Johnnie Birch	4f601edc36	Add x64 support for remaining int-to-int extend simd instructions Adds remaming support for int to int extend simd instructions. Specifically adds support for remaining I32x4->I64x2 instructions	2021-07-28 23:33:42 -07:00
Johnnie Birch	500f530322	Add support for i32x4_trunc_sat_f64x2_s for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	23290f0450	Add support for i32x4_trunc_sat_f64x2_u for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	5deda27977	Add support for Saturating Rounding Q-format Multiplication for x64	2021-07-26 20:32:46 -07:00
Johnnie Birch	ffec1f9b41	Fix for 3089 X64 ext_mul_i8x16 has incorrect lowering Also factors out unnecessary temp register	2021-07-26 20:06:43 -07:00
Andrew Brown	6b86984c41	x64: avoid load-coalescing SIMD operations with non-aligned loads Fixes #2943, though not as optimally as may be desired. With x64 SIMD instructions, the memory operand must be aligned--this change adds that check. There are cases, however, where we can do better--see #3106.	2021-07-26 13:39:16 -07:00
Johnnie Birch	2452a4cd74	Refactor lowering structure for ext_mul on x64 and add comments	2021-07-15 01:07:52 -07:00
Johnnie Birch	6fbe0b72bd	Add simd_extmul_* support for x64	2021-07-15 01:07:52 -07:00
Johnnie Birch	d8e813204e	Fold fcvt_low_from_uinit into previously existing clif instructions	2021-07-09 10:39:05 -07:00
Johnnie Birch	2d676d838f	Implements f64x2.convert_low_i32x4_u for x64	2021-07-09 10:39:05 -07:00
Anton Kirilov	330f02aa09	Enable the simd_i32x4_trunc_sat_f64x2 test for AArch64 Also, reorganize the AArch64-specific VCode instructions for unary narrowing and widening vector operations, so that they are more straightforward to use. Copyright (c) 2021, Arm Limited.	2021-06-30 12:17:53 +01:00
Anton Kirilov	98f1ac789e	Enable the simd_i16x8_q15mulr_sat_s test on AArch64 Copyright (c) 2021, Arm Limited.	2021-06-28 12:24:31 +01:00
Chris Fallin	d42c8692bc	x64 backend: be explicit about unimplemented opcodes. As discussed in #3035, most backends have explicit `unimplemented!(...)` match-arms for opcode lowering cases that are not yet implemented; this allows the backend maintainer to easily see what is not yet implemented, and avoiding a catch-all wildcard arm is less error-prone as opcodes are added in the future. However, the x64 backend was the exception: as @akirilov-arm pointed out, it had a wildcard match arm. This fixes the issue by explicitly listing all opcodes the x64 backend does not yet implement. As per our tests, these opcodes are not used or need by Wasm lowering; but, it is good to know that they exist, so that we can eventually either support or remove them. This was a good exercise for me as I wasn't aware of a few of these in particular: e.g., aarch64 supports `bmask` while x64 does not, and there isn't a good reason why x64 shouldn't, especially if others hope to use Cranelift as a SIMD-capable general codegen in the future. The `unimplemented!()` cases are separate from `panic!()` ones: my convention here was to split out those that are logically just missing from those that should be impossible, mostly due to expected removal by legalization before we reach the lowering step.	2021-06-26 15:29:14 -07:00
Chris Fallin	efe3930215	Fix `bint` on x64, and make `bextend` consistent with bool representation. There has been occasional confusion with the representation that we use for bool-typed values in registers, at least when these are wider than one bit. Does a `b8` store `true` as 1, or as all-ones (`0xff`)? We've settled on the latter because of some use-cases where the wide bool becomes a mask -- see #2058 for more on this. This is fine, and transparent, to most operations within CLIF, because the bool-typed value still has only two semantically-visible states, namely `true` and `false`. However, we have to be careful with bool-to-int conversions. `bint` on aarch64 correctly masked the all-ones value down to 0 or 1, as required by the instruction specification, but on x64 it did not. This PR fixes that bug and makes x64 consistent with aarch64. While staring at this code I realized that `bextend` was also not consistent with the all-ones invariant: it should do a sign-extend, not a zero-extend as it previously did. This is also rectified and tested. (Aarch64 also already had this case implemented correctly.) Fixes #3003.	2021-06-22 10:56:56 -07:00
Johnnie Birch	1770880e19	x64: add support for packed promote and demote (#2783 ) * Add support for x64 packed promote low * Add support for x64 packed floating point demote * Update vector promote low and demote by adding constraints Also does some renaming and minor refactoring	2021-06-04 15:59:20 -07:00
Andrew Brown	8dc4cc9fe3	x64: fix AVX512 flag checks Previously, the multiple flags for certain AVX512 instructions were checked using `OR`: e.g., if the CPU has AVX512VL `OR` AVX512DQ, emit `VPMULLQ`. This is incorrect--the logic should be `AND`. The Intel Software Developer Manual, vol. 1, sec. 15.4, has more information on this (notable there is the suggestion to check with `XGETBV` that the OS is allowing the use of the XMM registers--but that is a separate issue). This change switches to `AND` logic in the new backend.	2021-06-01 11:41:16 -07:00
Andrew Brown	2a9f458ea3	x64: lower i8x16.shuffle to VPERMI2B when possible When shuffling values from two different registers, the x64 lowering for `i8x16.shuffle` must first shuffle each register separately and then OR the results with SSE instructions. With `VPERMI2B`, available in AVX512VL + AVX512VBMI, this can be done in a single instruction after the shuffle mask has been moved into the destination register. This change uses `VPERMI2B` for that case when the CPU supports it.	2021-06-01 11:40:53 -07:00
Andrew Brown	459fce3467	x64: lower i8x16.popcnt to VPOPCNTB when possible When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt` instruction can be lowered to a single x64 instruction, `VPOPCNTB`, instead of 8+ instructions.	2021-05-25 12:16:25 -07:00
Johnnie Birch	9a5c9607e1	Vpopcnt for x64	2021-05-21 19:23:26 -07:00
Andrew Brown	54b45d28a3	x64: lower fcvt_from_uint to VCVTUDQ2PS when possible When AVX512VL and AVX512F are available, use a single instruction (`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This optimization is a port from the legacy x86 backend.	2021-05-19 12:20:11 -07:00
Andrew Brown	7ef3ae2903	x64: implement vselect with variable blend instructions This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`, and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used by [simple_preopt.rs](`fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999)`) to lower `bitselect` to a single x86 instruction when the condition mask is known to be boolean (all 1s or 0s, e.g., from a conversion). This is better than `bitselect` in general, which lowers to 4-5 instructions. The old backend had the `vselect` lowering; this simply introduces it to the new backend.	2021-05-17 11:23:33 -07:00
Andrew Brown	e676589b0c	x64: lower i64x2.imul to VPMULLQ when possible This adds the machinery to encode the VPMULLQ instruction which is available in AVX512VL and AVX512DQ. When these feature sets are available, we use this instruction instead of a lengthy 12-instruction sequence.	2021-05-13 20:14:05 -07:00
Andrew Brown	c982d2be65	x64: move multiplication lowering Since the lowering of `imul` complicated the other ALU operations it was matched with and since future commits will alter the multiplication lowering further, this change moves the `imul` lowering to its own match block.	2021-05-13 20:14:05 -07:00
Afonso Bordado	e021995323	Allow i128 amount operands on shift instructions in the x64 backend Fixes #2727.	2021-05-10 18:32:20 +01:00
Andrew Brown	0acc1451ea	x64: lower iabs.i64x2 using a single AVX512 instruction when possible (#2819 ) * x64: add EVEX encoding mechanism Also, includes an empty stub module for the VEX encoding. * x64: lower abs.i64x2 to VPABSQ when available * x64: refactor EVEX encodings to use `EvexInstruction` This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked. * x64: rename encoding CodeSink to ByteSink	2021-04-15 11:53:58 -07:00
Andrew Brown	6bdef48473	x64: refactor to use Inst::store during lowering This re-factoring replaces uses of `Inst::mov_r_m` with `Inst::store` to ensure there is only one code location to troubleshoot when generating store instructions for a specific type.	2021-04-13 13:09:07 -07:00
Johnnie Birch	31d3db1ec2	Implements convert low signed integer to float for x64 simd	2021-03-26 12:13:29 -07:00
Alex Crichton	3f694ae319	Use stable Rust on CI to test the x64 backend (#2766 ) * Use stable Rust on CI to test the x64 backend This commit leverages the newly-released 1.51.0 compiler to test the new backend on Windows and Linux with a stable compiler instead of a nightly compiler. This isolates the nightly build to just the nightly documentation generation and fuzzing, both of which rely on nightly for the best results right now. * Use updated stable in book build job * Run rustfmt for new stable * Silence new warnings for wasi-nn * Allow some dead code in the x64 backend Looks like new rustc is better about emitting some dead-code warnings * Update rust in peepmatic job * Fix a test in the pooling allocator * Remove `package.metdata.docs.rs` temporarily Needs resolution of https://github.com/rust-lang/cargo/pull/9300 first * Fix a warning in a wasi-nn example	2021-03-25 13:18:59 -05:00
Chris Fallin	b429f77ee9	Handle `srem` properly when `avoid_div_traps` is false. The codegen for div/rem ops has two modes, depending on the `avoid_div_traps` flag: it can either do all checks for trapping conditions explicitly, and use explicit trap instructions, then use a hardware divide instruction that will not trap (`avoid_div_traps == true`); or it can run in a mode where a hardware FP fault on the divide instruction implies a Wasm trap (`avoid_div_traps == false`). Wasmtime uses the former while Lucet (for example) uses the latter. It turns out that because we run all our spec tests run under Wasmtime, we missed a spec corner case that fails in the latter: INT_MIN % -1 == 0 per the spec, but causes a trap with the x86 signed divide/remainder instruction. Hence, in Lucet, this specific remainder computation would incorrectly result in a Wasm trap. This PR fixes the issue by just forcing use of the explicit-checks implementation for `srem` even when `avoid_div_traps` is false.	2021-03-24 22:30:07 -07:00
Andrew Brown	508f8fa5a9	[x64] Add i64x2.abs This instruction has a single instruction lowering in AVX512F/VL and a three instruction lowering in AVX but neither is currently supported in the x64 backend. To implement this, we instead subtract the vector from 0 and use a blending instruction to pick the lanes containing the absolute value.	2021-03-02 12:30:02 -08:00
Andrew Brown	d730f18a78	[x64] Add the new i64x2 comparisons	2021-03-01 17:52:34 -08:00
Chris Fallin	cdb60ec5a9	Merge pull request #2682 from cfallin/shift-bugs Fix some `i128` shift-related bugs in x64 backend.	2021-02-26 15:13:08 -08:00
Chris Fallin	6dcb31abb7	Fix 128-bit left shift: null out tmp3, not tmp2, on zero-shift case. Add a bunch of test vectors that actually expose this (previously the shift-by-zero test had equal lower and upper halves and hid the bug), including the most basic of all, 1 << 0 == 1 (thanks @bjorn3 for finding this).	2021-02-25 09:46:57 -08:00
Chris Fallin	40db4de44a	Fix incomplete trap metadata due to multiple traps at one address. If an instruction has more than one trap record associated with it (for example: a divide instruction that has participated in load-op fusion, so we have both a heap-out-of-bounds trap record due to its load and a divide-by-zero trap record due to its divide op), the current MachBuffer code would emit only one of the trap records to the sink. Separately, divide instructions probably shouldn't merge loads, because the two separate possible traps at one location might be confusing for some embedders (certainly in Lucet). Divide seems to be the only case in our current codegen where such merging might occur. This PR changes the lowering to always force the divisor into a register. Finally, while working out why trap records were not appearing, I had noticed that `isa::x64::emit_std_enc_mem()` was only emitting heap-OOB trap metadata for loads/stores when it had a srcloc. This PR ensures that the metadata is emitted even when the srcloc is empty. Note that none of the above presents a security or correctness problem; trap metadata only affects the status that we return to the embedder when a Wasm program terminates with a trap.	2021-02-24 15:13:45 -08:00
Chris Fallin	0f3e00b25e	Fix some `i128` shift-related bugs in x64 backend. This fixes #2672 and #2679, and also fixes an incorrect instruction emission (`test` with small immediate) that we had missed earlier. The shift-related fixes have to do with (i) shifts by 0 bits, as a special case that must be handled; and (ii) shifts by a 128-bit amount, which we can handle by just dropping the upper half (we only use 3--7 bits of shift amount). This adjusts the lowerings appropriately, and also adds run-tests to ensure that the lowerings actually execute correctly (previously we only had compile-tests with golden lowerings; I'd like to correct this for more ops eventually, adding run-tests beyond what the Wasm spec and frontend covers).	2021-02-23 14:22:04 -08:00
Kasey Carrothers	7bd96c8e2f	Refactor x64::Insts that use an is_64 bool to use OperandSize.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	3306408100	Refactor x64::Inst to use OperandSize instead of u8s. TODO: some types take a 'is_64_bit' bool. Those are left unchanged for now.	2021-02-03 10:40:11 -08:00
Benjamin Bouvier	2275519cb1	cranelift x64: use the POPCNT instruction for Popcount when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	6bf6612d96	cranelift x64: use the TZCNT instruction for Ctz when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	d3acd9a283	cranelift x64: use the LZCNT instruction for Clz when it's available;	2021-01-29 19:41:01 +01:00
Johnnie Birch	cbd7a6a80e	Add sse41 lowering for rounding x64	2021-01-28 17:37:17 -08:00
bjorn3	81d248c057	Implement Mach-O TLS access for x64 newBE	2021-01-21 18:25:56 +01:00
Chris Fallin	c7de8f5efb	Merge pull request #2541 from cfallin/struct-arg-ret x64 and aarch64: allow StructArgument and StructReturn args.	2021-01-17 23:50:19 -08:00
Chris Fallin	456561f431	x64 and aarch64: allow StructArgument and StructReturn args. The StructReturn ABI is fairly simple at the codegen/isel level: we only need to take care to return the sret pointer as one of the return values if that wasn't specified in the initial function signature. Struct arguments are a little more complex. A struct argument is stored as a chunk of memory in the stack-args space. However, the CLIF semantics are slightly special: on the caller side, the parameter passed in is a pointer to an arbitrary memory block, and we must memcpy this data to the on-stack struct-argument; and on the callee side, we provide a pointer to the passed-in struct-argument as the CLIF block param value. This is necessary to support various ABIs other than Wasm, such as that of Rust (with the cg_clif codegen backend).	2021-01-17 23:11:45 -08:00
Chris Fallin	0f563f786a	Add ELF TLS support in new x64 backend. This follows the implementation in the legacy x86 backend, including hardcoded sequence that is compatible with what the linker expects. We could potentially do better here, but it is likely not necessary. Thanks to @bjorn3 for a bugfix to an earlier version of this.	2021-01-17 22:48:51 -08:00

1 2 3 4 5

201 Commits