wasmtime

Author	SHA1	Message	Date
Anton Kirilov	07c27039b1	Cranelift AArch64: Add initial support for the Armv8.1 atomics This commit enables Cranelift's AArch64 backend to generate code for instruction set extensions (previously only the base Armv8-A architecture was supported); also, it makes it possible to detect the extensions supported by the host when JIT compiling. The new functionality is applied to the IR instruction `AtomicCas`. Copyright (c) 2021, Arm Limited.	2021-03-13 02:31:51 +00:00
Chris Fallin	2d5db92a9e	Rework/simplify unwind infrastructure and implement Windows unwind. Our previous implementation of unwind infrastructure was somewhat complex and brittle: it parsed generated instructions in order to reverse-engineer unwind info from prologues. It also relied on some fragile linkage to communicate instruction-layout information that VCode was not designed to provide. A much simpler, more reliable, and easier-to-reason-about approach is to embed unwind directives as pseudo-instructions in the prologue as we generate it. That way, we can say what we mean and just emit it directly. The usual reasoning that leads to the reverse-engineering approach is that metadata is hard to keep in sync across optimization passes; but here, (i) prologues are generated at the very end of the pipeline, and (ii) if we ever do a post-prologue-gen optimization, we can treat unwind directives as black boxes with unknown side-effects, just as we do for some other pseudo-instructions today. It turns out that it was easier to just build this for both x64 and aarch64 (since they share a factored-out ABI implementation), and wire up the platform-specific unwind-info generation for Windows and SystemV. Now we have simpler unwind on all platforms and we can delete the old unwind infra as soon as we remove the old backend. There were a few consequences to supporting Fastcall unwind in particular that led to a refactor of the common ABI. Windows only supports naming clobbered-register save locations within 240 bytes of the frame-pointer register, whatever one chooses that to be (RSP or RBP). We had previously saved clobbers below the fixed frame (and below nominal-SP). The 240-byte range has to include the old RBP too, so we're forced to place clobbers at the top of the frame, just below saved RBP/RIP. This is fine; we always keep a frame pointer anyway because we use it to refer to stack args. It does mean that offsets of fixed-frame slots (spillslots, stackslots) from RBP are no longer known before we do regalloc, so if we ever want to index these off of RBP rather than nominal-SP because we add support for `alloca` (dynamic frame growth), then we'll need a "nominal-BP" mode that is resolved after regalloc and clobber-save code is generated. I added a comment to this effect in `abi_impl.rs`. The above refactor touched both x64 and aarch64 because of shared code. This had a further effect in that the old aarch64 prologue generation subtracted from `sp` once to allocate space, then used stores to `[sp, offset]` to save clobbers. Unfortunately the offset only has 7-bit range, so if there are enough clobbered registers (and there can be -- aarch64 has 384 bytes of registers; at least one unit test hits this) the stores/loads will be out-of-range. I really don't want to synthesize large-offset sequences here; better to go back to the simpler pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like a "push". It's likely not much worse microarchitecturally (dependence chain on SP, but oh well) and it actually saves an instruction if there's no other frame to allocate. As a further advantage, it's much simpler to understand; simpler is usually better. This PR adds the new backend on Windows to CI as well.	2021-03-11 20:03:52 -08:00
Chris Fallin	e41d882144	Merge pull request #2678 from cfallin/x64-fastcall x86-64 Windows fastcall ABI support.	2021-03-05 10:46:47 -08:00
Dan Gohman	8854dec01d	Bump version to 0.24.0 I used a specially modified version of the publish script to avoid bumping the `witx` version.	2021-03-04 18:17:03 -08:00
Chris Fallin	6c94eb82aa	x86-64 Windows fastcall ABI support. This adds support for the "fastcall" ABI, which is the native C/C++ ABI on Windows platforms on x86-64. It is similar to but not exactly like System V; primarily, its argument register assignments are different, and it requires stack shadow space. Note that this also adjusts the handling of multi-register values in the shared ABI implementation, and with this change, adjusts handling of `i128`s on both Fastcall/x64 and SysV/x64 platforms. This was done to align with actual behavior by the "rustc ABI" on both platforms, as mapped out experimentally (Compiler Explorer link in comments). This behavior is gated under the `enable_llvm_abi_extensions` flag. Note also that this does not add x64 unwind info on Windows. That will come in a future PR (but is planned!).	2021-03-03 19:53:18 -08:00
Andrew Brown	508f8fa5a9	[x64] Add i64x2.abs This instruction has a single instruction lowering in AVX512F/VL and a three instruction lowering in AVX but neither is currently supported in the x64 backend. To implement this, we instead subtract the vector from 0 and use a blending instruction to pick the lanes containing the absolute value.	2021-03-02 12:30:02 -08:00
Andrew Brown	d730f18a78	[x64] Add the new i64x2 comparisons	2021-03-01 17:52:34 -08:00
Andrew Brown	44e76fe9c0	Update spec tests (#2690 ) * Update wasm-tools crates * Update Wasm SIMD spec tests * Invert 'experimental_x64_should_panic' logic By doing this, it is easier to see which spec tests currently panic. The new tests correspond to recently-added instructions. * Fix: ignore new spec tests for all backends	2021-03-01 16:39:20 -06:00
Chris Fallin	cdb60ec5a9	Merge pull request #2682 from cfallin/shift-bugs Fix some `i128` shift-related bugs in x64 backend.	2021-02-26 15:13:08 -08:00
Chris Fallin	6dcb31abb7	Fix 128-bit left shift: null out tmp3, not tmp2, on zero-shift case. Add a bunch of test vectors that actually expose this (previously the shift-by-zero test had equal lower and upper halves and hid the bug), including the most basic of all, 1 << 0 == 1 (thanks @bjorn3 for finding this).	2021-02-25 09:46:57 -08:00
Chris Fallin	40db4de44a	Fix incomplete trap metadata due to multiple traps at one address. If an instruction has more than one trap record associated with it (for example: a divide instruction that has participated in load-op fusion, so we have both a heap-out-of-bounds trap record due to its load and a divide-by-zero trap record due to its divide op), the current MachBuffer code would emit only one of the trap records to the sink. Separately, divide instructions probably shouldn't merge loads, because the two separate possible traps at one location might be confusing for some embedders (certainly in Lucet). Divide seems to be the only case in our current codegen where such merging might occur. This PR changes the lowering to always force the divisor into a register. Finally, while working out why trap records were not appearing, I had noticed that `isa::x64::emit_std_enc_mem()` was only emitting heap-OOB trap metadata for loads/stores when it had a srcloc. This PR ensures that the metadata is emitted even when the srcloc is empty. Note that none of the above presents a security or correctness problem; trap metadata only affects the status that we return to the embedder when a Wasm program terminates with a trap.	2021-02-24 15:13:45 -08:00
Chris Fallin	48d542d67c	Fix bad jumptable block ref when DCE removes a block. When a block is unreachable, the `unreachable_code` pass will remove it, which is perfectly sensible. Jump tables factor into unreachability in an expected way: even if a block is listed in a jump table, the block might be unreachable if the jump table itself is unused (or used in an unreachable block). Unfortunately, the verifier still expects all block refs in all jump tables to be valid, even after DCE, which will not always be the case. This makes a simple change to the pass: after removing blocks, it scans jump tables. Any jump table that refers to an unreachable block must itself be unused, and so we just clear its entries. We do not bother removing it (and renumbering all later jumptables), and we do not bother computing full unused-ness of all jumptables, as that would be more expensive; it's sufficient to clear out the ones that refer to unreachable blocks, which are a subset of all unused jumptables. Fixes #2670.	2021-02-23 15:01:01 -08:00
Chris Fallin	0f3e00b25e	Fix some `i128` shift-related bugs in x64 backend. This fixes #2672 and #2679, and also fixes an incorrect instruction emission (`test` with small immediate) that we had missed earlier. The shift-related fixes have to do with (i) shifts by 0 bits, as a special case that must be handled; and (ii) shifts by a 128-bit amount, which we can handle by just dropping the upper half (we only use 3--7 bits of shift amount). This adjusts the lowerings appropriately, and also adds run-tests to ensure that the lowerings actually execute correctly (previously we only had compile-tests with golden lowerings; I'd like to correct this for more ops eventually, adding run-tests beyond what the Wasm spec and frontend covers).	2021-02-23 14:22:04 -08:00
Alex Crichton	98d3e6823f	Update wasmparser/wat dependencies (#2675 ) * Update wasmparser/wat dependencies Bring in new opcodes and new instructions for SIMD * Update module linking syntax	2021-02-22 11:56:34 -06:00
Chris Fallin	c07ec4c525	Merge pull request #2653 from bjorn3/more_atomic_ops More atomic ops	2021-02-18 08:34:58 -08:00
bjorn3	ff22842da5	More atomic ops	2021-02-18 14:16:15 +01:00
bjorn3	0693b7dade	Include git rev in the version number	2021-02-18 13:01:01 +01:00
bjorn3	720da20588	Describe serialization format	2021-02-18 11:27:51 +01:00
bjorn3	a0c2276ee7	Add a version marker This prevents deserializing a function with a different Cranelift version	2021-02-18 11:27:51 +01:00
bjorn3	2fc964ea35	Add serde serialization support for the full clif ir	2021-02-18 11:27:02 +01:00
Kasey Carrothers	9c3edee9d0	Add methods to construct RexFlags from OperandSizes. This unifies the logic around Rex prefix emission and hopefully makes REX prefix errors less likely. There are still several instructions that use other sources to determine the flags, so set_w and clear_w are left as is. Additional cleanups: * Change always_emit_if_8bit_needed to take a Reg instead of a u8 for type safety. * Deduplicated emission code in MovRM.	2021-02-17 18:48:05 -08:00
Dan Gohman	8d90ea0390	Bump version to 0.23.0 I used a specially modified version of the publish script to avoid bumping the `witx` version.	2021-02-17 15:35:43 -08:00
Chris Fallin	8cd64e3ec6	Fix warnings (causing CI failures) with new Rust beta. - Panic messages must now be string literals (we used `format!()` in many places; `panic!()` can take format strings directly). - Some dead enum options with EVEX encoding stuff in old x86 backend. This will go away soon and/or be moved to the new backend anyway, so let's silence the warning for now. - A few other misc warnings.	2021-02-16 14:10:05 -08:00
Alex Crichton	09b976e1d5	Fix a number of warnings on nightly Rust (#2652 ) This fixes some issues that are cropping up where some syntax will get phased out in 2021	2021-02-11 12:42:45 -06:00
bjorn3	602006ff9d	Fix build_value_labels_ranges for newBE when there are no labels	2021-02-04 11:46:20 +01:00
Kasey Carrothers	7bd96c8e2f	Refactor x64::Insts that use an is_64 bool to use OperandSize.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	3306408100	Refactor x64::Inst to use OperandSize instead of u8s. TODO: some types take a 'is_64_bit' bool. Those are left unchanged for now.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	b12d41bfe9	Expand x64 OperandSize to support 8 and 16-bit operands. This is in preparation for refactoring all x64::Inst arms to use OperandSize. Current uses of OperandSize fall into two categories: 1. XMM operations which require 32/64 bit operands 2. Immediates which only care about 64-bit or not. Adds assertions to existing Inst constructors to check that they are passed valid sizes. This change also removes the implicit widening of 1 and 2 byte values to 4 bytes. from_bytes() is only used by category 2, so removing this behavior will not change any visible behavior. Overall this change should be a no-op.	2021-02-03 10:40:11 -08:00
bjorn3	76d615049d	Make the stackslot offsets available for debuginfo	2021-02-03 17:48:52 +01:00
bjorn3	81b4e48f9f	Remove some uses of riscv in tests (#2600 ) * Remove some uses of riscv in tests * Fix typo * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Benjamin Bouvier <public@benj.me>	2021-01-30 23:54:48 +01:00
Benjamin Bouvier	13027ad670	cranelift x64: add instruction set checks for popcnt/tzcnt/lzcnt;	2021-01-30 13:38:55 +01:00
Benjamin Bouvier	2275519cb1	cranelift x64: use the POPCNT instruction for Popcount when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	6bf6612d96	cranelift x64: use the TZCNT instruction for Ctz when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	d3acd9a283	cranelift x64: use the LZCNT instruction for Clz when it's available;	2021-01-29 19:41:01 +01:00
Kasey Carrothers	99be82c866	Replace MachInst::gen_zero_len_nop with gen_nop(0)	2021-01-29 01:15:08 -08:00
Chris Fallin	ac60ad6c9a	Merge pull request #2614 from kaseyc/nop Avoid creating 0-sized nops in x64's gen_nop().	2021-01-28 21:37:39 -08:00
Kasey Carrothers	f76a9d436e	Clean up handling of NOPs in the x64 backend. 1. Restricts max nop size to 15 instead of 16. 2. Fixes an edge case where gen_nop() would return a zero sized intruction on multiples of 16. 3. Clarifies the documentation of the gen_nop interface to state that returning zero is allowed when preferred_size is zero.	2021-01-28 20:45:00 -08:00
Johnnie Birch	cbd7a6a80e	Add sse41 lowering for rounding x64	2021-01-28 17:37:17 -08:00
Alex Crichton	503129ad91	Add a method to share `Config` across machines (#2608 ) With `Module::{serialize,deserialize}` it should be possible to share wasmtime modules across machines or CPUs. Serialization, however, embeds a hash of all configuration values, including cranelift compilation settings. By default wasmtime's selection of the native ISA would enable ISA flags according to CPU features available on the host, but the same CPU features may not be available across two machines. This commit adds a `Config::cranelift_clear_cpu_flags` method which allows clearing the target-specific ISA flags that are automatically inferred by default for the native CPU. Options can then be incrementally built back up as-desired with teh `cranelift_other_flag` method.	2021-01-26 15:59:12 -06:00
Chris Fallin	f54d0d05c7	Address review comments.	2021-01-22 16:02:29 -08:00
Chris Fallin	7e12abce71	Fix a few comment typos and add a clarifying comment.	2021-01-21 16:01:46 -08:00
Chris Fallin	997fab55d5	Skip value-label analysis if no value labels are present.	2021-01-21 15:59:52 -08:00
Chris Fallin	c84d6be6f4	Detailed debug-info (DWARF) support in new backends (initially x64). This PR propagates "value labels" all the way from CLIF to DWARF metadata on the emitted machine code. The key idea is as follows: - Translate value-label metadata on the input into "value_label" pseudo-instructions when lowering into VCode. These pseudo-instructions take a register as input, denote a value label, and semantically are like a "move into value label" -- i.e., they update the current value (as seen by debugging tools) of the given local. These pseudo-instructions emit no machine code. - Perform a dataflow analysis at the machine-code level, tracking value-labels that propagate into registers and into [SP+constant] stack storage. This is a forward dataflow fixpoint analysis where each storage location can contain a set of value labels, and each value label can reside in a set of storage locations. (Meet function is pairwise intersection by storage location.) This analysis traces value labels symbolically through loads and stores and reg-to-reg moves, so it will naturally handle spills and reloads without knowing anything special about them. - When this analysis converges, we have, at each machine-code offset, a mapping from value labels to some number of storage locations; for each offset for each label, we choose the best location (prefer registers). Note that we can choose any location, as the symbolic dataflow analysis is sound and guarantees that the value at the value_label instruction propagates to all of the named locations. - Then we can convert this mapping into a format that the DWARF generation code (wasmtime's debug crate) can use. This PR also adds the new-backend variant to the gdb tests on CI.	2021-01-21 15:59:49 -08:00
Alex Crichton	4a351ab7fe	Update a number of dependencies (#2594 ) This commit goes through the dependencies that wasmtime has and updates versions where possible. This notably brings in a wasmparser/wast update which has some simd spec changes with new instructions. Otherwise most of these are just routine updates.	2021-01-21 15:49:13 -06:00
bjorn3	81d248c057	Implement Mach-O TLS access for x64 newBE	2021-01-21 18:25:56 +01:00
Anton Kirilov	043a8434d2	Cranelift AArch64: Improve the Popcnt implementation Now the backend uses the CNT instruction, which results into a major simplification. Copyright (c) 2021, Arm Limited.	2021-01-19 16:49:47 +00:00
Chris Fallin	c7de8f5efb	Merge pull request #2541 from cfallin/struct-arg-ret x64 and aarch64: allow StructArgument and StructReturn args.	2021-01-17 23:50:19 -08:00
Chris Fallin	456561f431	x64 and aarch64: allow StructArgument and StructReturn args. The StructReturn ABI is fairly simple at the codegen/isel level: we only need to take care to return the sret pointer as one of the return values if that wasn't specified in the initial function signature. Struct arguments are a little more complex. A struct argument is stored as a chunk of memory in the stack-args space. However, the CLIF semantics are slightly special: on the caller side, the parameter passed in is a pointer to an arbitrary memory block, and we must memcpy this data to the on-stack struct-argument; and on the callee side, we provide a pointer to the passed-in struct-argument as the CLIF block param value. This is necessary to support various ABIs other than Wasm, such as that of Rust (with the cg_clif codegen backend).	2021-01-17 23:11:45 -08:00
Chris Fallin	0f563f786a	Add ELF TLS support in new x64 backend. This follows the implementation in the legacy x86 backend, including hardcoded sequence that is compatible with what the linker expects. We could potentially do better here, but it is likely not necessary. Thanks to @bjorn3 for a bugfix to an earlier version of this.	2021-01-17 22:48:51 -08:00
Peter Huene	8640025d8b	Merge pull request #2585 from alexcrichton/module-linking-update Update support for the module linking proposal	2021-01-14 15:48:14 -08:00

... 3 4 5 6 7 ...

1397 Commits