wasmtime

Author	SHA1	Message	Date
Alex Crichton	503129ad91	Add a method to share `Config` across machines (#2608 ) With `Module::{serialize,deserialize}` it should be possible to share wasmtime modules across machines or CPUs. Serialization, however, embeds a hash of all configuration values, including cranelift compilation settings. By default wasmtime's selection of the native ISA would enable ISA flags according to CPU features available on the host, but the same CPU features may not be available across two machines. This commit adds a `Config::cranelift_clear_cpu_flags` method which allows clearing the target-specific ISA flags that are automatically inferred by default for the native CPU. Options can then be incrementally built back up as-desired with teh `cranelift_other_flag` method.	2021-01-26 15:59:12 -06:00
Chris Fallin	f54d0d05c7	Address review comments.	2021-01-22 16:02:29 -08:00
Chris Fallin	7e12abce71	Fix a few comment typos and add a clarifying comment.	2021-01-21 16:01:46 -08:00
Chris Fallin	997fab55d5	Skip value-label analysis if no value labels are present.	2021-01-21 15:59:52 -08:00
Chris Fallin	c84d6be6f4	Detailed debug-info (DWARF) support in new backends (initially x64). This PR propagates "value labels" all the way from CLIF to DWARF metadata on the emitted machine code. The key idea is as follows: - Translate value-label metadata on the input into "value_label" pseudo-instructions when lowering into VCode. These pseudo-instructions take a register as input, denote a value label, and semantically are like a "move into value label" -- i.e., they update the current value (as seen by debugging tools) of the given local. These pseudo-instructions emit no machine code. - Perform a dataflow analysis at the machine-code level, tracking value-labels that propagate into registers and into [SP+constant] stack storage. This is a forward dataflow fixpoint analysis where each storage location can contain a set of value labels, and each value label can reside in a set of storage locations. (Meet function is pairwise intersection by storage location.) This analysis traces value labels symbolically through loads and stores and reg-to-reg moves, so it will naturally handle spills and reloads without knowing anything special about them. - When this analysis converges, we have, at each machine-code offset, a mapping from value labels to some number of storage locations; for each offset for each label, we choose the best location (prefer registers). Note that we can choose any location, as the symbolic dataflow analysis is sound and guarantees that the value at the value_label instruction propagates to all of the named locations. - Then we can convert this mapping into a format that the DWARF generation code (wasmtime's debug crate) can use. This PR also adds the new-backend variant to the gdb tests on CI.	2021-01-21 15:59:49 -08:00
bjorn3	81d248c057	Implement Mach-O TLS access for x64 newBE	2021-01-21 18:25:56 +01:00
Anton Kirilov	043a8434d2	Cranelift AArch64: Improve the Popcnt implementation Now the backend uses the CNT instruction, which results into a major simplification. Copyright (c) 2021, Arm Limited.	2021-01-19 16:49:47 +00:00
Chris Fallin	c7de8f5efb	Merge pull request #2541 from cfallin/struct-arg-ret x64 and aarch64: allow StructArgument and StructReturn args.	2021-01-17 23:50:19 -08:00
Chris Fallin	456561f431	x64 and aarch64: allow StructArgument and StructReturn args. The StructReturn ABI is fairly simple at the codegen/isel level: we only need to take care to return the sret pointer as one of the return values if that wasn't specified in the initial function signature. Struct arguments are a little more complex. A struct argument is stored as a chunk of memory in the stack-args space. However, the CLIF semantics are slightly special: on the caller side, the parameter passed in is a pointer to an arbitrary memory block, and we must memcpy this data to the on-stack struct-argument; and on the callee side, we provide a pointer to the passed-in struct-argument as the CLIF block param value. This is necessary to support various ABIs other than Wasm, such as that of Rust (with the cg_clif codegen backend).	2021-01-17 23:11:45 -08:00
Chris Fallin	0f563f786a	Add ELF TLS support in new x64 backend. This follows the implementation in the legacy x86 backend, including hardcoded sequence that is compatible with what the linker expects. We could potentially do better here, but it is likely not necessary. Thanks to @bjorn3 for a bugfix to an earlier version of this.	2021-01-17 22:48:51 -08:00
Chris Fallin	71ead6e31d	x64 backend: implement 128-bit ops and misc fixes. This implements all of the ops on I128 that are implemented by the legacy x86 backend, and includes all that are required by at least one major use-case (cg_clif rustc backend). The sequences are open-coded where necessary; for e.g. the bit operations, this can be somewhat complex, but these sequences have been tested carefully. This PR also includes a drive-by fix of clz/ctz for 8- and 16-bit cases where they were incorrect previously. Also includes ridealong fixes developed while bringing up cg_clif support, because they are difficult to completely separate due to other refactors that occurred in this PR: - fix REX prefix logic for some 8-bit instructions. When using an 8-bit register in 64-bit mode on x86-64, the REX prefix semantics are somewhat subtle: without the REX prefix, register numbers 4--7 correspond to the second-to-lowest byte of the first four registers (AH, CH, BH, DH), whereas with the REX prefix, these register numbers correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always emit a REX byte for instructions with 8-bit cases (this is harmless even if unneeded), but this would unnecessarily inflate code size; instead, the usual approach is to emit it only for these registers. This logic was present in some cases but missing for some other instructions: divide, not, negate, shifts. Fixes #2508. - avoid unaligned SSE loads on some f64 ops. The implementations of several FP ops, such as fabs/fneg, used SSE instructions. This is not a problem per-se, except that load-op merging did not take alignment into account. Specifically, if an op on an f64 loaded from memory happened to merge that load, and the instruction into which it was merged was an SSE instruction, then the SSE instruction imposes stricter (128-bit) alignment requirements than the load.f64 did. This PR simply forces any instruction lowerings that could use SSE instructions to implement non-SIMD operations to take inputs in registers only, and avoid load-op merging. Fixes #2507. - two bugfixes exposed by cg_clif: urem/srem.i8, select.b1. - urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places the remainder in AH, not RDX, different from all the other width-forms of this instruction. - select.b1: we were not recognizing selects of boolean values as integer-typed operations, so we were generating XMM moves instead (!).	2021-01-14 13:45:50 -08:00
Johnnie Birch	d17815a239	Zero newly allocated registers whose immediate use depends on content not being NaN An intermittent failure during SIMD spectests is described in #2432. This patch corrects code written in a way that assumes comparing fp equality of a register with itself will always return true. This is not true when the register value is NaN as NaN. In this case, and with all ordered comparisons involving NaN, the comparisons will always return false. This patch corrects that assumption for SIMD Fabs and Fneg which seem to be the only instructions generating the failure with #2432.	2021-01-13 19:44:00 -08:00
Chris Fallin	4638de673c	x64 bugfix: prevent load-op fusion of cmp because it could be emitted multiple times. On x64, the new backend generates `cmp` instructions at their use-sites when possible (when the icmp that generates a boolean is known) so that the condition flows directly through flags rather than a materialized boolean. E.g., both `bint` (boolean to int) and `select` (conditional select) instruction lowerings invoke `emit_cmp()` to do so. Load-op fusion in `emit_cmp()` nominally allowed `cmp` to use its `cmp reg, mem` form. However, the mergeable-load condition (load has only single use) was not adequately checked. Consider the sequence: ``` v2 = load.i64 v1 v3 = icmp eq v0, v2 v4 = bint.i64 v3 v5 = select.i64 v3, v0, v1 ``` The load `v2` is only used in the `icmp` at `v3`. However, the cmp will be separately codegen'd twice, once for the `bint` and once for the `select`. Prior to this fix, the above example would result in the load at `v2` sinking to the `cmp` just above the `select`; we then emit another `cmp` for the `bint`, but the load has already been used once so we do not allow merging. We thus (i) expect the register for `v2` to contain the loaded value, but (ii) skip the codegen for the load because it has been sunk. This results in a regalloc error (unexpected livein) as the unfilled register is upward-exposed to the entry point. Because of this, we need to accept only the reg, reg form in `emit_cmp()` (and the FP equivalent). We could get marginally better code by tracking whether the `cmp` we are emitting comes from an `icmp`/`fcmp` with only one use; but IMHO simplicity is a better rule here when subtle interactions occur.	2021-01-13 09:48:51 -08:00
Chris Fallin	b4426be072	machinst lowering: update inst color when scanning across branch to allow more load-op merging. A branch is considered side-effecting and so updates the instruction color (which is our way of computing how far instructions can sink). However, in the lowering loop, we did not update current instruction color when scanning backward across branches, which are side-effecting. As a result, the color was stale and fewer load-op merges were permitted than are actually possible. Note that this would not have resulted in any correctness issues, as the stale color is too high (so no merges are permitted that should have been disallowed). Fixes #2562.	2021-01-11 11:20:44 -08:00
Andrew Brown	b25a3c387e	fix: `dst` should be `Writable`, not `ValueRegs`	2021-01-08 16:49:28 -08:00
Andrew Brown	09a5b91b9d	x64: make several structures debuggable	2021-01-08 16:21:57 -08:00
Andrew Brown	bb2dd5b68b	[machinst x64]: implement load*_zero for x64	2021-01-08 16:21:57 -08:00
Chris Fallin	81bc811236	Merge pull request #2558 from cfallin/pic-symbol-refs x64: support PC-rel symbol references using the GOT when in PIC mode.	2021-01-08 10:03:10 -08:00
Chris Fallin	3ee898cb2c	x64: support PC-rel symbol references using the GOT when in PIC mode.	2021-01-07 22:46:56 -08:00
Chris Fallin	6eea015d6c	Multi-register value support: framework for Values wider than machine regs. This will allow for support for `I128` values everywhere, and `I64` values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the machine backends to build such support; it just adds the framework for the MachInst backends to reason about a `Value` residing in more than one register.	2021-01-05 17:45:02 -08:00
Nick Fitzgerald	6317290a1d	Merge pull request #2548 from cfallin/fix-aarch64-sp aarch64: fix reg/imm `sub` insts that read `SP`, not the zero register.	2021-01-05 16:38:25 -08:00
Chris Fallin	aac3751025	aarch64: fix reg/imm `sub` insts that read `SP`, not the zero register. On AArch64, the zero register (xzr) and the stack pointer (xsp) are alternately named by the same index `31` in machine code depending on context. In particular, in the reg-reg-immediate ALU instruction form, add/subtract will use the stack pointer, not the zero register, if index 31 is given for the first (register) source arg. In a few places, we were emitting subtract instructions with the zero register as an argument and a reg/immediate as the second argument. When an immediate could be incorporated directly (we have the `iconst` definition visible), this would result in incorrect code being generated. This issue was found in `ineg` and in the sequence for vector right-shifts. Reported by Ian Cullinan; thanks!	2021-01-05 15:48:07 -08:00
Chris Fallin	dbd2241b60	x64: handle tests of b1 values correctly (only LSB is defined). Previously, `select` and `brz`/`brnz` instructions, when given a `b1` boolean argument, would test whether that boolean argument was nonzero, rather than whether its LSB was nonzero. Since our invariant for mapping CLIF state to machine state is that bits beyond the width of a value are undefined, the proper lowering is to test only the LSB. (aarch64 does not have the same issue because its `Extend` pseudoinst already properly handles masking of b1 values when a zero-extend is requested, as it is for select/brz/brnz.) Found by Nathan Ringo on Zulip [1] (thanks!). [1] https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/bnot.20on.20b1s	2021-01-05 14:45:46 -08:00
Yury Delendik	2964023a77	[SIMD][x86_64] Add encoding for PMADDWD (#2530 ) * [SIMD][x86_64] Add encoding for PMADDWD * also for "experimental_x64"	2020-12-24 07:52:50 -06:00
Johnnie Birch	f705a72aeb	Refactor packed moves to use xmm_mov instead of xmm_rm_r Refactors previous packed move implementation to use xmm_mov instead of xmm_rm_r which looks to simplify register accounting during lowering.	2020-12-16 17:13:27 -08:00
Johnnie Birch	51973aefbb	Implements x64 SIMD loads for the new backend.	2020-12-16 17:13:27 -08:00
Chris Fallin	e91987c43c	Allow both x86 backends to be included, selected with a "variant" flag. (#2514 ) This PR adds a new `isa::lookup_variant()` that takes a `BackendVariant` (`Legacy`, `MachInst` or `Any`), and exposes both x86 backends as separate variants if both are compiled into the build. This will allow some new use-cases that require both backends in the same process: for example, differential fuzzing between old and new backends, or perhaps allowing for dynamic feature-flag selection between the backends.	2020-12-16 09:56:04 -06:00
Chris Fallin	743529b4eb	Merge pull request #2492 from uweigand/endian-memory-v5 Support explicit endianness in Cranelift IR MemFlags	2020-12-14 13:59:08 -08:00
Ulrich Weigand	467a1af83a	Support explicit endianness in Cranelift IR MemFlags WebAssembly memory operations are by definition little-endian even on big-endian target platforms. However, other memory accesses will require native target endianness (e.g. to access parts of the VMContext that is also accessed by VM native code). This means on big-endian targets, the code generator will have to handle both little- and big-endian memory accesses. However, there is currently no way to encode that distinction into the Cranelift IR that describes memory accesses. This patch provides such a way by adding an (optional) explicit endianness marker to an instance of MemFlags. Since each Cranelift IR instruction that describes memory accesses already has an instance of MemFlags attached, this can now be used to provide endianness information. Note that by default, memory accesses will continue to use the native target ISA endianness. To override this to specify an explicit endianness, a MemFlags value that was built using the set_endianness routine must be used. This patch does so for accesses that implement WebAssembly memory operations. This patch addresses issue #2124.	2020-12-14 20:15:37 +01:00
Chris Fallin	267d4a8bdb	Merge pull request #2490 from cfallin/fix-popcnt-load-width x64 lowering fix: i32.popcnt should not merge load and make it 64-bit.	2020-12-08 22:28:41 -08:00
Y-Nak	855a6374dd	Fix missing modification of jump table in licm	2020-12-09 11:13:33 +09:00
Chris Fallin	6632c45c01	x64 lowering fix: i32.popcnt should not merge load and make it 64-bit. As a subtle consequence of the recent load-op fusion, popcnt of a value that came from a load.i32 was compiling into a 64-bit load. This is a result of the way in which x86 infers the width of loads: it is a consequence of the instruction containing the memory reference, not the memory reference itself. So the `input_to_reg_mem()` helper (convert an instruction input into a register or memory reference) was providing the appropriate memory reference for the result of a load.i32, but never encoded the assumption that it would only be used in a 32-bit instruction. It turns out that popcnt.i32 uses a 64-bit instruction to load this RM op, hence widening a 32-bit to 64-bit load (which is problematic when the offset is (memory_length - 4)). Separately, popcnt was using the RM operand twice, resulting in two loads if we merged a load. This isn't a correctness bug in practice because only a racy sequence (store interleaving between the loads) would produce incorrect results, but we decided earlier to treat loads as effectful for now, neither reordering nor duplicating them, to deliberately reduce complexity. Because of the second issue, the fix is just to force the operand into a register always, so any source load will not be merged. Discovered via fuzzing with oss-fuzz.	2020-12-08 12:24:34 -08:00
Chris Fallin	2cec20aa57	Merge pull request #2486 from cfallin/fix-probestack Two Lucet-related fixes to stack overflow handling.	2020-12-07 16:47:37 -08:00
Chris Fallin	3a01d14712	Two Lucet-related fixes to stack overflow handling. Lucet uses stack probes rather than explicit stack limit checks as Wasmtime does. In bytecodealliance/lucet#616, I have discovered that I previously was not running some Lucet runtime tests with the new backend, so was missing some test failures due to missing pieces in the new backend. This PR adds (i) calls to probestack, when enabled, in the prologue of every function with a stack frame larger than one page (configurable via flags); and (ii) trap metadata for every instruction on x86-64 that can access the stack, hence be the first point at which a stack overflow is detected when the stack pointer is decremented.	2020-12-07 16:08:53 -08:00
Chris Fallin	1dddba649a	x64 regalloc register order: put caller-saves (volatiles) first. The x64 backend currently builds the `RealRegUniverse` in a way that is generating somewhat suboptimal code. In many blocks, we see uses of callee-save (non-volatile) registers (r12, r13, r14, rbx) first, even in very short leaf functions where there are plenty of volatiles to use. This is leading to unnecessary spills/reloads. On one (local) test program, a medium-sized C benchmark compiled to Wasm and run on Wasmtime, I am seeing a ~10% performance improvement with this change; it will be less pronounced in programs with high register pressure (there we are likely to use all registers regardless, so the prologue/epilogue will save/restore all callee-saves), or in programs with fewer calls, but this is a clear win for small functions and in many cases removes prologue/epilogue clobber-saves altogether. Separately, I think the RA's coalescing is tripping up a bit in some cases; see e.g. the filetest touched by this commit that loads a value into %rsi then moves to %rax and returns immediately. This is an orthogonal issue, though, and should be addressed (if worthwhile) in regalloc.rs.	2020-12-06 22:37:43 -08:00
Nick Fitzgerald	502b39606f	Merge pull request #2364 from fitzgen/peepmatic-rebuild-test peepmatic: Make the test-we-can-get-and-rebuild peephole optimizers test work on arm64	2020-12-04 10:32:39 -08:00
Julian Seward	8f34d2dc59	aarch64 isel: collect_address_addends: correctly handle `ExtendOp::UXTW(negative immediate)`. The current code doesn't correctly handle the case where `ExtendOp::UXTW` has as source, a constant-producing insn that produces a negative (32-bit) value. Then the value is incorrectly sign-extended to 64 bits (in fact, this has already been done by `ctx.get_constant(insn)`), whereas it needs to be zero extended. The obvious fix, done here, is just to force bits 63:32 of the extension to zero, hence zero-extending it.	2020-12-04 19:21:40 +01:00
Chris Fallin	bfd10512c1	Merge pull request #2473 from cfallin/fix-lowering Fix lowering instruction-sinking (load-merging) bug.	2020-12-03 15:56:50 -08:00
Nick Fitzgerald	76d3fb61c3	Merge pull request #2293 from fitzgen/souper-no-assign-constant souper-harvest: Do not generate assignments of constants	2020-12-03 15:20:41 -08:00
Chris Fallin	3e516e784b	Fix lowering instruction-sinking (load-merging) bug. This fixes a subtle corner case exposed during fuzzing. If we have a bit of CLIF like: ``` v0 = load.i64 ... v1 = iadd.i64 v0, ... v2 = do_other_thing v1 v3 = load.i64 v1 ``` and if this is lowered using a machine backend that can merge loads into ALU ops, and that has an addressing mode that can look through add ops, then the following can happen: 1. We lower the load at `v3`. This looks backward at the address operand tree and finds that `v1` is `v0` plus other things; it has an addressing mode that can add `v0`'s register and the other things directly; so it calls `put_value_in_reg(v0)` and uses its register in the amode. At this point, the add producing `v1` has no references, so it will not (yet) be codegen'd. 2. We lower `do_other_thing`, which puts `v1` in a register and uses it. the `iadd` now has a reference. 3. We reach the `iadd` and, because it has a reference, lower it. Our machine has the ability to merge a load into an ALU operation. Crucially, we think the load at `v0` is mergeable because it has only one user, the add at `v1` (!). So we merge it. 4. We reach the `load` at `v0` and because it has been merged into the `iadd`, we do not separately codegen it. The register that holds `v0` is thus never written, and the use of this register by the final load (Step 1) will see an undefined value. The logic error here is that in the presence of pattern matching that looks through pure ops, we can end up with multiple uses of a value that originally had a single use (because we allow lookthrough of pure ops in all cases). In other words, the multiple-use-ness of `v1` "passes through" in some sense to `v0`. However, the load sinking logic is not aware of this. The fix, I think, is pretty simple: we disallow an effectful instruction from sinking/merging if it already has some other use when we look back at it. If we disallowed lookthrough of any op that had multiple uses, even pure ones, then we would avoid this scenario; but earlier experiments showed that to have a non-negligible performance impact, so (given that we've worked out the logic above) I think this complexity is worth it.	2020-12-03 14:59:12 -08:00
Nick Fitzgerald	1efdf10ca7	souper-harvest: ensure that `select` conditions are of type `i1`	2020-12-03 14:43:28 -08:00
Nick Fitzgerald	f7cf771ee6	souper-harvest: Do not generate assignments of constants It turns out that Souper does not allow a constant to be assigned to a variable, they may only be used as operands. The 2.0.0 version of the `souper-ir` crate correctly reflects this. In the `cranelift_codegen::souper_harvest` module, we need to modify our Souper IR harvester so that it delays converting `iconst` and `bconst` into Souper IR until their values are used as operands. Finally, some unit tests in the `peepmatic-souper` crate need some small updates as well.	2020-12-03 14:43:14 -08:00
Pat Hickey	0f1dc9a735	Merge pull request #2403 from bjorn3/simplejit_hot_swapping SimpleJIT hot code swapping	2020-12-03 13:36:32 -08:00
Anton Kirilov	f59b274d22	Cranelift AArch64: Further vector constant improvements Introduce support for MOVI/MVNI with 16-, 32-, and 64-bit elements, and the vector variant of FMOV. Copyright (c) 2020, Arm Limited.	2020-12-03 15:30:24 +00:00
Johnnie Birch	a548516f97	Enable SIMD spec tests for f32x4_rounding and f64x4_rounding. Also address some review comments pointing out minor issues.	2020-12-02 13:44:51 -08:00
Johnnie Birch	a33e755cb2	Adds x86 SIMD support for Ceil, Floor, Trunc, and Nearest	2020-12-02 13:44:51 -08:00
Chris Fallin	dcbc4768c9	Merge pull request #2463 from cfallin/fix-heap-bounds-check-x64 x64 backend: fix condition-code used for part of explicit heap check.	2020-12-02 11:24:54 -08:00
Chris Fallin	60d7f7de0a	Debug info: two fixes in x64 backend. - Sort by generated-code offset to maintain invariant and avoid gimli panic. - Fix srcloc interaction with branch peephole optimization in MachBuffer: if a srcloc range overlaps with a branch that is truncated, remove that srcloc range. These issues were found while fuzzing the new backend (#2453); I suspect that they arise with the new backend because we can sink instructions (e.g. loads or extends) in more interesting ways than before, but I'm not entirely sure. Test coverage will be via the fuzz corpus once #2453 lands.	2020-12-02 10:41:14 -08:00
Chris Fallin	c9a81f008d	x64 backend: fix condition-code used for part of explicit heap check. A dynamic heap address computation may create up to two conditional branches: the usual bounds-check, but also (in some cases) an offset-addition overflow check. The x64 backend had reversed the condition code for this check, resulting in an always-trapping execution for a valid offset. I'm somewhat surprised this has existed so long, but I suppose the particular conditions (large offset, small offset guard, dynamic heap) have been somewhat rare in our testing so far. Found via fuzzing in #2453.	2020-12-02 10:40:53 -08:00
Chris Fallin	4bf2c15014	Merge pull request #2440 from jlb6740/remaining_simd_conversions Adds support for i32x4.trunc_sat_f32x4_u	2020-11-30 22:53:59 -08:00

... 17 18 19 20 21 ...

1724 Commits