wasmtime

Author	SHA1	Message	Date
Chris Fallin	1dddba649a	x64 regalloc register order: put caller-saves (volatiles) first. The x64 backend currently builds the `RealRegUniverse` in a way that is generating somewhat suboptimal code. In many blocks, we see uses of callee-save (non-volatile) registers (r12, r13, r14, rbx) first, even in very short leaf functions where there are plenty of volatiles to use. This is leading to unnecessary spills/reloads. On one (local) test program, a medium-sized C benchmark compiled to Wasm and run on Wasmtime, I am seeing a ~10% performance improvement with this change; it will be less pronounced in programs with high register pressure (there we are likely to use all registers regardless, so the prologue/epilogue will save/restore all callee-saves), or in programs with fewer calls, but this is a clear win for small functions and in many cases removes prologue/epilogue clobber-saves altogether. Separately, I think the RA's coalescing is tripping up a bit in some cases; see e.g. the filetest touched by this commit that loads a value into %rsi then moves to %rax and returns immediately. This is an orthogonal issue, though, and should be addressed (if worthwhile) in regalloc.rs.	2020-12-06 22:37:43 -08:00
bjorn3	411ec3a857	Rename SimpleJIT to JIT as it isn't simple anymore	2020-12-04 13:21:13 -08:00
Nick Fitzgerald	502b39606f	Merge pull request #2364 from fitzgen/peepmatic-rebuild-test peepmatic: Make the test-we-can-get-and-rebuild peephole optimizers test work on arm64	2020-12-04 10:32:39 -08:00
Julian Seward	8f34d2dc59	aarch64 isel: collect_address_addends: correctly handle `ExtendOp::UXTW(negative immediate)`. The current code doesn't correctly handle the case where `ExtendOp::UXTW` has as source, a constant-producing insn that produces a negative (32-bit) value. Then the value is incorrectly sign-extended to 64 bits (in fact, this has already been done by `ctx.get_constant(insn)`), whereas it needs to be zero extended. The obvious fix, done here, is just to force bits 63:32 of the extension to zero, hence zero-extending it.	2020-12-04 19:21:40 +01:00
Chris Fallin	bfd10512c1	Merge pull request #2473 from cfallin/fix-lowering Fix lowering instruction-sinking (load-merging) bug.	2020-12-03 15:56:50 -08:00
Nick Fitzgerald	76d3fb61c3	Merge pull request #2293 from fitzgen/souper-no-assign-constant souper-harvest: Do not generate assignments of constants	2020-12-03 15:20:41 -08:00
Chris Fallin	3e516e784b	Fix lowering instruction-sinking (load-merging) bug. This fixes a subtle corner case exposed during fuzzing. If we have a bit of CLIF like: ``` v0 = load.i64 ... v1 = iadd.i64 v0, ... v2 = do_other_thing v1 v3 = load.i64 v1 ``` and if this is lowered using a machine backend that can merge loads into ALU ops, and that has an addressing mode that can look through add ops, then the following can happen: 1. We lower the load at `v3`. This looks backward at the address operand tree and finds that `v1` is `v0` plus other things; it has an addressing mode that can add `v0`'s register and the other things directly; so it calls `put_value_in_reg(v0)` and uses its register in the amode. At this point, the add producing `v1` has no references, so it will not (yet) be codegen'd. 2. We lower `do_other_thing`, which puts `v1` in a register and uses it. the `iadd` now has a reference. 3. We reach the `iadd` and, because it has a reference, lower it. Our machine has the ability to merge a load into an ALU operation. Crucially, we think the load at `v0` is mergeable because it has only one user, the add at `v1` (!). So we merge it. 4. We reach the `load` at `v0` and because it has been merged into the `iadd`, we do not separately codegen it. The register that holds `v0` is thus never written, and the use of this register by the final load (Step 1) will see an undefined value. The logic error here is that in the presence of pattern matching that looks through pure ops, we can end up with multiple uses of a value that originally had a single use (because we allow lookthrough of pure ops in all cases). In other words, the multiple-use-ness of `v1` "passes through" in some sense to `v0`. However, the load sinking logic is not aware of this. The fix, I think, is pretty simple: we disallow an effectful instruction from sinking/merging if it already has some other use when we look back at it. If we disallowed lookthrough of any op that had multiple uses, even pure ones, then we would avoid this scenario; but earlier experiments showed that to have a non-negligible performance impact, so (given that we've worked out the logic above) I think this complexity is worth it.	2020-12-03 14:59:12 -08:00
Nick Fitzgerald	1efdf10ca7	souper-harvest: ensure that `select` conditions are of type `i1`	2020-12-03 14:43:28 -08:00
Nick Fitzgerald	f7cf771ee6	souper-harvest: Do not generate assignments of constants It turns out that Souper does not allow a constant to be assigned to a variable, they may only be used as operands. The 2.0.0 version of the `souper-ir` crate correctly reflects this. In the `cranelift_codegen::souper_harvest` module, we need to modify our Souper IR harvester so that it delays converting `iconst` and `bconst` into Souper IR until their values are used as operands. Finally, some unit tests in the `peepmatic-souper` crate need some small updates as well.	2020-12-03 14:43:14 -08:00
Pat Hickey	0f1dc9a735	Merge pull request #2403 from bjorn3/simplejit_hot_swapping SimpleJIT hot code swapping	2020-12-03 13:36:32 -08:00
bjorn3	937a3fde40	Fix simplejit tests for x64 by disabling is_pic	2020-12-03 19:24:59 +01:00
bjorn3	69d041faf1	Restore support for non-pic code in SimpleJIT	2020-12-03 18:49:18 +01:00
Chris Fallin	8e0e44b802	Merge pull request #2413 from akirilov-arm/fmov_vector Cranelift AArch64: Further vector constant improvements	2020-12-03 09:43:24 -08:00
Alex Crichton	f003388ec7	Implement imported/exported modules/instances (#2461 ) * Implement imported/exported modules/instances This commit implements the final piece of the module linking proposal which is to flesh out the support for importing/exporting instances and modules. This ended up having a few changes: * Two more `PrimaryMap` instances are now stored in an `Instance`. The value for instances is `InstanceHandle` (pretty easy) and for modules it's `Box<dyn Any>` (less easy). * The custom host state for `InstanceHandle` for `wasmtime` is now `Arc<TypeTables` to be able to fully reconstruct an instance's types just from its instance. * Type matching for imports now has been updated to take instances/modules into account. One of the main downsides of this implementation is that type matching of imports is duplicated between wasmparser and wasmtime, leading to posssible bugs especially in the subtelties of module linking. I'm not sure how best to unify these two pieces of validation, however, and it may be more trouble than it's worth. cc #2094 * Update wat/wast/wasmparser * Review comments * Fix a bug in publish script to vendor the right witx Currently there's two witx binaries in our repository given the two wasi spec submodules, so this updates the publication script to vendor the right one.	2020-12-03 10:15:42 -06:00
Anton Kirilov	f59b274d22	Cranelift AArch64: Further vector constant improvements Introduce support for MOVI/MVNI with 16-, 32-, and 64-bit elements, and the vector variant of FMOV. Copyright (c) 2020, Arm Limited.	2020-12-03 15:30:24 +00:00
Alex Crichton	9ac7d01288	Implement the module linking alias section (#2451 ) This commit is intended to do almost everything necessary for processing the alias section of module linking. Most of this is internal refactoring, the highlights being: * Type contents are now stored separately from a `wasmtime_env::Module`. Given that modules can freely alias types and have them used all over the place, it seemed best to have one canonical location to type storage which everywhere else points to (with indices). A new `TypeTables` structure is produced during compilation which is shared amongst all member modules in a wasm blob. * Instantiation is heavily refactored to account for module linking. The main gotcha here is that imports are now listed as "initializers". We have a sort of pseudo-bytecode-interpreter which interprets the initialization of a module. This is more complicated than just matching imports at this point because in the module linking proposal the module, alias, import, and instance sections may all be interleaved. This means that imports aren't guaranteed to show up at the beginning of the address space for modules/instances. Otherwise most of the changes here largely fell out from these two design points. Aliases are recorded as initializers in this scheme. Copying around type information and/or just knowing type information during compilation is also pretty easy since everything is just a pointer into a `TypeTables` and we don't have to actually copy any types themselves. Lots of various refactorings were necessary to accomodate these changes. Tests are hoped to cover a breadth of functionality here, but not necessarily a depth. There's still one more piece of the module linking proposal missing which is exporting instances/modules, which will come in a future PR. It's also worth nothing that there's one large TODO which isn't implemented in this change that I plan on opening an issue for. With module linking when a set of modules comes back from compilation each modules has all the trampolines for the entire set of modules. This is quite a lot of duplicate trampolines across module-linking modules. We'll want to refactor this at some point to instead have only one set of trampolines per set of module linking modules and have them shared from there. I figured it was best to separate out this change, however, since it's purely related to resource usage, and doesn't impact non-module-linking modules at all. cc #2094	2020-12-02 17:24:06 -06:00
Johnnie Birch	a548516f97	Enable SIMD spec tests for f32x4_rounding and f64x4_rounding. Also address some review comments pointing out minor issues.	2020-12-02 13:44:51 -08:00
Johnnie Birch	a33e755cb2	Adds x86 SIMD support for Ceil, Floor, Trunc, and Nearest	2020-12-02 13:44:51 -08:00
Chris Fallin	dcbc4768c9	Merge pull request #2463 from cfallin/fix-heap-bounds-check-x64 x64 backend: fix condition-code used for part of explicit heap check.	2020-12-02 11:24:54 -08:00
Chris Fallin	60d7f7de0a	Debug info: two fixes in x64 backend. - Sort by generated-code offset to maintain invariant and avoid gimli panic. - Fix srcloc interaction with branch peephole optimization in MachBuffer: if a srcloc range overlaps with a branch that is truncated, remove that srcloc range. These issues were found while fuzzing the new backend (#2453); I suspect that they arise with the new backend because we can sink instructions (e.g. loads or extends) in more interesting ways than before, but I'm not entirely sure. Test coverage will be via the fuzz corpus once #2453 lands.	2020-12-02 10:41:14 -08:00
Chris Fallin	c9a81f008d	x64 backend: fix condition-code used for part of explicit heap check. A dynamic heap address computation may create up to two conditional branches: the usual bounds-check, but also (in some cases) an offset-addition overflow check. The x64 backend had reversed the condition code for this check, resulting in an always-trapping execution for a valid offset. I'm somewhat surprised this has existed so long, but I suppose the particular conditions (large offset, small offset guard, dynamic heap) have been somewhat rare in our testing so far. Found via fuzzing in #2453.	2020-12-02 10:40:53 -08:00
Alex Crichton	88a8a8993a	Instantiate nested modules for module linking (#2447 ) This commit implements the interpretation necessary of the instance section of the module linking proposal. Instantiating a module which itself has nested instantiated instances will now instantiate the nested instances properly. This isn't all that useful without the ability to alias exports off the result, but we can at least observe the side effects of instantiation through the `start` function. cc #2094	2020-12-01 14:01:31 -06:00
Nick Fitzgerald	40ad39fee9	Merge pull request #2460 from abrown/fix-function-store interpreter: fix confusion caused by overloading of FuncRef	2020-12-01 09:36:21 -08:00
Chris Fallin	4bf2c15014	Merge pull request #2440 from jlb6740/remaining_simd_conversions Adds support for i32x4.trunc_sat_f32x4_u	2020-11-30 22:53:59 -08:00
Andrew Brown	87b1a85cc6	Fix confusion caused by overloading of FuncRef Prior to this change, the interpreter would use an incorrect `FuncRef` for accessing functions from the function store. This is now clarified and fixed by a new type--`FuncIndex`.	2020-11-30 17:28:30 -08:00
Andrew Brown	26509cb080	Optimize access to interpreter frame slots Previously, getting or setting a value in a frame of the Cranelift interpreter involved a hash table lookup. Since the interpreter statically knows the number of slots necessary for each called frame, we can use a vector instead and save time on the hash lookup. This also has the advantage that we have a more stable ABI for switching between interpreted and code.	2020-11-30 15:41:28 -08:00
Johnnie Birch	09f3d4e331	Refactor convert from float to unsigned int and add comments	2020-11-29 00:04:24 -08:00
Chris Fallin	d413b907b4	Merge pull request #2414 from jgouly/extend-refactor arm64: Refactor Inst::Extend handling	2020-11-25 17:22:07 -08:00
Chris Fallin	34d9931ed8	Fix Wasm translator bug: end of toplevel frame is branched-to only for fallthrough returns. This makes the value of `state.reachable()` inaccurate when observing at the tail of functions (in the post-function hook) after an ordinary return instruction.	2020-11-25 10:55:38 -08:00
Nick Fitzgerald	93c199363f	Merge pull request #2449 from bytecodealliance/cfallin/add-pre-host-hooks Add FuncEnvironment hooks to generate prologue and epilogue code.	2020-11-24 17:48:29 -08:00
Chris Fallin	4300c2c075	Add FuncEnvironment hooks to generate prologue and epilogue code. In some cases, it is useful to do some work at entry to or exit from a Cranelift function translated from WebAssembly. This PR adds two optional methods to the `FuncEnvironment` trait to do just this, analogous to the pre/post-hooks on operators that already exist. This PR also includes a drive-by compilation fix due to the latest nightly wherein `.is_empty()` on a `Range` ambiguously refers to either the `Range` impl or the `ExactSizeIterator` impl and can't resolve.	2020-11-24 16:36:15 -08:00
Alex Crichton	62be6841e4	Propagate optional import names to the wasmtime/C API With the module linking proposal the field name on imports is now optional, and only the module is required to be specified. This commit propagates this API change to the boundary of wasmtime's API, ensuring consumers are aware of what's optional with module linking and what isn't. Note that it's expected that all existing users will either update accordingly or unwrap the result since module linking is presumably disabled.	2020-11-23 15:26:26 -08:00
Johnnie Birch	ade9f12c72	Add support for X86_64 SIMD narrow instructions for vcode backend Adds lowering support for: i8x16.narrow_i16x8_s i8x16.narrow_i16x8_u i16x8.narrow_i32x4_s i16x8.narrow_i32x4_u	2020-11-23 09:58:39 -08:00
Nick Fitzgerald	1dd20b4371	Merge pull request #2400 from MattX/improve-finalize-msg Specify unsealed / unfilled blocks	2020-11-23 08:45:08 -08:00
Johnnie Birch	2cc501427e	Add remaining X86_64 support for pack w/ signed/unsigned saturation Adds lowering for packssdw, packusdw, packuswb	2020-11-22 23:14:29 -08:00
Johnnie Birch	258013cff1	Add support for SWidenHigh and UWidenHigh X86_64 for vcode backend Support is based on SSE4.1	2020-11-22 22:14:19 -08:00
Johnnie Birch	124096735b	Add support for palignr for X86_64 vcode backend	2020-11-22 22:14:02 -08:00
Johnnie Birch	f9937575d6	Add support for SwidenLow and UwidenLow for the X86_64 vcode backend Adds support using lowerings compatible with SSE4.1	2020-11-22 21:38:53 -08:00
Johnnie Birch	615a575da1	Add support for x86_64 packed move lowering for the vcode backend	2020-11-22 20:23:00 -08:00
Johnnie Birch	b6d783a120	Adds support for i32x4.trunc_sat_f32x4_u	2020-11-22 12:00:54 -08:00
Matt	27f3307f24	Replace if + panic! with assert! Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2020-11-21 00:03:41 -05:00
Yury Delendik	e34b410381	Update wasmparser for exception handling (#2431 )	2020-11-19 14:08:10 -06:00
Alex Crichton	4d64c68b05	Run rustfmt 1.48 Run rustfmt over wasmtime with the new stable release which looks like it wants to reformat a few lines.	2020-11-19 11:12:30 -08:00
Chris Fallin	073c727a74	x64 and aarch64: carry MemFlags on loads/stores; don't emit trap info unless an op can trap. This end result was previously enacted by carrying a `SourceLoc` on every load/store, which was somewhat cumbersome, and only indirectly encoded metadata about a memory reference (can it trap) by its presence or absence. We have a type for this -- `MemFlags` -- that tells us everything we might want to know about a load or store, and we should plumb it through to code emission instead. This PR attaches a `MemFlags` to an `Amode` on x64, and puts it on load and store `Inst` variants on aarch64. These two choices seem to factor things out in the nicest way: there are relatively few load/store insts on aarch64 but many addressing modes, while the opposite is true on x64.	2020-11-17 11:43:06 -08:00
Chris Fallin	b97f07b405	x64 backend: merge loads into ALU ops when appropriate. This PR makes use of the support in #2366 for sinking effectful instructions and merging them with consumers. In particular, on x86, we want to make use of the ability of many instructions to load one operand directly from memory. That is, instead of this: ``` movq 0(%rdi), %rax addq %rax, %rbx ``` we want to generate this: ``` addq 0(%rdi), %rax ``` As described in more detail in #2366, sinking and merging the load is only possible under certain conditions. In particular, we need to ensure that the use is the only use (otherwise the load happens more than once), and we need to ensure that it does not move across other effectful ops (see #2366 for how we ensure this). This change is actually fairly simple, given that all the framework is in place: we simply pattern-match a load on one operand of an ALU instruction that takes an RMI (reg, mem, or immediate) operand, and generate the mem form when we match. Also makes a drive-by improvement in the x64 backend to use statically-monomorphized `LowerCtx` types rather than a `&mut dyn LowerCtx`. On `bz2.wasm`, this results in ~1% instruction-count reduction. More is likely possible by following up with other instructions that can merge memory loads as well.	2020-11-17 11:06:46 -08:00
Chris Fallin	712ff22492	AArch64 SIMD: pattern-match load+splat into `LD1R` instruction.	2020-11-16 15:59:28 -08:00
Chris Fallin	39b5736727	Remove LoadSplat opcode, in preparation for pattern-matching Load+Splat. This was added as an incremental step to improve AArch64 code quality in PR #2278. At the time, we did not have a way to pattern-match the load + splat opcode sequence that the relevant Wasm opcodes lowered to. However, now with PR #2366, we can merge effectful instructions such as loads into other ops, and so we can do this pattern matching directly. The pattern-matching update will come in a subsequent commit.	2020-11-16 15:31:56 -08:00
Chris Fallin	3c8cb7b908	MachInst lowering logic: allow effectful instructions to merge. This PR updates the "coloring" scheme that accounts for side-effects in the MachInst lowering logic. As a result, the new backends will now be able to merge effectful operations (such as memory loads) into other operations; previously, only the other way (pure ops merged into effectful ops) was possible. This will allow, for example, a load+ALU-op combination, as is common on x86. It should even allow a load + ALU-op + store sequence to merge into one lowered instruction. The scheme arose from many fruitful discussions with @julian-seward1 (thanks!); significant credit is due to him for the insights here. The first insight is that given the right basic conditions, i.e. that the root instruction is the only use of an effectful instruction's result, all we need is that the "color" of the effectful instruction is one less than the color of the current instruction. It's easier to think about colors on the program points between instructions: if the color coming out of the first (effectful def) instruction and in to the second (effectful or effect-free use) instruction are the same, then they can merge. Basically the color denotes a version of global state; if the same, then no other effectful ops happened in the meantime. The second insight is that we can keep state as we scan, tracking the "current color", and update this when we sink (merge) an op. Hence when we sink a load into another op, we effectively re-color every instruction it moved over; this may allow further sinks. Consider the example (and assume that we consider loads effectful in order to conservatively ensure a strong memory model; otherwise, replace with other effectful value-producing insts): ``` v0 = load x v1 = load y v2 = add v0, 1 v3 = add v1, 1 ``` Scanning from bottom to top, we first see the add producing `v3` and we can sink the load producing `v1` into it, producing a load + ALU-op machine instruction. This is legal because `v1` moves over only `v2`, which is a pure instruction. Consider, though, `v2`: under a simple scheme that has no other context, `v0` could not sink to `v2` because it would move over `v1`, another load. But because we already sunk `v1` down to `v3`, we are free to sink `v0` to `v2`; the update of the "current color" during the scan allows this. This PR also cleans up the `LowerCtx` interface a bit at the same time: whereas previously it always gave some subset of (constant, mergeable inst, register) directly from `LowerCtx::get_input()`, it now returns zero or more of (constant, mergable inst) from `LowerCtx::maybe_get_input_as_source_or_const()`, and returns the register only from `LowerCtx::put_input_in_reg()`. This removes the need to explicitly denote uses of the register, so it's a little safer. Note that this PR does not actually make use of the new ability to merge loads into other ops; that will come in future PRs, especially to optimize the `x64` backend by using direct-memory operands.	2020-11-16 14:53:45 -08:00
Chris Fallin	7b9d870030	Merge pull request #2410 from cfallin/x64-gc Fix and enable GC on new x64 backend.	2020-11-13 09:40:05 -08:00
Chris Fallin	88fce766b0	Merge pull request #2411 from cfallin/x86-backend-cfg Don't run old x86 backend-specific tests with new x64 backend.	2020-11-13 09:29:16 -08:00

... 7 8 9 10 11 ...

3154 Commits