wasmtime

Author	SHA1	Message	Date
Benjamin Bouvier	c2692ecb8a	Wasmtime: allow using the experimental Cranelift x64 backend in cli; This introduces two changes: - first, a Cargo feature is added to make it possible to use the Cranelift x64 backend directly from wasmtime's CLI. - second, when passing a `cranelift-flags` parameter, and the given parameter's name doesn't exist at the target-independent flag level, try to set it as a target-dependent setting. These two changes make it possible to try out the new x64 backend with: cargo run --features experimental_x64 -- run --cranelift-flags use_new_backend=true -- /path/to/a.wasm Right now, this will fail because most opcodes required by the trampolines are actually not implemented yet.	2020-06-17 17:18:46 +02:00
Joey Gouly	0f462330e0	arm64: Implement AllTrue and AnyTrue This enables the simd_boolean WASM SIMD spec test. Copyright (c) 2020, Arm Limited.	2020-06-17 15:40:51 +01:00
Nick Fitzgerald	f30ce1fe97	externref: implement stack map-based garbage collection For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a `VMExternRef` avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the `VMExternRef` is cloned. When passing a `VMExternRef` into compiled Wasm code, we don't want to do reference count mutations for every compiled `local.{get,set}`, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing `VMExternRef`s somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of `VMExternRef`s that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of `VMExternRef`s inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the `VMExternRef`s that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set. The `VMExternRefActivationsTable` implements the over-approximized set of `VMExternRef`s referenced by Wasm activations. Calling a Wasm function and passing it a `VMExternRef` moves the `VMExternRef` into the table, and the compiled Wasm function logically "borrows" the `VMExternRef` from the table. Similarly, `global.get` and `table.get` operations clone the gotten `VMExternRef` into the `VMExternRefActivationsTable` and then "borrow" the reference out of the table. When a `VMExternRef` is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the `VMExternRefActivationsTable` and the reference count from the table will be dropped at the next GC). For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf cc #929 Fixes #1804	2020-06-15 09:39:37 -07:00
Benjamin Bouvier	357fb11f46	Review comments;	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	28c40ba0f7	machinst x64: refactor lowering too;	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	48fb9291bc	machinst x64: refactor REX prefix emission;	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	be4102b205	machinst x64: create a Rex wrapper to avoid flags for the REX prefix;	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	d9ca974133	machinst x64: renamings in the emit functions; This gets closer to Rust naming standards, and shorten a few names.	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	b2a0718404	machinst x64: expand encoding names a bit; This avoids one, two, and three letter structures names, which makes the code easier to read (while a bit more verbose).	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	ef5de04d32	machinst/x64: teach regalloc what FP instructions are moves; and cosmetic changes after #1665 landed.	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	238ae3bf21	cranelift: tweak condition in safepoint detection to check for resumable traps;	2020-06-15 12:04:28 +02:00
Benjamin Bouvier	dad56a2488	cranelift: add a new resumable_trapnz instruction; This is useful to have to allow resumable_trap to happen in loop headers, for instance. This is the correct way to implement interrupt checks in Spidermonkey, which are effectively resumable traps. Previous implementation was using traps, which is wrong, since traps semantically can't be resumed after.	2020-06-15 12:04:28 +02:00
Andrew Brown	01d34e71b9	Add x86 legalization for fcvt_from_uint.f32x4 This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.	2020-06-12 15:06:22 -07:00
Andrew Brown	23ed48f269	Add AVX512F flag	2020-06-12 15:06:22 -07:00
Andrew Brown	772ce73f7f	Add x86_pblendw instruction This instruction is necessary for lowering `fcvt_from_uint`.	2020-06-12 15:06:22 -07:00
Andrew Brown	546fc9ddf1	Add x86_vcvtudq2ps instruction This instruction converts i32x4 to f32x4 in several AVX512 feature sets.	2020-06-12 15:06:22 -07:00
Chris Fallin	6286ca7310	AArch64: make use of reg-reg-extend amode. When a load/store instruction needs an address of the form `v0 + uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we currently generate a separate zero/sign-extend operation and then use a plain `[rA, rB]` addressing mode. This patch extends `lower_address()` to look at both addends of an address if it has two addends and a zero offset, recognize extension operations, and incorporate them directly into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve our performence on WebAssembly workloads, at least, because we often see a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.	2020-06-12 10:40:54 -07:00
Dan Gohman	caa87048ab	Wasmtime 0.18.0 and Cranelift 0.65.0.	2020-06-11 17:49:56 -07:00
Chris Fallin	cdbe76a1d4	Remove uses of `matches!()` macro, incompatible with Firefox build. When we vendor Cranelift into Firefox, we need to be able to build with the Firefox CI setup (unless we carry patches on top of upstream). Unfortunately, the Firefox CI currently appears to build with a slightly older version of Rust: I can't work out which version exactly, but one without stable support for `matches!()`. A recent attempt to version-bump Cranelift failed with build errors at the two locations in this patch: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305994046&repo=autoland&lineNumber=24829 I also see a bunch of uses of `matches!()` in Peepmatic, but those crates are not built by Firefox, so we can leave them be for now, I think.	2020-06-11 15:11:10 -07:00
Chris Fallin	b0cccf1d87	Merge pull request #1864 from jgouly/bitwise arm64: Implement SIMD bitwise operations	2020-06-11 11:38:39 -07:00
Chris Fallin	6ba165be01	Merge pull request #1858 from cfallin/fix-scale-b1 Bugfix: scaled addressing mode: round B1 up to one byte.	2020-06-11 11:16:07 -07:00
Joey Gouly	544c5dece5	arm64: Implement SIMD bitwise operations Copyright (c) 2020, Arm Limited.	2020-06-11 10:58:23 -07:00
Chris Fallin	ed7e410111	Bugfix: scaled addressing mode: round B1 up to one byte. Issue uncovered by Ben Bouvier during regalloc work.	2020-06-11 10:27:32 -07:00
Chris Fallin	a84c1931a0	Merge pull request #1854 from akirilov-arm/simd_load_splat Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64	2020-06-10 12:11:29 -07:00
Johnnie Birch	48f0b10c7a	Add initial scalar FP operations (addss, subss, etc) to x64 backend. Adds support for addss and subss. This is the first lowering for sse floating point alu and some move operations. The changes here do some renaming of data structures and adds a couple of new ones to support sse specific operations. The work done here will likely evolve as needed to support an efficient, inituative, and consistent framework.	2020-06-10 18:36:57 +02:00
Benjamin Bouvier	5d01603390	mach backend: allow snapshotting IR graphs with the SNAPSHOT_REGALLOC env variable; This also requires the serde feature, which isn't enabled by default, thus it must be passed as a command-line argument to cargo.	2020-06-10 18:23:04 +02:00
Benjamin Bouvier	46093f6119	Bump regalloc.rs to 0.0.26; And adapt to regalloc.rs API change to provide the exact number of vregs.	2020-06-10 18:23:04 +02:00
Anton Kirilov	d941034c2e	Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-10 15:01:37 +01:00
Chris Fallin	ac87ed12bd	Merge pull request #1847 from akirilov-arm/simd_load_extend Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64	2020-06-09 12:29:06 -07:00
Joey Gouly	df2b031b6a	arm64: Implement Icmp for I16X8 and I32X4 Copyright (c) 2020, Arm Limited.	2020-06-09 11:07:43 -07:00
Anton Kirilov	7ac19af498	Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-09 18:05:38 +01:00
Chris Fallin	8da71a145c	Merge pull request #1802 from akirilov-arm/simd_align Enable the wast::Cranelift::spec::simd::simd_align test for AArch64	2020-06-09 09:58:26 -07:00
Chris Fallin	02ae1b4464	Merge pull request #1846 from julian-seward1/better-phis Rewrite `lower_edge` to produce better phi-translations:	2020-06-09 09:56:52 -07:00
Anton Kirilov	51a551fb39	Implement vector element extensions for AArch64 This commit also includes load and extend operations. Both are prerequisites for enabling further SIMD spec tests. Copyright (c) 2020, Arm Limited.	2020-06-09 12:28:49 +01:00
Julian Seward	6d25759c8e	Rewrite `lower_edge` to produce better phi-translations: * ensure that all const assignments are placed at the end of the sequence. This minimises live ranges. * for the non-const assignments, ignore self-assignments. This can dramatically reduce the total number of moves generated, because any self-assignments trigger the overlap-case handling, hence invoking the double-copy behaviour in cases where it's not necessary. It's worth pointing out that self-assignments are common, and are not due to deficiencies in CLIR optimisation. Rather, they occur whenever a loop back edge doesn't modify all loop-carried values. This can easily happen if the loop has multiple "early" back-edges -- "continues" in C parlance. Eg: loop_header(a, b, c, d, e, f): ... a_new = ... b_new = ... if (..) goto loop_header(a_new, b_new, c, d, e, f) ... c_new = ... d_new = ... if (..) goto loop_header(a_new, b_new, c_new, d_new, e, f) etc For functions with many live values, this can dramatically reduce the number of spill moves we throw into the register allocator. In terms of compilation costs, this ranges from neutral for functions which spill not at all, or minimally (joey_small, joey_med) to a 7.1% reduction in insn count. In terms of run costs, for one spill-heavy test (bz2 w/ custom timing harness), instruction counts are reduced by 4.3%, data reads by 12.3% and data writes by 18.5%. Note those last two figures include all reads and writes made by the generated code, not just spills/reloads, so the proportional reduction in spill/reload traffic must be greater.	2020-06-09 10:36:32 +02:00
Nick Fitzgerald	fb9f39ce17	Merge pull request #1824 from fitzgen/test-stack-maps cranelift: Better document and test stack maps	2020-06-08 15:58:20 -07:00
Nick Fitzgerald	6aac4c891e	cranelift: Better document and test stack maps	2020-06-08 15:05:20 -07:00
Chris Fallin	e3d89c8a92	Merge pull request #1825 from cfallin/spidermonkey-fixes Three fixes to various SpiderMonkey-related issues	2020-06-08 13:54:13 -07:00
Chris Fallin	fc2a6f273b	Three fixes to various SpiderMonkey-related issues: - Properly mask constant values down to appropriate width when generating a constant value directly in aarch64 backend. This was a miscompilation introduced in the new-isel refactor. In combination with failure to respect NarrowValueMode, this resulted in a very subtle bug when an `i32` constant was used in bit-twiddling logic. - Add support for `iadd_ifcout` in aarch64 backend as used in explicit heap-check mode. With this change, we no longer fail heap-related tests with the huge-heap-region mode disabled. - Remove a panic that was occurring in some tests that are currently ignored on aarch64, by simply returning empty/default information in `value_label` functionality rather than touching unimplemented APIs. This is not a bugfix per-se, but removes confusing panic messages from `cargo test` output that might otherwise mislead.	2020-06-08 13:02:00 -07:00
whitequark	3796164642	x86_32: legalize br{z,nz}.i64.	2020-06-08 12:52:13 -07:00
whitequark	bc555468a7	cranelift: add i64.{ishl,ushr,ashr} libcalls. These libcalls are useful for 32-bit platforms. On x86_32 in particular, commit `4ec16fa0` added support for legalizing 64-bit shifts through SIMD operations. However, that legalization requires SIMD to be enabled and SSE 4.1 to be supported, which is not acceptable as a hard requirement.	2020-06-05 12:13:49 -07:00
Chris Fallin	00abfcd943	Merge pull request #1817 from cfallin/issue-1809 Avoid touching encodings in `EncCursor` if using a MachInst backend.	2020-06-04 12:50:39 -07:00
Yury Delendik	6f37204f82	Upgrade gimli to 0.21 (#1819 ) * Use gimli 0.21 * rm CFI w Expression * Don't write .debug_frame twice	2020-06-04 14:34:05 -05:00
Chris Fallin	63a335b7d4	Avoid touching encodings in `EncCursor` if using a MachInst backend. `EncCursor` is a variant of `Cursor` that allows updating CLIF while keeping its encodings up to date, given a particular ISA. However, new (MachInst) backends don't use the encodings, and the `TargetIsaAdapter` shim will panic if any encoding-related method is called. This PR avoids those panics. Fixes #1809.	2020-06-04 10:53:45 -07:00
Andrew Brown	5db384cd76	Rename opcode: PMULLQ to VPMULLQ	2020-06-03 16:27:57 -07:00
Andrew Brown	1ea09088be	Add x86 legalization for imul.i64x2 for non-AVX CPUs The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.	2020-06-03 16:27:57 -07:00
Andrew Brown	b3a6985cd5	Re-organize transform groups for x86 legalization	2020-06-03 16:27:57 -07:00
Andrew Brown	5a32500518	Remove non-existent x86 encoding for sshr_imm.i64x2 This instruction does not exist in the SSE2 feature set; it can be added later with an VEX/EVEX encoding.	2020-06-03 16:27:57 -07:00
Andrew Brown	df171f01b5	Add x86_pmuludq This instruction multiplies the lower 32 bits of two 64x2 unsigned integers into an i64x2; this is necessary for lowering Wasm's i64x2.mul.	2020-06-03 16:27:57 -07:00
Andrew Brown	40f31375a5	Add TargetIsa::as_any for downcasting to specific ISA implementations This is necessary when we would like to check specific ISA flags, e.g.	2020-06-03 16:27:57 -07:00

... 4 5 6 7 8 ...

985 Commits