wasmtime

Author	SHA1	Message	Date
Andrew Brown	c9d573d841	Provide spec-compliant legalization for SIMD floating point min/max	2020-06-25 14:48:16 -07:00
Alex Crichton	a92a31d850	Rename the `master` branch to `main` (#1924 ) * This PR is against a branch called `main` * Internally all docs/CI/etc is updated * The default branch of the repo is now `main` * All active PRs have been updated to retarget `main` Closes #1914	2020-06-25 14:03:21 -05:00
Benjamin Bouvier	c9a3f05afd	machinst x64: implement calls and int cmp/store/loads; This makes it possible to run a simple recursive fibonacci function in wasmtime.	2020-06-25 16:20:33 +02:00
Johnnie Birch	2d364f75bd	Remove xmm_r_r inst data structure and cases after related refactoring Removes unneeded data structure that was holding instructions for xmm based move instructions. These instructions can should be categorized as rm not just r. This change is intended to simplify organization and cases when lowering.	2020-06-25 14:31:51 +02:00
Johnnie Birch	f2f7706265	Implements vcode lowering for f32.copysign. This patch implements the required but not already available x64 instructions for copysign as well as the actual lowering sequence and tests for the newly implemented x64 instructions. Those instructions include: andps, andnps, movaps, movd, orps, The lowering sequence is based on the lowering for f32.copysign in the current cranelift backend. movd does not have a test yet due to some logic needed express a 32-bit register as a source for xmm_rm_r instructions. This code also begins some rethinking/refactoring of how the sse move instuctions are written and so also includes new emit cases that will replace current ones that match a different enum used to describe sse moves.	2020-06-24 11:47:26 -07:00
Chris Fallin	492000e945	MachInst isel and aarch64 backend: docs / clarity improvements. From discussion with Julian and Ben, this PR makes a few documentation- and naming-level changes (no functionality change): - Document that the `LowerCtx`-provided output register can be used as a scratch register during the lowered instruction sequence before placing the final result in it. - Rename `input_to_` helpers in the AArch64 backend to `put_input_in_`, emphasizing that these are side-effecting helpers that potentially generate code (e.g., sign/zero-extensions) to ensure an input value is in a register.	2020-06-18 12:18:50 -07:00
Chris Fallin	f84903fa43	Merge pull request #1892 from akirilov-arm/simd_fp_cmp AArch64: Implement SIMD floating-point comparisons	2020-06-18 12:17:09 -07:00
Andrew Brown	3675f95bb2	Legalize fcvt_to_sint_sat.i32x4 on x86 Use a lengthy sequence involving CVTTPS2DQ to quiet NaNs and saturate overflow.	2020-06-18 11:39:38 -07:00
Andrew Brown	3740772176	Add encoding for x86 CVTTPS2DQ This reuses the `x86_cvtt2si` instruction since the packed and scalar versions seem to group together well.	2020-06-18 11:39:38 -07:00
Anton Kirilov	90bafae1dc	AArch64: Implement SIMD floating-point comparisons Copyright (c) 2020, Arm Limited.	2020-06-18 11:07:52 +01:00
Johnnie Birch	043571fee0	Adds f32.mul, f32.div for vcode backend for x64. Adds support for lowering clif instructions Fdiv and Fmul for new vcode backend. Misc adds lowering and test for sqrtss and removes a redundant to_string() func for the SseOpcode struct.	2020-06-17 17:19:57 -07:00
Chris Fallin	4dd1363581	Merge pull request #1890 from jgouly/simd-bool arm64: Implement AllTrue and AnyTrue	2020-06-17 09:38:10 -07:00
Benjamin Bouvier	c2692ecb8a	Wasmtime: allow using the experimental Cranelift x64 backend in cli; This introduces two changes: - first, a Cargo feature is added to make it possible to use the Cranelift x64 backend directly from wasmtime's CLI. - second, when passing a `cranelift-flags` parameter, and the given parameter's name doesn't exist at the target-independent flag level, try to set it as a target-dependent setting. These two changes make it possible to try out the new x64 backend with: cargo run --features experimental_x64 -- run --cranelift-flags use_new_backend=true -- /path/to/a.wasm Right now, this will fail because most opcodes required by the trampolines are actually not implemented yet.	2020-06-17 17:18:46 +02:00
Joey Gouly	0f462330e0	arm64: Implement AllTrue and AnyTrue This enables the simd_boolean WASM SIMD spec test. Copyright (c) 2020, Arm Limited.	2020-06-17 15:40:51 +01:00
Nick Fitzgerald	f30ce1fe97	externref: implement stack map-based garbage collection For host VM code, we use plain reference counting, where cloning increments the reference count, and dropping decrements it. We can avoid many of the on-stack increment/decrement operations that typically plague the performance of reference counting via Rust's ownership and borrowing system. Moving a `VMExternRef` avoids mutating its reference count, and borrowing it either avoids the reference count increment or delays it until if/when the `VMExternRef` is cloned. When passing a `VMExternRef` into compiled Wasm code, we don't want to do reference count mutations for every compiled `local.{get,set}`, nor for every function call. Therefore, we use a variation of deferred reference counting, where we only mutate reference counts when storing `VMExternRef`s somewhere that outlives the activation: into a global or table. Simultaneously, we over-approximate the set of `VMExternRef`s that are inside Wasm function activations. Periodically, we walk the stack at GC safe points, and use stack map information to precisely identify the set of `VMExternRef`s inside Wasm activations. Then we take the difference between this precise set and our over-approximation, and decrement the reference count for each of the `VMExternRef`s that are in our over-approximation but not in the precise set. Finally, the over-approximation is replaced with the precise set. The `VMExternRefActivationsTable` implements the over-approximized set of `VMExternRef`s referenced by Wasm activations. Calling a Wasm function and passing it a `VMExternRef` moves the `VMExternRef` into the table, and the compiled Wasm function logically "borrows" the `VMExternRef` from the table. Similarly, `global.get` and `table.get` operations clone the gotten `VMExternRef` into the `VMExternRefActivationsTable` and then "borrow" the reference out of the table. When a `VMExternRef` is returned to host code from a Wasm function, the host increments the reference count (because the reference is logically "borrowed" from the `VMExternRefActivationsTable` and the reference count from the table will be dropped at the next GC). For more general information on deferred reference counting, see An Examination of Deferred Reference Counting and Cycle Detection by Quinane: https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf cc #929 Fixes #1804	2020-06-15 09:39:37 -07:00
Benjamin Bouvier	357fb11f46	Review comments;	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	28c40ba0f7	machinst x64: refactor lowering too;	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	48fb9291bc	machinst x64: refactor REX prefix emission;	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	be4102b205	machinst x64: create a Rex wrapper to avoid flags for the REX prefix;	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	d9ca974133	machinst x64: renamings in the emit functions; This gets closer to Rust naming standards, and shorten a few names.	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	b2a0718404	machinst x64: expand encoding names a bit; This avoids one, two, and three letter structures names, which makes the code easier to read (while a bit more verbose).	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	ef5de04d32	machinst/x64: teach regalloc what FP instructions are moves; and cosmetic changes after #1665 landed.	2020-06-15 16:39:08 +02:00
Benjamin Bouvier	238ae3bf21	cranelift: tweak condition in safepoint detection to check for resumable traps;	2020-06-15 12:04:28 +02:00
Benjamin Bouvier	dad56a2488	cranelift: add a new resumable_trapnz instruction; This is useful to have to allow resumable_trap to happen in loop headers, for instance. This is the correct way to implement interrupt checks in Spidermonkey, which are effectively resumable traps. Previous implementation was using traps, which is wrong, since traps semantically can't be resumed after.	2020-06-15 12:04:28 +02:00
Andrew Brown	01d34e71b9	Add x86 legalization for fcvt_from_uint.f32x4 This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.	2020-06-12 15:06:22 -07:00
Andrew Brown	23ed48f269	Add AVX512F flag	2020-06-12 15:06:22 -07:00
Andrew Brown	772ce73f7f	Add x86_pblendw instruction This instruction is necessary for lowering `fcvt_from_uint`.	2020-06-12 15:06:22 -07:00
Andrew Brown	546fc9ddf1	Add x86_vcvtudq2ps instruction This instruction converts i32x4 to f32x4 in several AVX512 feature sets.	2020-06-12 15:06:22 -07:00
Chris Fallin	6286ca7310	AArch64: make use of reg-reg-extend amode. When a load/store instruction needs an address of the form `v0 + uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we currently generate a separate zero/sign-extend operation and then use a plain `[rA, rB]` addressing mode. This patch extends `lower_address()` to look at both addends of an address if it has two addends and a zero offset, recognize extension operations, and incorporate them directly into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve our performence on WebAssembly workloads, at least, because we often see a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.	2020-06-12 10:40:54 -07:00
Dan Gohman	caa87048ab	Wasmtime 0.18.0 and Cranelift 0.65.0.	2020-06-11 17:49:56 -07:00
Chris Fallin	cdbe76a1d4	Remove uses of `matches!()` macro, incompatible with Firefox build. When we vendor Cranelift into Firefox, we need to be able to build with the Firefox CI setup (unless we carry patches on top of upstream). Unfortunately, the Firefox CI currently appears to build with a slightly older version of Rust: I can't work out which version exactly, but one without stable support for `matches!()`. A recent attempt to version-bump Cranelift failed with build errors at the two locations in this patch: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305994046&repo=autoland&lineNumber=24829 I also see a bunch of uses of `matches!()` in Peepmatic, but those crates are not built by Firefox, so we can leave them be for now, I think.	2020-06-11 15:11:10 -07:00
Chris Fallin	b0cccf1d87	Merge pull request #1864 from jgouly/bitwise arm64: Implement SIMD bitwise operations	2020-06-11 11:38:39 -07:00
Chris Fallin	6ba165be01	Merge pull request #1858 from cfallin/fix-scale-b1 Bugfix: scaled addressing mode: round B1 up to one byte.	2020-06-11 11:16:07 -07:00
Joey Gouly	544c5dece5	arm64: Implement SIMD bitwise operations Copyright (c) 2020, Arm Limited.	2020-06-11 10:58:23 -07:00
Chris Fallin	ed7e410111	Bugfix: scaled addressing mode: round B1 up to one byte. Issue uncovered by Ben Bouvier during regalloc work.	2020-06-11 10:27:32 -07:00
Chris Fallin	a84c1931a0	Merge pull request #1854 from akirilov-arm/simd_load_splat Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64	2020-06-10 12:11:29 -07:00
Johnnie Birch	48f0b10c7a	Add initial scalar FP operations (addss, subss, etc) to x64 backend. Adds support for addss and subss. This is the first lowering for sse floating point alu and some move operations. The changes here do some renaming of data structures and adds a couple of new ones to support sse specific operations. The work done here will likely evolve as needed to support an efficient, inituative, and consistent framework.	2020-06-10 18:36:57 +02:00
Benjamin Bouvier	5d01603390	mach backend: allow snapshotting IR graphs with the SNAPSHOT_REGALLOC env variable; This also requires the serde feature, which isn't enabled by default, thus it must be passed as a command-line argument to cargo.	2020-06-10 18:23:04 +02:00
Benjamin Bouvier	46093f6119	Bump regalloc.rs to 0.0.26; And adapt to regalloc.rs API change to provide the exact number of vregs.	2020-06-10 18:23:04 +02:00
Anton Kirilov	d941034c2e	Enable the wast::Cranelift::spec::simd::simd_load_splat test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-10 15:01:37 +01:00
Chris Fallin	ac87ed12bd	Merge pull request #1847 from akirilov-arm/simd_load_extend Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64	2020-06-09 12:29:06 -07:00
Joey Gouly	df2b031b6a	arm64: Implement Icmp for I16X8 and I32X4 Copyright (c) 2020, Arm Limited.	2020-06-09 11:07:43 -07:00
Anton Kirilov	7ac19af498	Enable the wast::Cranelift::spec::simd::simd_load_extend test for AArch64 Copyright (c) 2020, Arm Limited.	2020-06-09 18:05:38 +01:00
Chris Fallin	8da71a145c	Merge pull request #1802 from akirilov-arm/simd_align Enable the wast::Cranelift::spec::simd::simd_align test for AArch64	2020-06-09 09:58:26 -07:00
Chris Fallin	02ae1b4464	Merge pull request #1846 from julian-seward1/better-phis Rewrite `lower_edge` to produce better phi-translations:	2020-06-09 09:56:52 -07:00
Anton Kirilov	51a551fb39	Implement vector element extensions for AArch64 This commit also includes load and extend operations. Both are prerequisites for enabling further SIMD spec tests. Copyright (c) 2020, Arm Limited.	2020-06-09 12:28:49 +01:00
Julian Seward	6d25759c8e	Rewrite `lower_edge` to produce better phi-translations: * ensure that all const assignments are placed at the end of the sequence. This minimises live ranges. * for the non-const assignments, ignore self-assignments. This can dramatically reduce the total number of moves generated, because any self-assignments trigger the overlap-case handling, hence invoking the double-copy behaviour in cases where it's not necessary. It's worth pointing out that self-assignments are common, and are not due to deficiencies in CLIR optimisation. Rather, they occur whenever a loop back edge doesn't modify all loop-carried values. This can easily happen if the loop has multiple "early" back-edges -- "continues" in C parlance. Eg: loop_header(a, b, c, d, e, f): ... a_new = ... b_new = ... if (..) goto loop_header(a_new, b_new, c, d, e, f) ... c_new = ... d_new = ... if (..) goto loop_header(a_new, b_new, c_new, d_new, e, f) etc For functions with many live values, this can dramatically reduce the number of spill moves we throw into the register allocator. In terms of compilation costs, this ranges from neutral for functions which spill not at all, or minimally (joey_small, joey_med) to a 7.1% reduction in insn count. In terms of run costs, for one spill-heavy test (bz2 w/ custom timing harness), instruction counts are reduced by 4.3%, data reads by 12.3% and data writes by 18.5%. Note those last two figures include all reads and writes made by the generated code, not just spills/reloads, so the proportional reduction in spill/reload traffic must be greater.	2020-06-09 10:36:32 +02:00
Nick Fitzgerald	fb9f39ce17	Merge pull request #1824 from fitzgen/test-stack-maps cranelift: Better document and test stack maps	2020-06-08 15:58:20 -07:00
Nick Fitzgerald	6aac4c891e	cranelift: Better document and test stack maps	2020-06-08 15:05:20 -07:00
Chris Fallin	e3d89c8a92	Merge pull request #1825 from cfallin/spidermonkey-fixes Three fixes to various SpiderMonkey-related issues	2020-06-08 13:54:13 -07:00

1 2 3 4 5 ...

747 Commits