wasmtime

Author	SHA1	Message	Date
Andrew Brown	01d34e71b9	Add x86 legalization for fcvt_from_uint.f32x4 This converts an `i32x4` into an `f32x4` with some rounding either by using an AVX512VL/F instruction--VCVTUDQ2PS--or a long sequence of SSE4.1 compatible instructions.	2020-06-12 15:06:22 -07:00
Andrew Brown	772ce73f7f	Add x86_pblendw instruction This instruction is necessary for lowering `fcvt_from_uint`.	2020-06-12 15:06:22 -07:00
Andrew Brown	546fc9ddf1	Add x86_vcvtudq2ps instruction This instruction converts i32x4 to f32x4 in several AVX512 feature sets.	2020-06-12 15:06:22 -07:00
Chris Fallin	6286ca7310	AArch64: make use of reg-reg-extend amode. When a load/store instruction needs an address of the form `v0 + uextend(v1)` or `v0 + sextend(v1)` (or the commuted forms thereof), we currently generate a separate zero/sign-extend operation and then use a plain `[rA, rB]` addressing mode. This patch extends `lower_address()` to look at both addends of an address if it has two addends and a zero offset, recognize extension operations, and incorporate them directly into a `[rA, rB, UXTW]` or `[rA, rB, SXTW]` form. This should improve our performence on WebAssembly workloads, at least, because we often see a 64-bit linear memory base indexed by a 32-bit (Wasm) pointer value.	2020-06-12 10:40:54 -07:00
Dan Gohman	caa87048ab	Wasmtime 0.18.0 and Cranelift 0.65.0.	2020-06-11 17:49:56 -07:00
Chris Fallin	6ba165be01	Merge pull request #1858 from cfallin/fix-scale-b1 Bugfix: scaled addressing mode: round B1 up to one byte.	2020-06-11 11:16:07 -07:00
Chris Fallin	47402316e0	Add test case: b1-typed spillslot access using UImm12 addressing mode.	2020-06-11 10:27:39 -07:00
Benjamin Bouvier	46093f6119	Bump regalloc.rs to 0.0.26; And adapt to regalloc.rs API change to provide the exact number of vregs.	2020-06-10 18:23:04 +02:00
Nick Fitzgerald	fb9f39ce17	Merge pull request #1824 from fitzgen/test-stack-maps cranelift: Better document and test stack maps	2020-06-08 15:58:20 -07:00
Nick Fitzgerald	6aac4c891e	cranelift: Better document and test stack maps	2020-06-08 15:05:20 -07:00
Chris Fallin	e3d89c8a92	Merge pull request #1825 from cfallin/spidermonkey-fixes Three fixes to various SpiderMonkey-related issues	2020-06-08 13:54:13 -07:00
Chris Fallin	fc2a6f273b	Three fixes to various SpiderMonkey-related issues: - Properly mask constant values down to appropriate width when generating a constant value directly in aarch64 backend. This was a miscompilation introduced in the new-isel refactor. In combination with failure to respect NarrowValueMode, this resulted in a very subtle bug when an `i32` constant was used in bit-twiddling logic. - Add support for `iadd_ifcout` in aarch64 backend as used in explicit heap-check mode. With this change, we no longer fail heap-related tests with the huge-heap-region mode disabled. - Remove a panic that was occurring in some tests that are currently ignored on aarch64, by simply returning empty/default information in `value_label` functionality rather than touching unimplemented APIs. This is not a bugfix per-se, but removes confusing panic messages from `cargo test` output that might otherwise mislead.	2020-06-08 13:02:00 -07:00
whitequark	3796164642	x86_32: legalize br{z,nz}.i64.	2020-06-08 12:52:13 -07:00
whitequark	bc555468a7	cranelift: add i64.{ishl,ushr,ashr} libcalls. These libcalls are useful for 32-bit platforms. On x86_32 in particular, commit `4ec16fa0` added support for legalizing 64-bit shifts through SIMD operations. However, that legalization requires SIMD to be enabled and SSE 4.1 to be supported, which is not acceptable as a hard requirement.	2020-06-05 12:13:49 -07:00
Yury Delendik	6f37204f82	Upgrade gimli to 0.21 (#1819 ) * Use gimli 0.21 * rm CFI w Expression * Don't write .debug_frame twice	2020-06-04 14:34:05 -05:00
Andrew Brown	1ea09088be	Add x86 legalization for imul.i64x2 for non-AVX CPUs The `convert_i64x2_imul` custom legalization checks the ISA flags for AVX512DQ or AVX512VL support and legalizes `imul.i64x2` to an `x86_pmullq` in this case; if not, it uses a lengthy SSE2-compatible instruction sequence.	2020-06-03 16:27:57 -07:00
Andrew Brown	5a32500518	Remove non-existent x86 encoding for sshr_imm.i64x2 This instruction does not exist in the SSE2 feature set; it can be added later with an VEX/EVEX encoding.	2020-06-03 16:27:57 -07:00
Andrew Brown	df171f01b5	Add x86_pmuludq This instruction multiplies the lower 32 bits of two 64x2 unsigned integers into an i64x2; this is necessary for lowering Wasm's i64x2.mul.	2020-06-03 16:27:57 -07:00
Andrew Brown	9ba9fd0f64	Add x86-specific instruction for i64x2 multiplication Without this special instruction, legalizing to the AVX512 instruction AND the SSE instruction sequence is impossible. This extra instruction would be rendered unnecessary by the x64 backend.	2020-06-03 16:27:57 -07:00
Chris Fallin	fe97659813	Address review comments.	2020-06-03 13:31:34 -07:00
Chris Fallin	615362068f	Multi-value return support.	2020-06-03 13:31:34 -07:00
Dan Gohman	a76639c6fb	Wasmtime 0.17.0 and Cranelift 0.64.0. (#1805 )	2020-06-02 18:51:59 -07:00
Nick Fitzgerald	7c68a10ed6	Merge pull request #1670 from teapotd/win64-pass-by-ref Implement passing arguments by ref for win64 ABI	2020-06-01 11:13:30 -07:00
Andrew Brown	7d6e94b952	Replace InsertLane format with TernaryImm8 The InsertLane format has an ordering (`value().imm().value()`) and immediate name (`"lane"`) that make it awkward to use for other instructions. This changes the ordering (`value().value().imm()`) and uses the default name (`"imm"`) throughout the codebase.	2020-05-29 19:56:27 -07:00
teapotd	e430984ac4	Improve bitselect codegen with knowledge of operand origin (#1783 ) * Encode vselect using BLEND instructions on x86 * Legalize vselect to bitselect * Optimize bitselect to vselect for some operands * Add run tests for bitselect-vselect optimization * Address review feedback	2020-05-29 19:53:11 -07:00
teapotd	759cc3e751	Implement passing arguments by ref for win64 ABI	2020-05-29 20:12:41 +02:00
Nick Fitzgerald	94380bf2b7	Merge pull request #1510 from teapotd/abi-i128-fix Always check if struct-return parameter is needed	2020-05-29 10:02:16 -07:00
whitequark	a180b5b393	x86_32: fix stack_addr encoding. Consider this testcase: target i686 function u0:0() -> i32 system_v { ss0 = explicit_slot 0 block0: v2 = stack_addr.i32 ss0 return v2 } Before this commit, in 32-bit mode the x86 backend would generate incorrect code for stack addresses: 0: 55 push ebp 1: 89 e5 mov ebp, esp 3: 83 ec 08 sub esp, 8 6: 8d 44 24 00 lea eax, [esp] a: 00 00 add byte ptr [eax], al c: 00 83 c4 08 5d c3 add byte ptr [ebx - 0x3ca2f73c], al This happened because the ModRM byte indicated a disp8 encoding, but the instruction actually used a disp32 encoding. After this commit, correct code is generated: 0: 55 push ebp 1: 89 e5 mov ebp, esp 3: 83 ec 08 sub esp, 8 6: 8d 84 24 00 00 00 00 lea eax, [esp] d: 83 c4 08 add esp, 8 10: 5d pop ebp 11: c3 ret	2020-05-29 09:17:36 -07:00
whitequark	880e692fd4	x86: add encoding for bnot.b1. Fixes #1743. Co-authored-by: iximeow <git@iximeow.net>	2020-05-28 08:43:25 -07:00
Andrew Brown	b017844bef	Fix interpreter semantics of 'irsub_imm' Previously it used `arg - imm` but the functionality should be a wrapping `imm - arg` (see `cranelift/codegen/meta/src/shared/instructions.rs`).	2020-05-28 16:28:27 +02:00
teapotd	fbac2e53f9	Make vconst BxN match specification	2020-05-27 09:37:13 -07:00
Andrew Brown	628a9f0eaa	Print more detailed `test run` failures (#1764 )	2020-05-27 09:04:46 -05:00
teapotd	9e70a64728	Legalize sret call arguments	2020-05-25 20:03:24 +02:00
teapotd	b18846057f	Add system_v legalizer tests for i128 args	2020-05-25 20:03:24 +02:00
teapotd	6465003899	Run popcnt.i128 legalization test on x86_64	2020-05-25 20:03:24 +02:00
teapotd	0f55bb4b8d	Always check if struct-return parameter is needed	2020-05-25 20:03:24 +02:00
Chris Fallin	51f9ac2150	Merge pull request #1741 from cfallin/filetest-vcode-compile Merge `vcode` filetest mode into `compile`.	2020-05-22 18:57:21 -07:00
Chris Fallin	48573b52b2	Merge `vcode` filetest mode into `compile`. I hadn't realized before that the filetest backend for `test vcode` is doing essentially what `compile` is doing, but for new (`MachInst`) backends: it is just getting a disassembly and running it through filecheck. There's no reason not to reuse `test compile` for the AArch64 tests as well. This was motivated by the desire to have "this IR compiles successfully" tests work on both x86 and AArch64. It seems this should work fine by adding multiple `target` directives when a test case should be compile-tested on multiple architectures.	2020-05-22 17:28:48 -07:00
Chris Fallin	73537e72c0	Merge pull request #1732 from jgouly/copysign-fpu arm64: Use FPU instrctions for Fcopysign	2020-05-22 17:25:33 -07:00
Peter Huene	f36539130b	Merge pull request #1734 from peterhuene/fix-saved-fprs Cranelift: Fix FPR saving and shadow space allocation for Windows x64.	2020-05-22 12:06:37 -07:00
whitequark	b2e8ed4dc9	cranelift: add i64.[us]{div,rem} libcalls. These libcalls are useful for 32-bit platforms.	2020-05-22 11:41:56 +00:00
Joey Gouly	02c3f238f8	arm64: Use FPU instrctions for Fcopysign Copyright (c) 2020, Arm Limited.	2020-05-21 18:14:12 +01:00
Peter Huene	78c3091e84	Fix FPR saving and shadow space allocation for Windows x64. This commit fixes both how FPR callee-saved registers are saved and how the shadow space allocation occurs when laying out the stack for Windows x64 calling convention. Importantly, this commit removes the compiler limitation of stack size for Windows x64 that was imposed because FPR saves previously couldn't always be represented in the unwind information. The FPR saves are now performed without using stack slots, much like how the callee-saved GPRs are saved. The total CSR space is given to `layout_stack` so that it is included in the frame size and to offset the layout of spills and explicit slots. The FPR saves are now done via an RSP offset (post adjustment) and they always follow the GPR saves on the stack. A simpler calculation can now be made to determine the proper offsets of the FPR saves for representing the unwind information. Additionally, the shadow space is no longer treated as an incoming argument, but an explicit stack slot that gets laid out at the lowest address possible in the local frame. This prevents `layout_stack` from putting a spill or explicit slot in this reserved space. In the future, `layout_stack` should take advantage of the caller-provided shadow space for spills, but this commit does not attempt to address that. The shadow space is now omitted from the local frame for leaf functions. Fixes #1728. Fixes #1587. Fixes #1475.	2020-05-20 15:37:30 -07:00
Chris Fallin	72e6be9342	Rework of MachInst isel, branch fixups and lowering, and block ordering. This patch includes: - A complete rework of the way that CLIF blocks and edge blocks are lowered into VCode blocks. The new mechanism in `BlockLoweringOrder` computes RPO over the CFG, but with a twist: it merges edge blocks intto heads or tails of original CLIF blocks wherever possible, and it does this without ever actually materializing the full nodes-plus-edges graph first. The backend driver lowers blocks in final order so there's no need to reshuffle later. - A new `MachBuffer` that replaces the `MachSection`. This is a special version of a code-sink that is far more than a humble `Vec<u8>`. In particular, it keeps a record of label definitions and label uses, with a machine-pluggable `LabelUse` trait that defines various types of fixups (basically internal relocations). Importantly, it implements some simple peephole-style branch rewrites inline in the emission pass, without any separate traversals over the code to use fallthroughs, swap taken/not-taken arms, etc. It tracks branches at the tail of the buffer and can (i) remove blocks that are just unconditional branches (by redirecting the label), (ii) understand a conditional/unconditional pair and swap the conditional polarity when it's helpful; and (iii) remove branches that branch to the fallthrough PC. The `MachBuffer` also implements branch-island support. On architectures like AArch64, this is needed to allow conditional branches within plausibly-attainable ranges (+/- 1MB on AArch64 specifically). It also does this inline while streaming through the emission, without any sort of fixpoint algorithm or later moving of code, by simply tracking outstanding references and "deadlines" and emitting an island just-in-time when we're in danger of going out of range. - A rework of the instruction selector driver. This is largely following the same algorithm as before, but is cleaned up significantly, in particular in the API: the machine backend can ask for an input arg and get any of three forms (constant, register, producing instruction), indicating it needs the register or can merge the constant or producing instruction as appropriate. This new driver takes special care to emit constants right at use-sites (and at phi inputs), minimizing their live-ranges, and also special-cases the "pinned register" to avoid superfluous moves. Overall, on `bz2.wasm`, the results are: wasmtime full run (compile + runtime) of bz2: baseline: 9774M insns, 9742M cycles, 3.918s w/ changes: 7012M insns, 6888M cycles, 2.958s (24.5% faster, 28.3% fewer insns) clif-util wasm compile bz2: baseline: 2633M insns, 3278M cycles, 1.034s w/ changes: 2366M insns, 2920M cycles, 0.923s (10.7% faster, 10.1% fewer insns) All numbers are averages of two runs on an Ampere eMAG.	2020-05-16 23:08:22 -07:00
Nick Fitzgerald	fb7a690efc	Merge pull request #1687 from fitzgen/sign-extend-immediates cranelift: Sign extend `Imm64` immediates	2020-05-14 10:09:53 -07:00
Nick Fitzgerald	52c6ece5f3	peepmatic: Make peepmatic optional to enable Rather than outright replacing parts of our existing peephole optimizations passes, this makes peepmatic an optional cargo feature that can be enabled. This allows us to take a conservative approach with enabling peepmatic everywhere, while also allowing us to get it in-tree and make it easier to collaborate on improving it quickly.	2020-05-14 07:52:23 -07:00
Nick Fitzgerald	6e135b3aea	peepmatic: Fix a failed assertion due to extra iterations after fixed point After replacing an instruction with an alias to an earlier value, trying to further optimize that value is unnecessary, since we've already processed it, and also was triggering an assertion.	2020-05-14 07:52:23 -07:00
Nick Fitzgerald	9a1f8038b7	peepmatic: Do not transplant instructions whose results are potentially used elsewhere	2020-05-14 07:52:23 -07:00
Nick Fitzgerald	090d1c2d32	cranelift: Port most of `simple_preopt.rs` over to the `peepmatic` DSL This ports all of the identity, no-op, simplification, and canonicalization related optimizations over from being hand-coded to the `peepmatic` DSL. This does not handle the branch-to-branch optimizations or most of the divide-by-constant optimizations.	2020-05-14 07:52:23 -07:00
Dan Gohman	fb0b9e3ae6	Change `proc_exit` to unwind the stack rather than exiting the host process. (#1646 ) * Remove Cranelift's OutOfBounds trap, which is no longer used. * Change proc_exit to unwind instead of exit the host process. This implements the semantics in https://github.com/WebAssembly/WASI/pull/235. Fixes #783. Fixes #993. * Fix exit-status tests on Windows. * Revert the wiggle changes and re-introduce the wasi-common implementations. * Move `wasi_proc_exit` into the wasmtime-wasi crate. * Revert the spec_testsuite change. * Remove the old proc_exit implementations. * Make `TrapReason` an implementation detail. * Allow exit status 2 on Windows too. * Fix a documentation link. * Really fix a documentation link.	2020-05-13 15:59:43 -07:00

1 2 3 4 5 ...

705 Commits