wasmtime

Author	SHA1	Message	Date
teapotd	759cc3e751	Implement passing arguments by ref for win64 ABI	2020-05-29 20:12:41 +02:00
Nick Fitzgerald	94380bf2b7	Merge pull request #1510 from teapotd/abi-i128-fix Always check if struct-return parameter is needed	2020-05-29 10:02:16 -07:00
whitequark	a180b5b393	x86_32: fix stack_addr encoding. Consider this testcase: target i686 function u0:0() -> i32 system_v { ss0 = explicit_slot 0 block0: v2 = stack_addr.i32 ss0 return v2 } Before this commit, in 32-bit mode the x86 backend would generate incorrect code for stack addresses: 0: 55 push ebp 1: 89 e5 mov ebp, esp 3: 83 ec 08 sub esp, 8 6: 8d 44 24 00 lea eax, [esp] a: 00 00 add byte ptr [eax], al c: 00 83 c4 08 5d c3 add byte ptr [ebx - 0x3ca2f73c], al This happened because the ModRM byte indicated a disp8 encoding, but the instruction actually used a disp32 encoding. After this commit, correct code is generated: 0: 55 push ebp 1: 89 e5 mov ebp, esp 3: 83 ec 08 sub esp, 8 6: 8d 84 24 00 00 00 00 lea eax, [esp] d: 83 c4 08 add esp, 8 10: 5d pop ebp 11: c3 ret	2020-05-29 09:17:36 -07:00
whitequark	880e692fd4	x86: add encoding for bnot.b1. Fixes #1743. Co-authored-by: iximeow <git@iximeow.net>	2020-05-28 08:43:25 -07:00
Andrew Brown	b017844bef	Fix interpreter semantics of 'irsub_imm' Previously it used `arg - imm` but the functionality should be a wrapping `imm - arg` (see `cranelift/codegen/meta/src/shared/instructions.rs`).	2020-05-28 16:28:27 +02:00
teapotd	fbac2e53f9	Make vconst BxN match specification	2020-05-27 09:37:13 -07:00
Andrew Brown	628a9f0eaa	Print more detailed `test run` failures (#1764 )	2020-05-27 09:04:46 -05:00
teapotd	9e70a64728	Legalize sret call arguments	2020-05-25 20:03:24 +02:00
teapotd	b18846057f	Add system_v legalizer tests for i128 args	2020-05-25 20:03:24 +02:00
teapotd	6465003899	Run popcnt.i128 legalization test on x86_64	2020-05-25 20:03:24 +02:00
teapotd	0f55bb4b8d	Always check if struct-return parameter is needed	2020-05-25 20:03:24 +02:00
Chris Fallin	51f9ac2150	Merge pull request #1741 from cfallin/filetest-vcode-compile Merge `vcode` filetest mode into `compile`.	2020-05-22 18:57:21 -07:00
Chris Fallin	48573b52b2	Merge `vcode` filetest mode into `compile`. I hadn't realized before that the filetest backend for `test vcode` is doing essentially what `compile` is doing, but for new (`MachInst`) backends: it is just getting a disassembly and running it through filecheck. There's no reason not to reuse `test compile` for the AArch64 tests as well. This was motivated by the desire to have "this IR compiles successfully" tests work on both x86 and AArch64. It seems this should work fine by adding multiple `target` directives when a test case should be compile-tested on multiple architectures.	2020-05-22 17:28:48 -07:00
Chris Fallin	73537e72c0	Merge pull request #1732 from jgouly/copysign-fpu arm64: Use FPU instrctions for Fcopysign	2020-05-22 17:25:33 -07:00
Peter Huene	f36539130b	Merge pull request #1734 from peterhuene/fix-saved-fprs Cranelift: Fix FPR saving and shadow space allocation for Windows x64.	2020-05-22 12:06:37 -07:00
whitequark	b2e8ed4dc9	cranelift: add i64.[us]{div,rem} libcalls. These libcalls are useful for 32-bit platforms.	2020-05-22 11:41:56 +00:00
Joey Gouly	02c3f238f8	arm64: Use FPU instrctions for Fcopysign Copyright (c) 2020, Arm Limited.	2020-05-21 18:14:12 +01:00
Peter Huene	78c3091e84	Fix FPR saving and shadow space allocation for Windows x64. This commit fixes both how FPR callee-saved registers are saved and how the shadow space allocation occurs when laying out the stack for Windows x64 calling convention. Importantly, this commit removes the compiler limitation of stack size for Windows x64 that was imposed because FPR saves previously couldn't always be represented in the unwind information. The FPR saves are now performed without using stack slots, much like how the callee-saved GPRs are saved. The total CSR space is given to `layout_stack` so that it is included in the frame size and to offset the layout of spills and explicit slots. The FPR saves are now done via an RSP offset (post adjustment) and they always follow the GPR saves on the stack. A simpler calculation can now be made to determine the proper offsets of the FPR saves for representing the unwind information. Additionally, the shadow space is no longer treated as an incoming argument, but an explicit stack slot that gets laid out at the lowest address possible in the local frame. This prevents `layout_stack` from putting a spill or explicit slot in this reserved space. In the future, `layout_stack` should take advantage of the caller-provided shadow space for spills, but this commit does not attempt to address that. The shadow space is now omitted from the local frame for leaf functions. Fixes #1728. Fixes #1587. Fixes #1475.	2020-05-20 15:37:30 -07:00
Chris Fallin	72e6be9342	Rework of MachInst isel, branch fixups and lowering, and block ordering. This patch includes: - A complete rework of the way that CLIF blocks and edge blocks are lowered into VCode blocks. The new mechanism in `BlockLoweringOrder` computes RPO over the CFG, but with a twist: it merges edge blocks intto heads or tails of original CLIF blocks wherever possible, and it does this without ever actually materializing the full nodes-plus-edges graph first. The backend driver lowers blocks in final order so there's no need to reshuffle later. - A new `MachBuffer` that replaces the `MachSection`. This is a special version of a code-sink that is far more than a humble `Vec<u8>`. In particular, it keeps a record of label definitions and label uses, with a machine-pluggable `LabelUse` trait that defines various types of fixups (basically internal relocations). Importantly, it implements some simple peephole-style branch rewrites inline in the emission pass, without any separate traversals over the code to use fallthroughs, swap taken/not-taken arms, etc. It tracks branches at the tail of the buffer and can (i) remove blocks that are just unconditional branches (by redirecting the label), (ii) understand a conditional/unconditional pair and swap the conditional polarity when it's helpful; and (iii) remove branches that branch to the fallthrough PC. The `MachBuffer` also implements branch-island support. On architectures like AArch64, this is needed to allow conditional branches within plausibly-attainable ranges (+/- 1MB on AArch64 specifically). It also does this inline while streaming through the emission, without any sort of fixpoint algorithm or later moving of code, by simply tracking outstanding references and "deadlines" and emitting an island just-in-time when we're in danger of going out of range. - A rework of the instruction selector driver. This is largely following the same algorithm as before, but is cleaned up significantly, in particular in the API: the machine backend can ask for an input arg and get any of three forms (constant, register, producing instruction), indicating it needs the register or can merge the constant or producing instruction as appropriate. This new driver takes special care to emit constants right at use-sites (and at phi inputs), minimizing their live-ranges, and also special-cases the "pinned register" to avoid superfluous moves. Overall, on `bz2.wasm`, the results are: wasmtime full run (compile + runtime) of bz2: baseline: 9774M insns, 9742M cycles, 3.918s w/ changes: 7012M insns, 6888M cycles, 2.958s (24.5% faster, 28.3% fewer insns) clif-util wasm compile bz2: baseline: 2633M insns, 3278M cycles, 1.034s w/ changes: 2366M insns, 2920M cycles, 0.923s (10.7% faster, 10.1% fewer insns) All numbers are averages of two runs on an Ampere eMAG.	2020-05-16 23:08:22 -07:00
Nick Fitzgerald	fb7a690efc	Merge pull request #1687 from fitzgen/sign-extend-immediates cranelift: Sign extend `Imm64` immediates	2020-05-14 10:09:53 -07:00
Nick Fitzgerald	52c6ece5f3	peepmatic: Make peepmatic optional to enable Rather than outright replacing parts of our existing peephole optimizations passes, this makes peepmatic an optional cargo feature that can be enabled. This allows us to take a conservative approach with enabling peepmatic everywhere, while also allowing us to get it in-tree and make it easier to collaborate on improving it quickly.	2020-05-14 07:52:23 -07:00
Nick Fitzgerald	6e135b3aea	peepmatic: Fix a failed assertion due to extra iterations after fixed point After replacing an instruction with an alias to an earlier value, trying to further optimize that value is unnecessary, since we've already processed it, and also was triggering an assertion.	2020-05-14 07:52:23 -07:00
Nick Fitzgerald	9a1f8038b7	peepmatic: Do not transplant instructions whose results are potentially used elsewhere	2020-05-14 07:52:23 -07:00
Nick Fitzgerald	090d1c2d32	cranelift: Port most of `simple_preopt.rs` over to the `peepmatic` DSL This ports all of the identity, no-op, simplification, and canonicalization related optimizations over from being hand-coded to the `peepmatic` DSL. This does not handle the branch-to-branch optimizations or most of the divide-by-constant optimizations.	2020-05-14 07:52:23 -07:00
Dan Gohman	fb0b9e3ae6	Change `proc_exit` to unwind the stack rather than exiting the host process. (#1646 ) * Remove Cranelift's OutOfBounds trap, which is no longer used. * Change proc_exit to unwind instead of exit the host process. This implements the semantics in https://github.com/WebAssembly/WASI/pull/235. Fixes #783. Fixes #993. * Fix exit-status tests on Windows. * Revert the wiggle changes and re-introduce the wasi-common implementations. * Move `wasi_proc_exit` into the wasmtime-wasi crate. * Revert the spec_testsuite change. * Remove the old proc_exit implementations. * Make `TrapReason` an implementation detail. * Allow exit status 2 on Windows too. * Fix a documentation link. * Really fix a documentation link.	2020-05-13 15:59:43 -07:00
Nick Fitzgerald	9b867b09c7	cranelift: Sign extend `Imm64` immediates When an instruction has an `Imm64` immediate, but operates on values of a narrower width, we need to sign extend the value. Fixes #1095	2020-05-12 15:44:48 -07:00
whitequark	4ec16fa057	Legalize 64 bit shifts on x86_32 using PSLLQ/PSRLQ. Co-authored-by: iximeow <git@iximeow.net>	2020-05-09 03:28:19 -07:00
Andrew Brown	b26ca3cbdd	Add `test interpret` support to filetests	2020-05-07 16:51:09 -07:00
Andrew Brown	b4238229c2	Cast DataValues to and from native types Also, returns a `Result` in the `RunCommand::run` helper.	2020-05-07 16:51:09 -07:00
Chris Fallin	6d73fdb70a	Merge pull request #1607 from cfallin/aarch64-stack-frame Rework aarch64 stack frame implementation to use positive offsets.	2020-05-06 10:29:30 -07:00
Chris Fallin	a66724aafd	Rework aarch64 stack frame implementation. This PR changes the aarch64 ABI implementation to use positive offsets from SP, rather than negative offsets from FP, to refer to spill slots and stack-local storage. This allows for better addressing-mode options, and hence slightly better code: e.g., the unsigned scaled 12-bit offset mode can be used to reach anywhere in a 32KB frame without extra address-construction instructions, whereas negative offsets are limited to a signed 9-bit unscaled mode (-256 bytes). To enable this, the PR introduces a notion of "nominal SP offsets" as a virtual addressing mode, lowered during the emission pass. The offsets are relative to "SP after adjusting downward to allocate stack/spill slots", but before pushing clobbers. This allows the addressing-mode expressions to be generated before register allocation (or during it, for spill/reload sequences). To convert these offsets into true offsets from SP, we need to track how much further SP is moved downward, and compensate for this. We do so with "virtual SP offset adjustment" pseudo-instructions: these are seen by the emission pass, and result in no instruction (0 byte output), but update state that is now threaded through each instruction emission in turn. In this way, we can push e.g. stack args for a call and adjust the virtual SP offset, allowing reloads from nominal-SP-relative spillslots while we do the argument setup with "real SP offsets" at the same time.	2020-05-06 09:23:55 -07:00
whitequark	162fcd3d75	Legalize [su]extend.i64 to iconst/sshr_imm + iconcat. This was already done for [su]extend.i128, and is necessary for codegen for 32-bit x86.	2020-05-05 16:08:58 -07:00
whitequark	14bdaf3ce3	Legalize ireduce.iN.i2N to isplit.	2020-05-05 14:13:30 -07:00
Andrew Brown	cd49ed9582	Add x86 legalization for sshr.i64x2	2020-05-05 12:01:46 -07:00
Chris Fallin	e39b4aba1c	Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use it. Previously, every call was lowered on AArch64 to a `call` instruction, which takes a signed 26-bit PC-relative offset. Including the 2-bit left shift, this gives a range of +/- 128 MB. Longer-distance offsets would cause an impossible relocation record to be emitted (or rather, a record that a more sophisticated linker would fix up by inserting a shim/veneer). This commit adds a notion of "relocation distance" in the MachInst backends, and provides this information for every call target and symbol reference. The intent is that backends on architectures like AArch64, where there are different offset sizes / addressing strategies to choose from, can either emit a regular call or a load-64-bit-constant / call-indirect sequence, as necessary. This avoids the need to implement complex linking behavior. The MachInst driver code provides this information based on the "colocated" bit in the CLIF symbol references, which appears to have been designed for this purpose, or at least a similar one. Combined with the `use_colocated_libcalls` setting, this allows client code to ensure that library calls can link to library code at any location in the address space. Separately, the `simplejit` example did not handle `Arm64Call`; rather than doing so, it appears all that is necessary to get its tests to pass is to set the `use_colocated_libcalls` flag to false, to make use of the above change. This fixes the `libcall_function` unit-test in this crate.	2020-05-05 09:55:12 -07:00
Andrew Brown	d6796d0d23	Improve documentation of the filetest `run` command (#1645 ) * Improve output display of RunCommand The previous use of Debug for displaying `print` and `run` results was less than clear. * Avoid checking the types of vectors during trampoline construction Because DataValue only understands `V128` vectors, we avoid type-checking vector values when constructing the trampoline arguments. * Improve the documentation of the filetest `run` command Adds an up-to-date example of how to use the `run` and `print` directives and includes an actual use of the new directives in a SIMD arithmetic filetest.	2020-05-04 14:08:27 -05:00
Nick Fitzgerald	4471a82b0c	Merge pull request #1635 from fitzgen/filetests-threads Allow setting the number of filetest threads via the CRANELIFT_FILETESTS_THREADS env var	2020-05-01 10:06:26 -07:00
Andrew Brown	38dff29179	Add ability to call CLIF functions with arbitrary arguments in filetests This resolves the work started in https://github.com/bytecodealliance/cranelift/pull/1231 and https://github.com/bytecodealliance/wasmtime/pull/1436. Cranelift filetests currently have the ability to run CLIF functions with a signature like `() -> b*` and check that the result is true under the `test run` directive. This PR adds the ability to call functions with arbitrary arguments and non-boolean returns and either print the result or check against a list of expected results: - `run` commands look like `; run: %add(2, 2) == 4` or `; run: %add(2, 2) != 5` and verify that the executed CLIF function returns the expected value - `print` commands look like `; print: %add(2, 2)` and print the result of the function to stdout To make this work, this PR compiles a single Cranelift `Function` into a `CompiledFunction` using a `SingleFunctionCompiler`. Because we will not know the signature of the function until runtime, we use a `Trampoline` to place the values in the appropriate location for the calling convention; this should look a lot like what @alexcrichton is doing with `VMTrampoline` in wasmtime (see `3b7cb6ee64/crates/api/src/func.rs (L510-L526)`, `3b7cb6ee64/crates/jit/src/compiler.rs (L260)`). To avoid re-compiling `Trampoline`s for the same function signatures, `Trampoline`s are cached in the `SingleFunctionCompiler`.	2020-04-30 11:21:00 -07:00
Nick Fitzgerald	c4292fb2be	Allow setting the number of filetest threads via the CRANELIFT_FILETESTS_THREADS env var	2020-04-30 09:20:23 -07:00
Dan Gohman	864cf98c8d	Update release notes, wasmtime 0.16, cranelift 0.63.	2020-04-29 17:30:25 -07:00
Nick Fitzgerald	c9b27b484e	filecheck: Use `std::fs::read_to_string` instead of hand-rolled copy (#1627 )	2020-04-28 17:54:41 -07:00
Alex Crichton	74eda8090c	Implement stack limit checks for AArch64 (#1573 ) This commit implements the stack limit checks in cranelift for the AArch64 backend. This gets the `stack_limit` argument purpose as well as a function's global `stack_limit` directive working for the AArch64 backend. I've tested this locally on some hardware and in an emulator and it looks to be working for basic tests, but I've never really done AArch64 before so some scrutiny on the instructions would be most welcome!	2020-04-24 15:01:57 -05:00
Joey Gouly	f020f0812e	arm64: Implement checks in division / remainder This implements the divide by 0 and signed overflow checks that Wasm specifies. Copyright (c) 2020, Arm Limited.	2020-04-24 17:40:19 +02:00
Benjamin Bouvier	b6e6998713	aarch64: mask rotation counts and share code generation of left and right rotations; Given an integer size N, a left rotation of K places is the same as a right rotation of N - K places. This means we can use right rotations to implement left rotations too. The Cranelift's rotation semantics are inherited from WebAssembly, which mean the rotation count is truncated modulo the operand's bit size. Note the ROR aarch64 instruction has the same semantics, when both input operands are registers.	2020-04-24 12:36:59 +02:00
Benjamin Bouvier	de92b7e014	aarch64: implement correct float-to-int conversion semantics; These are inherited from wasm semantics.	2020-04-24 11:51:35 +02:00
Andrew Brown	d24f23285b	Legalize i8x16.sshr using pack/unpack instructions Due to arithmetic shift behavior, this legalization cannot easily use the masks for i8x16.ushr or i8x16.ishl	2020-04-23 10:55:54 -07:00
Andrew Brown	fb6e8f784d	Add x86 pack instructions	2020-04-23 10:55:54 -07:00
Andrew Brown	f5fc09f64a	Add x86 unpack instructions	2020-04-23 10:55:54 -07:00
Andrew Brown	5f0286696c	Add x86 implentation of 8x16 `ishl` This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.	2020-04-23 10:55:54 -07:00
Alex Crichton	d1aa86f91a	Add AArch64 tests to CI (#1526 ) * Add AArch64 tests to CI This commit enhances our CI with an AArch64 builder. Currently we have no physical hardware to run on so for now we run all tests in an emulator. The AArch64 build is cross-compiled from x86_64 from Linux. Tests all happen in release mode with a recent version of QEMU (recent version because it's so much faster, and in release mode because debug mode tests take quite a long time in an emulator). The goal here was not to get all tests passing on CI, but rather to get AArch64 running on CI and get it green at the same time. To achieve that goal many tests are now ignored on aarch64 platforms. Many tests fail due to unimplemented functionality in the aarch64 backend (#1521), and all wasmtime tests involving compilation are also disabled due to panicking attempting to generate generate instruction offset information for trap symbolication (#1523). Despite this, though, all Cranelift tests and other wasmtime tests should be runnin on AArch64 through QEMU with this PR. Additionally we'll have an AArch64 binary release of Wasmtime for Linux, although it won't be too useful just yet since it will panic on almost all wasm modules. * Review comments	2020-04-22 12:56:54 -05:00

... 4 5 6 7 8 ...

930 Commits