Commit Graph

958 Commits

Author SHA1 Message Date
Nick Fitzgerald
9a1f8038b7 peepmatic: Do not transplant instructions whose results are potentially used elsewhere 2020-05-14 07:52:23 -07:00
Nick Fitzgerald
090d1c2d32 cranelift: Port most of simple_preopt.rs over to the peepmatic DSL
This ports all of the identity, no-op, simplification, and canonicalization
related optimizations over from being hand-coded to the `peepmatic` DSL. This
does not handle the branch-to-branch optimizations or most of the
divide-by-constant optimizations.
2020-05-14 07:52:23 -07:00
Dan Gohman
fb0b9e3ae6 Change proc_exit to unwind the stack rather than exiting the host process. (#1646)
* Remove Cranelift's OutOfBounds trap, which is no longer used.

* Change proc_exit to unwind instead of exit the host process.

This implements the semantics in https://github.com/WebAssembly/WASI/pull/235.

Fixes #783.
Fixes #993.

* Fix exit-status tests on Windows.

* Revert the wiggle changes and re-introduce the wasi-common implementations.

* Move `wasi_proc_exit` into the wasmtime-wasi crate.

* Revert the spec_testsuite change.

* Remove the old proc_exit implementations.

* Make `TrapReason` an implementation detail.

* Allow exit status 2 on Windows too.

* Fix a documentation link.

* Really fix a documentation link.
2020-05-13 15:59:43 -07:00
Nick Fitzgerald
9b867b09c7 cranelift: Sign extend Imm64 immediates
When an instruction has an `Imm64` immediate, but operates on values of a
narrower width, we need to sign extend the value.

Fixes #1095
2020-05-12 15:44:48 -07:00
whitequark
4ec16fa057 Legalize 64 bit shifts on x86_32 using PSLLQ/PSRLQ.
Co-authored-by: iximeow <git@iximeow.net>
2020-05-09 03:28:19 -07:00
Andrew Brown
b26ca3cbdd Add test interpret support to filetests 2020-05-07 16:51:09 -07:00
Andrew Brown
b4238229c2 Cast DataValues to and from native types
Also, returns a `Result` in the `RunCommand::run` helper.
2020-05-07 16:51:09 -07:00
Chris Fallin
6d73fdb70a Merge pull request #1607 from cfallin/aarch64-stack-frame
Rework aarch64 stack frame implementation to use positive offsets.
2020-05-06 10:29:30 -07:00
Chris Fallin
a66724aafd Rework aarch64 stack frame implementation.
This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into *true* offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.
2020-05-06 09:23:55 -07:00
whitequark
162fcd3d75 Legalize [su]extend.i64 to iconst/sshr_imm + iconcat.
This was already done for [su]extend.i128, and is necessary for
codegen for 32-bit x86.
2020-05-05 16:08:58 -07:00
whitequark
14bdaf3ce3 Legalize ireduce.iN.i2N to isplit. 2020-05-05 14:13:30 -07:00
Andrew Brown
cd49ed9582 Add x86 legalization for sshr.i64x2 2020-05-05 12:01:46 -07:00
Chris Fallin
e39b4aba1c Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use it.
Previously, every call was lowered on AArch64 to a `call` instruction, which
takes a signed 26-bit PC-relative offset. Including the 2-bit left shift, this
gives a range of +/- 128 MB. Longer-distance offsets would cause an impossible
relocation record to be emitted (or rather, a record that a more sophisticated
linker would fix up by inserting a shim/veneer).

This commit adds a notion of "relocation distance" in the MachInst backends,
and provides this information for every call target and symbol reference. The
intent is that backends on architectures like AArch64, where there are different
offset sizes / addressing strategies to choose from, can either emit a regular
call or a load-64-bit-constant / call-indirect sequence, as necessary. This
avoids the need to implement complex linking behavior.

The MachInst driver code provides this information based on the "colocated" bit
in the CLIF symbol references, which appears to have been designed for this
purpose, or at least a similar one. Combined with the `use_colocated_libcalls`
setting, this allows client code to ensure that library calls can link to
library code at any location in the address space.

Separately, the `simplejit` example did not handle `Arm64Call`; rather than doing
so, it appears all that is necessary to get its tests to pass is to set the
`use_colocated_libcalls` flag to false, to make use of the above change. This
fixes the `libcall_function` unit-test in this crate.
2020-05-05 09:55:12 -07:00
Andrew Brown
d6796d0d23 Improve documentation of the filetest run command (#1645)
* Improve output display of RunCommand

The previous use of Debug for displaying `print` and `run` results was less than clear.

* Avoid checking the types of vectors during trampoline construction

Because DataValue only understands `V128` vectors, we avoid type-checking vector values when constructing the trampoline arguments.

* Improve the documentation of the filetest `run` command

Adds an up-to-date example of how to use the `run` and `print` directives and includes an actual use of the new directives in a SIMD arithmetic filetest.
2020-05-04 14:08:27 -05:00
Nick Fitzgerald
4471a82b0c Merge pull request #1635 from fitzgen/filetests-threads
Allow setting the number of filetest threads via the CRANELIFT_FILETESTS_THREADS env var
2020-05-01 10:06:26 -07:00
Andrew Brown
38dff29179 Add ability to call CLIF functions with arbitrary arguments in filetests
This resolves the work started in https://github.com/bytecodealliance/cranelift/pull/1231 and https://github.com/bytecodealliance/wasmtime/pull/1436. Cranelift filetests currently have the ability to run CLIF functions with a signature like `() -> b*` and check that the result is true under the `test run` directive. This PR adds the ability to call functions with arbitrary arguments and non-boolean returns and either print the result or check against a list of expected results:
 - `run` commands look like `; run: %add(2, 2) == 4` or `; run: %add(2, 2) != 5` and verify that the executed CLIF function returns the expected value
 - `print` commands look like `; print: %add(2, 2)` and print the result of the function to stdout

To make this work, this PR compiles a single Cranelift `Function` into a `CompiledFunction` using a `SingleFunctionCompiler`. Because we will not know the signature of the function until runtime, we use a `Trampoline` to place the values in the appropriate location for the calling convention; this should look a lot like what @alexcrichton is doing with `VMTrampoline` in wasmtime (see 3b7cb6ee64/crates/api/src/func.rs (L510-L526), 3b7cb6ee64/crates/jit/src/compiler.rs (L260)). To avoid re-compiling `Trampoline`s for the same function signatures, `Trampoline`s are cached in the `SingleFunctionCompiler`.
2020-04-30 11:21:00 -07:00
Nick Fitzgerald
c4292fb2be Allow setting the number of filetest threads via the CRANELIFT_FILETESTS_THREADS env var 2020-04-30 09:20:23 -07:00
Dan Gohman
864cf98c8d Update release notes, wasmtime 0.16, cranelift 0.63. 2020-04-29 17:30:25 -07:00
Nick Fitzgerald
c9b27b484e filecheck: Use std::fs::read_to_string instead of hand-rolled copy (#1627) 2020-04-28 17:54:41 -07:00
Alex Crichton
74eda8090c Implement stack limit checks for AArch64 (#1573)
This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the `stack_limit` argument purpose as well as
a function's global `stack_limit` directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!
2020-04-24 15:01:57 -05:00
Joey Gouly
f020f0812e arm64: Implement checks in division / remainder
This implements the divide by 0 and signed overflow checks that Wasm
specifies.

Copyright (c) 2020, Arm Limited.
2020-04-24 17:40:19 +02:00
Benjamin Bouvier
b6e6998713 aarch64: mask rotation counts and share code generation of left and right rotations;
Given an integer size N, a left rotation of K places is the same as a
right rotation of N - K places. This means we can use right rotations to
implement left rotations too.

The Cranelift's rotation semantics are inherited from WebAssembly, which
mean the rotation count is truncated modulo the operand's bit size. Note
the ROR aarch64 instruction has the same semantics, when both input
operands are registers.
2020-04-24 12:36:59 +02:00
Benjamin Bouvier
de92b7e014 aarch64: implement correct float-to-int conversion semantics;
These are inherited from wasm semantics.
2020-04-24 11:51:35 +02:00
Andrew Brown
d24f23285b Legalize i8x16.sshr using pack/unpack instructions
Due to arithmetic shift behavior, this legalization cannot easily use the masks for i8x16.ushr or i8x16.ishl
2020-04-23 10:55:54 -07:00
Andrew Brown
fb6e8f784d Add x86 pack instructions 2020-04-23 10:55:54 -07:00
Andrew Brown
f5fc09f64a Add x86 unpack instructions 2020-04-23 10:55:54 -07:00
Andrew Brown
5f0286696c Add x86 implentation of 8x16 ishl
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
2020-04-23 10:55:54 -07:00
Alex Crichton
d1aa86f91a Add AArch64 tests to CI (#1526)
* Add AArch64 tests to CI

This commit enhances our CI with an AArch64 builder. Currently we have
no physical hardware to run on so for now we run all tests in an
emulator. The AArch64 build is cross-compiled from x86_64 from Linux.
Tests all happen in release mode with a recent version of QEMU (recent
version because it's so much faster, and in release mode because debug
mode tests take quite a long time in an emulator).

The goal here was not to get all tests passing on CI, but rather to get
AArch64 running on CI and get it green at the same time. To achieve that
goal many tests are now ignored on aarch64 platforms. Many tests fail
due to unimplemented functionality in the aarch64 backend (#1521), and
all wasmtime tests involving compilation are also disabled due to
panicking attempting to generate generate instruction offset information
for trap symbolication (#1523).

Despite this, though, all Cranelift tests and other wasmtime tests
should be runnin on AArch64 through QEMU with this PR. Additionally
we'll have an AArch64 binary release of Wasmtime for Linux, although it
won't be too useful just yet since it will panic on almost all wasm
modules.

* Review comments
2020-04-22 12:56:54 -05:00
Alex Crichton
c9a0ba81a0 Implement interrupting wasm code, reimplement stack overflow (#1490)
* Implement interrupting wasm code, reimplement stack overflow

This commit is a relatively large change for wasmtime with two main
goals:

* Primarily this enables interrupting executing wasm code with a trap,
  preventing infinite loops in wasm code. Note that resumption of the
  wasm code is not a goal of this commit.

* Additionally this commit reimplements how we handle stack overflow to
  ensure that host functions always have a reasonable amount of stack to
  run on. This fixes an issue where we might longjmp out of a host
  function, skipping destructors.

Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are.

A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.

When interrupts are delivered and what triggers them has been left up to
embedders of the `wasmtime` crate. The `wasmtime::Store` type has a
method to acquire an `InterruptHandle`, where `InterruptHandle` is a
`Send` and `Sync` type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.

Some fallout included from this change is:

* Unix signal handlers are no longer registered with `SA_ONSTACK`.
  Instead they run on the native stack the thread was already using.
  This is possible since stack overflow isn't handled by hitting the
  guard page, but rather it's explicitly checked for in wasm now. Native
  stack overflow will continue to abort the process as usual.

* Unix sigaltstack management is now no longer necessary since we don't
  use it any more.

* Windows no longer has any need to reset guard pages since we no longer
  try to recover from faults on guard pages.

* On all targets probestack intrinsics are disabled since we use a
  different mechanism for catching stack overflow.

* The C API has been updated with interrupts handles. An example has
  also been added which shows off how to interrupt a module.

Closes #139
Closes #860
Closes #900

* Update comment about magical interrupt value

* Store stack limit as a global value, not a closure

* Run rustfmt

* Handle review comments

* Add a comment about SA_ONSTACK

* Use `usize` for type of `INTERRUPTED`

* Parse human-readable durations

* Bring back sigaltstack handling

Allows libstd to print out stack overflow on failure still.

* Add parsing and emission of stack limit-via-preamble

* Fix new example for new apis

* Fix host segfault test in release mode

* Fix new doc example
2020-04-21 11:03:28 -07:00
Chris Fallin
d88098744b Merge pull request #1527 from cfallin/aarch64-fp-vcode-test
Add vcode test for floating-point, and fix two FP bugs.
2020-04-21 09:35:23 -07:00
Joey Gouly
ad9be0d445 arm64: Support bool constants
Copyright (c) 2020, Arm Limited.
2020-04-21 12:24:57 +02:00
bjorn3
cb1c9ef085 Fix printing of LoadAddr 2020-04-18 13:24:06 +02:00
bjorn3
4960c9a0c6 Add tests for stack_{addr,load,store} 2020-04-18 13:24:06 +02:00
Chris Fallin
5e53482a13 arm64: Support less-than-64-bit integers in Bitrev, Clz, Cls, and Popcnt instructions.
Includes a temporary bugfix for popcnt with 32-bit operand. The popcnt
issue was initially identified by Benjamin Bouvier <public@benj.me>, and
the root cause was debugged by Joey Gouly <joey.gouly@arm.com>. This
patch is simply a quick fix that zero-extends the operand to 64 bits;
Joey plans to contribute a more permanent fix shortly (tracked in
 #1537).
2020-04-17 16:42:46 -07:00
Andrew Brown
3f47291f2e Add x86 implentation of 8x16 ushr
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
2020-04-17 11:59:47 -07:00
Andrew Brown
65856987cd Add const_addr instruction
This new instruction calculates the effective address of a constant in the constant pool using LEA (x86).
2020-04-17 11:59:47 -07:00
Andrew Brown
fa35d88878 Verify that constant values are the correct size
Since we now allow constants of any size, we have to verify that `vconst` (currently the only user of the constant pool) is accessing constants that match its controlling type.
2020-04-17 11:59:47 -07:00
Andrew Brown
0672d1dc0f Declare constants in the function preamble
This allows us to give names to constants in the constant pool and then use these names in the function body. The original behavior, specifiying the constant value as an instruction immediate, is still supported as a shortcut but some filetests had to change since the canonical way of printing the CLIF constants is now in the preamble.
2020-04-17 11:59:47 -07:00
Chris Fallin
2b68abed6a Add vcode test for floating-point, and fix two FP bugs.
- Added a filetest for the vcode output of lowering every handled FP opcode.

- Fixed two bugs that were discovered while going through the lowerings:
  - Saturating FP->int operators would return `u{32,64}::MIN` rather than
    `0` for a NaN input.
  - `fcopysign` did not mask off the sign bit of the value whose sign is
    overwritten.

These probably would have been caught by Wasm conformance tests soon
(and the validity of these lowerings will ultimately be tested this way)
but let's get them right by inspection, too!
2020-04-16 13:43:52 -07:00
Peter Huene
f7e9f86ba9 Refactor unwind generation in Cranelift.
This commit makes the following changes to unwind information generation in
Cranelift:

* Remove frame layout change implementation in favor of processing the prologue
  and epilogue instructions when unwind information is requested.  This also
  means this work is no longer performed for Windows, which didn't utilize it.
  It also helps simplify the prologue and epilogue generation code.

* Remove the unwind sink implementation that required each unwind information
  to be represented in final form. For FDEs, this meant writing a
  complete frame table per function, which wastes 20 bytes or so for each
  function with duplicate CIEs.  This also enables Cranelift users to collect the
  unwind information and write it as a single frame table.

* For System V calling convention, the unwind information is no longer stored
  in code memory (it's only a requirement for Windows ABI to do so).  This allows
  for more compact code memory for modules with a lot of functions.

* Deletes some duplicate code relating to frame table generation.  Users can
  now simply use gimli to create a frame table from each function's unwind
  information.

Fixes #1181.
2020-04-16 11:15:32 -07:00
Chris Fallin
7da6101732 Merge pull request #1494 from cfallin/arm64-merge
Add new `MachInst` backend and ARM64 support.
2020-04-16 10:02:02 -07:00
Chris Fallin
48cf2c2f50 Address review comments:
- Undo temporary changes to default features (`all-arch`) and a
  signal-handler test.
- Remove `SIGTRAP` handler: no longer needed now that we've found an
  "undefined opcode" option on ARM64.
- Rename pp.rs to pretty_print.rs in machinst/.
- Only use empty stack-probe on non-x86. As per a comment in
  rust-lang/compiler-builtins [1], LLVM only supports stack probes on
  x86 and x86-64. Thus, on any other CPU architecture, we cannot refer
  to `__rust_probestack`, because it does not exist.
- Rename arm64 to aarch64.
- Use `target` directive in vcode filetests.
- Run the flags verifier, but without encinfo, when using new backends.
- Clean up warning overrides.
- Fix up use of casts: use u32::from(x) and siblings when possible,
  u32::try_from(x).unwrap() when not, to avoid silent truncation.
- Take immutable `Function` borrows as input; we don't actually
  mutate the input IR.
- Lots of other miscellaneous cleanups.

[1] cae3e6ea23/src/probestack.rs (L39)
2020-04-15 17:21:28 -07:00
Samrat Man Singh
4d34c22a1c Use F64X2 as type when saving and restoring XMM registers
When adding floating-point registers as callee-saved register to
block- and function parameter lists add them as `F64X2` arguments.
2020-04-13 09:48:08 -07:00
Chris Fallin
3de504c24c ARM64 backend, part 11 / 11: filetests for ARM64 VCode.
This patch, the last in the series, adds the filetests for the new ARM64
backend. The filetests cover most of the opcodes, except for the
recently-added floating point support.

This patch contains code written by Julian Seward <jseward@acm.org> and
Benjamin Bouvier <public@benj.me>, originally developed on a side-branch
before rebasing and condensing into this patch series. See the `arm64`
branch at `https://github.com/cfallin/wasmtime` for original development
history.

This patch also contains code written by Joey Gouly
<joey.gouly@arm.com> and contributed to the above branch. These
contributions are "Copyright (c) 2020, Arm Limited."

Co-authored-by: Julian Seward <jseward@acm.org>
Co-authored-by: Benjamin Bouvier <public@benj.me>
Co-authored-by: Joey Gouly <joey.gouly@arm.com>
2020-04-11 17:53:01 -07:00
Chris Fallin
402303f67a ARM64 backend, part 10 / 11: filetest support for VCode tests.
This patch adds support for filetests with the `vcode` type. This allows
test cases to feed CLIF into the new backend, produce VCode output with
machine instructions, and then perform matching against the
pretty-printed text representation of the VCode.

Tests for the new ARM64 backend using this infrastructure will come in a
followup patch.
2020-04-11 17:52:56 -07:00
iximeow
4cca510085 Windows FPRs preservation (#1216)
Preserve FPRs as required by the Windows fastcall calling convention.

This exposes an implementation limit due to Cranelift's approach to stack layout, which conflicts with expectations Windows makes in SEH layout - functions where the Cranelift user desires fastcall unwind information, that require preservation of an ABI-reserved FPR, that have a stack frame 240 bytes or larger, now produce an error when compiled. Several wasm spectests were disabled because they would trip this limit. This is a temporary constraint that should be fixed promptly.

Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
2020-04-10 13:27:20 -07:00
Benjamin Bouvier
f4c4a84b84 cranelift codegen: pass source locations with external relocations; 2020-04-07 11:52:39 +02:00
Andrew Brown
5297466add Expose parsing of run commands and trivially use in testing framework
This is necessary to avoid build errors from dead code (and I didn't want to litter all of the structs with `#[allow(dead_code)]` just to remove in a subsequent PR).
2020-04-03 13:25:10 -07:00
Andrew Brown
f56f9fb01c Remove unparseable run commands in existing tests 2020-04-03 13:25:10 -07:00
Dan Gohman
fde5ddf159 Fixes for 0.15 (#1449)
* Wasmtime 0.15.0 and Cranelift 0.62.0. (#1398)

* Bump more ad-hoc versions.

* Add build.rs to wasi-common's Cargo.toml.

* Update the env var name in more places.

* Remove a redundant echo.
2020-04-03 13:13:37 -07:00