Commit Graph

724 Commits

Author SHA1 Message Date
Benjamin Bouvier
528d3c1355 machinst: Steal the used/defs Sets when emitting a call in ABICall; 2020-05-07 12:24:02 +02:00
Benjamin Bouvier
19d8a7f1fb machinst: Reuse memory accross loop iterations in lowering; 2020-05-07 12:24:02 +02:00
Benjamin Bouvier
b24b711c16 machinst: Reduce the number of vec allocations for edge blocks; 2020-05-07 12:24:02 +02:00
Benjamin Bouvier
9215b610ef machinst: Avoid a lot of short-lived allocations in ABICall; 2020-05-07 12:24:02 +02:00
Benjamin Bouvier
4f919c6460 machinst: bump regalloc to 0.0.23 and return a slice on the successor indexes, in block_succs; 2020-05-07 12:24:02 +02:00
Julian Seward
48521393ae Update to regalloc.rs version 0.22. 2020-05-06 20:16:31 +02:00
Chris Fallin
6d73fdb70a Merge pull request #1607 from cfallin/aarch64-stack-frame
Rework aarch64 stack frame implementation to use positive offsets.
2020-05-06 10:29:30 -07:00
Chris Fallin
a66724aafd Rework aarch64 stack frame implementation.
This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).

To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).

To convert these offsets into *true* offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.
2020-05-06 09:23:55 -07:00
Benjamin Bouvier
1d90751ba9 machinst: Avoid a full instructions traversal of all the blocks when computing the final block ordering; 2020-05-06 15:13:25 +02:00
whitequark
162fcd3d75 Legalize [su]extend.i64 to iconst/sshr_imm + iconcat.
This was already done for [su]extend.i128, and is necessary for
codegen for 32-bit x86.
2020-05-05 16:08:58 -07:00
whitequark
14bdaf3ce3 Legalize ireduce.iN.i2N to isplit. 2020-05-05 14:13:30 -07:00
Andrew Brown
cd49ed9582 Add x86 legalization for sshr.i64x2 2020-05-05 12:01:46 -07:00
Chris Fallin
59039df001 Merge pull request #1570 from cfallin/fix-long-range-aarch64-call
Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use new long-distance call.
2020-05-05 10:45:55 -07:00
Chris Fallin
e39b4aba1c Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use it.
Previously, every call was lowered on AArch64 to a `call` instruction, which
takes a signed 26-bit PC-relative offset. Including the 2-bit left shift, this
gives a range of +/- 128 MB. Longer-distance offsets would cause an impossible
relocation record to be emitted (or rather, a record that a more sophisticated
linker would fix up by inserting a shim/veneer).

This commit adds a notion of "relocation distance" in the MachInst backends,
and provides this information for every call target and symbol reference. The
intent is that backends on architectures like AArch64, where there are different
offset sizes / addressing strategies to choose from, can either emit a regular
call or a load-64-bit-constant / call-indirect sequence, as necessary. This
avoids the need to implement complex linking behavior.

The MachInst driver code provides this information based on the "colocated" bit
in the CLIF symbol references, which appears to have been designed for this
purpose, or at least a similar one. Combined with the `use_colocated_libcalls`
setting, this allows client code to ensure that library calls can link to
library code at any location in the address space.

Separately, the `simplejit` example did not handle `Arm64Call`; rather than doing
so, it appears all that is necessary to get its tests to pass is to set the
`use_colocated_libcalls` flag to false, to make use of the above change. This
fixes the `libcall_function` unit-test in this crate.
2020-05-05 09:55:12 -07:00
Benjamin Bouvier
fa54422854 Add a work-in-progress backend for x86_64 using the new instruction selection;
Most of the work is credited to Julian Seward.

Co-authored-by: Julian Seward <jseward@acm.org>
Co-authored-by: Chris Fallin <cfallin@mozilla.com>
2020-05-05 16:35:41 +02:00
Chris Fallin
8393412c40 Merge pull request #1632 from cfallin/aarch64-fix-srclocs
MachInst backend: attach SourceLoc span information to all ranges.
2020-04-30 16:13:55 -07:00
Chris Fallin
964c6087bd MachInst backend: attach SourceLoc span information to all ranges.
Previously, the SourceLoc information transferred in `VCode` only
included PC-spans for non-default SourceLocs. I realized that the
invariant we're supposed to keep here is that every PC is covered; if no
source information, just use `SourceLoc::default()`.

This was spurred by @bjorn3's comment in #1575 (thanks!).
2020-04-30 15:40:55 -07:00
Andrew Brown
49622bde58 Use complex load-extend instructions in optimize_complex_addresses; fixes #1186 2020-04-30 11:38:01 -07:00
Andrew Brown
a312506262 Add x86 complex encodings for SIMD load-extend instructions 2020-04-30 11:38:01 -07:00
Andrew Brown
2048d3d30c Add x86 encodings for same-size bint conversions up to 64 bits 2020-04-30 11:21:00 -07:00
Yury Delendik
1873c0ae46 Fix value label ranges resolution (#1572)
There was a bug how value labels were resolved, which caused some DWARF expressions not be transformed, e.g. those are in the registers.

*    Implements FIXME in expression.rs
*    Move TargetIsa from CompiledExpression structure
*    Fix expression format for GDB
*    Add tests for parsing
*    Proper logic in ValueLabelRangesBuilder::process_label
*    Tests for ValueLabelRangesBuilder
*    Refactor build_with_locals to return Iterator instead of Vec<_>
*    Misc comments and magical numbers
2020-04-30 08:07:55 -05:00
Benjamin Bouvier
b7cfd39b53 aarch64: split emit tests into its own file;
This is done to satisfy a check done on the maximal file's size when
vendoring Rust source code into Mozilla central's repository.
2020-04-30 13:50:45 +02:00
Benjamin Bouvier
4c066b1c73 codegen: split lower.rs into multiple files;
This splits off lower.rs into two files: lower.rs keeps all the utility
functions, while lower_inst.rs contains the (gigantic!) function
lowering a single Cranelift instruction into vcode.

This is done to satisfy a check done on the maximal file's size when
vendoring Rust source code into Mozilla central's repository.
2020-04-30 13:50:45 +02:00
Benjamin Bouvier
a2b6c19861 Fix arm32 build: ensure that the expand group is always generated; 2020-04-30 13:50:45 +02:00
Dan Gohman
864cf98c8d Update release notes, wasmtime 0.16, cranelift 0.63. 2020-04-29 17:30:25 -07:00
Alex Crichton
363cd2d20f Expose memory-related options in Config (#1513)
* Expose memory-related options in `Config`

This commit was initially motivated by looking more into #1501, but it
ended up balooning a bit after finding a few issues. The high-level
items in this commit are:

* New configuration options via `wasmtime::Config` are exposed to
  configure the tunable limits of how memories are allocated and such.
* The `MemoryCreator` trait has been updated to accurately reflect the
  required allocation characteristics that JIT code expects.
* A bug has been fixed in the cranelift wasm code generation where if no
  guard page was present bounds checks weren't accurately performed.

The new `Config` methods allow tuning the memory allocation
characteristics of wasmtime. Currently 64-bit platforms will reserve 6GB
chunks of memory for each linear memory, but by tweaking various config
options you can change how this is allocate, perhaps at the cost of
slower JIT code since it needs more bounds checks. The methods are
intended to be pretty thoroughly documented as to the effect they have
on the JIT code and what values you may wish to select. These new
methods have been added to the spectest fuzzer to ensure that various
configuration values for these methods don't affect correctness.

The `MemoryCreator` trait previously only allocated memories with a
`MemoryType`, but this didn't actually reflect the guarantees that JIT
code expected. JIT code is generated with an assumption about the
minimum size of the guard region, as well as whether memory is static or
dynamic (whether the base pointer can be relocated). These properties
must be upheld by custom allocation engines for JIT code to perform
correctly, so extra parameters have been added to
`MemoryCreator::new_memory` to reflect this.

Finally the fuzzing with `Config` turned up an issue where if no guard
pages present the wasm code wouldn't correctly bounds-check memory
accesses. The issue here was that with a guard page we only need to
bounds-check the first byte of access, but without a guard page we need
to bounds-check the last byte of access. This meant that the code
generation needed to account for the size of the memory operation
(load/store) and use this as the offset-to-check in the no-guard-page
scenario. I've attempted to make the various comments in cranelift a bit
more exhaustive too to hopefully make it a bit clearer for future
readers!

Closes #1501

* Review comments

* Update a comment
2020-04-29 17:10:00 -07:00
Chris Fallin
be6f060abf Use new regalloc.rs version with dense vreg->rreg maps.
This PR updates Cranelift to use the new version of regalloc.rs
(bytecodealliance/regalloc.rs#55) that provides dense vreg->rreg maps to
the `map_reg()` function for each instruction, rather than the earlier
hashmap-based approach.

In one test (regex-rs.wasm), this PR results in a 15% reduction in
memory allocations (1245MB -> 1060MB) as measured by DHAT on `clif-util
wasm` runs.
2020-04-29 10:42:25 -07:00
Benjamin Bouvier
767bcaab29 aarch64: redefine is_move now that regalloc.rs bug has been fixed; 2020-04-29 13:38:30 +02:00
Benjamin Bouvier
983c2a0873 Bump regalloc.rs to 0.0.20; 2020-04-29 13:38:30 +02:00
Benjamin Bouvier
698dc9c401 Fixes #1619: Properly bubble up errors when seeing an unexpected type during lowering. 2020-04-29 10:26:22 +02:00
Gabor Greif
d9d69299bb A few typofixes (#1623)
* a few typofixes

* more tyops
2020-04-28 19:18:05 -05:00
Chris Fallin
b691770faa MachInst backend: pass through SourceLoc information.
This change adds SourceLoc information per instruction in a `VCode<Inst>`
container, and keeps this information up-to-date across register allocation
and branch reordering. The information is initially collected during
instruction lowering, eventually collected on the MachSection, and finally
provided to the environment that wraps the codegen crate for wasmtime.
2020-04-24 13:18:01 -07:00
Alex Crichton
74eda8090c Implement stack limit checks for AArch64 (#1573)
This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the `stack_limit` argument purpose as well as
a function's global `stack_limit` directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!
2020-04-24 15:01:57 -05:00
Benjamin Bouvier
19b5b0cc7b aarch64: pass a lowering context to gen_copy_reg_to_arg; 2020-04-24 17:41:14 +02:00
Benjamin Bouvier
0b13d8c848 aarch64: copy SP whenever it's involved in an address lowering with an explicit add; 2020-04-24 17:41:14 +02:00
Benjamin Bouvier
077556ac17 aarch64: correctly display fpu loads and stores; 2020-04-24 17:40:36 +02:00
Joey Gouly
f020f0812e arm64: Implement checks in division / remainder
This implements the divide by 0 and signed overflow checks that Wasm
specifies.

Copyright (c) 2020, Arm Limited.
2020-04-24 17:40:19 +02:00
Benjamin Bouvier
b6e6998713 aarch64: mask rotation counts and share code generation of left and right rotations;
Given an integer size N, a left rotation of K places is the same as a
right rotation of N - K places. This means we can use right rotations to
implement left rotations too.

The Cranelift's rotation semantics are inherited from WebAssembly, which
mean the rotation count is truncated modulo the operand's bit size. Note
the ROR aarch64 instruction has the same semantics, when both input
operands are registers.
2020-04-24 12:36:59 +02:00
Benjamin Bouvier
de92b7e014 aarch64: implement correct float-to-int conversion semantics;
These are inherited from wasm semantics.
2020-04-24 11:51:35 +02:00
Chris Fallin
8f462db645 Fix ImmLogic.invert(), and with it, fcopysign and float_misc test.
Previously, `fcopysign` was mysteriously failing to pass the
`float_misc` spec test. This was tracked down to bad logical-immediate
masks used to separate the sign and not-sign bits. In particular, the
masks for the and-not operations were wrong. The `invert()` function on
an `ImmLogic` immediate, it turns out, assumed every immediate would be
used by a 64-bit instruction; `ImmLogic` immediates are subtly different
for 32-bit instructions. This change tracks the instruction size (32 or
64 bits) intended for use with each such immediate, and passes it back
into `maybe_from_u64` when computing the inverted immediate.

Addresses several of the failures (`float_misc`, `f32_bitwise`) for
 #1521 (test failures) and presumably helps #1519 (SpiderMonkey
integration).
2020-04-23 13:06:13 -07:00
Andrew Brown
d24f23285b Legalize i8x16.sshr using pack/unpack instructions
Due to arithmetic shift behavior, this legalization cannot easily use the masks for i8x16.ushr or i8x16.ishl
2020-04-23 10:55:54 -07:00
Andrew Brown
fb6e8f784d Add x86 pack instructions 2020-04-23 10:55:54 -07:00
Andrew Brown
341dc45cea Add DerivedFunction for splitting lane widths and doubling the number of lanes
Certain operations (e.g. x86_packss) will have operands with types like `NxM` but will return results with types like `(N/2)x(M*2)` (halve the lane width, double the number of lanes; maintain the same number of vector bits). This is equivalent to applying two `DerivedFunction`s to the type: `DerivedFunction::HalfWidth` then `DerivedFunction::DoubleVector`. Since there is no easy way to apply multiple `DerivedFunction`s (e.g. most of the logic is one-level deep, 1d5a678124/cranelift/codegen/meta/src/gen_inst.rs (L618-L621)), I added `DerivedFunction::SplitLanes` to do the necessary type conversion.
2020-04-23 10:55:54 -07:00
Andrew Brown
f5fc09f64a Add x86 unpack instructions 2020-04-23 10:55:54 -07:00
Andrew Brown
5f0286696c Add x86 implentation of 8x16 ishl
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
2020-04-23 10:55:54 -07:00
Chris Fallin
2f1a2f4225 Merge pull request #1571 from cfallin/fix-aarch64-heap-oob
Fix aarch64 load trap info: HeapOutOfBounds, not OutOfBounds.
2020-04-22 12:15:32 -07:00
Benjamin Bouvier
65ef26b989 Add a setting to choose a register allocator algorithm to use with MachBackend; 2020-04-22 14:47:18 +02:00
Chris Fallin
dacadc8a34 Fix aarch64 load trap info: HeapOutOfBounds, not OutOfBounds.
This halfway solves a test failure: when temporarily disabling another
assert that is triggered by lack of debug info, this causes the
`custom_trap_handler` test to pass.
2020-04-21 15:30:58 -07:00
Alex Crichton
c9a0ba81a0 Implement interrupting wasm code, reimplement stack overflow (#1490)
* Implement interrupting wasm code, reimplement stack overflow

This commit is a relatively large change for wasmtime with two main
goals:

* Primarily this enables interrupting executing wasm code with a trap,
  preventing infinite loops in wasm code. Note that resumption of the
  wasm code is not a goal of this commit.

* Additionally this commit reimplements how we handle stack overflow to
  ensure that host functions always have a reasonable amount of stack to
  run on. This fixes an issue where we might longjmp out of a host
  function, skipping destructors.

Lots of various odds and ends end up falling out in this commit once the
two goals above were implemented. The strategy for implementing this was
also lifted from Spidermonkey and existing functionality inside of
Cranelift. I've tried to write up thorough documentation of how this all
works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are.

A brief summary of how this works is that each function and each loop
header now checks to see if they're interrupted. Interrupts and the
stack overflow check are actually folded into one now, where function
headers check to see if they've run out of stack and the sentinel value
used to indicate an interrupt, checked in loop headers, tricks functions
into thinking they're out of stack. An interrupt is basically just
writing a value to a location which is read by JIT code.

When interrupts are delivered and what triggers them has been left up to
embedders of the `wasmtime` crate. The `wasmtime::Store` type has a
method to acquire an `InterruptHandle`, where `InterruptHandle` is a
`Send` and `Sync` type which can travel to other threads (or perhaps
even a signal handler) to get notified from. It's intended that this
provides a good degree of flexibility when interrupting wasm code. Note
though that this does have a large caveat where interrupts don't work
when you're interrupting host code, so if you've got a host import
blocking for a long time an interrupt won't actually be received until
the wasm starts running again.

Some fallout included from this change is:

* Unix signal handlers are no longer registered with `SA_ONSTACK`.
  Instead they run on the native stack the thread was already using.
  This is possible since stack overflow isn't handled by hitting the
  guard page, but rather it's explicitly checked for in wasm now. Native
  stack overflow will continue to abort the process as usual.

* Unix sigaltstack management is now no longer necessary since we don't
  use it any more.

* Windows no longer has any need to reset guard pages since we no longer
  try to recover from faults on guard pages.

* On all targets probestack intrinsics are disabled since we use a
  different mechanism for catching stack overflow.

* The C API has been updated with interrupts handles. An example has
  also been added which shows off how to interrupt a module.

Closes #139
Closes #860
Closes #900

* Update comment about magical interrupt value

* Store stack limit as a global value, not a closure

* Run rustfmt

* Handle review comments

* Add a comment about SA_ONSTACK

* Use `usize` for type of `INTERRUPTED`

* Parse human-readable durations

* Bring back sigaltstack handling

Allows libstd to print out stack overflow on failure still.

* Add parsing and emission of stack limit-via-preamble

* Fix new example for new apis

* Fix host segfault test in release mode

* Fix new doc example
2020-04-21 11:03:28 -07:00
Chris Fallin
7b24ff15db Merge pull request #1538 from bnjbvr/all-ones-unpatched
implement emit_all_ones_funcaddrs in new backend
2020-04-21 09:51:17 -07:00