This commit removes the "set frame pointer" unwind code and frame
pointer information from Windows x64 unwind information.
In Windows x64 unwind information, a "frame pointer" is actually the
*base address* of the static part of the local frame and would be at some
negative offset to RSP upon establishing the frame pointer.
Currently Cranelift uses a "traditional" notion of a frame pointer, one
that is the highest address in the local frame (i.e. pointing at the
previous frame pointer on the stack).
Windows x64 unwind doesn't describe such frame pointers and only needs
one described if the frame contains a dynamic stack allocation.
Fixes#1967.
- put the division in the synthetic instruction as well,
- put the branch table check in the inst's emission code,
- replace OneWayCondJmp by TrapIf vcode instruction,
- add comments describing code generated by the synthetic instructions
In discussions with @bnjbvr, it came up that generating `OneWayCondBr`s
with explicit, hardcoded PC-offsets as part of lowered instruction
sequences is actually unsafe, because the register allocator *might*
insert a spill or reload into the middle of our sequence. We were
careful about this in some cases but somehow missed that it was a
general restriction. Conceptually, all inter-instruction references
should be via labels at the VCode level; explicit offsets are only ever
known at emission time, and resolved by the `MachBuffer`.
To allow for conditional trap checks without modifying the CFG (as seen
by regalloc) during lowering, this PR instead adds a `TrapIf`
pseudo-instruction that conditionally skips a single embedded trap
instruction. It lowers to the same `condbr label ; trap ; label: ...`
sequence, but without the hardcoded branch-target offset in the lowering
code.
The failure to mask the amount triggered a panic due to a subtraction
overflow check; see
https://bugzilla.mozilla.org/show_bug.cgi?id=1649432. Attempting to
shift by an out-of-range amount should be defined to shift by an amount
mod the operand size (i.e., masked to 5 bits for 32-bit shifts, or 6
bits for 64-bit shifts).
This PR adds a conditional move following a heap bounds check through
which the address to be accessed flows. This conditional move ensures
that even if the branch is mispredicted (access is actually out of
bounds, but speculation goes down in-bounds path), the acually accessed
address is zero (a NULL pointer) rather than the out-of-bounds address.
The mitigation is controlled by a flag that is off by default, but can
be set by the embedding. Note that in order to turn it on by default,
we would need to add conditional-move support to the current x86
backend; this does not appear to be present. Once the deprecated
backend is removed in favor of the new backend, IMHO we should turn
this flag on by default.
Note that the mitigation is unneccessary when we use the "huge heap"
technique on 64-bit systems, in which we allocate a range of virtual
address space such that no 32-bit offset can reach other data. Hence,
this only affects small-heap configurations.
These instructions have fast, inline JIT paths for the common cases, and only
call out to host VM functions for the slow paths. This required some changes to
`cranelift-wasm`'s `FuncEnvironment`: instead of taking a `FuncCursor` to insert
an instruction sequence within the current basic block,
`FuncEnvironment::translate_table_{get,set}` now take a `&mut FunctionBuilder`
so that they can create whole new basic blocks. This is necessary for
implementing GC read/write barriers that involve branching (e.g. checking for
null, or whether a store buffer is at capacity).
Furthermore, it required that the `load`, `load_complex`, and `store`
instructions handle loading and storing through an `r{32,64}` rather than just
`i{32,64}` addresses. This involved making `r{32,64}` types acceptable
instantiations of the `iAddr` type variable, plus a few new instruction
encodings.
Part of #929
Also add configuration to CI to fail doc generation if any links are
broken. Unfortunately we can't blanket deny all warnings in rustdoc
since some are unconditional warnings, but for now this is hopefully
good enough.
Closes#1947
* This PR is against a branch called `main`
* Internally all docs/CI/etc is updated
* The default branch of the repo is now `main`
* All active PRs have been updated to retarget `main`
Closes#1914
Removes unneeded data structure that was holding instructions for
xmm based move instructions. These instructions can should be categorized
as rm not just r. This change is intended to simplify organization and
cases when lowering.
This patch implements the required but not already available
x64 instructions for copysign as well as the actual lowering sequence
and tests for the newly implemented x64 instructions.
Those instructions include:
andps,
andnps,
movaps,
movd,
orps,
The lowering sequence is based on the lowering for f32.copysign
in the current cranelift backend. movd does not have a test yet
due to some logic needed express a 32-bit register as a source
for xmm_rm_r instructions. This code also begins some
rethinking/refactoring of how the sse move instuctions
are written and so also includes new emit cases that will replace
current ones that match a different enum used to describe sse moves.
From discussion with Julian and Ben, this PR makes a few documentation-
and naming-level changes (no functionality change):
- Document that the `LowerCtx`-provided output register can be used as a
scratch register during the lowered instruction sequence before
placing the final result in it.
- Rename `input_to_*` helpers in the AArch64 backend to
`put_input_in_*`, emphasizing that these are side-effecting helpers
that potentially generate code (e.g., sign/zero-extensions) to ensure
an input value is in a register.
Adds support for lowering clif instructions Fdiv and Fmul
for new vcode backend. Misc adds lowering and test for
sqrtss and removes a redundant to_string() func for the
SseOpcode struct.
This introduces two changes:
- first, a Cargo feature is added to make it possible to use the
Cranelift x64 backend directly from wasmtime's CLI.
- second, when passing a `cranelift-flags` parameter, and the given
parameter's name doesn't exist at the target-independent flag level, try
to set it as a target-dependent setting.
These two changes make it possible to try out the new x64 backend with:
cargo run --features experimental_x64 -- run --cranelift-flags use_new_backend=true -- /path/to/a.wasm
Right now, this will fail because most opcodes required by the
trampolines are actually not implemented yet.
For host VM code, we use plain reference counting, where cloning increments
the reference count, and dropping decrements it. We can avoid many of the
on-stack increment/decrement operations that typically plague the
performance of reference counting via Rust's ownership and borrowing system.
Moving a `VMExternRef` avoids mutating its reference count, and borrowing it
either avoids the reference count increment or delays it until if/when the
`VMExternRef` is cloned.
When passing a `VMExternRef` into compiled Wasm code, we don't want to do
reference count mutations for every compiled `local.{get,set}`, nor for
every function call. Therefore, we use a variation of **deferred reference
counting**, where we only mutate reference counts when storing
`VMExternRef`s somewhere that outlives the activation: into a global or
table. Simultaneously, we over-approximate the set of `VMExternRef`s that
are inside Wasm function activations. Periodically, we walk the stack at GC
safe points, and use stack map information to precisely identify the set of
`VMExternRef`s inside Wasm activations. Then we take the difference between
this precise set and our over-approximation, and decrement the reference
count for each of the `VMExternRef`s that are in our over-approximation but
not in the precise set. Finally, the over-approximation is replaced with the
precise set.
The `VMExternRefActivationsTable` implements the over-approximized set of
`VMExternRef`s referenced by Wasm activations. Calling a Wasm function and
passing it a `VMExternRef` moves the `VMExternRef` into the table, and the
compiled Wasm function logically "borrows" the `VMExternRef` from the
table. Similarly, `global.get` and `table.get` operations clone the gotten
`VMExternRef` into the `VMExternRefActivationsTable` and then "borrow" the
reference out of the table.
When a `VMExternRef` is returned to host code from a Wasm function, the host
increments the reference count (because the reference is logically
"borrowed" from the `VMExternRefActivationsTable` and the reference count
from the table will be dropped at the next GC).
For more general information on deferred reference counting, see *An
Examination of Deferred Reference Counting and Cycle Detection* by Quinane:
https://openresearch-repository.anu.edu.au/bitstream/1885/42030/2/hon-thesis.pdf
cc #929Fixes#1804