Lucet uses stack probes rather than explicit stack limit checks as
Wasmtime does. In bytecodealliance/lucet#616, I have discovered that I
previously was not running some Lucet runtime tests with the new
backend, so was missing some test failures due to missing pieces in the
new backend.
This PR adds (i) calls to probestack, when enabled, in the prologue of
every function with a stack frame larger than one page (configurable via
flags); and (ii) trap metadata for every instruction on x86-64 that can
access the stack, hence be the first point at which a stack overflow is
detected when the stack pointer is decremented.
This end result was previously enacted by carrying a `SourceLoc` on
every load/store, which was somewhat cumbersome, and only indirectly
encoded metadata about a memory reference (can it trap) by its presence
or absence. We have a type for this -- `MemFlags` -- that tells us
everything we might want to know about a load or store, and we should
plumb it through to code emission instead.
This PR attaches a `MemFlags` to an `Amode` on x64, and puts it on load
and store `Inst` variants on aarch64. These two choices seem to factor
things out in the nicest way: there are relatively few load/store insts
on aarch64 but many addressing modes, while the opposite is true on x64.
In existing MachInst backends, many instructions -- any that can trap or
result in a relocation -- carry `SourceLoc` values in order to propagate
the location-in-original-source to use to describe resulting traps or
relocation errors.
This is quite tedious, and also error-prone: it is likely that the
necessary plumbing will be missed in some cases, and in any case, it's
unnecessarily verbose.
This PR factors out the `SourceLoc` handling so that it is tracked
during emission as part of the `EmitState`, and plumbed through
automatically by the machine-independent framework. Instruction emission
code that directly emits trap or relocation records can query the
current location as necessary. Then we only need to ensure that memory
references and trap instructions, at their (one) emission point rather
than their (many) lowering/generation points, are wired up correctly.
This does have the side-effect that some loads and stores that do not
correspond directly to user code's heap accesses will have unnecessary
but harmless trap metadata. For example, the load that fetches a code
offset from a jump table will have a 'heap out of bounds' trap record
attached to it; but because it is bounds-checked, and will never
actually trap if the lowering is correct, this should be harmless. The
simplicity improvement here seemed more worthwhile to me than plumbing
through a "corresponds to user-level load/store" bit, because the latter
is a bit complex when we allow for op merging.
Closes#2290: though it does not implement a full "metadata" scheme as
described in that issue, this seems simpler overall.
There has been some confusion over the meaning of the "sign-extend"
(`sext`) and "zero-extend" (`uext`) attributes on parameters and return
values in signatures. According to the three implemented backends, these
attributes indicate that a value narrower than a full register should
always be extended in the way specified. However, they are much more
useful if they mean "extend in this way if the ABI requires extending":
only the ABI backend knows whether or not a particular ABI (e.g., x64
SysV vs. x64 Baldrdash) requires extensions, while only the frontend
(CLIF generator) knows whether or not a value is signed, so the two have
to work in concert.
This is the result of some very helpful discussion in #2354 (thanks to
@uweigand for raising the issue and @bjorn3 for helping to reason about
it).
This change respects the extension attributes in the above way, rather
than unconditionally extending, to avoid potential performance
degradation as we introduce more extension attributes on signatures.
When performing a function call, the platform ABI may require space
on the stack to hold outgoing arguments and/or return values.
Currently, this is supported via decrementing the stack pointer
before the call and incrementing it afterwards, using the
emit_stack_pre_adjust and emit_stack_post_adjust methods of
ABICaller. However, on some platforms it would be preferable
to just allocate enough space for any call done in the function
in the caller's prologue instead.
This patch adds support to allow back-ends to choose that method.
Instead of calling emit_stack_pre/post_adjust around a call, they
simply call a new accumulate_outgoing_args_size method of
ABICaller instead. This will pass on the required size to the
ABICallee structure of the calling function, which will accumulate
the maximum size required for all function calls.
That accumulated size is then passed to the gen_clobber_save
and gen_clobber_restore functions so they can include the size
in the stack allocation / deallocation that already happens in
the prologue / epilogue code.
The ABI common code currently passes the fixed frame size to
the gen_clobber_save back-end routine, which is required to
emit code to allocate the required stack space in the prologue.
Similarly, the back-end needs to emit code to de-allocate the
stack in the epilogue. However, at this point the back-end
does not have access to that fixed frame size value any more.
With targets that use a frame pointer, this does not matter,
since de-allocation can be done simply by assigning the frame
pointer back to the stack pointer. However, on targets that
do not use a frame pointer, the frame size is required.
To allow back-ends that option, this patch changes ABI common
code to pass the fixed frame size to get_clobber_restore as
well (the same value as is passed to get_clobber_save).
The common gen_prologue code currently assumes that the stack
pointer has to be aligned to twice the word size. While this
is true for many ABIs, it does not hold universally.
This patch adds a new callback stack_align that back-ends can
provide to define the specific stack alignment required by the
ABI on that platform.
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.
In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).
Fixes#2254.
This also passes `fixed_frame_storage_size` (previously `total_sp_adjust`)
into `gen_clobber_save` so that it can be combined with other stack
adjustments.
Copyright (c) 2020, Arm Limited.
As part of a Wasm JIT update, SpiderMonkey is changing its internal
WebAssembly function ABI. The new ABI's frame format includes "caller
TLS" and "callee TLS" slots. The details of where these come from are
not important; from Cranelift's point of view, the only relevant
requirement is that we have two on-stack args that are always present
(offsetting other on-stack args), and that we define special argument
purposes so that we can supply values for these slots.
Note that this adds a *new* ABI (a variant of the Baldrdash ABI) because
we do not want to tightly couple the landing of this PR to the landing
of the changes in SpiderMonkey; it's better if both the old and new
behavior remain available in Cranelift, so SpiderMonkey can continue to
vendor Cranelift even if it does not land (or backs out) the ABI change.
Furthermore, note that this needs to be a Cranelift-level change (i.e.
cannot be done purely from the translator environment implementation)
because the special TLS arguments must always go on the stack, which
would not otherwise happen with the usual argument-placement logic; and
there is no primitive to push a value directly in CLIF code (the notion
of a stack frame is a lower-level concept).
This commit adds arm32 code generation for some IR insts.
Floating-point instructions are not supported, because regalloc
does not allow to represent overlapping register classes,
which are needed by VFP/Neon.
There is also no support for big-endianness, I64 and I128 types.
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.
This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
This change primarily adds the ability to lower packed `[move|load|store]` instructions (the vector types were previously unimplemented), but with the addition of the utility `Inst::[move|load|store]` functions it became possible to remove duplicated code (e.g. `stack_load` and `stack_store`) and use these utility functions elsewhere (though not exhaustively).
This change is a pure refactoring--no change to functionality. It removes `use crate::ir::types::*` imports and uses instead `types::I32`, e.g., throughout the x64 code. Though it increases code verbosity, this change makes it more clear where the type identifiers come from (they are generated by `cranelif-codegen-meta` so without a prefix it is difficult to find their origin), avoids IDE confusion (e.g. CLion flags the un-prefixed identifiers as errors), and avoids importing unwanted identifiers into the namespace.
This change is a pure refactoring--no change to functionality. It removes newlines between the `use ...` statements in the x64 backend so that rustfmt can format them according to its convention. I noticed some files had followed a manual convention but subsequent additions did not seem to fit; this change fixes that and lightly coalesces some of the occurrences of `use a::b; use a::c;` into `use::{b, c}`.
In the Baldrdash (SpiderMonkey) embedding, we must take care to
zero-extend all function arguments to callees in integer registers when
the types are narrower than 64 bits. This is because, unlike the native
SysV ABI, the Baldrdash ABI expects high bits to be cleared. Not doing
so leads to difficult-to-trace errors where high bits falsely tag an
int32 as e.g. an object pointer, leading to potential security issues.
This commit adds support for generating stackmaps at safepoints to the
new backend framework and to the AArch64 backend in particular. It has
been tested to work with SpiderMonkey.
Removes unneeded data structure that was holding instructions for
xmm based move instructions. These instructions can should be categorized
as rm not just r. This change is intended to simplify organization and
cases when lowering.
Adds support for addss and subss. This is the first lowering for
sse floating point alu and some move operations. The changes here do
some renaming of data structures and adds a couple of new ones
to support sse specific operations. The work done here will likely
evolve as needed to support an efficient, inituative, and consistent
framework.