Commit Graph

30 Commits

Author SHA1 Message Date
Ulrich Weigand
10efe8e780 cranelift: Fix spillslot regression on big-endian platforms
PR 2840 changed the store_spillslot routine to always store
integer registers in full word size to a spill slot.  However,
the load_spillslot routine was not updated, which may causes
the contents to be reloaded in a different type.  On big-endian
systems this will fetch wrong data.

Fixed by using the same type override in load_spillslot.
2021-04-15 21:39:14 +02:00
Chris Fallin
36c667d58d Merge pull request #2837 from uweigand/outgoing-args
Add back support for accumulating outgoing arguments
2021-04-14 12:54:06 -07:00
Chris Fallin
fd4bfbe5a7 Merge pull request #2836 from uweigand/framesizefix
Fix frame size after unwind rework
2021-04-14 12:19:38 -07:00
Benjamin Bouvier
e7bced9512 cranelift: always spill i32 with i64 stores;
Fixes #2839. See also the issue description and comments in this commits for
details of what the fix is about here.
2021-04-14 18:08:52 +02:00
Ulrich Weigand
336c6369b4 Add back support for accumulating outgoing arguments
The unwind rework (commit 2d5db92a) removed support for the
feature to allow a target to allocate the space for outgoing
function arguments right in the prologue (originally added
via commit 80c2d70d).   This patch adds it back.
2021-04-14 13:51:16 +02:00
Ulrich Weigand
e3bb36ba77 Fix frame size after unwind rework
After the unwind rework (commit 2d5db92a) the space used to save
clobbered registers now lies between the nominal SP and the FP.
Therefore, the size of that space should now be included in the
frame size as reported by frame_size(), since this value is used
to compute the nominal_sp_to_fp offset.
2021-04-14 13:46:08 +02:00
Alex Crichton
195bf0e29a Fully support multiple returns in Wasmtime (#2806)
* Fully support multiple returns in Wasmtime

For quite some time now Wasmtime has "supported" multiple return values,
but only in the mose bare bones ways. Up until recently you couldn't get
a typed version of functions with multiple return values, and never have
you been able to use `Func::wrap` with functions that return multiple
values. Even recently where `Func::typed` can call functions that return
multiple values it uses a double-indirection by calling a trampoline
which calls the real function.

The underlying reason for this lack of support is that cranelift's ABI
for returning multiple values is not possible to write in Rust. For
example if a wasm function returns two `i32` values there is no Rust (or
C!) function you can write to correspond to that. This commit, however
fixes that.

This commit adds two new ABIs to Cranelift: `WasmtimeSystemV` and
`WasmtimeFastcall`. The intention is that these Wasmtime-specific ABIs
match their corresponding ABI (e.g. `SystemV` or `WindowsFastcall`) for
everything *except* how multiple values are returned. For multiple
return values we simply define our own version of the ABI which Wasmtime
implements, which is that for N return values the first is returned as
if the function only returned that and the latter N-1 return values are
returned via an out-ptr that's the last parameter to the function.

These custom ABIs provides the ability for Wasmtime to bind these in
Rust meaning that `Func::wrap` can now wrap functions that return
multiple values and `Func::typed` no longer uses trampolines when
calling functions that return multiple values. Although there's lots of
internal changes there's no actual changes in the API surface area of
Wasmtime, just a few more impls of more public traits which means that
more types are supported in more places!

Another change made with this PR is a consolidation of how the ABI of
each function in a wasm module is selected. The native `SystemV` ABI,
for example, is more efficient at returning multiple values than the
wasmtime version of the ABI (since more things are in more registers).
To continue to take advantage of this Wasmtime will now classify some
functions in a wasm module with the "fast" ABI. Only functions that are
not reachable externally from the module are classified with the fast
ABI (e.g. those not exported, used in tables, or used with `ref.func`).
This should enable purely internal functions of modules to have a faster
calling convention than those which might be exposed to Wasmtime itself.

Closes #1178

* Tweak some names and add docs

* "fix" lightbeam compile

* Fix TODO with dummy environ

* Unwind info is a property of the target, not the ABI

* Remove lightbeam unused imports

* Attempt to fix arm64

* Document new ABIs aren't stable

* Fix filetests to use the right target

* Don't always do 64-bit stores with cranelift

This was overwriting upper bits when 32-bit registers were being stored
into return values, so fix the code inline to do a sized store instead
of one-size-fits-all store.

* At least get tests passing on the old backend

* Fix a typo

* Add some filetests with mixed abi calls

* Get `multi` example working

* Fix doctests on old x86 backend

* Add a mixture of wasmtime/system_v tests
2021-04-07 12:34:26 -05:00
Alex Crichton
30d9164b6e Fix a number of warnings cropping up on nightly Rust (#2767)
Various small issues here and there, nothing major
2021-03-25 13:19:37 -05:00
Benjamin Bouvier
6e6713ae0b cranelift: add support for the Mac aarch64 calling convention
This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).

Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.
2021-03-22 10:06:13 +01:00
Chris Fallin
2d5db92a9e Rework/simplify unwind infrastructure and implement Windows unwind.
Our previous implementation of unwind infrastructure was somewhat
complex and brittle: it parsed generated instructions in order to
reverse-engineer unwind info from prologues. It also relied on some
fragile linkage to communicate instruction-layout information that VCode
was not designed to provide.

A much simpler, more reliable, and easier-to-reason-about approach is to
embed unwind directives as pseudo-instructions in the prologue as we
generate it. That way, we can say what we mean and just emit it
directly.

The usual reasoning that leads to the reverse-engineering approach is
that metadata is hard to keep in sync across optimization passes; but
here, (i) prologues are generated at the very end of the pipeline, and
(ii) if we ever do a post-prologue-gen optimization, we can treat unwind
directives as black boxes with unknown side-effects, just as we do for
some other pseudo-instructions today.

It turns out that it was easier to just build this for both x64 and
aarch64 (since they share a factored-out ABI implementation), and wire
up the platform-specific unwind-info generation for Windows and SystemV.
Now we have simpler unwind on all platforms and we can delete the old
unwind infra as soon as we remove the old backend.

There were a few consequences to supporting Fastcall unwind in
particular that led to a refactor of the common ABI. Windows only
supports naming clobbered-register save locations within 240 bytes of
the frame-pointer register, whatever one chooses that to be (RSP or
RBP). We had previously saved clobbers below the fixed frame (and below
nominal-SP). The 240-byte range has to include the old RBP too, so we're
forced to place clobbers at the top of the frame, just below saved
RBP/RIP. This is fine; we always keep a frame pointer anyway because we
use it to refer to stack args. It does mean that offsets of fixed-frame
slots (spillslots, stackslots) from RBP are no longer known before we do
regalloc, so if we ever want to index these off of RBP rather than
nominal-SP because we add support for `alloca` (dynamic frame growth),
then we'll need a "nominal-BP" mode that is resolved after regalloc and
clobber-save code is generated. I added a comment to this effect in
`abi_impl.rs`.

The above refactor touched both x64 and aarch64 because of shared code.
This had a further effect in that the old aarch64 prologue generation
subtracted from `sp` once to allocate space, then used stores to `[sp,
offset]` to save clobbers. Unfortunately the offset only has 7-bit
range, so if there are enough clobbered registers (and there can be --
aarch64 has 384 bytes of registers; at least one unit test hits this)
the stores/loads will be out-of-range. I really don't want to synthesize
large-offset sequences here; better to go back to the simpler
pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like
a "push". It's likely not much worse microarchitecturally (dependence
chain on SP, but oh well) and it actually saves an instruction if
there's no other frame to allocate. As a further advantage, it's much
simpler to understand; simpler is usually better.

This PR adds the new backend on Windows to CI as well.
2021-03-11 20:03:52 -08:00
Chris Fallin
6c94eb82aa x86-64 Windows fastcall ABI support.
This adds support for the "fastcall" ABI, which is the native C/C++ ABI
on Windows platforms on x86-64. It is similar to but not exactly like
System V; primarily, its argument register assignments are different,
and it requires stack shadow space.

Note that this also adjusts the handling of multi-register values in the
shared ABI implementation, and with this change, adjusts handling of
`i128`s on *both* Fastcall/x64 *and* SysV/x64 platforms. This was done
to align with actual behavior by the "rustc ABI" on both platforms, as
mapped out experimentally (Compiler Explorer link in comments). This
behavior is gated under the `enable_llvm_abi_extensions` flag.

Note also that this does *not* add x64 unwind info on Windows. That will
come in a future PR (but is planned!).
2021-03-03 19:53:18 -08:00
bjorn3
76d615049d Make the stackslot offsets available for debuginfo 2021-02-03 17:48:52 +01:00
Chris Fallin
456561f431 x64 and aarch64: allow StructArgument and StructReturn args.
The StructReturn ABI is fairly simple at the codegen/isel level: we only
need to take care to return the sret pointer as one of the return values
if that wasn't specified in the initial function signature.

Struct arguments are a little more complex. A struct argument is stored
as a chunk of memory in the stack-args space. However, the CLIF
semantics are slightly special: on the caller side, the parameter passed
in is a pointer to an arbitrary memory block, and we must memcpy this
data to the on-stack struct-argument; and on the callee side, we provide
a pointer to the passed-in struct-argument as the CLIF block param
value.

This is necessary to support various ABIs other than Wasm, such as that
of Rust (with the cg_clif codegen backend).
2021-01-17 23:11:45 -08:00
Chris Fallin
6eea015d6c Multi-register value support: framework for Values wider than machine regs.
This will allow for support for `I128` values everywhere, and `I64`
values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the
machine backends to build such support; it just adds the framework for
the MachInst backends to *reason* about a `Value` residing in more than
one register.
2021-01-05 17:45:02 -08:00
Chris Fallin
3a01d14712 Two Lucet-related fixes to stack overflow handling.
Lucet uses stack probes rather than explicit stack limit checks as
Wasmtime does. In bytecodealliance/lucet#616, I have discovered that I
previously was not running some Lucet runtime tests with the new
backend, so was missing some test failures due to missing pieces in the
new backend.

This PR adds (i) calls to probestack, when enabled, in the prologue of
every function with a stack frame larger than one page (configurable via
flags); and (ii) trap metadata for every instruction on x86-64 that can
access the stack, hence be the first point at which a stack overflow is
detected when the stack pointer is decremented.
2020-12-07 16:08:53 -08:00
Chris Fallin
89dbc4590d Merge pull request #2363 from cfallin/extend-only-if-abi
Do value-extensions at ABI boundaries only when ABI requires it.
2020-11-12 12:26:20 -08:00
Chris Fallin
4dce51096d MachInst backends: handle SourceLocs out-of-band, not in Insts.
In existing MachInst backends, many instructions -- any that can trap or
result in a relocation -- carry `SourceLoc` values in order to propagate
the location-in-original-source to use to describe resulting traps or
relocation errors.

This is quite tedious, and also error-prone: it is likely that the
necessary plumbing will be missed in some cases, and in any case, it's
unnecessarily verbose.

This PR factors out the `SourceLoc` handling so that it is tracked
during emission as part of the `EmitState`, and plumbed through
automatically by the machine-independent framework. Instruction emission
code that directly emits trap or relocation records can query the
current location as necessary. Then we only need to ensure that memory
references and trap instructions, at their (one) emission point rather
than their (many) lowering/generation points, are wired up correctly.

This does have the side-effect that some loads and stores that do not
correspond directly to user code's heap accesses will have unnecessary
but harmless trap metadata. For example, the load that fetches a code
offset from a jump table will have a 'heap out of bounds' trap record
attached to it; but because it is bounds-checked, and will never
actually trap if the lowering is correct, this should be harmless.  The
simplicity improvement here seemed more worthwhile to me than plumbing
through a "corresponds to user-level load/store" bit, because the latter
is a bit complex when we allow for op merging.

Closes #2290: though it does not implement a full "metadata" scheme as
described in that issue, this seems simpler overall.
2020-11-10 15:46:53 -08:00
Chris Fallin
a2bbb198de Do value-extensions at ABI boundaries only when ABI requires it.
There has been some confusion over the meaning of the "sign-extend"
(`sext`) and "zero-extend" (`uext`) attributes on parameters and return
values in signatures. According to the three implemented backends, these
attributes indicate that a value narrower than a full register should
always be extended in the way specified. However, they are much more
useful if they mean "extend in this way if the ABI requires extending":
only the ABI backend knows whether or not a particular ABI (e.g., x64
SysV vs. x64 Baldrdash) requires extensions, while only the frontend
(CLIF generator) knows whether or not a value is signed, so the two have
to work in concert.

This is the result of some very helpful discussion in #2354 (thanks to
@uweigand for raising the issue and @bjorn3 for helping to reason about
it).

This change respects the extension attributes in the above way, rather
than unconditionally extending, to avoid potential performance
degradation as we introduce more extension attributes on signatures.
2020-11-05 11:54:35 -08:00
Ulrich Weigand
80c2d70d2d machinst ABI: Support for accumulating outgoing args
When performing a function call, the platform ABI may require space
on the stack to hold outgoing arguments and/or return values.

Currently, this is supported via decrementing the stack pointer
before the call and incrementing it afterwards, using the
emit_stack_pre_adjust and emit_stack_post_adjust methods of
ABICaller.  However, on some platforms it would be preferable
to just allocate enough space for any call done in the function
in the caller's prologue instead.

This patch adds support to allow back-ends to choose that method.
Instead of calling emit_stack_pre/post_adjust around a call, they
simply call a new accumulate_outgoing_args_size method of
ABICaller instead.  This will pass on the required size to the
ABICallee structure of the calling function, which will accumulate
the maximum size required for all function calls.

That accumulated size is then passed to the gen_clobber_save
and gen_clobber_restore functions so they can include the size
in the stack allocation / deallocation that already happens in
the prologue / epilogue code.
2020-11-03 18:49:34 +01:00
Chris Fallin
5ab7b4aa7f Merge pull request #2345 from uweigand/abi-stackalign
machinst ABI: Allow back-end to define stack alignment
2020-11-03 09:02:41 -08:00
Ulrich Weigand
c9bc4edd08 machinst ABI: Pass fixed frame size to gen_clobber_restore
The ABI common code currently passes the fixed frame size to
the gen_clobber_save back-end routine, which is required to
emit code to allocate the required stack space in the prologue.

Similarly, the back-end needs to emit code to de-allocate the
stack in the epilogue.  However, at this point the back-end
does not have access to that fixed frame size value any more.
With targets that use a frame pointer, this does not matter,
since de-allocation can be done simply by assigning the frame
pointer back to the stack pointer.  However, on targets that
do not use a frame pointer, the frame size is required.

To allow back-ends that option, this patch changes ABI common
code to pass the fixed frame size to get_clobber_restore as
well (the same value as is passed to get_clobber_save).
2020-11-03 11:15:03 +01:00
Ulrich Weigand
d02ae3940c machinst ABI: Allow back-end to define stack alignment
The common gen_prologue code currently assumes that the stack
pointer has to be aligned to twice the word size.  While this
is true for many ABIs, it does not hold universally.

This patch adds a new callback stack_align that back-ends can
provide to define the specific stack alignment required by the
ABI on that platform.
2020-11-03 09:43:55 +01:00
Yury Delendik
de4af90af6 machinst x64: New backend unwind (#2266)
Addresses unwind for experimental x64 backend. The preliminary code enables backtrace on SystemV call convension.
2020-10-23 15:19:41 -05:00
Chris Fallin
71768bb6cf Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs.
This PR updates the AArch64 ABI implementation so that it (i) properly
respects that v8-v15 inclusive have callee-save lower halves, and
caller-save upper halves, by conservatively approximating (to full
registers) in the appropriate directions when generating prologue
caller-saves and when informing the regalloc of clobbered regs across
callsites.

In order to prevent saving all of these vector registers in the prologue
of every non-leaf function due to the above approximation, this also
makes use of a new regalloc.rs feature to exclude call instructions'
writes from the clobber set returned by register allocation. This is
safe whenever the caller and callee have the same ABI (because anything
the callee could clobber, the caller is allowed to clobber as well
without saving it in the prologue).

Fixes #2254.
2020-10-06 14:44:02 -07:00
Joey Gouly
eec60c9b06 arm64: Use SignedOffset rather than PreIndexed addressing mode for callee-saved registers
This also passes `fixed_frame_storage_size` (previously `total_sp_adjust`)
into `gen_clobber_save` so that it can be combined with other stack
adjustments.

Copyright (c) 2020, Arm Limited.
2020-10-02 16:22:55 +01:00
Chris Fallin
835db11bea Support for SpiderMonkey's "Wasm ABI 2020".
As part of a Wasm JIT update, SpiderMonkey is changing its internal
WebAssembly function ABI. The new ABI's frame format includes "caller
TLS" and "callee TLS" slots. The details of where these come from are
not important; from Cranelift's point of view, the only relevant
requirement is that we have two on-stack args that are always present
(offsetting other on-stack args), and that we define special argument
purposes so that we can supply values for these slots.

Note that this adds a *new* ABI (a variant of the Baldrdash ABI) because
we do not want to tightly couple the landing of this PR to the landing
of the changes in SpiderMonkey; it's better if both the old and new
behavior remain available in Cranelift, so SpiderMonkey can continue to
vendor Cranelift even if it does not land (or backs out) the ABI change.

Furthermore, note that this needs to be a Cranelift-level change (i.e.
cannot be done purely from the translator environment implementation)
because the special TLS arguments must always go on the stack, which
would not otherwise happen with the usual argument-placement logic; and
there is no primitive to push a value directly in CLIF code (the notion
of a stack frame is a lower-level concept).
2020-09-30 14:55:56 -07:00
Jakub Krauz
f6a140a662 arm32 codegen
This commit adds arm32 code generation for some IR insts.
Floating-point instructions are not supported, because regalloc
does not allow to represent overlapping register classes,
which are needed by VFP/Neon.

There is also no support for big-endianness, I64 and I128 types.
2020-09-22 12:49:42 +02:00
Chris Fallin
e8f772c1ac x64 new backend: port ABI implementation to shared infrastructure with AArch64.
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.

This also changes some register allocations in the aarch64 code because
call support in the shared ABI infra now passes a temp vreg in, rather
than requiring use of a fixed, non-allocable temp; tests have been
updated, and the runtime behavior is unchanged.
2020-09-08 17:59:01 -07:00
Chris Fallin
5fa0be3515 AArch64 ABI: properly store full 64-bit width of extended args/retvals.
When storing an argument to a stack location for consumption by a
callee, or storing a return value to an on-stack return slot for
consumption by the caller, the ABI implementation was properly extending
the value but was then performing a store with only the original width.
This fixes the issue by always performing a 64-bit store of the extended
value.

Issue reported by @uweigand (thanks!).
2020-08-17 15:00:04 -07:00
Chris Fallin
5cf3fba3da Refactor AArch64 ABI support to extract common bits for shared impl with x64.
We have observed that the ABI implementations for AArch64 and x64 are
very similar; in fact, x64's implementation started as a modified copy
of AArch64's implementation. This is an artifact of both a similar ABI
(both machines pass args and return values in registers first, then the
stack, and both machines give considerable freedom with stack-frame
layout) and a too-low-level ABI abstraction in the existing design. For
machines that fit the mainstream or most common ABI-design idioms, we
should be able to do much better.

This commit factors AArch64 into machine-specific and
machine-independent parts, but does not yet modify x64; that will come
next.

This should be completely neutral with respect to compile time and
generated code performance.
2020-08-14 16:27:39 -07:00