Files
wasmtime/crates/fiber/src/arch/aarch64.S
Alex Crichton 7795a230f2 Implement support for async functions in Wasmtime (#2434)
* Implement support for `async` functions in Wasmtime

This is an implementation of [RFC 2] in Wasmtime which is to support
`async`-defined host functions. At a high level support is added by
executing WebAssembly code that might invoke an asynchronous host
function on a separate native stack. When the host function's future is
not ready we switch back to the main native stack to continue execution.

There's a whole bunch of details in this commit, and it's a bit much to
go over them all here in this commit message. The most important changes
here are:

* A new `wasmtime-fiber` crate has been written to manage the low-level
  details of stack-switching. Unixes use `mmap` to allocate a stack and
  Windows uses the native fibers implementation. We'll surely want to
  refactor this to move stack allocation elsewhere in the future. Fibers
  are intended to be relatively general with a lot of type paremters to
  fling values back and forth across suspension points. The whole crate
  is a giant wad of `unsafe` unfortunately and involves handwritten
  assembly with custom dwarf CFI directives to boot. Definitely deserves
  a close eye in review!

* The `Store` type has two new methods -- `block_on` and `on_fiber`
  which bridge between the async and non-async worlds. Lots of unsafe
  fiddly bits here as we're trying to communicate context pointers
  between disparate portions of the code. Extra eyes and care in review
  is greatly appreciated.

* The APIs for binding `async` functions are unfortunately pretty ugly
  in `Func`. This is mostly due to language limitations and compiler
  bugs (I believe) in Rust. Instead of `Func::wrap` we have a
  `Func::wrapN_async` family of methods, and we've also got a whole
  bunch of `Func::getN_async` methods now too. It may be worth
  rethinking the API of `Func` to try to make the documentation page
  actually grok'able.

This isn't super heavily tested but the various test should suffice for
engaging hopefully nearly all the infrastructure in one form or another.
This is just the start though!

[RFC 2]: https://github.com/bytecodealliance/rfcs/pull/2

* Add wasmtime-fiber to publish script

* Save vector/float registers on ARM too.

* Fix a typo

* Update lock file

* Implement periodically yielding with fuel consumption

This commit implements APIs on `Store` to periodically yield execution
of futures through the consumption of fuel. When fuel runs out a
future's execution is yielded back to the caller, and then upon
resumption fuel is re-injected. The goal of this is to allow cooperative
multi-tasking with futures.

* Fix compile without async

* Save/restore the frame pointer in fiber switching

Turns out this is another caller-saved register!

* Simplify x86_64 fiber asm

Take a leaf out of aarch64's playbook and don't have extra memory to
load/store these arguments, instead leverage how `wasmtime_fiber_switch`
already loads a bunch of data into registers which we can then
immediately start using on a fiber's start without any extra memory
accesses.

* Add x86 support to wasmtime-fiber

* Add ARM32 support to fiber crate

* Make fiber build file probing more flexible

* Use CreateFiberEx on Windows

* Remove a stray no-longer-used trait declaration

* Don't reach into `Caller` internals

* Tweak async fuel to eventually run out.

With fuel it's probably best to not provide any way to inject infinite
fuel.

* Fix some typos

* Cleanup asm a bit

* Use a shared header file to deduplicate some directives
* Guarantee hidden visibility for functions
* Enable gc-sections on macOS x86_64
* Add `.type` annotations for ARM

* Update lock file

* Fix compile error

* Review comments
2021-02-26 16:19:56 -06:00

123 lines
4.0 KiB
ArmAsm

// A WORD OF CAUTION
//
// This entire file basically needs to be kept in sync with itself. It's not
// really possible to modify just one bit of this file without understanding
// all the other bits. Documentation tries to reference various bits here and
// there but try to make sure to read over everything before tweaking things!
//
// Also at this time this file is heavily based off the x86_64 file, so you'll
// probably want to read that one as well.
#include "header.h"
// fn(top_of_stack(%x0): *mut u8)
HIDDEN(wasmtime_fiber_switch)
GLOBL(wasmtime_fiber_switch)
.p2align 2
TYPE(wasmtime_fiber_switch)
FUNCTION(wasmtime_fiber_switch):
// Save all callee-saved registers on the stack since we're assuming
// they're clobbered as a result of the stack switch.
stp lr, fp, [sp, -16]!
stp x20, x19, [sp, -16]!
stp x22, x21, [sp, -16]!
stp x24, x23, [sp, -16]!
stp x26, x25, [sp, -16]!
stp x28, x27, [sp, -16]!
stp d9, d8, [sp, -16]!
stp d11, d10, [sp, -16]!
stp d13, d12, [sp, -16]!
stp d15, d14, [sp, -16]!
// Load our previously saved stack pointer to resume to, and save off our
// current stack pointer on where to come back to eventually.
ldr x8, [x0, -0x10]
mov x9, sp
str x9, [x0, -0x10]
// Switch to the new stack and restore all our callee-saved registers after
// the switch and return to our new stack.
mov sp, x8
ldp d15, d14, [sp], 16
ldp d13, d12, [sp], 16
ldp d11, d10, [sp], 16
ldp d9, d8, [sp], 16
ldp x28, x27, [sp], 16
ldp x26, x25, [sp], 16
ldp x24, x23, [sp], 16
ldp x22, x21, [sp], 16
ldp x20, x19, [sp], 16
ldp lr, fp, [sp], 16
ret
SIZE(wasmtime_fiber_switch)
// fn(
// top_of_stack(%x0): *mut u8,
// entry_point(%x1): extern fn(*mut u8, *mut u8),
// entry_arg0(%x2): *mut u8,
// )
HIDDEN(wasmtime_fiber_init)
GLOBL(wasmtime_fiber_init)
.p2align 2
TYPE(wasmtime_fiber_init)
FUNCTION(wasmtime_fiber_init):
adr x8, FUNCTION(wasmtime_fiber_start)
stp x0, x8, [x0, -0x28] // x0 => x19, x8 => lr
stp x2, x1, [x0, -0x38] // x1 => x20, x2 => x21
// `wasmtime_fiber_switch` has an 0xa0 byte stack, and we add 0x10 more for
// the original reserved 16 bytes.
add x8, x0, -0xb0
str x8, [x0, -0x10]
ret
SIZE(wasmtime_fiber_init)
.p2align 2
TYPE(wasmtime_fiber_start)
FUNCTION(wasmtime_fiber_start):
.cfi_startproc simple
// See the x86_64 file for more commentary on what these CFI directives are
// doing. Like over there note that the relative offsets to registers here
// match the frame layout in `wasmtime_fiber_switch`.
.cfi_escape 0x0f, /* DW_CFA_def_cfa_expression */ \
5, /* the byte length of this expression */ \
0x6f, /* DW_OP_reg31(%sp) */ \
0x06, /* DW_OP_deref */ \
0x23, 0xa0, 0x1 /* DW_OP_plus_uconst 0xa0 */
.cfi_rel_offset lr, -0x10
.cfi_rel_offset x19, -0x18
.cfi_rel_offset x20, -0x20
.cfi_rel_offset x21, -0x28
.cfi_rel_offset x22, -0x30
.cfi_rel_offset x23, -0x38
.cfi_rel_offset x24, -0x40
.cfi_rel_offset x25, -0x48
.cfi_rel_offset x26, -0x50
.cfi_rel_offset x27, -0x58
.cfi_rel_offset x29, -0x60
// Load our two arguments from the stack, where x1 is our start procedure
// and x0 is its first argument. This also blows away the stack space used
// by those two arguments.
mov x0, x21
mov x1, x19
// ... and then we call the function! Note that this is a function call so
// our frame stays on the stack to backtrace through.
blr x20
// .. technically we shouldn't get here, and I would like to write in an
// instruction which just aborts, but I don't know such an instruction in
// aarch64 land.
.cfi_endproc
SIZE(wasmtime_fiber_start)
// This omits the `.subsections_via_symbols` directive on macOS which means we
// can't GC specific intrinsics from this file, but it enables usage of the
// `adr` instruction above in lieu of figuring out a slightly more complicated
// way of implementing that.
#ifndef CFG_TARGET_OS_macos
FOOTER
#endif