Implement interrupting wasm code, reimplement stack overflow (#1490)

* Implement interrupting wasm code, reimplement stack overflow This commit is a relatively large change for wasmtime with two main goals: * Primarily this enables interrupting executing wasm code with a trap, preventing infinite loops in wasm code. Note that resumption of the wasm code is not a goal of this commit. * Additionally this commit reimplements how we handle stack overflow to ensure that host functions always have a reasonable amount of stack to run on. This fixes an issue where we might longjmp out of a host function, skipping destructors. Lots of various odds and ends end up falling out in this commit once the two goals above were implemented. The strategy for implementing this was also lifted from Spidermonkey and existing functionality inside of Cranelift. I've tried to write up thorough documentation of how this all works in `crates/environ/src/cranelift.rs` where gnarly-ish bits are. A brief summary of how this works is that each function and each loop header now checks to see if they're interrupted. Interrupts and the stack overflow check are actually folded into one now, where function headers check to see if they've run out of stack and the sentinel value used to indicate an interrupt, checked in loop headers, tricks functions into thinking they're out of stack. An interrupt is basically just writing a value to a location which is read by JIT code. When interrupts are delivered and what triggers them has been left up to embedders of the `wasmtime` crate. The `wasmtime::Store` type has a method to acquire an `InterruptHandle`, where `InterruptHandle` is a `Send` and `Sync` type which can travel to other threads (or perhaps even a signal handler) to get notified from. It's intended that this provides a good degree of flexibility when interrupting wasm code. Note though that this does have a large caveat where interrupts don't work when you're interrupting host code, so if you've got a host import blocking for a long time an interrupt won't actually be received until the wasm starts running again. Some fallout included from this change is: * Unix signal handlers are no longer registered with `SA_ONSTACK`. Instead they run on the native stack the thread was already using. This is possible since stack overflow isn't handled by hitting the guard page, but rather it's explicitly checked for in wasm now. Native stack overflow will continue to abort the process as usual. * Unix sigaltstack management is now no longer necessary since we don't use it any more. * Windows no longer has any need to reset guard pages since we no longer try to recover from faults on guard pages. * On all targets probestack intrinsics are disabled since we use a different mechanism for catching stack overflow. * The C API has been updated with interrupts handles. An example has also been added which shows off how to interrupt a module. Closes #139 Closes #860 Closes #900 * Update comment about magical interrupt value * Store stack limit as a global value, not a closure * Run rustfmt * Handle review comments * Add a comment about SA_ONSTACK * Use `usize` for type of `INTERRUPTED` * Parse human-readable durations * Bring back sigaltstack handling Allows libstd to print out stack overflow on failure still. * Add parsing and emission of stack limit-via-preamble * Fix new example for new apis * Fix host segfault test in release mode * Fix new doc example
2020-04-21 13:03:28 -05:00
parent 4a63a4d86e
commit c9a0ba81a0
45 changed files with 1361 additions and 143 deletions
--- a/cranelift/codegen/src/isa/x86/abi.rs
+++ b/cranelift/codegen/src/isa/x86/abi.rs
@@ -685,21 +685,32 @@ fn insert_common_prologue(
    fpr_slot: Option<&StackSlot>,
    isa: &dyn TargetIsa,
 ) {
-    if stack_size > 0 {
-        // Check if there is a special stack limit parameter. If so insert stack check.
-        if let Some(stack_limit_arg) = pos.func.special_param(ArgumentPurpose::StackLimit) {
-            // Total stack size is the size of all stack area used by the function, including
-            // pushed CSRs, frame pointer.
-            // Also, the size of a return address, implicitly pushed by a x86 `call` instruction,
-            // also should be accounted for.
-            // If any FPR are present, count them as well as necessary alignment space.
-            // TODO: Check if the function body actually contains a `call` instruction.
-            let mut total_stack_size =
-                (csrs.iter(GPR).len() + 1 + 1) as i64 * (isa.pointer_bytes() as isize) as i64;
-
-            total_stack_size += csrs.iter(FPR).len() as i64 * types::F64X2.bytes() as i64;
-
-            insert_stack_check(pos, total_stack_size, stack_limit_arg);
+    // If this is a leaf function with zero stack, then there's no need to
+    // insert a stack check since it can't overflow anything and
+    // forward-progress is guarantee so long as loop are handled anyway.
+    //
+    // If this has a stack size it could stack overflow, or if it isn't a leaf
+    // it could be part of a long call chain which we need to check anyway.
+    //
+    // First we look for the stack limit as a special argument to the function,
+    // and failing that we see if a custom stack limit factory has been provided
+    // which will be used to likely calculate the stack limit from the arguments
+    // or perhaps constants.
+    if stack_size > 0 || !pos.func.is_leaf() {
+        let scratch = ir::ValueLoc::Reg(RU::rax as RegUnit);
+        let stack_limit_arg = match pos.func.special_param(ArgumentPurpose::StackLimit) {
+            Some(arg) => {
+                let copy = pos.ins().copy(arg);
+                pos.func.locations[copy] = scratch;
+                Some(copy)
+            }
+            None => pos
+                .func
+                .stack_limit
+                .map(|gv| interpret_gv(pos, gv, scratch)),
+        };
+        if let Some(stack_limit_arg) = stack_limit_arg {
+            insert_stack_check(pos, stack_size, stack_limit_arg);
        }
    }

@@ -811,16 +822,76 @@ fn insert_common_prologue(
    );
 }

+/// Inserts code necessary to calculate `gv`.
+///
+/// Note that this is typically done with `ins().global_value(...)` but that
+/// requires legalization to run to encode it, and we're running super late
+/// here in the backend where legalization isn't possible. To get around this
+/// we manually interpret the `gv` specified and do register allocation for
+/// intermediate values.
+///
+/// This is an incomplete implementation of loading `GlobalValue` values to get
+/// compared to the stack pointer, but currently it serves enough functionality
+/// to get this implemented in `wasmtime` itself. This'll likely get expanded a
+/// bit over time!
+fn interpret_gv(pos: &mut EncCursor, gv: ir::GlobalValue, scratch: ir::ValueLoc) -> ir::Value {
+    match pos.func.global_values[gv] {
+        ir::GlobalValueData::VMContext => pos
+            .func
+            .special_param(ir::ArgumentPurpose::VMContext)
+            .expect("no vmcontext parameter found"),
+        ir::GlobalValueData::Load {
+            base,
+            offset,
+            global_type,
+            readonly: _,
+        } => {
+            let base = interpret_gv(pos, base, scratch);
+            let ret = pos
+                .ins()
+                .load(global_type, ir::MemFlags::trusted(), base, offset);
+            pos.func.locations[ret] = scratch;
+            return ret;
+        }
+        ref other => panic!("global value for stack limit not supported: {}", other),
+    }
+}
+
 /// Insert a check that generates a trap if the stack pointer goes
 /// below a value in `stack_limit_arg`.
 fn insert_stack_check(pos: &mut EncCursor, stack_size: i64, stack_limit_arg: ir::Value) {
    use crate::ir::condcodes::IntCC;

+    // Our stack pointer, after subtracting `stack_size`, must not be below
+    // `stack_limit_arg`. To do this we're going to add `stack_size` to
+    // `stack_limit_arg` and see if the stack pointer is below that. The
+    // `stack_size + stack_limit_arg` computation might overflow, however, due
+    // to how stack limits may be loaded and set externally to trigger a trap.
+    //
+    // To handle this we'll need an extra comparison to see if the stack
+    // pointer is already below `stack_limit_arg`. Most of the time this
+    // isn't necessary though since the stack limit which triggers a trap is
+    // likely a sentinel somewhere around `usize::max_value()`. In that case
+    // only conditionally emit this pre-flight check. That way most functions
+    // only have the one comparison, but are also guaranteed that if we add
+    // `stack_size` to `stack_limit_arg` is won't overflow.
+    //
+    // This does mean that code generators which use this stack check
+    // functionality need to ensure that values stored into the stack limit
+    // will never overflow if this threshold is added.
+    if stack_size >= 32 * 1024 {
+        let cflags = pos.ins().ifcmp_sp(stack_limit_arg);
+        pos.func.locations[cflags] = ir::ValueLoc::Reg(RU::rflags as RegUnit);
+        pos.ins().trapif(
+            IntCC::UnsignedGreaterThanOrEqual,
+            cflags,
+            ir::TrapCode::StackOverflow,
+        );
+    }
+
    // Copy `stack_limit_arg` into a %rax and use it for calculating
    // a SP threshold.
-    let stack_limit_copy = pos.ins().copy(stack_limit_arg);
-    pos.func.locations[stack_limit_copy] = ir::ValueLoc::Reg(RU::rax as RegUnit);
-    let sp_threshold = pos.ins().iadd_imm(stack_limit_copy, stack_size);
+    let sp_threshold = pos.ins().iadd_imm(stack_limit_arg, stack_size);
    pos.func.locations[sp_threshold] = ir::ValueLoc::Reg(RU::rax as RegUnit);

    // If the stack pointer currently reaches the SP threshold or below it then after opening