Various improvements to differential fuzzing (#4845)

* Improve wasmi differential fuzzer * Support modules with a `start` function * Implement trap-matching to ensure that wasmi and Wasmtime both report the same flavor of trap. * Support differential fuzzing where no engines match Locally I was attempting to run against just one wasm engine with `ALLOWED_ENGINES=wasmi` but the fuzzer quickly panicked because the generated test case didn't match wasmi's configuration. This commit updates engine-selection in the differential fuzzer to return `None` if no engine is applicable, throwing out the test case. This won't be hit at all with oss-fuzz-based runs but for local runs it'll be useful to have. * Improve proposal support in differential fuzzer * De-prioritize unstable wasm proposals such as multi-memory and memory64 by making them more unlikely with `Unstructured::ratio`. * Allow fuzzing multi-table (reference types) and multi-memory by avoiding setting their maximums to 1 in `set_differential_config`. * Update selection of the pooling strategy to unconditionally support the selected module config rather than the other way around. * Improve handling of traps in differential fuzzing This commit fixes an issue found via local fuzzing where engines were reporting different results but the underlying reason for this was that one engine was hitting stack overflow before the other. To fix the underlying issue I updated the execution to check for stack overflow and, if hit, it discards the entire fuzz test case from then on. The rationale behind this is that each engine can have unique limits for stack overflow. One test case I was looking at for example would stack overflow at less than 1000 frames with epoch interruption enabled but would stack overflow at more than 1000 frames with it disabled. This means that the state after the trap started to diverge and it looked like the engines produced different results. While I was at it I also improved the "function call returned a trap" case to compare traps to make sure the same trap reason popped out. * Fix fuzzer tests
2022-09-02 14:16:02 -05:00
parent 3afb711a51
commit 10dbb19983
9 changed files with 192 additions and 55 deletions
--- a/crates/fuzzing/src/oracles.rs
+++ b/crates/fuzzing/src/oracles.rs
@@ -19,7 +19,7 @@ pub mod engine;
 mod stacks;

 use self::diff_wasmtime::WasmtimeInstance;
-use self::engine::DiffInstance;
+use self::engine::{DiffEngine, DiffInstance};
 use crate::generators::{self, DiffValue, DiffValueType};
 use arbitrary::Arbitrary;
 pub use stacks::check_stacks;
@@ -330,24 +330,28 @@ pub fn instantiate_with_dummy(store: &mut Store<StoreLimits>, module: &Module) -
 /// Evaluate the function identified by `name` in two different engine
 /// instances--`lhs` and `rhs`.
 ///
+/// Returns `Ok(true)` if more evaluations can happen or `Ok(false)` if the
+/// instances may have drifted apart and no more evaluations can happen.
+///
 /// # Panics
 ///
 /// This will panic if the evaluation is different between engines (e.g.,
 /// results are different, hashed instance is different, one side traps, etc.).
 pub fn differential(
    lhs: &mut dyn DiffInstance,
+    lhs_engine: &dyn DiffEngine,
    rhs: &mut WasmtimeInstance,
    name: &str,
    args: &[DiffValue],
    result_tys: &[DiffValueType],
-) -> anyhow::Result<()> {
+) -> anyhow::Result<bool> {
    log::debug!("Evaluating: `{}` with {:?}", name, args);
    let lhs_results = match lhs.evaluate(name, args, result_tys) {
        Ok(Some(results)) => Ok(results),
        Err(e) => Err(e),
        // this engine couldn't execute this type signature, so discard this
        // execution by returning success.
-        Ok(None) => return Ok(()),
+        Ok(None) => return Ok(true),
    };
    log::debug!(" -> results on {}: {:?}", lhs.name(), &lhs_results);

@@ -360,11 +364,24 @@ pub fn differential(
    match (lhs_results, rhs_results) {
        // If the evaluation succeeds, we compare the results.
        (Ok(lhs_results), Ok(rhs_results)) => assert_eq!(lhs_results, rhs_results),
-        // Both sides failed--this is an acceptable result (e.g., both sides
-        // trap at a divide by zero). We could compare the error strings perhaps
-        // (since the `lhs` and `rhs` could be failing for different reasons)
-        // but this seems good enough for now.
-        (Err(_), Err(_)) => {}
+
+        // Both sides failed. If either one hits a stack overflow then that's an
+        // engine defined limit which means we can no longer compare the state
+        // of the two instances, so `false` is returned and nothing else is
+        // compared.
+        //
+        // Otherwise, though, the same error should have popped out and this
+        // falls through to checking the intermediate state otherwise.
+        (Err(lhs), Err(rhs)) => {
+            let err = rhs.downcast::<Trap>().expect("not a trap");
+            let poisoned = err.trap_code() == Some(TrapCode::StackOverflow)
+                || lhs_engine.is_stack_overflow(&lhs);
+
+            if poisoned {
+                return Ok(false);
+            }
+            lhs_engine.assert_error_match(&err, &lhs);
+        }
        // A real bug is found if only one side fails.
        (Ok(_), Err(_)) => panic!("only the `rhs` ({}) failed for this input", rhs.name()),
        (Err(_), Ok(_)) => panic!("only the `lhs` ({}) failed for this input", lhs.name()),
@@ -392,7 +409,7 @@ pub fn differential(
        panic!("memories have differing values");
    }

-    Ok(())
+    Ok(true)
 }

 /// Invoke the given API calls.