Add initial support for fused adapter trampolines (#4501)

* Add initial support for fused adapter trampolines This commit lands a significant new piece of functionality to Wasmtime's implementation of the component model in the form of the implementation of fused adapter trampolines. Internally within a component core wasm modules can communicate with each other by having their exports `canon lift`'d to get `canon lower`'d into a different component. This signifies that two components are communicating through a statically known interface via the canonical ABI at this time. Previously Wasmtime was able to identify that this communication was happening but it simply panicked with `unimplemented!` upon seeing it. This commit is the beginning of filling out this panic location with an actual implementation. The implementation route chosen here for fused adapters is to use a WebAssembly module itself for the implementation. This means that, at compile time of a component, Wasmtime is generating core WebAssembly modules which then get recursively compiled within Wasmtime as well. The choice to use WebAssembly itself as the implementation of fused adapters stems from a few motivations: * This does not represent a significant increase in the "trusted compiler base" of Wasmtime. Getting the Wasm -> CLIF translation correct once is hard enough much less for an entirely different IR to CLIF. By generating WebAssembly no new interactions with Cranelift are added which drastically reduces the possibilities for mistakes. * Using WebAssembly means that component adapters are insulated from miscompilations and mistakes. If something goes wrong it's defined well within the WebAssembly specification how it goes wrong and what happens as a result. This means that the "blast zone" for a wrong adapter is the component instance but not the entire host itself. Accesses to linear memory are guaranteed to be in-bounds and otherwise handled via well-defined traps. * A fully-finished fused adapter compiler is expected to be a significant and quite complex component of Wasmtime. Functionality along these lines is expected to be needed for Web-based polyfills of the component model and by using core WebAssembly it provides the opportunity to share code between Wasmtime and these polyfills for the component model. * Finally the runtime implementation of managing WebAssembly modules is already implemented and quite easy to integrate with, so representing fused adapters with WebAssembly results in very little extra support necessary for the runtime implementation of instantiating and managing a component. The compiler added in this commit is dubbed Wasmtime's Fused Adapter Compiler of Trampolines (FACT) because who doesn't like deriving a name from an acronym. Currently the trampoline compiler is limited in its support for interface types and only supports a few primitives. I plan on filing future PRs to flesh out the support here for all the variants of `InterfaceType`. For now this PR is primarily focused on all of the other infrastructure for the addition of a trampoline compiler. With the choice to use core WebAssembly to implement fused adapters it means that adapters need to be inserted into a module. Unfortunately adapters cannot all go into a single WebAssembly module because adapters themselves have dependencies which may be provided transitively through instances that were instantiated with other adapters. This means that a significant chunk of this PR (`adapt.rs`) is dedicated to determining precisely which adapters go into precisely which adapter modules. This partitioning process attempts to make large modules wherever it can to cut down on core wasm instantiations but is likely not optimal as it's just a simple heuristic today. With all of this added together it's now possible to start writing `*.wast` tests that internally have adapted modules communicating with one another. A `fused.wast` test suite was added as part of this PR which is the beginning of tests for the support of the fused adapter compiler added in this PR. Currently this is primarily testing some various topologies of adapters along with direct/indirect modes. This will grow many more tests over time as more types are supported. Overall I'm not 100% satisfied with the testing story of this PR. When a test fails it's very difficult to debug since everything is written in the text format of WebAssembly meaning there's no "conveniences" to print out the state of the world when things go wrong and easily debug. I think this will become even more apparent as more tests are written for more types in subsequent PRs. At this time though I know of no better alternative other than leaning pretty heavily on fuzz-testing to ensure this is all exercised. * Fix an unused field warning * Fix tests in `wasmtime-runtime` * Add some more tests for compiled trampolines * Remap exports when injecting adapters The exports of a component were accidentally left unmapped which meant that they indexed the instance indexes pre-adapter module insertion. * Fix typo * Rebase conflicts
2022-07-25 18:13:26 -05:00
parent 78d3e0b693
commit 97894bc65e
33 changed files with 3182 additions and 170 deletions
--- a/crates/environ/src/fact/core_types.rs
+++ b/crates/environ/src/fact/core_types.rs
@@ -0,0 +1,24 @@
+use std::collections::HashMap;
+use wasm_encoder::{TypeSection, ValType};
+
+/// A simple representation of the type section which automatically intern's
+/// types and ensures they're only defined once.
+#[derive(Default)]
+pub struct CoreTypes {
+    pub section: TypeSection,
+    intern: HashMap<(Vec<ValType>, Vec<ValType>), u32>,
+}
+
+impl CoreTypes {
+    pub fn function(&mut self, params: &[ValType], results: &[ValType]) -> u32 {
+        *self
+            .intern
+            .entry((params.to_vec(), results.to_vec()))
+            .or_insert_with(|| {
+                let idx = self.section.len();
+                self.section
+                    .function(params.iter().copied(), results.iter().copied());
+                idx
+            })
+    }
+}
--- a/crates/environ/src/fact/signature.rs
+++ b/crates/environ/src/fact/signature.rs
@@ -0,0 +1,265 @@
+//! Size, align, and flattening information about component model types.
+
+use crate::component::{InterfaceType, TypeFuncIndex, MAX_FLAT_PARAMS, MAX_FLAT_RESULTS};
+use crate::fact::{Context, Module};
+use wasm_encoder::ValType;
+
+/// Metadata about a core wasm signature which is created for a component model
+/// signature.
+pub struct Signature {
+    /// Core wasm parameters.
+    pub params: Vec<ValType>,
+    /// Core wasm results.
+    pub results: Vec<ValType>,
+    /// Indicator to whether parameters are indirect, meaning that the first
+    /// entry of `params` is a pointer type which all parameters are loaded
+    /// through.
+    pub params_indirect: bool,
+    /// Indicator whether results are passed indirectly. This may mean that
+    /// `results` is an `i32` or that `params` ends with an `i32` depending on
+    /// the `Context`.
+    pub results_indirect: bool,
+}
+
+pub(crate) fn align_to(n: usize, align: usize) -> usize {
+    assert!(align.is_power_of_two());
+    (n + (align - 1)) & !(align - 1)
+}
+
+impl Module<'_> {
+    /// Calculates the core wasm function signature for the component function
+    /// type specified within `Context`.
+    ///
+    /// This is used to generate the core wasm signatures for functions that are
+    /// imported (matching whatever was `canon lift`'d) and functions that are
+    /// exported (matching the generated function from `canon lower`).
+    pub(super) fn signature(&self, ty: TypeFuncIndex, context: Context) -> Signature {
+        let ty = &self.types[ty];
+
+        let mut params = self.flatten_types(ty.params.iter().map(|(_, ty)| *ty));
+        let mut params_indirect = false;
+        if params.len() > MAX_FLAT_PARAMS {
+            params = vec![ValType::I32];
+            params_indirect = true;
+        }
+
+        let mut results = self.flatten_types([ty.result]);
+        let mut results_indirect = false;
+        if results.len() > MAX_FLAT_RESULTS {
+            results_indirect = true;
+            match context {
+                // For a lifted function too-many-results gets translated to a
+                // returned pointer where results are read from. The callee
+                // allocates space here.
+                Context::Lift => results = vec![ValType::I32],
+                // For a lowered function too-many-results becomes a return
+                // pointer which is passed as the last argument. The caller
+                // allocates space here.
+                Context::Lower => {
+                    results.truncate(0);
+                    params.push(ValType::I32);
+                }
+            }
+        }
+        Signature {
+            params,
+            results,
+            params_indirect,
+            results_indirect,
+        }
+    }
+
+    /// Pushes the flat version of a list of component types into a final result
+    /// list.
+    pub(crate) fn flatten_types(
+        &self,
+        tys: impl IntoIterator<Item = InterfaceType>,
+    ) -> Vec<ValType> {
+        let mut result = Vec::new();
+        for ty in tys {
+            self.push_flat(&ty, &mut result);
+        }
+        result
+    }
+
+    fn push_flat(&self, ty: &InterfaceType, dst: &mut Vec<ValType>) {
+        match ty {
+            InterfaceType::Unit => {}
+
+            InterfaceType::Bool
+            | InterfaceType::S8
+            | InterfaceType::U8
+            | InterfaceType::S16
+            | InterfaceType::U16
+            | InterfaceType::S32
+            | InterfaceType::U32
+            | InterfaceType::Char => dst.push(ValType::I32),
+
+            InterfaceType::S64 | InterfaceType::U64 => dst.push(ValType::I64),
+
+            InterfaceType::Float32 => dst.push(ValType::F32),
+            InterfaceType::Float64 => dst.push(ValType::F64),
+
+            InterfaceType::String | InterfaceType::List(_) => {
+                dst.push(ValType::I32);
+                dst.push(ValType::I32);
+            }
+            InterfaceType::Record(r) => {
+                for field in self.types[*r].fields.iter() {
+                    self.push_flat(&field.ty, dst);
+                }
+            }
+            InterfaceType::Tuple(t) => {
+                for ty in self.types[*t].types.iter() {
+                    self.push_flat(ty, dst);
+                }
+            }
+            InterfaceType::Flags(f) => {
+                let flags = &self.types[*f];
+                let nflags = align_to(flags.names.len(), 32) / 32;
+                for _ in 0..nflags {
+                    dst.push(ValType::I32);
+                }
+            }
+            InterfaceType::Enum(_) => dst.push(ValType::I32),
+            InterfaceType::Option(t) => {
+                dst.push(ValType::I32);
+                self.push_flat(&self.types[*t], dst);
+            }
+            InterfaceType::Variant(t) => {
+                dst.push(ValType::I32);
+                let pos = dst.len();
+                let mut tmp = Vec::new();
+                for case in self.types[*t].cases.iter() {
+                    self.push_flat_variant(&case.ty, pos, &mut tmp, dst);
+                }
+            }
+            InterfaceType::Union(t) => {
+                dst.push(ValType::I32);
+                let pos = dst.len();
+                let mut tmp = Vec::new();
+                for ty in self.types[*t].types.iter() {
+                    self.push_flat_variant(ty, pos, &mut tmp, dst);
+                }
+            }
+            InterfaceType::Expected(t) => {
+                dst.push(ValType::I32);
+                let e = &self.types[*t];
+                let pos = dst.len();
+                let mut tmp = Vec::new();
+                self.push_flat_variant(&e.ok, pos, &mut tmp, dst);
+                self.push_flat_variant(&e.err, pos, &mut tmp, dst);
+            }
+        }
+    }
+
+    fn push_flat_variant(
+        &self,
+        ty: &InterfaceType,
+        pos: usize,
+        tmp: &mut Vec<ValType>,
+        dst: &mut Vec<ValType>,
+    ) {
+        tmp.truncate(0);
+        self.push_flat(ty, tmp);
+        for (i, a) in tmp.iter().enumerate() {
+            match dst.get_mut(pos + i) {
+                Some(b) => join(*a, b),
+                None => dst.push(*a),
+            }
+        }
+
+        fn join(a: ValType, b: &mut ValType) {
+            if a == *b {
+                return;
+            }
+            match (a, *b) {
+                (ValType::I32, ValType::F32) | (ValType::F32, ValType::I32) => *b = ValType::I32,
+                _ => *b = ValType::I64,
+            }
+        }
+    }
+
+    pub(crate) fn align(&self, ty: &InterfaceType) -> usize {
+        self.size_align(ty).1
+    }
+
+    /// Returns a (size, align) pair corresponding to the byte-size and
+    /// byte-alignment of the type specified.
+    //
+    // TODO: this is probably inefficient to entire recalculate at all phases,
+    // seems like it would be best to intern this in some sort of map somewhere.
+    pub(crate) fn size_align(&self, ty: &InterfaceType) -> (usize, usize) {
+        match ty {
+            InterfaceType::Unit => (0, 1),
+            InterfaceType::Bool | InterfaceType::S8 | InterfaceType::U8 => (1, 1),
+            InterfaceType::S16 | InterfaceType::U16 => (2, 2),
+            InterfaceType::S32
+            | InterfaceType::U32
+            | InterfaceType::Char
+            | InterfaceType::Float32 => (4, 4),
+            InterfaceType::S64 | InterfaceType::U64 | InterfaceType::Float64 => (8, 8),
+            InterfaceType::String | InterfaceType::List(_) => (8, 4),
+
+            InterfaceType::Record(r) => {
+                self.record_size_align(self.types[*r].fields.iter().map(|f| &f.ty))
+            }
+            InterfaceType::Tuple(t) => self.record_size_align(self.types[*t].types.iter()),
+            InterfaceType::Flags(f) => match self.types[*f].names.len() {
+                n if n <= 8 => (1, 1),
+                n if n <= 16 => (2, 2),
+                n if n <= 32 => (4, 4),
+                n => (4 * (align_to(n, 32) / 32), 4),
+            },
+            InterfaceType::Enum(t) => self.discrim_size_align(self.types[*t].names.len()),
+            InterfaceType::Option(t) => {
+                let ty = &self.types[*t];
+                self.variant_size_align([&InterfaceType::Unit, ty].into_iter())
+            }
+            InterfaceType::Variant(t) => {
+                self.variant_size_align(self.types[*t].cases.iter().map(|c| &c.ty))
+            }
+            InterfaceType::Union(t) => self.variant_size_align(self.types[*t].types.iter()),
+            InterfaceType::Expected(t) => {
+                let e = &self.types[*t];
+                self.variant_size_align([&e.ok, &e.err].into_iter())
+            }
+        }
+    }
+
+    pub(crate) fn record_size_align<'a>(
+        &self,
+        fields: impl Iterator<Item = &'a InterfaceType>,
+    ) -> (usize, usize) {
+        let mut size = 0;
+        let mut align = 1;
+        for ty in fields {
+            let (fsize, falign) = self.size_align(ty);
+            size = align_to(size, falign) + fsize;
+            align = align.max(falign);
+        }
+        (align_to(size, align), align)
+    }
+
+    fn variant_size_align<'a>(
+        &self,
+        cases: impl ExactSizeIterator<Item = &'a InterfaceType>,
+    ) -> (usize, usize) {
+        let (discrim_size, mut align) = self.discrim_size_align(cases.len());
+        let mut payload_size = 0;
+        for ty in cases {
+            let (csize, calign) = self.size_align(ty);
+            payload_size = payload_size.max(csize);
+            align = align.max(calign);
+        }
+        (align_to(discrim_size, align) + payload_size, align)
+    }
+
+    fn discrim_size_align<'a>(&self, cases: usize) -> (usize, usize) {
+        match cases {
+            n if n <= u8::MAX as usize => (1, 1),
+            n if n <= u16::MAX as usize => (2, 2),
+            _ => (4, 4),
+        }
+    }
+}
--- a/crates/environ/src/fact/trampoline.rs
+++ b/crates/environ/src/fact/trampoline.rs
@@ -0,0 +1,763 @@
+//! Low-level compilation of an fused adapter function.
+//!
+//! This module is tasked with the top-level `compile` function which creates a
+//! single WebAssembly function which will perform the steps of the fused
+//! adapter for an `AdapterData` provided. This is the "meat" of compilation
+//! where the validation of the canonical ABI or similar all happens to
+//! translate arguments from one module to another.
+//!
+//! ## Traps and their ordering
+//!
+//! Currently this compiler is pretty "loose" about the ordering of precisely
+//! what trap happens where. The main reason for this is that to core wasm all
+//! traps are the same and for fused adapters if a trap happens no intermediate
+//! side effects are visible (as designed by the canonical ABI itself). For this
+//! it's important to note that some of the precise choices of control flow here
+//! can be somewhat arbitrary, an intentional decision.
+
+use crate::component::{
+    InterfaceType, TypeRecordIndex, TypeTupleIndex, FLAG_MAY_ENTER, FLAG_MAY_LEAVE,
+    MAX_FLAT_PARAMS, MAX_FLAT_RESULTS,
+};
+use crate::fact::signature::{align_to, Signature};
+use crate::fact::traps::Trap;
+use crate::fact::{AdapterData, Context, Module, Options};
+use crate::GlobalIndex;
+use std::collections::HashMap;
+use std::mem;
+use std::ops::Range;
+use wasm_encoder::{BlockType, Encode, Instruction, Instruction::*, MemArg, ValType};
+
+struct Compiler<'a> {
+    /// The module that the adapter will eventually be inserted into.
+    module: &'a Module<'a>,
+
+    /// Metadata about the adapter that is being compiled.
+    adapter: &'a AdapterData,
+
+    /// The encoded WebAssembly function body so far, not including locals.
+    code: Vec<u8>,
+
+    /// Generated locals that this function will use.
+    ///
+    /// The first entry in the tuple is the number of locals and the second
+    /// entry is the type of those locals. This is pushed during compilation as
+    /// locals become necessary.
+    locals: Vec<(u32, ValType)>,
+
+    /// Total number of locals generated so far.
+    nlocals: u32,
+
+    /// Metadata about all `unreachable` trap instructions in this function and
+    /// what the trap represents. The offset within `self.code` is recorded as
+    /// well.
+    traps: Vec<(usize, Trap)>,
+
+    /// The function signature of the lowered half of this trampoline, or the
+    /// signature of the function that's being generated.
+    lower_sig: &'a Signature,
+
+    /// The function signature of the lifted half of this trampoline, or the
+    /// signature of the function that's imported the trampoline will call.
+    lift_sig: &'a Signature,
+}
+
+pub(super) fn compile(module: &Module<'_>, adapter: &AdapterData) -> (Vec<u8>, Vec<(usize, Trap)>) {
+    let lower_sig = &module.signature(adapter.lower.ty, Context::Lower);
+    let lift_sig = &module.signature(adapter.lift.ty, Context::Lift);
+    Compiler {
+        module,
+        adapter,
+        code: Vec::new(),
+        locals: Vec::new(),
+        nlocals: lower_sig.params.len() as u32,
+        traps: Vec::new(),
+        lower_sig,
+        lift_sig,
+    }
+    .compile()
+}
+
+/// Possible ways that a interface value is represented in the core wasm
+/// canonical ABI.
+enum Source<'a> {
+    /// This value is stored on the "stack" in wasm locals.
+    ///
+    /// This could mean that it's inline from the parameters to the function or
+    /// that after a function call the results were stored in locals and the
+    /// locals are the inline results.
+    Stack(Stack<'a>),
+
+    /// This value is stored in linear memory described by the `Memory`
+    /// structure.
+    Memory(Memory),
+}
+
+/// Same as `Source` but for where values are translated into.
+enum Destination {
+    /// This value is destined for the WebAssembly stack which means that
+    /// results are simply pushed as we go along.
+    Stack,
+
+    /// This value is to be placed in linear memory described by `Memory`.
+    Memory(Memory),
+}
+
+struct Stack<'a> {
+    /// The locals that comprise a particular value.
+    ///
+    /// The length of this list represents the flattened list of types that make
+    /// up the component value. Each list has the index of the local being
+    /// accessed as well as the type of the local itself.
+    locals: &'a [(u32, ValType)],
+}
+
+/// Representation of where a value is going to be stored in linear memory.
+struct Memory {
+    /// The index of the local that contains the base address of where the
+    /// storage is happening.
+    addr_local: u32,
+    /// A "static" offset that will be baked into wasm instructions for where
+    /// memory loads/stores happen.
+    offset: u32,
+    /// The index of memory in the wasm module memory index space that this
+    /// memory is referring to.
+    memory_idx: u32,
+}
+
+impl Compiler<'_> {
+    fn compile(&mut self) -> (Vec<u8>, Vec<(usize, Trap)>) {
+        // Check the instance flags required for this trampoline.
+        //
+        // This inserts the initial check required by `canon_lower` that the
+        // caller instance can be left and additionally checks the
+        // flags on the callee if necessary whether it can be entered.
+        self.trap_if_not_flag(self.adapter.lower.flags, FLAG_MAY_LEAVE, Trap::CannotLeave);
+        if self.adapter.called_as_export {
+            self.trap_if_not_flag(self.adapter.lift.flags, FLAG_MAY_ENTER, Trap::CannotEnter);
+            self.set_flag(self.adapter.lift.flags, FLAG_MAY_ENTER, false);
+        } else if self.module.debug {
+            self.assert_not_flag(
+                self.adapter.lift.flags,
+                FLAG_MAY_ENTER,
+                "may_enter should be unset",
+            );
+        }
+
+        // Perform the translation of arguments. Note that `FLAG_MAY_LEAVE` is
+        // cleared around this invocation for the callee as per the
+        // `canon_lift` definition in the spec. Additionally note that the
+        // precise ordering of traps here is not required since internal state
+        // is not visible to either instance and a trap will "lock down" both
+        // instances to no longer be visible. This means that we're free to
+        // reorder lifts/lowers and flags and such as is necessary and
+        // convenient here.
+        //
+        // TODO: if translation doesn't actually call any functions in either
+        // instance then there's no need to set/clear the flag here and that can
+        // be optimized away.
+        self.set_flag(self.adapter.lift.flags, FLAG_MAY_LEAVE, false);
+        let param_locals = self
+            .lower_sig
+            .params
+            .iter()
+            .enumerate()
+            .map(|(i, ty)| (i as u32, *ty))
+            .collect::<Vec<_>>();
+        self.translate_params(&param_locals);
+        self.set_flag(self.adapter.lift.flags, FLAG_MAY_LEAVE, true);
+
+        // With all the arguments on the stack the actual target function is
+        // now invoked. The core wasm results of the function are then placed
+        // into locals for result translation afterwards.
+        self.instruction(Call(self.adapter.callee.as_u32()));
+        let mut result_locals = Vec::with_capacity(self.lift_sig.results.len());
+        for ty in self.lift_sig.results.iter().rev() {
+            let local = self.gen_local(*ty);
+            self.instruction(LocalSet(local));
+            result_locals.push((local, *ty));
+        }
+        result_locals.reverse();
+
+        // Like above during the translation of results the caller cannot be
+        // left (as we might invoke things like `realloc`). Again the precise
+        // order of everything doesn't matter since intermediate states cannot
+        // be witnessed, hence the setting of flags here to encapsulate both
+        // liftings and lowerings.
+        //
+        // TODO: like above the management of the `MAY_LEAVE` flag can probably
+        // be elided here for "simple" results.
+        self.set_flag(self.adapter.lower.flags, FLAG_MAY_LEAVE, false);
+        self.translate_results(&param_locals, &result_locals);
+        self.set_flag(self.adapter.lower.flags, FLAG_MAY_LEAVE, true);
+
+        // And finally post-return state is handled here once all results/etc
+        // are all translated.
+        if let Some(func) = self.adapter.lift.post_return {
+            for (result, _) in result_locals.iter() {
+                self.instruction(LocalGet(*result));
+            }
+            self.instruction(Call(func.as_u32()));
+        }
+        if self.adapter.called_as_export {
+            self.set_flag(self.adapter.lift.flags, FLAG_MAY_ENTER, true);
+        }
+
+        self.finish()
+    }
+
+    fn translate_params(&mut self, param_locals: &[(u32, ValType)]) {
+        let src_tys = &self.module.types[self.adapter.lower.ty].params;
+        let src_tys = src_tys.iter().map(|(_, ty)| *ty).collect::<Vec<_>>();
+        let dst_tys = &self.module.types[self.adapter.lift.ty].params;
+        let dst_tys = dst_tys.iter().map(|(_, ty)| *ty).collect::<Vec<_>>();
+
+        // TODO: handle subtyping
+        assert_eq!(src_tys.len(), dst_tys.len());
+
+        let src_flat = self.module.flatten_types(src_tys.iter().copied());
+        let dst_flat = self.module.flatten_types(dst_tys.iter().copied());
+
+        let src = if src_flat.len() <= MAX_FLAT_PARAMS {
+            Source::Stack(Stack {
+                locals: &param_locals[..src_flat.len()],
+            })
+        } else {
+            // If there are too many parameters then that means the parameters
+            // are actually a tuple stored in linear memory addressed by the
+            // first parameter local.
+            let (addr, ty) = param_locals[0];
+            assert_eq!(ty, self.adapter.lower.ptr());
+            let align = src_tys
+                .iter()
+                .map(|t| self.module.align(t))
+                .max()
+                .unwrap_or(1);
+            Source::Memory(self.memory_operand(&self.adapter.lower, addr, align))
+        };
+
+        let dst = if dst_flat.len() <= MAX_FLAT_PARAMS {
+            Destination::Stack
+        } else {
+            // If there are too many parameters then space is allocated in the
+            // destination module for the parameters via its `realloc` function.
+            let (size, align) = self.module.record_size_align(dst_tys.iter());
+            Destination::Memory(self.malloc(&self.adapter.lift, size, align))
+        };
+
+        let srcs = src
+            .record_field_sources(self.module, src_tys.iter().copied())
+            .zip(src_tys.iter());
+        let dsts = dst
+            .record_field_sources(self.module, dst_tys.iter().copied())
+            .zip(dst_tys.iter());
+        for ((src, src_ty), (dst, dst_ty)) in srcs.zip(dsts) {
+            self.translate(&src_ty, &src, &dst_ty, &dst);
+        }
+
+        // If the destination was linear memory instead of the stack then the
+        // actual parameter that we're passing is the address of the values
+        // stored, so ensure that's happening in the wasm body here.
+        if let Destination::Memory(mem) = dst {
+            self.instruction(LocalGet(mem.addr_local));
+        }
+    }
+
+    fn translate_results(
+        &mut self,
+        param_locals: &[(u32, ValType)],
+        result_locals: &[(u32, ValType)],
+    ) {
+        let src_ty = self.module.types[self.adapter.lift.ty].result;
+        let dst_ty = self.module.types[self.adapter.lower.ty].result;
+
+        let src_flat = self.module.flatten_types([src_ty]);
+        let dst_flat = self.module.flatten_types([dst_ty]);
+
+        let src = if src_flat.len() <= MAX_FLAT_RESULTS {
+            Source::Stack(Stack {
+                locals: result_locals,
+            })
+        } else {
+            // The original results to read from in this case come from the
+            // return value of the function itself. The imported function will
+            // return a linear memory address at which the values can be read
+            // from.
+            let align = self.module.align(&src_ty);
+            assert_eq!(result_locals.len(), 1);
+            let (addr, ty) = result_locals[0];
+            assert_eq!(ty, self.adapter.lift.ptr());
+            Source::Memory(self.memory_operand(&self.adapter.lift, addr, align))
+        };
+
+        let dst = if dst_flat.len() <= MAX_FLAT_RESULTS {
+            Destination::Stack
+        } else {
+            // This is slightly different than `translate_params` where the
+            // return pointer was provided by the caller of this function
+            // meaning the last parameter local is a pointer into linear memory.
+            let align = self.module.align(&dst_ty);
+            let (addr, ty) = *param_locals.last().expect("no retptr");
+            assert_eq!(ty, self.adapter.lower.ptr());
+            Destination::Memory(self.memory_operand(&self.adapter.lower, addr, align))
+        };
+
+        self.translate(&src_ty, &src, &dst_ty, &dst);
+    }
+
+    fn translate(
+        &mut self,
+        src_ty: &InterfaceType,
+        src: &Source<'_>,
+        dst_ty: &InterfaceType,
+        dst: &Destination,
+    ) {
+        if let Source::Memory(mem) = src {
+            self.assert_aligned(src_ty, mem);
+        }
+        if let Destination::Memory(mem) = dst {
+            self.assert_aligned(dst_ty, mem);
+        }
+        match src_ty {
+            InterfaceType::Unit => self.translate_unit(src, dst_ty, dst),
+            InterfaceType::Bool => self.translate_bool(src, dst_ty, dst),
+            InterfaceType::U8 => self.translate_u8(src, dst_ty, dst),
+            InterfaceType::U32 => self.translate_u32(src, dst_ty, dst),
+            InterfaceType::Record(t) => self.translate_record(*t, src, dst_ty, dst),
+            InterfaceType::Tuple(t) => self.translate_tuple(*t, src, dst_ty, dst),
+
+            InterfaceType::String => {
+                // consider this field used for now until this is fully
+                // implemented.
+                drop(&self.adapter.lift.string_encoding);
+                unimplemented!("don't know how to translate strings")
+            }
+
+            // TODO: this needs to be filled out for all the other interface
+            // types.
+            ty => unimplemented!("don't know how to translate {ty:?}"),
+        }
+    }
+
+    fn translate_unit(&mut self, src: &Source<'_>, dst_ty: &InterfaceType, dst: &Destination) {
+        // TODO: subtyping
+        assert!(matches!(dst_ty, InterfaceType::Unit));
+        drop((src, dst));
+    }
+
+    fn translate_bool(&mut self, src: &Source<'_>, dst_ty: &InterfaceType, dst: &Destination) {
+        // TODO: subtyping
+        assert!(matches!(dst_ty, InterfaceType::Bool));
+        self.push_dst_addr(dst);
+
+        // Booleans are canonicalized to 0 or 1 as they pass through the
+        // component boundary, so use a `select` instruction to do so.
+        self.instruction(I32Const(1));
+        self.instruction(I32Const(0));
+        match src {
+            Source::Memory(mem) => self.i32_load8u(mem),
+            Source::Stack(stack) => self.stack_get(stack, ValType::I32),
+        }
+        self.instruction(Select);
+
+        match dst {
+            Destination::Memory(mem) => self.i32_store8(mem),
+            Destination::Stack => {}
+        }
+    }
+
+    fn translate_u8(&mut self, src: &Source<'_>, dst_ty: &InterfaceType, dst: &Destination) {
+        // TODO: subtyping
+        assert!(matches!(dst_ty, InterfaceType::U8));
+        self.push_dst_addr(dst);
+        match src {
+            Source::Memory(mem) => self.i32_load8u(mem),
+            Source::Stack(stack) => self.stack_get(stack, ValType::I32),
+        }
+        match dst {
+            Destination::Memory(mem) => self.i32_store8(mem),
+            Destination::Stack => {}
+        }
+    }
+
+    fn translate_u32(&mut self, src: &Source<'_>, dst_ty: &InterfaceType, dst: &Destination) {
+        // TODO: subtyping
+        assert!(matches!(dst_ty, InterfaceType::U32));
+        self.push_dst_addr(dst);
+        match src {
+            Source::Memory(mem) => self.i32_load(mem),
+            Source::Stack(stack) => self.stack_get(stack, ValType::I32),
+        }
+        match dst {
+            Destination::Memory(mem) => self.i32_store(mem),
+            Destination::Stack => {}
+        }
+    }
+
+    fn translate_record(
+        &mut self,
+        src_ty: TypeRecordIndex,
+        src: &Source<'_>,
+        dst_ty: &InterfaceType,
+        dst: &Destination,
+    ) {
+        let src_ty = &self.module.types[src_ty];
+        let dst_ty = match dst_ty {
+            InterfaceType::Record(r) => &self.module.types[*r],
+            _ => panic!("expected a record"),
+        };
+
+        // TODO: subtyping
+        assert_eq!(src_ty.fields.len(), dst_ty.fields.len());
+
+        // First a map is made of the source fields to where they're coming
+        // from (e.g. which offset or which locals). This map is keyed by the
+        // fields' names
+        let mut src_fields = HashMap::new();
+        for (i, src) in src
+            .record_field_sources(self.module, src_ty.fields.iter().map(|f| f.ty))
+            .enumerate()
+        {
+            let field = &src_ty.fields[i];
+            src_fields.insert(&field.name, (src, &field.ty));
+        }
+
+        // .. and next translation is performed in the order of the destination
+        // fields in case the destination is the stack to ensure that the stack
+        // has the fields all in the right order.
+        //
+        // Note that the lookup in `src_fields` is an infallible lookup which
+        // will panic if the field isn't found.
+        //
+        // TODO: should that lookup be fallible with subtyping?
+        for (i, dst) in dst
+            .record_field_sources(self.module, dst_ty.fields.iter().map(|f| f.ty))
+            .enumerate()
+        {
+            let field = &dst_ty.fields[i];
+            let (src, src_ty) = &src_fields[&field.name];
+            self.translate(src_ty, src, &field.ty, &dst);
+        }
+    }
+
+    fn translate_tuple(
+        &mut self,
+        src_ty: TypeTupleIndex,
+        src: &Source<'_>,
+        dst_ty: &InterfaceType,
+        dst: &Destination,
+    ) {
+        let src_ty = &self.module.types[src_ty];
+        let dst_ty = match dst_ty {
+            InterfaceType::Tuple(t) => &self.module.types[*t],
+            _ => panic!("expected a tuple"),
+        };
+
+        // TODO: subtyping
+        assert_eq!(src_ty.types.len(), dst_ty.types.len());
+
+        let srcs = src
+            .record_field_sources(self.module, src_ty.types.iter().copied())
+            .zip(src_ty.types.iter());
+        let dsts = dst
+            .record_field_sources(self.module, dst_ty.types.iter().copied())
+            .zip(dst_ty.types.iter());
+        for ((src, src_ty), (dst, dst_ty)) in srcs.zip(dsts) {
+            self.translate(src_ty, &src, dst_ty, &dst);
+        }
+    }
+
+    fn trap_if_not_flag(&mut self, flags_global: GlobalIndex, flag_to_test: i32, trap: Trap) {
+        self.instruction(GlobalGet(flags_global.as_u32()));
+        self.instruction(I32Const(flag_to_test));
+        self.instruction(I32And);
+        self.instruction(I32Eqz);
+        self.instruction(If(BlockType::Empty));
+        self.trap(trap);
+        self.instruction(End);
+    }
+
+    fn assert_not_flag(&mut self, flags_global: GlobalIndex, flag_to_test: i32, msg: &'static str) {
+        self.instruction(GlobalGet(flags_global.as_u32()));
+        self.instruction(I32Const(flag_to_test));
+        self.instruction(I32And);
+        self.instruction(If(BlockType::Empty));
+        self.trap(Trap::AssertFailed(msg));
+        self.instruction(End);
+    }
+
+    fn set_flag(&mut self, flags_global: GlobalIndex, flag_to_set: i32, value: bool) {
+        self.instruction(GlobalGet(flags_global.as_u32()));
+        if value {
+            self.instruction(I32Const(flag_to_set));
+            self.instruction(I32Or);
+        } else {
+            self.instruction(I32Const(!flag_to_set));
+            self.instruction(I32And);
+        }
+        self.instruction(GlobalSet(flags_global.as_u32()));
+    }
+
+    fn verify_aligned(&mut self, local: u32, align: usize) {
+        // If the alignment is 1 then everything is trivially aligned and the
+        // check can be omitted.
+        if align == 1 {
+            return;
+        }
+        self.instruction(LocalGet(local));
+        assert!(align.is_power_of_two());
+        let mask = i32::try_from(align - 1).unwrap();
+        self.instruction(I32Const(mask));
+        self.instruction(I32And);
+        self.instruction(If(BlockType::Empty));
+        self.trap(Trap::UnalignedPointer);
+        self.instruction(End);
+    }
+
+    fn assert_aligned(&mut self, ty: &InterfaceType, mem: &Memory) {
+        if !self.module.debug {
+            return;
+        }
+        let align = self.module.align(ty);
+        if align == 1 {
+            return;
+        }
+        assert!(align.is_power_of_two());
+        self.instruction(LocalGet(mem.addr_local));
+        self.instruction(I32Const(mem.i32_offset()));
+        self.instruction(I32Add);
+        let mask = i32::try_from(align - 1).unwrap();
+        self.instruction(I32Const(mask));
+        self.instruction(I32And);
+        self.instruction(If(BlockType::Empty));
+        self.trap(Trap::AssertFailed("pointer not aligned"));
+        self.instruction(End);
+    }
+
+    fn malloc(&mut self, opts: &Options, size: usize, align: usize) -> Memory {
+        let addr_local = self.gen_local(opts.ptr());
+        let realloc = opts.realloc.unwrap();
+        if opts.memory64 {
+            self.instruction(I64Const(0));
+            self.instruction(I64Const(0));
+            self.instruction(I64Const(i64::try_from(align).unwrap()));
+            self.instruction(I64Const(i64::try_from(size).unwrap()));
+        } else {
+            self.instruction(I32Const(0));
+            self.instruction(I32Const(0));
+            self.instruction(I32Const(i32::try_from(align).unwrap()));
+            self.instruction(I32Const(i32::try_from(size).unwrap()));
+        }
+        self.instruction(Call(realloc.as_u32()));
+        self.instruction(LocalSet(addr_local));
+        self.memory_operand(opts, addr_local, align)
+    }
+
+    fn memory_operand(&mut self, opts: &Options, addr_local: u32, align: usize) -> Memory {
+        let memory = opts.memory.unwrap();
+        self.verify_aligned(addr_local, align);
+        Memory {
+            addr_local,
+            offset: 0,
+            memory_idx: memory.as_u32(),
+        }
+    }
+
+    fn gen_local(&mut self, ty: ValType) -> u32 {
+        // TODO: see if local reuse is necessary, right now this always
+        // generates a new local.
+        match self.locals.last_mut() {
+            Some((cnt, prev_ty)) if ty == *prev_ty => *cnt += 1,
+            _ => self.locals.push((1, ty)),
+        }
+        self.nlocals += 1;
+        self.nlocals - 1
+    }
+
+    fn instruction(&mut self, instr: Instruction) {
+        instr.encode(&mut self.code);
+    }
+
+    fn trap(&mut self, trap: Trap) {
+        self.traps.push((self.code.len(), trap));
+        self.instruction(Unreachable);
+    }
+
+    fn finish(&mut self) -> (Vec<u8>, Vec<(usize, Trap)>) {
+        self.instruction(End);
+
+        let mut bytes = Vec::new();
+
+        // Encode all locals used for this function
+        self.locals.len().encode(&mut bytes);
+        for (count, ty) in self.locals.iter() {
+            count.encode(&mut bytes);
+            ty.encode(&mut bytes);
+        }
+
+        // Factor in the size of the encodings of locals into the offsets of
+        // traps.
+        for (offset, _) in self.traps.iter_mut() {
+            *offset += bytes.len();
+        }
+
+        // Then append the function we built and return
+        bytes.extend_from_slice(&self.code);
+        (bytes, mem::take(&mut self.traps))
+    }
+
+    fn stack_get(&mut self, stack: &Stack<'_>, dst_ty: ValType) {
+        assert_eq!(stack.locals.len(), 1);
+        let (idx, src_ty) = stack.locals[0];
+        self.instruction(LocalGet(idx));
+        match (src_ty, dst_ty) {
+            (ValType::I32, ValType::I32)
+            | (ValType::I64, ValType::I64)
+            | (ValType::F32, ValType::F32)
+            | (ValType::F64, ValType::F64) => {}
+
+            (ValType::I32, ValType::F32) => self.instruction(F32ReinterpretI32),
+            (ValType::I64, ValType::I32) => self.instruction(I32WrapI64),
+            (ValType::I64, ValType::F64) => self.instruction(F64ReinterpretI64),
+            (ValType::F64, ValType::F32) => self.instruction(F32DemoteF64),
+            (ValType::I64, ValType::F32) => {
+                self.instruction(F64ReinterpretI64);
+                self.instruction(F32DemoteF64);
+            }
+
+            // should not be possible given the `join` function for variants
+            (ValType::I32, ValType::I64)
+            | (ValType::I32, ValType::F64)
+            | (ValType::F32, ValType::I32)
+            | (ValType::F32, ValType::I64)
+            | (ValType::F32, ValType::F64)
+            | (ValType::F64, ValType::I32)
+            | (ValType::F64, ValType::I64)
+
+            // not used in the component model
+            | (ValType::ExternRef, _)
+            | (_, ValType::ExternRef)
+            | (ValType::FuncRef, _)
+            | (_, ValType::FuncRef)
+            | (ValType::V128, _)
+            | (_, ValType::V128) => {
+                panic!("cannot get {dst_ty:?} from {src_ty:?} local");
+            }
+        }
+    }
+
+    fn i32_load8u(&mut self, mem: &Memory) {
+        self.instruction(LocalGet(mem.addr_local));
+        self.instruction(I32Load8_U(mem.memarg(0)));
+    }
+
+    fn i32_load(&mut self, mem: &Memory) {
+        self.instruction(LocalGet(mem.addr_local));
+        self.instruction(I32Load(mem.memarg(2)));
+    }
+
+    fn push_dst_addr(&mut self, dst: &Destination) {
+        if let Destination::Memory(mem) = dst {
+            self.instruction(LocalGet(mem.addr_local));
+        }
+    }
+
+    fn i32_store8(&mut self, mem: &Memory) {
+        self.instruction(I32Store8(mem.memarg(0)));
+    }
+
+    fn i32_store(&mut self, mem: &Memory) {
+        self.instruction(I32Store(mem.memarg(2)));
+    }
+}
+
+impl<'a> Source<'a> {
+    /// Given this `Source` returns an iterator over the `Source` for each of
+    /// the component `fields` specified.
+    ///
+    /// This will automatically slice stack-based locals to the appropriate
+    /// width for each component type and additionally calculate the appropriate
+    /// offset for each memory-based type.
+    fn record_field_sources<'b>(
+        &'b self,
+        module: &'b Module,
+        fields: impl IntoIterator<Item = InterfaceType> + 'b,
+    ) -> impl Iterator<Item = Source<'a>> + 'b
+    where
+        'a: 'b,
+    {
+        let mut offset = 0;
+        fields.into_iter().map(move |ty| match self {
+            Source::Memory(mem) => {
+                let (size, align) = module.size_align(&ty);
+                offset = align_to(offset, align) + size;
+                Source::Memory(mem.bump(offset - size))
+            }
+            Source::Stack(stack) => {
+                let cnt = module.flatten_types([ty]).len();
+                offset += cnt;
+                Source::Stack(stack.slice(offset - cnt..offset))
+            }
+        })
+    }
+}
+
+impl Destination {
+    /// Same as `Source::record_field_sources` but for destinations.
+    fn record_field_sources<'a>(
+        &'a self,
+        module: &'a Module,
+        fields: impl IntoIterator<Item = InterfaceType> + 'a,
+    ) -> impl Iterator<Item = Destination> + 'a {
+        let mut offset = 0;
+        fields.into_iter().map(move |ty| match self {
+            // TODO: dedupe with above?
+            Destination::Memory(mem) => {
+                let (size, align) = module.size_align(&ty);
+                offset = align_to(offset, align) + size;
+                Destination::Memory(mem.bump(offset - size))
+            }
+            Destination::Stack => Destination::Stack,
+        })
+    }
+}
+
+impl Memory {
+    fn i32_offset(&self) -> i32 {
+        self.offset as i32
+    }
+
+    fn memarg(&self, align: u32) -> MemArg {
+        MemArg {
+            offset: u64::from(self.offset),
+            align,
+            memory_index: self.memory_idx,
+        }
+    }
+
+    fn bump(&self, offset: usize) -> Memory {
+        Memory {
+            addr_local: self.addr_local,
+            memory_idx: self.memory_idx,
+            offset: self.offset + u32::try_from(offset).unwrap(),
+        }
+    }
+}
+
+impl<'a> Stack<'a> {
+    fn slice(&self, range: Range<usize>) -> Stack<'a> {
+        Stack {
+            locals: &self.locals[range],
+        }
+    }
+}
+
+impl Options {
+    fn ptr(&self) -> ValType {
+        if self.memory64 {
+            ValType::I64
+        } else {
+            ValType::I32
+        }
+    }
+}
--- a/crates/environ/src/fact/traps.rs
+++ b/crates/environ/src/fact/traps.rs
@@ -0,0 +1,105 @@
+//! Module used to encode failure messages associated with traps in an adapter
+//! module.
+//!
+//! This module is a bit forward-looking in an attempt to help assist with
+//! debugging issues with adapter modules and their implementation. This isn't
+//! actually wired up to any decoder at this time and may end up getting deleted
+//! entirely depending on how things go.
+//!
+//! Currently in core wasm the `unreachable` instruction and other traps have no
+//! ability to assign failure messages with traps. This means that if an adapter
+//! fails all you have is the program counter into the wasm function, but
+//! there's not actually any source corresponding to wasm adapters either. This
+//! module is an attempt to assign optional string messages to `unreachable`
+//! trap instructions so, when sufficient debugging options are enabled, these
+//! trap messages could be displayed instead of a bland "unreachable" trap
+//! message.
+//!
+//! This information is currently encoded as a custom section in the wasm
+//! module.
+
+use std::collections::HashMap;
+use std::fmt;
+use wasm_encoder::Encode;
+
+#[derive(Hash, PartialEq, Eq, Copy, Clone)]
+pub enum Trap {
+    CannotLeave,
+    CannotEnter,
+    UnalignedPointer,
+    AssertFailed(&'static str),
+}
+
+#[derive(Default)]
+pub struct TrapSection {
+    trap_to_index: HashMap<Trap, usize>,
+    trap_list: Vec<Trap>,
+    function_traps: Vec<(u32, Vec<(usize, usize)>)>,
+}
+
+impl TrapSection {
+    /// Appends a list of traps found within a function.
+    ///
+    /// The `func` is the core wasm function index that is being described. The
+    /// `traps` is a list of `(offset, trap)` where `offset` is the offset
+    /// within the function body itself and `trap` is the description of the
+    /// trap of the opcode at that offset.
+    pub fn append(&mut self, func: u32, traps: Vec<(usize, Trap)>) {
+        if traps.is_empty() {
+            return;
+        }
+
+        // Deduplicate `Trap` annotations to avoid repeating the trap string
+        // internally within the custom section.
+        let traps = traps
+            .into_iter()
+            .map(|(offset, trap)| {
+                let trap = *self.trap_to_index.entry(trap).or_insert_with(|| {
+                    let idx = self.trap_list.len();
+                    self.trap_list.push(trap);
+                    idx
+                });
+                (offset, trap)
+            })
+            .collect();
+        self.function_traps.push((func, traps));
+    }
+
+    /// Creates the custom section payload of this section to be encoded into a
+    /// core wasm module.
+    pub fn finish(self) -> Vec<u8> {
+        let mut data = Vec::new();
+
+        // First append all trap messages which will be indexed below.
+        self.trap_list.len().encode(&mut data);
+        for trap in self.trap_list.iter() {
+            trap.to_string().encode(&mut data);
+        }
+
+        // Afterwards encode trap information for all known functions where
+        // offsets are relative to the body of the function index specified and
+        // the trap message is a pointer into the table built above this.
+        self.function_traps.len().encode(&mut data);
+        for (func, traps) in self.function_traps.iter() {
+            func.encode(&mut data);
+            traps.len().encode(&mut data);
+            for (func_offset, trap_message) in traps {
+                func_offset.encode(&mut data);
+                trap_message.encode(&mut data);
+            }
+        }
+
+        data
+    }
+}
+
+impl fmt::Display for Trap {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        match self {
+            Trap::CannotLeave => "cannot leave instance".fmt(f),
+            Trap::CannotEnter => "cannot enter instance".fmt(f),
+            Trap::UnalignedPointer => "pointer not aligned correctly".fmt(f),
+            Trap::AssertFailed(s) => write!(f, "assertion failure: {}", s),
+        }
+    }
+}