Files
wasmtime/crates/environ/src/component/translate/adapt.rs
Alex Crichton bc8e36a6af Refactor and optimize the flat type calculations (#4708)
* Optimize flat type representation calculations

Previously calculating the flat type representation would be done
recursively for an entire type tree every time it was visited.
Additionally the flat type representation was entirely built only to be
thrown away if it was too large at the end. This chiefly presented a
source of recursion based on the type structure in the component model
which fuzzing does not like as it reports stack overflows.

This commit overhauls the representation of flat types in Wasmtime by
caching the representation for each type in the compile-time
`ComponentTypesBuilder` structure. This avoids recalculating each time
the flat representation is queried and additionally allows opportunity
to have more short-circuiting to avoid building overly-large vectors.

* Remove duplicate flat count calculation in wasmtime

Roughly share the infrastructure in the `wasmtime-environ` crate, namely
the non-recursive and memoizing nature of the calculation.

* Fix component fuzz build

* Fix example compile
2022-08-16 13:31:47 -05:00

450 lines
20 KiB
Rust

//! Identification and creation of fused adapter modules in Wasmtime.
//!
//! A major piece of the component model is the ability for core wasm modules to
//! talk to each other through the use of lifted and lowered functions. For
//! example one core wasm module can export a function which is lifted. Another
//! component could import that lifted function, lower it, and pass it as the
//! import to another core wasm module. This is what Wasmtime calls "adapter
//! fusion" where two core wasm functions are coming together through the
//! component model.
//!
//! There are a few ingredients during adapter fusion:
//!
//! * A core wasm function which is "lifted".
//! * A "lift type" which is the type that the component model function had in
//! the original component
//! * A "lower type" which is the type that the component model function has
//! in the destination component (the one the uses `canon lower`)
//! * Configuration options for both the lift and the lower operations such as
//! memories, reallocs, etc.
//!
//! With these ingredients combined Wasmtime must produce a function which
//! connects the two components through the options specified. The fused adapter
//! performs tasks such as validation of passed values, copying data between
//! linear memories, etc.
//!
//! Wasmtime's current implementation of fused adapters is designed to reduce
//! complexity elsewhere as much as possible while also being suitable for being
//! used as a polyfill for the component model in JS environments as well. To
//! that end Wasmtime implements a fused adapter with another wasm module that
//! it itself generates on the fly. The usage of WebAssembly for fused adapters
//! has a number of advantages:
//!
//! * There is no need to create a raw Cranelift-based compiler. This is where
//! majority of "unsafety" lives in Wasmtime so reducing the need to lean on
//! this or audit another compiler is predicted to weed out a whole class of
//! bugs in the fused adapter compiler.
//!
//! * As mentioned above generation of WebAssembly modules means that this is
//! suitable for use in JS environments. For example a hypothetical tool which
//! polyfills a component onto the web today would need to do something for
//! adapter modules, and ideally the adapters themselves are speedy. While
//! this could all be written in JS the adapting process is quite nontrivial
//! so sharing code with Wasmtime would be ideal.
//!
//! * Using WebAssembly insulates the implementation to bugs to a certain
//! degree. While logic bugs are still possible it should be much more
//! difficult to have segfaults or things like that. With adapters exclusively
//! executing inside a WebAssembly sandbox like everything else the failure
//! modes to the host at least should be minimized.
//!
//! * Integration into the runtime is relatively simple, the adapter modules are
//! just another kind of wasm module to instantiate and wire up at runtime.
//! The goal is that the `GlobalInitializer` list that is processed at runtime
//! will have all of its `Adapter`-using variants erased by the time it makes
//! its way all the way up to Wasmtime. This means that the support in
//! Wasmtime prior to adapter modules is actually the same as the support
//! after adapter modules are added, keeping the runtime fiddly bits quite
//! minimal.
//!
//! This isn't to say that this approach isn't without its disadvantages of
//! course. For now though this seems to be a reasonable set of tradeoffs for
//! the development stage of the component model proposal.
//!
//! ## Creating adapter modules
//!
//! With WebAssembly itself being used to implement fused adapters, Wasmtime
//! still has the question of how to organize the adapter functions into actual
//! wasm modules.
//!
//! The first thing you might reach for is to put all the adapters into the same
//! wasm module. This cannot be done, however, because some adapters may depend
//! on other adapters (transitively) to be created. This means that if
//! everything were in the same module there would be no way to instantiate the
//! module. An example of this dependency is an adapter (A) used to create a
//! core wasm instance (M) whose exported memory is then referenced by another
//! adapter (B). In this situation the adapter B cannot be in the same module
//! as adapter A because B needs the memory of M but M is created with A which
//! would otherwise create a circular dependency.
//!
//! The second possibility of organizing adapter modules would be to place each
//! fused adapter into its own module. Each `canon lower` would effectively
//! become a core wasm module instantiation at that point. While this works it's
//! currently believed to be a bit too fine-grained. For example it would mean
//! that importing a dozen lowered functions into a module could possibly result
//! in up to a dozen different adapter modules. While this possibility could
//! work it has been ruled out as "probably too expensive at runtime".
//!
//! Thus the purpose and existence of this module is now evident -- this module
//! exists to identify what exactly goes into which adapter module. This will
//! evaluate the `GlobalInitializer` lists coming out of the `inline` pass and
//! insert `InstantiateModule` entries for where adapter modules should be
//! created.
//!
//! ## Partitioning adapter modules
//!
//! Currently this module does not attempt to be really all that fancy about
//! grouping adapters into adapter modules. The main idea is that most items
//! within an adapter module are likely to be close together since they're
//! theoretically going to be used for an instantiation of a core wasm module
//! just after the fused adapter was declared. With that in mind the current
//! algorithm is a one-pass approach to partitioning everything into adapter
//! modules.
//!
//! Adapters were indentified in-order as part of the inlining phase of
//! translation where we're guaranteed that once an adapter is identified
//! it can't depend on anything identified later. The pass implemented here is
//! to visit all transitive dependencies of an adapter. If one of the
//! dependencies of an adapter is an adapter in the current adapter module
//! being built then the current module is finished and a new adapter module is
//! started. This should quickly parition adapters into contiugous chunks of
//! their index space which can be in adapter modules together.
//!
//! There's probably more general algorithms for this but for now this should be
//! fast enough as it's "just" a linear pass. As we get more components over
//! time this may want to be revisited if too many adapter modules are being
//! created.
use crate::component::translate::*;
use crate::fact;
use crate::EntityType;
use std::collections::HashSet;
use wasmparser::WasmFeatures;
/// Metadata information about a fused adapter.
#[derive(Debug, Clone, Hash, Eq, PartialEq)]
pub struct Adapter {
/// The type used when the original core wasm function was lifted.
///
/// Note that this could be different than `lower_ty` (but still matches
/// according to subtyping rules).
pub lift_ty: TypeFuncIndex,
/// Canonical ABI options used when the function was lifted.
pub lift_options: AdapterOptions,
/// The type used when the function was lowered back into a core wasm
/// function.
///
/// Note that this could be different than `lift_ty` (but still matches
/// according to subtyping rules).
pub lower_ty: TypeFuncIndex,
/// Canonical ABI options used when the function was lowered.
pub lower_options: AdapterOptions,
/// The original core wasm function which was lifted.
pub func: dfg::CoreDef,
}
/// Configuration options which can be specified as part of the canonical ABI
/// in the component model.
#[derive(Debug, Clone, Hash, Eq, PartialEq)]
pub struct AdapterOptions {
/// The Wasmtime-assigned component instance index where the options were
/// originally specified.
pub instance: RuntimeComponentInstanceIndex,
/// How strings are encoded.
pub string_encoding: StringEncoding,
/// An optional memory definition supplied.
pub memory: Option<dfg::CoreExport<MemoryIndex>>,
/// If `memory` is specified, whether it's a 64-bit memory.
pub memory64: bool,
/// An optional definition of `realloc` to used.
pub realloc: Option<dfg::CoreDef>,
/// An optional definition of a `post-return` to use.
pub post_return: Option<dfg::CoreDef>,
}
impl<'data> Translator<'_, 'data> {
/// This is the entrypoint of functionality within this module which
/// performs all the work of identifying adapter usages and organizing
/// everything into adapter modules.
///
/// This will mutate the provided `component` in-place and fill out the dfg
/// metadata for adapter modules.
pub(super) fn partition_adapter_modules(&mut self, component: &mut dfg::ComponentDfg) {
// Visit each adapter, in order of its original definition, during the
// paritioning. This allows for the guarantee that dependencies are
// visited in a topological fashion ideally.
let mut state = PartitionAdapterModules::default();
for (id, adapter) in component.adapters.iter() {
state.adapter(component, id, adapter);
}
state.finish_adapter_module();
// Now that all adapters have been partitioned into modules this loop
// generates a core wasm module for each adapter module, translates
// the module using standard core wasm translation, and then fills out
// the dfg metadata for each adapter.
for (module_id, adapter_module) in state.adapter_modules.iter() {
let mut module = fact::Module::new(self.types, self.tunables.debug_adapter_modules);
let mut names = Vec::with_capacity(adapter_module.adapters.len());
for adapter in adapter_module.adapters.iter() {
let name = format!("adapter{}", adapter.as_u32());
module.adapt(&name, &component.adapters[*adapter]);
names.push(name);
}
let wasm = module.encode();
let imports = module.imports().to_vec();
// Extend the lifetime of the owned `wasm: Vec<u8>` on the stack to
// a higher scope defined by our original caller. That allows to
// transform `wasm` into `&'data [u8]` which is much easier to work
// with here.
let wasm = &*self.scope_vec.push(wasm);
if log::log_enabled!(log::Level::Trace) {
match wasmprinter::print_bytes(wasm) {
Ok(s) => log::trace!("generated adapter module:\n{}", s),
Err(e) => log::trace!("failed to print adapter module: {}", e),
}
}
// With the wasm binary this is then pushed through general
// translation, validation, etc. Note that multi-memory is
// specifically enabled here since the adapter module is highly
// likely to use that if anything is actually indirected through
// memory.
let mut validator = Validator::new_with_features(WasmFeatures {
multi_memory: true,
..*self.validator.features()
});
let translation = ModuleEnvironment::new(
self.tunables,
&mut validator,
self.types.module_types_builder(),
)
.translate(Parser::new(0), wasm)
.expect("invalid adapter module generated");
// Record, for each adapter in this adapter module, the module that
// the adapter was placed within as well as the function index of
// the adapter in the wasm module generated. Note that adapters are
// paritioned in-order so we're guaranteed to push the adapters
// in-order here as well. (with an assert to double-check)
for (adapter, name) in adapter_module.adapters.iter().zip(&names) {
let index = translation.module.exports[name];
let i = component.adapter_paritionings.push((module_id, index));
assert_eq!(i, *adapter);
}
// Finally the metadata necessary to instantiate this adapter
// module is also recorded in the dfg. This metadata will be used
// to generate `GlobalInitializer` entries during the linearization
// final phase.
assert_eq!(imports.len(), translation.module.imports().len());
let args = imports
.iter()
.zip(translation.module.imports())
.map(|(arg, (_, _, ty))| fact_import_to_core_def(component, arg, ty))
.collect::<Vec<_>>();
let static_index = self.static_modules.push(translation);
let id = component.adapter_modules.push((static_index, args.into()));
assert_eq!(id, module_id);
}
}
}
fn fact_import_to_core_def(
dfg: &mut dfg::ComponentDfg,
import: &fact::Import,
ty: EntityType,
) -> dfg::CoreDef {
match import {
fact::Import::CoreDef(def) => def.clone(),
fact::Import::Transcode {
op,
from,
from64,
to,
to64,
} => {
fn unwrap_memory(def: &dfg::CoreDef) -> dfg::CoreExport<MemoryIndex> {
match def {
dfg::CoreDef::Export(e) => e.clone().map_index(|i| match i {
EntityIndex::Memory(i) => i,
_ => unreachable!(),
}),
_ => unreachable!(),
}
}
let from = dfg.memories.push_uniq(unwrap_memory(from));
let to = dfg.memories.push_uniq(unwrap_memory(to));
dfg::CoreDef::Transcoder(dfg.transcoders.push_uniq(dfg::Transcoder {
op: *op,
from,
from64: *from64,
to,
to64: *to64,
signature: match ty {
EntityType::Function(signature) => signature,
_ => unreachable!(),
},
}))
}
}
}
#[derive(Default)]
struct PartitionAdapterModules {
/// The next adapter module that's being created. This may be empty.
next_module: AdapterModuleInProgress,
/// The set of items which are known to be defined which the adapter module
/// in progress is allowed to depend on.
defined_items: HashSet<Def>,
/// Finished adapter modules that won't be added to.
///
/// In theory items could be added to preexisting modules here but to keep
/// this pass linear this is never modified after insertion.
adapter_modules: PrimaryMap<dfg::AdapterModuleId, AdapterModuleInProgress>,
}
#[derive(Default)]
struct AdapterModuleInProgress {
/// The adapters which have been placed into this module.
adapters: Vec<dfg::AdapterId>,
}
/// Items that adapters can depend on.
///
/// Note that this is somewhat of a flat list and is intended to mostly model
/// core wasm instances which are side-effectful unlike other host items like
/// lowerings or always-trapping functions.
#[derive(Copy, Clone, Hash, Eq, PartialEq)]
enum Def {
Adapter(dfg::AdapterId),
Instance(dfg::InstanceId),
}
impl PartitionAdapterModules {
fn adapter(&mut self, dfg: &dfg::ComponentDfg, id: dfg::AdapterId, adapter: &Adapter) {
// Visit all dependencies of this adapter and if anything depends on
// the current adapter module in progress then a new adapter module is
// started.
self.adapter_options(dfg, &adapter.lift_options);
self.adapter_options(dfg, &adapter.lower_options);
self.core_def(dfg, &adapter.func);
// With all dependencies visited this adapter is added to the next
// module.
//
// This will either get added the preexisting module if this adapter
// didn't depend on anything in that module itself or it will be added
// to a fresh module if this adapter depended on something that the
// current adapter module created.
log::debug!("adding {id:?} to adapter module");
self.next_module.adapters.push(id);
}
fn adapter_options(&mut self, dfg: &dfg::ComponentDfg, options: &AdapterOptions) {
if let Some(memory) = &options.memory {
self.core_export(dfg, memory);
}
if let Some(def) = &options.realloc {
self.core_def(dfg, def);
}
if let Some(def) = &options.post_return {
self.core_def(dfg, def);
}
}
fn core_def(&mut self, dfg: &dfg::ComponentDfg, def: &dfg::CoreDef) {
match def {
dfg::CoreDef::Export(e) => self.core_export(dfg, e),
dfg::CoreDef::Adapter(id) => {
// If this adapter is already defined then we can safely depend
// on it with no consequences.
if self.defined_items.contains(&Def::Adapter(*id)) {
log::debug!("using existing adapter {id:?} ");
return;
}
log::debug!("splitting module needing {id:?} ");
// .. otherwise we found a case of an adapter depending on an
// adapter-module-in-progress meaning that the current adapter
// module must be completed and then a new one is started.
self.finish_adapter_module();
assert!(self.defined_items.contains(&Def::Adapter(*id)));
}
// These items can't transitively depend on an adapter
dfg::CoreDef::Lowered(_)
| dfg::CoreDef::AlwaysTrap(_)
| dfg::CoreDef::InstanceFlags(_) => {}
// should not be in the dfg yet
dfg::CoreDef::Transcoder(_) => unreachable!(),
}
}
fn core_export<T>(&mut self, dfg: &dfg::ComponentDfg, export: &dfg::CoreExport<T>) {
// When an adapter depends on an exported item it actually depends on
// the instance of that exported item. The caveat here is that the
// adapter not only depends on that particular instance, but also all
// prior instances to that instance as well because instance
// instantiation order is fixed and cannot change.
//
// To model this the instance index space is looped over here and while
// an instance hasn't been visited it's visited. Note that if an
// instance has already been visited then all prior instances have
// already been visited so there's no need to continue.
let mut instance = export.instance;
while self.defined_items.insert(Def::Instance(instance)) {
self.instance(dfg, instance);
if instance.as_u32() == 0 {
break;
}
instance = dfg::InstanceId::from_u32(instance.as_u32() - 1);
}
}
fn instance(&mut self, dfg: &dfg::ComponentDfg, instance: dfg::InstanceId) {
log::debug!("visiting instance {instance:?}");
// ... otherwise if this is the first timet he instance has been seen
// then the instances own arguments are recursively visited to find
// transitive dependencies on adapters.
match &dfg.instances[instance] {
dfg::Instance::Static(_, args) => {
for arg in args.iter() {
self.core_def(dfg, arg);
}
}
dfg::Instance::Import(_, args) => {
for (_, values) in args {
for (_, def) in values {
self.core_def(dfg, def);
}
}
}
}
}
fn finish_adapter_module(&mut self) {
if self.next_module.adapters.is_empty() {
return;
}
// Reset the state of the current module-in-progress and then flag all
// pending adapters as now defined since the current module is being
// committed.
let module = mem::take(&mut self.next_module);
for adapter in module.adapters.iter() {
let inserted = self.defined_items.insert(Def::Adapter(*adapter));
assert!(inserted);
}
let idx = self.adapter_modules.push(module);
log::debug!("finishing adapter module {idx:?}");
}
}