Implement strings in adapter modules (#4623)
* Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings
This commit is contained in:
@@ -20,7 +20,7 @@
|
||||
|
||||
use crate::component::dfg::CoreDef;
|
||||
use crate::component::{Adapter, AdapterOptions, ComponentTypes, StringEncoding, TypeFuncIndex};
|
||||
use crate::{FuncIndex, GlobalIndex, MemoryIndex};
|
||||
use crate::{FuncIndex, GlobalIndex, MemoryIndex, PrimaryMap};
|
||||
use std::collections::HashMap;
|
||||
use std::mem;
|
||||
use wasm_encoder::*;
|
||||
@@ -28,8 +28,11 @@ use wasm_encoder::*;
|
||||
mod core_types;
|
||||
mod signature;
|
||||
mod trampoline;
|
||||
mod transcode;
|
||||
mod traps;
|
||||
|
||||
pub use self::transcode::{FixedEncoding, Transcode};
|
||||
|
||||
/// Representation of an adapter module.
|
||||
pub struct Module<'a> {
|
||||
/// Whether or not debug code is inserted into the adapters themselves.
|
||||
@@ -46,9 +49,10 @@ pub struct Module<'a> {
|
||||
core_imports: ImportSection,
|
||||
/// Final list of imports that this module ended up using, in the same order
|
||||
/// as the imports in the import section.
|
||||
imports: Vec<CoreDef>,
|
||||
imports: Vec<Import>,
|
||||
/// Intern'd imports and what index they were assigned.
|
||||
imported: HashMap<CoreDef, u32>,
|
||||
imported_memories: PrimaryMap<MemoryIndex, CoreDef>,
|
||||
|
||||
// Current status of index spaces from the imports generated so far.
|
||||
core_funcs: u32,
|
||||
@@ -100,6 +104,7 @@ impl<'a> Module<'a> {
|
||||
imported: Default::default(),
|
||||
adapters: Default::default(),
|
||||
imports: Default::default(),
|
||||
imported_memories: PrimaryMap::new(),
|
||||
core_funcs: 0,
|
||||
core_memories: 0,
|
||||
core_globals: 0,
|
||||
@@ -246,19 +251,50 @@ impl<'a> Module<'a> {
|
||||
let ret = *cnt - 1;
|
||||
self.core_imports.import(module, name, ty);
|
||||
self.imported.insert(def.clone(), ret);
|
||||
self.imports.push(def);
|
||||
if let EntityType::Memory(_) = ty {
|
||||
self.imported_memories.push(def.clone());
|
||||
}
|
||||
self.imports.push(Import::CoreDef(def));
|
||||
ret
|
||||
}
|
||||
|
||||
/// Encodes this module into a WebAssembly binary.
|
||||
pub fn encode(&mut self) -> Vec<u8> {
|
||||
let mut types = mem::take(&mut self.core_types);
|
||||
let mut transcoders = transcode::Transcoders::new(self.core_funcs);
|
||||
let mut adapter_funcs = Vec::new();
|
||||
for adapter in self.adapters.iter() {
|
||||
adapter_funcs.push(trampoline::compile(
|
||||
self,
|
||||
&mut types,
|
||||
&mut transcoders,
|
||||
adapter,
|
||||
));
|
||||
}
|
||||
|
||||
// If any string transcoding imports were needed add imported items
|
||||
// associated with them.
|
||||
for (module, name, ty, transcoder) in transcoders.imports() {
|
||||
self.core_imports.import(module, name, ty);
|
||||
let from = self.imported_memories[transcoder.from_memory].clone();
|
||||
let to = self.imported_memories[transcoder.to_memory].clone();
|
||||
self.imports.push(Import::Transcode {
|
||||
op: transcoder.op,
|
||||
from,
|
||||
from64: transcoder.from_memory64,
|
||||
to,
|
||||
to64: transcoder.to_memory64,
|
||||
});
|
||||
self.core_funcs += 1;
|
||||
}
|
||||
|
||||
// Now that all functions are known as well as all imports the actual
|
||||
// bodies of all adapters are assembled into a final module.
|
||||
let mut funcs = FunctionSection::new();
|
||||
let mut code = CodeSection::new();
|
||||
let mut exports = ExportSection::new();
|
||||
let mut traps = traps::TrapSection::default();
|
||||
|
||||
let mut types = mem::take(&mut self.core_types);
|
||||
for adapter in self.adapters.iter() {
|
||||
for (adapter, (function, func_traps)) in self.adapters.iter().zip(adapter_funcs) {
|
||||
let idx = self.core_funcs + funcs.len();
|
||||
exports.export(&adapter.name, ExportKind::Func, idx);
|
||||
|
||||
@@ -266,7 +302,6 @@ impl<'a> Module<'a> {
|
||||
let ty = types.function(&signature.params, &signature.results);
|
||||
funcs.function(ty);
|
||||
|
||||
let (function, func_traps) = trampoline::compile(self, &mut types, adapter);
|
||||
code.raw(&function);
|
||||
traps.append(idx, func_traps);
|
||||
}
|
||||
@@ -290,11 +325,31 @@ impl<'a> Module<'a> {
|
||||
|
||||
/// Returns the imports that were used, in order, to create this adapter
|
||||
/// module.
|
||||
pub fn imports(&self) -> &[CoreDef] {
|
||||
pub fn imports(&self) -> &[Import] {
|
||||
&self.imports
|
||||
}
|
||||
}
|
||||
|
||||
/// Possible imports into an adapter module.
|
||||
#[derive(Clone)]
|
||||
pub enum Import {
|
||||
/// A definition required in the configuration of an `Adapter`.
|
||||
CoreDef(CoreDef),
|
||||
/// A transcoding function from the host to convert between string encodings.
|
||||
Transcode {
|
||||
/// The transcoding operation this performs.
|
||||
op: Transcode,
|
||||
/// The memory being read
|
||||
from: CoreDef,
|
||||
/// Whether or not `from` is a 64-bit memory
|
||||
from64: bool,
|
||||
/// The memory being written
|
||||
to: CoreDef,
|
||||
/// Whether or not `to` is a 64-bit memory
|
||||
to64: bool,
|
||||
},
|
||||
}
|
||||
|
||||
impl Options {
|
||||
fn ptr(&self) -> ValType {
|
||||
if self.memory64 {
|
||||
|
||||
Reference in New Issue
Block a user