* Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings
255 lines
8.5 KiB
Rust
255 lines
8.5 KiB
Rust
//! A simple fuzzer for FACT
|
|
//!
|
|
//! This is an intentionally small fuzzer which is intended to only really be
|
|
//! used during the development of FACT itself when generating adapter modules.
|
|
//! This creates arbitrary adapter signatures and then generates the required
|
|
//! trampoline for that adapter ensuring that the final output wasm module is a
|
|
//! valid wasm module. This doesn't actually validate anything about the
|
|
//! correctness of the trampoline, only that it's valid wasm.
|
|
|
|
#![no_main]
|
|
|
|
use arbitrary::Arbitrary;
|
|
use component_fuzz_util::Type as ValType;
|
|
use libfuzzer_sys::fuzz_target;
|
|
use wasmparser::{Validator, WasmFeatures};
|
|
use wasmtime_environ::component::*;
|
|
use wasmtime_environ::fact::Module;
|
|
|
|
#[derive(Arbitrary, Debug)]
|
|
struct GenAdapterModule {
|
|
debug: bool,
|
|
adapters: Vec<GenAdapter>,
|
|
}
|
|
|
|
#[derive(Arbitrary, Debug)]
|
|
struct GenAdapter {
|
|
ty: FuncType,
|
|
post_return: bool,
|
|
lift_memory64: bool,
|
|
lower_memory64: bool,
|
|
lift_encoding: GenStringEncoding,
|
|
lower_encoding: GenStringEncoding,
|
|
}
|
|
|
|
#[derive(Arbitrary, Debug)]
|
|
struct FuncType {
|
|
params: Vec<ValType>,
|
|
result: ValType,
|
|
}
|
|
|
|
#[derive(Copy, Clone, Arbitrary, Debug)]
|
|
enum GenStringEncoding {
|
|
Utf8,
|
|
Utf16,
|
|
CompactUtf16,
|
|
}
|
|
|
|
fuzz_target!(|module: GenAdapterModule| {
|
|
target(module);
|
|
});
|
|
|
|
fn target(module: GenAdapterModule) {
|
|
drop(env_logger::try_init());
|
|
|
|
let mut types = ComponentTypesBuilder::default();
|
|
|
|
// Manufactures a unique `CoreDef` so all function imports get unique
|
|
// function imports.
|
|
let mut next_def = 0;
|
|
let mut dummy_def = || {
|
|
next_def += 1;
|
|
dfg::CoreDef::Adapter(dfg::AdapterId::from_u32(next_def))
|
|
};
|
|
|
|
// Manufactures a `CoreExport` for a memory with the shape specified. Note
|
|
// that we can't import as many memories as functions so these are
|
|
// intentionally limited. Once a handful of memories are generated of each
|
|
// type then they start getting reused.
|
|
let mut next_memory = 0;
|
|
let mut memories32 = Vec::new();
|
|
let mut memories64 = Vec::new();
|
|
let mut dummy_memory = |memory64: bool| {
|
|
let dst = if memory64 {
|
|
&mut memories64
|
|
} else {
|
|
&mut memories32
|
|
};
|
|
let idx = if dst.len() < 5 {
|
|
next_memory += 1;
|
|
dst.push(next_memory - 1);
|
|
next_memory - 1
|
|
} else {
|
|
dst[0]
|
|
};
|
|
dfg::CoreExport {
|
|
instance: dfg::InstanceId::from_u32(idx),
|
|
item: ExportItem::Name(String::new()),
|
|
}
|
|
};
|
|
|
|
let mut adapters = Vec::new();
|
|
for adapter in module.adapters.iter() {
|
|
let mut params = Vec::new();
|
|
for param in adapter.ty.params.iter() {
|
|
params.push((None, intern(&mut types, param)));
|
|
}
|
|
let result = intern(&mut types, &adapter.ty.result);
|
|
let signature = types.add_func_type(TypeFunc {
|
|
params: params.into(),
|
|
result,
|
|
});
|
|
adapters.push(Adapter {
|
|
lift_ty: signature,
|
|
lower_ty: signature,
|
|
lower_options: AdapterOptions {
|
|
instance: RuntimeComponentInstanceIndex::from_u32(0),
|
|
string_encoding: adapter.lower_encoding.into(),
|
|
memory64: adapter.lower_memory64,
|
|
// Pessimistically assume that memory/realloc are going to be
|
|
// required for this trampoline and provide it. Avoids doing
|
|
// calculations to figure out whether they're necessary and
|
|
// simplifies the fuzzer here without reducing coverage within FACT
|
|
// itself.
|
|
memory: Some(dummy_memory(adapter.lower_memory64)),
|
|
realloc: Some(dummy_def()),
|
|
// Lowering never allows `post-return`
|
|
post_return: None,
|
|
},
|
|
lift_options: AdapterOptions {
|
|
instance: RuntimeComponentInstanceIndex::from_u32(1),
|
|
string_encoding: adapter.lift_encoding.into(),
|
|
memory64: adapter.lift_memory64,
|
|
memory: Some(dummy_memory(adapter.lift_memory64)),
|
|
realloc: Some(dummy_def()),
|
|
post_return: if adapter.post_return {
|
|
Some(dummy_def())
|
|
} else {
|
|
None
|
|
},
|
|
},
|
|
func: dummy_def(),
|
|
});
|
|
}
|
|
let types = types.finish();
|
|
let mut fact_module = Module::new(&types, module.debug);
|
|
for (i, adapter) in adapters.iter().enumerate() {
|
|
fact_module.adapt(&format!("adapter{i}"), adapter);
|
|
}
|
|
let wasm = fact_module.encode();
|
|
let result = Validator::new_with_features(WasmFeatures {
|
|
multi_memory: true,
|
|
memory64: true,
|
|
..WasmFeatures::default()
|
|
})
|
|
.validate_all(&wasm);
|
|
|
|
let err = match result {
|
|
Ok(_) => return,
|
|
Err(e) => e,
|
|
};
|
|
eprintln!("invalid wasm module: {err:?}");
|
|
for adapter in module.adapters.iter() {
|
|
eprintln!("adapter: {adapter:?}");
|
|
}
|
|
std::fs::write("invalid.wasm", &wasm).unwrap();
|
|
match wasmprinter::print_bytes(&wasm) {
|
|
Ok(s) => std::fs::write("invalid.wat", &s).unwrap(),
|
|
Err(_) => drop(std::fs::remove_file("invalid.wat")),
|
|
}
|
|
|
|
panic!()
|
|
}
|
|
|
|
fn intern(types: &mut ComponentTypesBuilder, ty: &ValType) -> InterfaceType {
|
|
match ty {
|
|
ValType::Unit => InterfaceType::Unit,
|
|
ValType::Bool => InterfaceType::Bool,
|
|
ValType::U8 => InterfaceType::U8,
|
|
ValType::S8 => InterfaceType::S8,
|
|
ValType::U16 => InterfaceType::U16,
|
|
ValType::S16 => InterfaceType::S16,
|
|
ValType::U32 => InterfaceType::U32,
|
|
ValType::S32 => InterfaceType::S32,
|
|
ValType::U64 => InterfaceType::U64,
|
|
ValType::S64 => InterfaceType::S64,
|
|
ValType::Float32 => InterfaceType::Float32,
|
|
ValType::Float64 => InterfaceType::Float64,
|
|
ValType::Char => InterfaceType::Char,
|
|
ValType::String => InterfaceType::String,
|
|
ValType::List(ty) => {
|
|
let ty = intern(types, ty);
|
|
InterfaceType::List(types.add_interface_type(ty))
|
|
}
|
|
ValType::Record(tys) => {
|
|
let ty = TypeRecord {
|
|
fields: tys
|
|
.iter()
|
|
.enumerate()
|
|
.map(|(i, ty)| RecordField {
|
|
name: format!("f{i}"),
|
|
ty: intern(types, ty),
|
|
})
|
|
.collect(),
|
|
};
|
|
InterfaceType::Record(types.add_record_type(ty))
|
|
}
|
|
ValType::Flags(size) => {
|
|
let ty = TypeFlags {
|
|
names: (0..size.as_usize()).map(|i| format!("f{i}")).collect(),
|
|
};
|
|
InterfaceType::Flags(types.add_flags_type(ty))
|
|
}
|
|
ValType::Tuple(tys) => {
|
|
let ty = TypeTuple {
|
|
types: tys.iter().map(|ty| intern(types, ty)).collect(),
|
|
};
|
|
InterfaceType::Tuple(types.add_tuple_type(ty))
|
|
}
|
|
ValType::Variant(cases) => {
|
|
let ty = TypeVariant {
|
|
cases: cases
|
|
.iter()
|
|
.enumerate()
|
|
.map(|(i, ty)| VariantCase {
|
|
name: format!("c{i}"),
|
|
ty: intern(types, ty),
|
|
})
|
|
.collect(),
|
|
};
|
|
InterfaceType::Variant(types.add_variant_type(ty))
|
|
}
|
|
ValType::Union(tys) => {
|
|
let ty = TypeUnion {
|
|
types: tys.iter().map(|ty| intern(types, ty)).collect(),
|
|
};
|
|
InterfaceType::Union(types.add_union_type(ty))
|
|
}
|
|
ValType::Enum(size) => {
|
|
let ty = TypeEnum {
|
|
names: (0..size.as_usize()).map(|i| format!("c{i}")).collect(),
|
|
};
|
|
InterfaceType::Enum(types.add_enum_type(ty))
|
|
}
|
|
ValType::Option(ty) => {
|
|
let ty = intern(types, ty);
|
|
InterfaceType::Option(types.add_interface_type(ty))
|
|
}
|
|
ValType::Expected { ok, err } => {
|
|
let ok = intern(types, ok);
|
|
let err = intern(types, err);
|
|
InterfaceType::Expected(types.add_expected_type(TypeExpected { ok, err }))
|
|
}
|
|
}
|
|
}
|
|
|
|
impl From<GenStringEncoding> for StringEncoding {
|
|
fn from(gen: GenStringEncoding) -> StringEncoding {
|
|
match gen {
|
|
GenStringEncoding::Utf8 => StringEncoding::Utf8,
|
|
GenStringEncoding::Utf16 => StringEncoding::Utf16,
|
|
GenStringEncoding::CompactUtf16 => StringEncoding::CompactUtf16,
|
|
}
|
|
}
|
|
}
|