* Improve the `component_api` fuzzer on a few dimensions * Update the generated component to use an adapter module. This involves two core wasm instances communicating with each other to test that data flows through everything correctly. The intention here is to fuzz the fused adapter compiler. String encoding options have been plumbed here to exercise differences in string encodings. * Use `Cow<'static, ...>` and `static` declarations for each static test case to try to cut down on rustc codegen time. * Add `Copy` to derivation of fuzzed enums to make `derive(Clone)` smaller. * Use `Store<Box<dyn Any>>` to try to cut down on codegen by monomorphizing fewer `Store<T>` implementation. * Add debug logging to print out what's flowing in and what's flowing out for debugging failures. * Improve `Debug` representation of dynamic value types to more closely match their Rust counterparts. * Fix a variant issue with adapter trampolines Previously the offset of the payload was calculated as the discriminant aligned up to the alignment of a singular case, but instead this needs to be aligned up to the alignment of all cases to ensure all cases start at the same location. * Fix a copy/paste error when copying masked integers A 32-bit load was actually doing a 16-bit load by accident since it was copied from the 16-bit load-and-mask case. * Fix f32/i64 conversions in adapter modules The adapter previously erroneously converted the f32 to f64 and then to i64, where instead it should go from f32 to i32 to i64. * Fix zero-sized flags in adapter modules This commit corrects the size calculation for zero-sized flags in adapter modules. cc #4592 * Fix a variant size calculation bug in adapters This fixes the same issue found with variants during normal host-side fuzzing earlier where the size of a variant needs to align up the summation of the discriminant and the maximum case size. * Implement memory growth in libc bump realloc Some fuzz-generated test cases are copying lists large enough to exceed one page of memory so bake in a `memory.grow` to the bump allocator as well. * Avoid adapters of exponential size This commit is an attempt to avoid adapters being exponentially sized with respect to the type hierarchy of the input. Previously all adaptation was done inline within each adapter which meant that if something was structured as `tuple<T, T, T, T, ...>` the translation of `T` would be inlined N times. For very deeply nested types this can quickly create an exponentially sized adapter with types of the form: (type $t0 (list u8)) (type $t1 (tuple $t0 $t0)) (type $t2 (tuple $t1 $t1)) (type $t3 (tuple $t2 $t2)) ;; ... where the translation of `t4` has 8 different copies of translating `t0`. This commit changes the translation of types through memory to almost always go through a helper function. The hope here is that it doesn't lose too much performance because types already reside in memory. This can still lead to exponentially sized adapter modules to a lesser degree where if the translation all happens on the "stack", e.g. via `variant`s and their flat representation then many copies of one translation could still be made. For now this commit at least gets the problem under control for fuzzing where fuzzing doesn't trivially find type hierarchies that take over a minute to codegen the adapter module. One of the main tricky parts of this implementation is that when a function is generated the index that it will be placed at in the final module is not known at that time. To solve this the encoded form of the `Call` instruction is saved in a relocation-style format where the `Call` isn't encoded but instead saved into a different area for encoding later. When the entire adapter module is encoded to wasm these pseudo-`Call` instructions are encoded as real instructions at that time. * Fix some memory64 issues with string encodings Introduced just before #4623 I had a few mistakes related to 64-bit memories and mixing 32/64-bit memories. * Actually insert into the `translate_mem_funcs` map This... was the whole point of having the map! * Assert memory growth succeeds in bump allocator
147 lines
5.4 KiB
Rust
147 lines
5.4 KiB
Rust
use crate::fact::core_types::CoreTypes;
|
|
use crate::MemoryIndex;
|
|
use serde::{Deserialize, Serialize};
|
|
use wasm_encoder::{EntityType, ValType};
|
|
|
|
#[derive(Copy, Clone, Hash, Eq, PartialEq)]
|
|
pub struct Transcoder {
|
|
pub from_memory: MemoryIndex,
|
|
pub from_memory64: bool,
|
|
pub to_memory: MemoryIndex,
|
|
pub to_memory64: bool,
|
|
pub op: Transcode,
|
|
}
|
|
|
|
/// Possible transcoding operations that must be provided by the host.
|
|
///
|
|
/// Note that each transcoding operation may have a unique signature depending
|
|
/// on the precise operation.
|
|
#[allow(missing_docs)]
|
|
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize)]
|
|
pub enum Transcode {
|
|
Copy(FixedEncoding),
|
|
Latin1ToUtf16,
|
|
Latin1ToUtf8,
|
|
Utf16ToCompactProbablyUtf16,
|
|
Utf16ToCompactUtf16,
|
|
Utf16ToLatin1,
|
|
Utf16ToUtf8,
|
|
Utf8ToCompactUtf16,
|
|
Utf8ToLatin1,
|
|
Utf8ToUtf16,
|
|
}
|
|
|
|
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize)]
|
|
#[allow(missing_docs)]
|
|
pub enum FixedEncoding {
|
|
Utf8,
|
|
Utf16,
|
|
Latin1,
|
|
}
|
|
|
|
impl Transcoder {
|
|
pub fn name(&self) -> String {
|
|
format!(
|
|
"{} (mem{} => mem{})",
|
|
self.op.desc(),
|
|
self.from_memory.as_u32(),
|
|
self.to_memory.as_u32(),
|
|
)
|
|
}
|
|
|
|
pub fn ty(&self, types: &mut CoreTypes) -> EntityType {
|
|
let from_ptr = if self.from_memory64 {
|
|
ValType::I64
|
|
} else {
|
|
ValType::I32
|
|
};
|
|
let to_ptr = if self.to_memory64 {
|
|
ValType::I64
|
|
} else {
|
|
ValType::I32
|
|
};
|
|
|
|
let ty = match self.op {
|
|
// These direct transcodings take the source pointer, the source
|
|
// code units, and the destination pointer.
|
|
//
|
|
// The memories being copied between are part of each intrinsic and
|
|
// the destination code units are the same as the source.
|
|
// Note that the pointers are dynamically guaranteed to be aligned
|
|
// and in-bounds for the code units length as defined by the string
|
|
// encoding.
|
|
Transcode::Copy(_) | Transcode::Latin1ToUtf16 => {
|
|
types.function(&[from_ptr, from_ptr, to_ptr], &[])
|
|
}
|
|
|
|
// Transcoding from utf8 to utf16 takes the from ptr/len as well as
|
|
// a destination. The destination is valid for len*2 bytes. The
|
|
// return value is how many code units were written to the
|
|
// destination.
|
|
Transcode::Utf8ToUtf16 => types.function(&[from_ptr, from_ptr, to_ptr], &[to_ptr]),
|
|
|
|
// Transcoding to utf8 as a smaller format takes all the parameters
|
|
// and returns the amount of space consumed in the src/destination
|
|
Transcode::Utf16ToUtf8 | Transcode::Latin1ToUtf8 => {
|
|
types.function(&[from_ptr, from_ptr, to_ptr, to_ptr], &[from_ptr, to_ptr])
|
|
}
|
|
|
|
// The return type is a tagged length which indicates which was
|
|
// used
|
|
Transcode::Utf16ToCompactProbablyUtf16 => {
|
|
types.function(&[from_ptr, from_ptr, to_ptr], &[to_ptr])
|
|
}
|
|
|
|
// The initial step of transcoding from a fixed format to a compact
|
|
// format. Takes the ptr/len of the source the the destination
|
|
// pointer. The destination length is implicitly the same. Returns
|
|
// how many code units were consumed in the source, which is also
|
|
// how many bytes were written to the destination.
|
|
Transcode::Utf8ToLatin1 | Transcode::Utf16ToLatin1 => {
|
|
types.function(&[from_ptr, from_ptr, to_ptr], &[from_ptr, to_ptr])
|
|
}
|
|
|
|
// The final step of transcoding to a compact format when the fixed
|
|
// transcode has failed. This takes the ptr/len of the source that's
|
|
// remaining to transcode. Then this takes the destination ptr/len
|
|
// as well as the destination bytes written so far with latin1.
|
|
// Finally this returns the number of code units written to the
|
|
// destination.
|
|
Transcode::Utf8ToCompactUtf16 | Transcode::Utf16ToCompactUtf16 => {
|
|
types.function(&[from_ptr, from_ptr, to_ptr, to_ptr, to_ptr], &[to_ptr])
|
|
}
|
|
};
|
|
EntityType::Function(ty)
|
|
}
|
|
}
|
|
|
|
impl Transcode {
|
|
/// Returns a human-readable description for this transcoding operation.
|
|
pub fn desc(&self) -> &'static str {
|
|
match self {
|
|
Transcode::Copy(FixedEncoding::Utf8) => "utf8-to-utf8",
|
|
Transcode::Copy(FixedEncoding::Utf16) => "utf16-to-utf16",
|
|
Transcode::Copy(FixedEncoding::Latin1) => "latin1-to-latin1",
|
|
Transcode::Latin1ToUtf16 => "latin1-to-utf16",
|
|
Transcode::Latin1ToUtf8 => "latin1-to-utf8",
|
|
Transcode::Utf16ToCompactProbablyUtf16 => "utf16-to-compact-probably-utf16",
|
|
Transcode::Utf16ToCompactUtf16 => "utf16-to-compact-utf16",
|
|
Transcode::Utf16ToLatin1 => "utf16-to-latin1",
|
|
Transcode::Utf16ToUtf8 => "utf16-to-utf8",
|
|
Transcode::Utf8ToCompactUtf16 => "utf8-to-compact-utf16",
|
|
Transcode::Utf8ToLatin1 => "utf8-to-latin1",
|
|
Transcode::Utf8ToUtf16 => "utf8-to-utf16",
|
|
}
|
|
}
|
|
}
|
|
|
|
impl FixedEncoding {
|
|
pub(crate) fn width(&self) -> u8 {
|
|
match self {
|
|
FixedEncoding::Utf8 => 1,
|
|
FixedEncoding::Utf16 => 2,
|
|
FixedEncoding::Latin1 => 1,
|
|
}
|
|
}
|
|
}
|