Implement strings in adapter modules (#4623)
* Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings
This commit is contained in:
@@ -48,3 +48,25 @@ pub use self::info::*;
|
||||
pub use self::translate::*;
|
||||
pub use self::types::*;
|
||||
pub use self::vmcomponent_offsets::*;
|
||||
|
||||
/// Helper macro to iterate over the transcoders that the host will provide
|
||||
/// adapter modules through libcalls.
|
||||
#[macro_export]
|
||||
macro_rules! foreach_transcoder {
|
||||
($mac:ident) => {
|
||||
$mac! {
|
||||
utf8_to_utf8(src: ptr_u8, len: size, dst: ptr_u8);
|
||||
utf16_to_utf16(src: ptr_u16, len: size, dst: ptr_u16);
|
||||
latin1_to_latin1(src: ptr_u8, len: size, dst: ptr_u8);
|
||||
latin1_to_utf16(src: ptr_u8, len: size, dst: ptr_u16);
|
||||
utf8_to_utf16(src: ptr_u8, len: size, dst: ptr_u16) -> size;
|
||||
utf16_to_utf8(src: ptr_u16, src_len: size, dst: ptr_u8, dst_len: size) -> size_pair;
|
||||
latin1_to_utf8(src: ptr_u8, src_len: size, dst: ptr_u8, dst_len: size) -> size_pair;
|
||||
utf16_to_compact_probably_utf16(src: ptr_u16, len: size, dst: ptr_u16) -> size;
|
||||
utf8_to_latin1(src: ptr_u8, len: size, dst: ptr_u8) -> size_pair;
|
||||
utf16_to_latin1(src: ptr_u16, len: size, dst: ptr_u8) -> size_pair;
|
||||
utf8_to_compact_utf16(src: ptr_u8, src_len: size, dst: ptr_u16, dst_len: size, bytes_so_far: size) -> size;
|
||||
utf16_to_compact_utf16(src: ptr_u16, src_len: size, dst: ptr_u16, dst_len: size, bytes_so_far: size) -> size;
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
use crate::component::{
|
||||
Component, ComponentTypes, LowerImport, LoweredIndex, RuntimeAlwaysTrapIndex,
|
||||
RuntimeTranscoderIndex, Transcoder,
|
||||
};
|
||||
use crate::{PrimaryMap, SignatureIndex, Trampoline, WasmFuncType};
|
||||
use anyhow::Result;
|
||||
@@ -61,6 +62,24 @@ pub trait ComponentCompiler: Send + Sync {
|
||||
/// `canon lift`'d function immediately being `canon lower`'d.
|
||||
fn compile_always_trap(&self, ty: &WasmFuncType) -> Result<Box<dyn Any + Send>>;
|
||||
|
||||
/// Compiles a trampoline to implement string transcoding from adapter
|
||||
/// modules.
|
||||
///
|
||||
/// The generated trampoline will invoke the `transcoder.op` libcall with
|
||||
/// the various memory configuration provided in `transcoder`. This is used
|
||||
/// to pass raw pointers to host functions to avoid the host having to deal
|
||||
/// with base pointers, offsets, memory32-vs-64, etc.
|
||||
///
|
||||
/// Note that all bounds checks for memories are present in adapters
|
||||
/// themselves, and the host libcalls simply assume that the pointers are
|
||||
/// valid.
|
||||
fn compile_transcoder(
|
||||
&self,
|
||||
component: &Component,
|
||||
transcoder: &Transcoder,
|
||||
types: &ComponentTypes,
|
||||
) -> Result<Box<dyn Any + Send>>;
|
||||
|
||||
/// Emits the `lowerings` and `trampolines` specified into the in-progress
|
||||
/// ELF object specified by `obj`.
|
||||
///
|
||||
@@ -73,11 +92,13 @@ pub trait ComponentCompiler: Send + Sync {
|
||||
&self,
|
||||
lowerings: PrimaryMap<LoweredIndex, Box<dyn Any + Send>>,
|
||||
always_trap: PrimaryMap<RuntimeAlwaysTrapIndex, Box<dyn Any + Send>>,
|
||||
transcoders: PrimaryMap<RuntimeTranscoderIndex, Box<dyn Any + Send>>,
|
||||
tramplines: Vec<(SignatureIndex, Box<dyn Any + Send>)>,
|
||||
obj: &mut Object<'static>,
|
||||
) -> Result<(
|
||||
PrimaryMap<LoweredIndex, FunctionInfo>,
|
||||
PrimaryMap<RuntimeAlwaysTrapIndex, AlwaysTrapInfo>,
|
||||
PrimaryMap<RuntimeTranscoderIndex, FunctionInfo>,
|
||||
Vec<Trampoline>,
|
||||
)>;
|
||||
}
|
||||
|
||||
@@ -71,6 +71,9 @@ pub struct ComponentDfg {
|
||||
/// out of the inlining pass of translation.
|
||||
pub adapters: Intern<AdapterId, Adapter>,
|
||||
|
||||
/// Metadata about string transcoders needed by adapter modules.
|
||||
pub transcoders: Intern<TranscoderId, Transcoder>,
|
||||
|
||||
/// Metadata about all known core wasm instances created.
|
||||
///
|
||||
/// This is mostly an ordered list and is not deduplicated based on contents
|
||||
@@ -125,6 +128,7 @@ id! {
|
||||
pub struct PostReturnId(u32);
|
||||
pub struct AlwaysTrapId(u32);
|
||||
pub struct AdapterModuleId(u32);
|
||||
pub struct TranscoderId(u32);
|
||||
}
|
||||
|
||||
/// Same as `info::InstantiateModule`
|
||||
@@ -158,6 +162,7 @@ pub enum CoreDef {
|
||||
Lowered(LowerImportId),
|
||||
AlwaysTrap(AlwaysTrapId),
|
||||
InstanceFlags(RuntimeComponentInstanceIndex),
|
||||
Transcoder(TranscoderId),
|
||||
|
||||
/// This is a special variant not present in `info::CoreDef` which
|
||||
/// represents that this definition refers to a fused adapter function. This
|
||||
@@ -220,6 +225,18 @@ pub struct CanonicalOptions {
|
||||
pub post_return: Option<PostReturnId>,
|
||||
}
|
||||
|
||||
/// Same as `info::Transcoder`
|
||||
#[derive(Clone, Hash, Eq, PartialEq)]
|
||||
#[allow(missing_docs)]
|
||||
pub struct Transcoder {
|
||||
pub op: Transcode,
|
||||
pub from: MemoryId,
|
||||
pub from64: bool,
|
||||
pub to: MemoryId,
|
||||
pub to64: bool,
|
||||
pub signature: SignatureIndex,
|
||||
}
|
||||
|
||||
/// A helper structure to "intern" and deduplicate values of type `V` with an
|
||||
/// identifying key `K`.
|
||||
///
|
||||
@@ -292,6 +309,7 @@ impl ComponentDfg {
|
||||
runtime_instances: Default::default(),
|
||||
runtime_always_trap: Default::default(),
|
||||
runtime_lowerings: Default::default(),
|
||||
runtime_transcoders: Default::default(),
|
||||
};
|
||||
|
||||
// First the instances are all processed for instantiation. This will,
|
||||
@@ -324,6 +342,7 @@ impl ComponentDfg {
|
||||
num_runtime_instances: linearize.runtime_instances.len() as u32,
|
||||
num_always_trap: linearize.runtime_always_trap.len() as u32,
|
||||
num_lowerings: linearize.runtime_lowerings.len() as u32,
|
||||
num_transcoders: linearize.runtime_transcoders.len() as u32,
|
||||
|
||||
imports: self.imports,
|
||||
import_types: self.import_types,
|
||||
@@ -342,6 +361,7 @@ struct LinearizeDfg<'a> {
|
||||
runtime_instances: HashMap<RuntimeInstance, RuntimeInstanceIndex>,
|
||||
runtime_always_trap: HashMap<AlwaysTrapId, RuntimeAlwaysTrapIndex>,
|
||||
runtime_lowerings: HashMap<LowerImportId, LoweredIndex>,
|
||||
runtime_transcoders: HashMap<TranscoderId, RuntimeTranscoderIndex>,
|
||||
}
|
||||
|
||||
#[derive(Copy, Clone, Hash, Eq, PartialEq)]
|
||||
@@ -460,6 +480,7 @@ impl LinearizeDfg<'_> {
|
||||
CoreDef::Lowered(id) => info::CoreDef::Lowered(self.runtime_lowering(*id)),
|
||||
CoreDef::InstanceFlags(i) => info::CoreDef::InstanceFlags(*i),
|
||||
CoreDef::Adapter(id) => info::CoreDef::Export(self.adapter(*id)),
|
||||
CoreDef::Transcoder(id) => info::CoreDef::Transcoder(self.runtime_transcoder(*id)),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -497,6 +518,35 @@ impl LinearizeDfg<'_> {
|
||||
)
|
||||
}
|
||||
|
||||
fn runtime_transcoder(&mut self, id: TranscoderId) -> RuntimeTranscoderIndex {
|
||||
self.intern(
|
||||
id,
|
||||
|me| &mut me.runtime_transcoders,
|
||||
|me, id| {
|
||||
let info = &me.dfg.transcoders[id];
|
||||
(
|
||||
info.op,
|
||||
me.runtime_memory(info.from),
|
||||
info.from64,
|
||||
me.runtime_memory(info.to),
|
||||
info.to64,
|
||||
info.signature,
|
||||
)
|
||||
},
|
||||
|index, (op, from, from64, to, to64, signature)| {
|
||||
GlobalInitializer::Transcoder(info::Transcoder {
|
||||
index,
|
||||
op,
|
||||
from,
|
||||
from64,
|
||||
to,
|
||||
to64,
|
||||
signature,
|
||||
})
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
fn core_export<T>(&mut self, export: &CoreExport<T>) -> info::CoreExport<T>
|
||||
where
|
||||
T: Clone,
|
||||
|
||||
@@ -147,6 +147,10 @@ pub struct Component {
|
||||
/// The number of functions which "always trap" used to implement
|
||||
/// `canon.lower` of `canon.lift`'d functions within the same component.
|
||||
pub num_always_trap: u32,
|
||||
|
||||
/// The number of host transcoder functions needed for strings in adapter
|
||||
/// modules.
|
||||
pub num_transcoders: u32,
|
||||
}
|
||||
|
||||
/// GlobalInitializer instructions to get processed when instantiating a component
|
||||
@@ -207,6 +211,11 @@ pub enum GlobalInitializer {
|
||||
|
||||
/// Same as `SaveModuleUpvar`, but for imports.
|
||||
SaveModuleImport(RuntimeImportIndex),
|
||||
|
||||
/// Similar to `ExtractMemory` and friends and indicates that a
|
||||
/// `VMCallerCheckedAnyfunc` needs to be initialized for a transcoder
|
||||
/// function and this will later be used to instantiate an adapter module.
|
||||
Transcoder(Transcoder),
|
||||
}
|
||||
|
||||
/// Metadata for extraction of a memory of what's being extracted and where it's
|
||||
@@ -316,6 +325,9 @@ pub enum CoreDef {
|
||||
/// This is a reference to a wasm global which represents the
|
||||
/// runtime-managed flags for a wasm instance.
|
||||
InstanceFlags(RuntimeComponentInstanceIndex),
|
||||
/// This refers to a cranelift-generated trampoline which calls to a
|
||||
/// host-defined transcoding function.
|
||||
Transcoder(RuntimeTranscoderIndex),
|
||||
}
|
||||
|
||||
impl<T> From<CoreExport<T>> for CoreDef
|
||||
@@ -433,3 +445,42 @@ pub enum StringEncoding {
|
||||
Utf16,
|
||||
CompactUtf16,
|
||||
}
|
||||
|
||||
/// Information about a string transcoding function required by an adapter
|
||||
/// module.
|
||||
///
|
||||
/// A transcoder is used when strings are passed between adapter modules,
|
||||
/// optionally changing string encodings at the same time. The transcoder is
|
||||
/// implemented in a few different layers:
|
||||
///
|
||||
/// * Each generated adapter module has some glue around invoking the transcoder
|
||||
/// represented by this item. This involves bounds-checks and handling
|
||||
/// `realloc` for example.
|
||||
/// * Each transcoder gets a cranelift-generated trampoline which has the
|
||||
/// appropriate signature for the adapter module in question. Existence of
|
||||
/// this initializer indicates that this should be compiled by Cranelift.
|
||||
/// * The cranelift-generated trampoline will invoke a "transcoder libcall"
|
||||
/// which is implemented natively in Rust that has a signature independent of
|
||||
/// memory64 configuration options for example.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Hash, Eq, PartialEq)]
|
||||
pub struct Transcoder {
|
||||
/// The index of the transcoder being defined and initialized.
|
||||
///
|
||||
/// This indicates which `VMCallerCheckedAnyfunc` slot is written to in a
|
||||
/// `VMComponentContext`.
|
||||
pub index: RuntimeTranscoderIndex,
|
||||
/// The transcoding operation being performed.
|
||||
pub op: Transcode,
|
||||
/// The linear memory that the string is being read from.
|
||||
pub from: RuntimeMemoryIndex,
|
||||
/// Whether or not the source linear memory is 64-bit or not.
|
||||
pub from64: bool,
|
||||
/// The linear memory that the string is being written to.
|
||||
pub to: RuntimeMemoryIndex,
|
||||
/// Whether or not the destination linear memory is 64-bit or not.
|
||||
pub to64: bool,
|
||||
/// The wasm signature of the cranelift-generated trampoline.
|
||||
pub signature: SignatureIndex,
|
||||
}
|
||||
|
||||
pub use crate::fact::{FixedEncoding, Transcode};
|
||||
|
||||
@@ -116,7 +116,8 @@
|
||||
//! created.
|
||||
|
||||
use crate::component::translate::*;
|
||||
use crate::fact::Module;
|
||||
use crate::fact;
|
||||
use crate::EntityType;
|
||||
use std::collections::HashSet;
|
||||
use wasmparser::WasmFeatures;
|
||||
|
||||
@@ -183,7 +184,7 @@ impl<'data> Translator<'_, 'data> {
|
||||
// the module using standard core wasm translation, and then fills out
|
||||
// the dfg metadata for each adapter.
|
||||
for (module_id, adapter_module) in state.adapter_modules.iter() {
|
||||
let mut module = Module::new(
|
||||
let mut module = fact::Module::new(
|
||||
self.types.component_types(),
|
||||
self.tunables.debug_adapter_modules,
|
||||
);
|
||||
@@ -194,7 +195,7 @@ impl<'data> Translator<'_, 'data> {
|
||||
names.push(name);
|
||||
}
|
||||
let wasm = module.encode();
|
||||
let args = module.imports().to_vec();
|
||||
let imports = module.imports().to_vec();
|
||||
|
||||
// Extend the lifetime of the owned `wasm: Vec<u8>` on the stack to
|
||||
// a higher scope defined by our original caller. That allows to
|
||||
@@ -240,6 +241,12 @@ impl<'data> Translator<'_, 'data> {
|
||||
// module is also recorded in the dfg. This metadata will be used
|
||||
// to generate `GlobalInitializer` entries during the linearization
|
||||
// final phase.
|
||||
assert_eq!(imports.len(), translation.module.imports().len());
|
||||
let args = imports
|
||||
.iter()
|
||||
.zip(translation.module.imports())
|
||||
.map(|(arg, (_, _, ty))| fact_import_to_core_def(component, arg, ty))
|
||||
.collect::<Vec<_>>();
|
||||
let static_index = self.static_modules.push(translation);
|
||||
let id = component.adapter_modules.push((static_index, args.into()));
|
||||
assert_eq!(id, module_id);
|
||||
@@ -247,6 +254,47 @@ impl<'data> Translator<'_, 'data> {
|
||||
}
|
||||
}
|
||||
|
||||
fn fact_import_to_core_def(
|
||||
dfg: &mut dfg::ComponentDfg,
|
||||
import: &fact::Import,
|
||||
ty: EntityType,
|
||||
) -> dfg::CoreDef {
|
||||
match import {
|
||||
fact::Import::CoreDef(def) => def.clone(),
|
||||
fact::Import::Transcode {
|
||||
op,
|
||||
from,
|
||||
from64,
|
||||
to,
|
||||
to64,
|
||||
} => {
|
||||
fn unwrap_memory(def: &dfg::CoreDef) -> dfg::CoreExport<MemoryIndex> {
|
||||
match def {
|
||||
dfg::CoreDef::Export(e) => e.clone().map_index(|i| match i {
|
||||
EntityIndex::Memory(i) => i,
|
||||
_ => unreachable!(),
|
||||
}),
|
||||
_ => unreachable!(),
|
||||
}
|
||||
}
|
||||
|
||||
let from = dfg.memories.push_uniq(unwrap_memory(from));
|
||||
let to = dfg.memories.push_uniq(unwrap_memory(to));
|
||||
dfg::CoreDef::Transcoder(dfg.transcoders.push_uniq(dfg::Transcoder {
|
||||
op: *op,
|
||||
from,
|
||||
from64: *from64,
|
||||
to,
|
||||
to64: *to64,
|
||||
signature: match ty {
|
||||
EntityType::Function(signature) => signature,
|
||||
_ => unreachable!(),
|
||||
},
|
||||
}))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Default)]
|
||||
struct PartitionAdapterModules {
|
||||
/// The next adapter module that's being created. This may be empty.
|
||||
@@ -336,6 +384,9 @@ impl PartitionAdapterModules {
|
||||
dfg::CoreDef::Lowered(_)
|
||||
| dfg::CoreDef::AlwaysTrap(_)
|
||||
| dfg::CoreDef::InstanceFlags(_) => {}
|
||||
|
||||
// should not be in the dfg yet
|
||||
dfg::CoreDef::Transcoder(_) => unreachable!(),
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -166,6 +166,13 @@ indices! {
|
||||
/// Index that represents an exported module from a component since that's
|
||||
/// currently the only use for saving the entire module state at runtime.
|
||||
pub struct RuntimeModuleIndex(u32);
|
||||
|
||||
/// Index into the list of transcoders identified during compilation.
|
||||
///
|
||||
/// This is used to index the `VMCallerCheckedAnyfunc` slots reserved for
|
||||
/// string encoders which reference linear memories defined within a
|
||||
/// component.
|
||||
pub struct RuntimeTranscoderIndex(u32);
|
||||
}
|
||||
|
||||
// Reexport for convenience some core-wasm indices which are also used in the
|
||||
|
||||
@@ -2,11 +2,13 @@
|
||||
//
|
||||
// struct VMComponentContext {
|
||||
// magic: u32,
|
||||
// transcode_libcalls: &'static VMBuiltinTranscodeArray,
|
||||
// store: *mut dyn Store,
|
||||
// limits: *const VMRuntimeLimits,
|
||||
// flags: [VMGlobalDefinition; component.num_runtime_component_instances],
|
||||
// lowering_anyfuncs: [VMCallerCheckedAnyfunc; component.num_lowerings],
|
||||
// always_trap_anyfuncs: [VMCallerCheckedAnyfunc; component.num_always_trap],
|
||||
// transcoder_anyfuncs: [VMCallerCheckedAnyfunc; component.num_transcoders],
|
||||
// lowerings: [VMLowering; component.num_lowerings],
|
||||
// memories: [*mut VMMemoryDefinition; component.num_memories],
|
||||
// reallocs: [*mut VMCallerCheckedAnyfunc; component.num_reallocs],
|
||||
@@ -15,7 +17,7 @@
|
||||
|
||||
use crate::component::{
|
||||
Component, LoweredIndex, RuntimeAlwaysTrapIndex, RuntimeComponentInstanceIndex,
|
||||
RuntimeMemoryIndex, RuntimePostReturnIndex, RuntimeReallocIndex,
|
||||
RuntimeMemoryIndex, RuntimePostReturnIndex, RuntimeReallocIndex, RuntimeTranscoderIndex,
|
||||
};
|
||||
use crate::PtrSize;
|
||||
|
||||
@@ -57,14 +59,18 @@ pub struct VMComponentOffsets<P> {
|
||||
/// Number of "always trap" functions which have their
|
||||
/// `VMCallerCheckedAnyfunc` stored inline in the `VMComponentContext`.
|
||||
pub num_always_trap: u32,
|
||||
/// Number of transcoders needed for string conversion.
|
||||
pub num_transcoders: u32,
|
||||
|
||||
// precalculated offsets of various member fields
|
||||
magic: u32,
|
||||
transcode_libcalls: u32,
|
||||
store: u32,
|
||||
limits: u32,
|
||||
flags: u32,
|
||||
lowering_anyfuncs: u32,
|
||||
always_trap_anyfuncs: u32,
|
||||
transcoder_anyfuncs: u32,
|
||||
lowerings: u32,
|
||||
memories: u32,
|
||||
reallocs: u32,
|
||||
@@ -93,12 +99,15 @@ impl<P: PtrSize> VMComponentOffsets<P> {
|
||||
.try_into()
|
||||
.unwrap(),
|
||||
num_always_trap: component.num_always_trap,
|
||||
num_transcoders: component.num_transcoders,
|
||||
magic: 0,
|
||||
transcode_libcalls: 0,
|
||||
store: 0,
|
||||
limits: 0,
|
||||
flags: 0,
|
||||
lowering_anyfuncs: 0,
|
||||
always_trap_anyfuncs: 0,
|
||||
transcoder_anyfuncs: 0,
|
||||
lowerings: 0,
|
||||
memories: 0,
|
||||
reallocs: 0,
|
||||
@@ -133,6 +142,7 @@ impl<P: PtrSize> VMComponentOffsets<P> {
|
||||
fields! {
|
||||
size(magic) = 4u32,
|
||||
align(u32::from(ret.ptr.size())),
|
||||
size(transcode_libcalls) = ret.ptr.size(),
|
||||
size(store) = cmul(2, ret.ptr.size()),
|
||||
size(limits) = ret.ptr.size(),
|
||||
align(16),
|
||||
@@ -140,6 +150,7 @@ impl<P: PtrSize> VMComponentOffsets<P> {
|
||||
align(u32::from(ret.ptr.size())),
|
||||
size(lowering_anyfuncs) = cmul(ret.num_lowerings, ret.ptr.size_of_vmcaller_checked_anyfunc()),
|
||||
size(always_trap_anyfuncs) = cmul(ret.num_always_trap, ret.ptr.size_of_vmcaller_checked_anyfunc()),
|
||||
size(transcoder_anyfuncs) = cmul(ret.num_transcoders, ret.ptr.size_of_vmcaller_checked_anyfunc()),
|
||||
size(lowerings) = cmul(ret.num_lowerings, ret.ptr.size() * 2),
|
||||
size(memories) = cmul(ret.num_runtime_memories, ret.ptr.size()),
|
||||
size(reallocs) = cmul(ret.num_runtime_reallocs, ret.ptr.size()),
|
||||
@@ -168,6 +179,12 @@ impl<P: PtrSize> VMComponentOffsets<P> {
|
||||
self.magic
|
||||
}
|
||||
|
||||
/// The offset of the `transcode_libcalls` field.
|
||||
#[inline]
|
||||
pub fn transcode_libcalls(&self) -> u32 {
|
||||
self.transcode_libcalls
|
||||
}
|
||||
|
||||
/// The offset of the `flags` field.
|
||||
#[inline]
|
||||
pub fn instance_flags(&self, index: RuntimeComponentInstanceIndex) -> u32 {
|
||||
@@ -215,6 +232,20 @@ impl<P: PtrSize> VMComponentOffsets<P> {
|
||||
+ index.as_u32() * u32::from(self.ptr.size_of_vmcaller_checked_anyfunc())
|
||||
}
|
||||
|
||||
/// The offset of the `transcoder_anyfuncs` field.
|
||||
#[inline]
|
||||
pub fn transcoder_anyfuncs(&self) -> u32 {
|
||||
self.transcoder_anyfuncs
|
||||
}
|
||||
|
||||
/// The offset of `VMCallerCheckedAnyfunc` for the `index` specified.
|
||||
#[inline]
|
||||
pub fn transcoder_anyfunc(&self, index: RuntimeTranscoderIndex) -> u32 {
|
||||
assert!(index.as_u32() < self.num_transcoders);
|
||||
self.transcoder_anyfuncs()
|
||||
+ index.as_u32() * u32::from(self.ptr.size_of_vmcaller_checked_anyfunc())
|
||||
}
|
||||
|
||||
/// The offset of the `lowerings` field.
|
||||
#[inline]
|
||||
pub fn lowerings(&self) -> u32 {
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
|
||||
use crate::component::dfg::CoreDef;
|
||||
use crate::component::{Adapter, AdapterOptions, ComponentTypes, StringEncoding, TypeFuncIndex};
|
||||
use crate::{FuncIndex, GlobalIndex, MemoryIndex};
|
||||
use crate::{FuncIndex, GlobalIndex, MemoryIndex, PrimaryMap};
|
||||
use std::collections::HashMap;
|
||||
use std::mem;
|
||||
use wasm_encoder::*;
|
||||
@@ -28,8 +28,11 @@ use wasm_encoder::*;
|
||||
mod core_types;
|
||||
mod signature;
|
||||
mod trampoline;
|
||||
mod transcode;
|
||||
mod traps;
|
||||
|
||||
pub use self::transcode::{FixedEncoding, Transcode};
|
||||
|
||||
/// Representation of an adapter module.
|
||||
pub struct Module<'a> {
|
||||
/// Whether or not debug code is inserted into the adapters themselves.
|
||||
@@ -46,9 +49,10 @@ pub struct Module<'a> {
|
||||
core_imports: ImportSection,
|
||||
/// Final list of imports that this module ended up using, in the same order
|
||||
/// as the imports in the import section.
|
||||
imports: Vec<CoreDef>,
|
||||
imports: Vec<Import>,
|
||||
/// Intern'd imports and what index they were assigned.
|
||||
imported: HashMap<CoreDef, u32>,
|
||||
imported_memories: PrimaryMap<MemoryIndex, CoreDef>,
|
||||
|
||||
// Current status of index spaces from the imports generated so far.
|
||||
core_funcs: u32,
|
||||
@@ -100,6 +104,7 @@ impl<'a> Module<'a> {
|
||||
imported: Default::default(),
|
||||
adapters: Default::default(),
|
||||
imports: Default::default(),
|
||||
imported_memories: PrimaryMap::new(),
|
||||
core_funcs: 0,
|
||||
core_memories: 0,
|
||||
core_globals: 0,
|
||||
@@ -246,19 +251,50 @@ impl<'a> Module<'a> {
|
||||
let ret = *cnt - 1;
|
||||
self.core_imports.import(module, name, ty);
|
||||
self.imported.insert(def.clone(), ret);
|
||||
self.imports.push(def);
|
||||
if let EntityType::Memory(_) = ty {
|
||||
self.imported_memories.push(def.clone());
|
||||
}
|
||||
self.imports.push(Import::CoreDef(def));
|
||||
ret
|
||||
}
|
||||
|
||||
/// Encodes this module into a WebAssembly binary.
|
||||
pub fn encode(&mut self) -> Vec<u8> {
|
||||
let mut types = mem::take(&mut self.core_types);
|
||||
let mut transcoders = transcode::Transcoders::new(self.core_funcs);
|
||||
let mut adapter_funcs = Vec::new();
|
||||
for adapter in self.adapters.iter() {
|
||||
adapter_funcs.push(trampoline::compile(
|
||||
self,
|
||||
&mut types,
|
||||
&mut transcoders,
|
||||
adapter,
|
||||
));
|
||||
}
|
||||
|
||||
// If any string transcoding imports were needed add imported items
|
||||
// associated with them.
|
||||
for (module, name, ty, transcoder) in transcoders.imports() {
|
||||
self.core_imports.import(module, name, ty);
|
||||
let from = self.imported_memories[transcoder.from_memory].clone();
|
||||
let to = self.imported_memories[transcoder.to_memory].clone();
|
||||
self.imports.push(Import::Transcode {
|
||||
op: transcoder.op,
|
||||
from,
|
||||
from64: transcoder.from_memory64,
|
||||
to,
|
||||
to64: transcoder.to_memory64,
|
||||
});
|
||||
self.core_funcs += 1;
|
||||
}
|
||||
|
||||
// Now that all functions are known as well as all imports the actual
|
||||
// bodies of all adapters are assembled into a final module.
|
||||
let mut funcs = FunctionSection::new();
|
||||
let mut code = CodeSection::new();
|
||||
let mut exports = ExportSection::new();
|
||||
let mut traps = traps::TrapSection::default();
|
||||
|
||||
let mut types = mem::take(&mut self.core_types);
|
||||
for adapter in self.adapters.iter() {
|
||||
for (adapter, (function, func_traps)) in self.adapters.iter().zip(adapter_funcs) {
|
||||
let idx = self.core_funcs + funcs.len();
|
||||
exports.export(&adapter.name, ExportKind::Func, idx);
|
||||
|
||||
@@ -266,7 +302,6 @@ impl<'a> Module<'a> {
|
||||
let ty = types.function(&signature.params, &signature.results);
|
||||
funcs.function(ty);
|
||||
|
||||
let (function, func_traps) = trampoline::compile(self, &mut types, adapter);
|
||||
code.raw(&function);
|
||||
traps.append(idx, func_traps);
|
||||
}
|
||||
@@ -290,11 +325,31 @@ impl<'a> Module<'a> {
|
||||
|
||||
/// Returns the imports that were used, in order, to create this adapter
|
||||
/// module.
|
||||
pub fn imports(&self) -> &[CoreDef] {
|
||||
pub fn imports(&self) -> &[Import] {
|
||||
&self.imports
|
||||
}
|
||||
}
|
||||
|
||||
/// Possible imports into an adapter module.
|
||||
#[derive(Clone)]
|
||||
pub enum Import {
|
||||
/// A definition required in the configuration of an `Adapter`.
|
||||
CoreDef(CoreDef),
|
||||
/// A transcoding function from the host to convert between string encodings.
|
||||
Transcode {
|
||||
/// The transcoding operation this performs.
|
||||
op: Transcode,
|
||||
/// The memory being read
|
||||
from: CoreDef,
|
||||
/// Whether or not `from` is a 64-bit memory
|
||||
from64: bool,
|
||||
/// The memory being written
|
||||
to: CoreDef,
|
||||
/// Whether or not `to` is a 64-bit memory
|
||||
to64: bool,
|
||||
},
|
||||
}
|
||||
|
||||
impl Options {
|
||||
fn ptr(&self) -> ValType {
|
||||
if self.memory64 {
|
||||
|
||||
@@ -16,12 +16,13 @@
|
||||
//! can be somewhat arbitrary, an intentional decision.
|
||||
|
||||
use crate::component::{
|
||||
InterfaceType, TypeEnumIndex, TypeExpectedIndex, TypeFlagsIndex, TypeInterfaceIndex,
|
||||
TypeRecordIndex, TypeTupleIndex, TypeUnionIndex, TypeVariantIndex, FLAG_MAY_ENTER,
|
||||
FLAG_MAY_LEAVE, MAX_FLAT_PARAMS, MAX_FLAT_RESULTS,
|
||||
InterfaceType, StringEncoding, TypeEnumIndex, TypeExpectedIndex, TypeFlagsIndex,
|
||||
TypeInterfaceIndex, TypeRecordIndex, TypeTupleIndex, TypeUnionIndex, TypeVariantIndex,
|
||||
FLAG_MAY_ENTER, FLAG_MAY_LEAVE, MAX_FLAT_PARAMS, MAX_FLAT_RESULTS,
|
||||
};
|
||||
use crate::fact::core_types::CoreTypes;
|
||||
use crate::fact::signature::{align_to, Signature};
|
||||
use crate::fact::transcode::{FixedEncoding as FE, Transcode, Transcoder, Transcoders};
|
||||
use crate::fact::traps::Trap;
|
||||
use crate::fact::{AdapterData, Context, Module, Options};
|
||||
use crate::GlobalIndex;
|
||||
@@ -31,6 +32,9 @@ use std::ops::Range;
|
||||
use wasm_encoder::{BlockType, Encode, Instruction, Instruction::*, MemArg, ValType};
|
||||
use wasmtime_component_util::{DiscriminantSize, FlagsSize};
|
||||
|
||||
const MAX_STRING_BYTE_LENGTH: u32 = 1 << 31;
|
||||
const UTF16_TAG: u32 = 1 << 31;
|
||||
|
||||
struct Compiler<'a, 'b> {
|
||||
/// The module that the adapter will eventually be inserted into.
|
||||
module: &'a Module<'a>,
|
||||
@@ -38,6 +42,9 @@ struct Compiler<'a, 'b> {
|
||||
/// The type section of `module`
|
||||
types: &'b mut CoreTypes,
|
||||
|
||||
/// Imported functions to transcode between various string encodings.
|
||||
transcoders: &'b mut Transcoders,
|
||||
|
||||
/// Metadata about the adapter that is being compiled.
|
||||
adapter: &'a AdapterData,
|
||||
|
||||
@@ -71,6 +78,7 @@ struct Compiler<'a, 'b> {
|
||||
pub(super) fn compile(
|
||||
module: &Module<'_>,
|
||||
types: &mut CoreTypes,
|
||||
transcoders: &mut Transcoders,
|
||||
adapter: &AdapterData,
|
||||
) -> (Vec<u8>, Vec<(usize, Trap)>) {
|
||||
let lower_sig = &module.signature(&adapter.lower, Context::Lower);
|
||||
@@ -79,6 +87,7 @@ pub(super) fn compile(
|
||||
module,
|
||||
types,
|
||||
adapter,
|
||||
transcoders,
|
||||
code: Vec::new(),
|
||||
locals: Vec::new(),
|
||||
nlocals: lower_sig.params.len() as u32,
|
||||
@@ -356,6 +365,7 @@ impl Compiler<'_, '_> {
|
||||
InterfaceType::Float32 => self.translate_f32(src, dst_ty, dst),
|
||||
InterfaceType::Float64 => self.translate_f64(src, dst_ty, dst),
|
||||
InterfaceType::Char => self.translate_char(src, dst_ty, dst),
|
||||
InterfaceType::String => self.translate_string(src, dst_ty, dst),
|
||||
InterfaceType::List(t) => self.translate_list(*t, src, dst_ty, dst),
|
||||
InterfaceType::Record(t) => self.translate_record(*t, src, dst_ty, dst),
|
||||
InterfaceType::Flags(f) => self.translate_flags(*f, src, dst_ty, dst),
|
||||
@@ -365,13 +375,6 @@ impl Compiler<'_, '_> {
|
||||
InterfaceType::Enum(t) => self.translate_enum(*t, src, dst_ty, dst),
|
||||
InterfaceType::Option(t) => self.translate_option(*t, src, dst_ty, dst),
|
||||
InterfaceType::Expected(t) => self.translate_expected(*t, src, dst_ty, dst),
|
||||
|
||||
InterfaceType::String => {
|
||||
// consider this field used for now until this is fully
|
||||
// implemented.
|
||||
drop(&self.adapter.lift.string_encoding);
|
||||
unimplemented!("don't know how to translate strings")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -636,6 +639,768 @@ impl Compiler<'_, '_> {
|
||||
}
|
||||
}
|
||||
|
||||
fn translate_string(&mut self, src: &Source<'_>, dst_ty: &InterfaceType, dst: &Destination) {
|
||||
assert!(matches!(dst_ty, InterfaceType::String));
|
||||
let src_opts = src.opts();
|
||||
let dst_opts = dst.opts();
|
||||
|
||||
// Load the pointer/length of this string into temporary locals. These
|
||||
// will be referenced a good deal so this just makes it easier to deal
|
||||
// with them consistently below rather than trying to reload from memory
|
||||
// for example.
|
||||
let src_ptr = self.gen_local(src_opts.ptr());
|
||||
let src_len = self.gen_local(src_opts.ptr());
|
||||
match src {
|
||||
Source::Stack(s) => {
|
||||
assert_eq!(s.locals.len(), 2);
|
||||
self.stack_get(&s.slice(0..1), src_opts.ptr());
|
||||
self.instruction(LocalSet(src_ptr));
|
||||
self.stack_get(&s.slice(1..2), src_opts.ptr());
|
||||
self.instruction(LocalSet(src_len));
|
||||
}
|
||||
Source::Memory(mem) => {
|
||||
self.ptr_load(mem);
|
||||
self.instruction(LocalSet(src_ptr));
|
||||
self.ptr_load(&mem.bump(src_opts.ptr_size().into()));
|
||||
self.instruction(LocalSet(src_len));
|
||||
}
|
||||
}
|
||||
let src_str = &WasmString {
|
||||
ptr: src_ptr,
|
||||
len: src_len,
|
||||
opts: src_opts,
|
||||
};
|
||||
|
||||
let dst_str = match src_opts.string_encoding {
|
||||
StringEncoding::Utf8 => match dst_opts.string_encoding {
|
||||
StringEncoding::Utf8 => self.string_copy(src_str, FE::Utf8, dst_opts, FE::Utf8),
|
||||
StringEncoding::Utf16 => self.string_utf8_to_utf16(src_str, dst_opts),
|
||||
StringEncoding::CompactUtf16 => self.string_to_compact(src_str, FE::Utf8, dst_opts),
|
||||
},
|
||||
|
||||
StringEncoding::Utf16 => {
|
||||
self.verify_aligned(src_opts, src_ptr, 2);
|
||||
match dst_opts.string_encoding {
|
||||
StringEncoding::Utf8 => {
|
||||
self.string_deflate_to_utf8(src_str, FE::Utf16, dst_opts)
|
||||
}
|
||||
StringEncoding::Utf16 => {
|
||||
self.string_copy(src_str, FE::Utf16, dst_opts, FE::Utf16)
|
||||
}
|
||||
StringEncoding::CompactUtf16 => {
|
||||
self.string_to_compact(src_str, FE::Utf16, dst_opts)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
StringEncoding::CompactUtf16 => {
|
||||
self.verify_aligned(src_opts, src_ptr, 2);
|
||||
|
||||
// Test the tag big to see if this is a utf16 or a latin1 string
|
||||
// at runtime...
|
||||
self.instruction(LocalGet(src_len));
|
||||
self.ptr_uconst(src_opts, UTF16_TAG);
|
||||
self.ptr_and(src_opts);
|
||||
self.ptr_if(src_opts, BlockType::Empty);
|
||||
|
||||
// In the utf16 block unset the upper bit from the length local
|
||||
// so further calculations have the right value. Afterwards the
|
||||
// string transcode proceeds assuming utf16.
|
||||
self.instruction(LocalGet(src_len));
|
||||
self.ptr_uconst(src_opts, UTF16_TAG);
|
||||
self.ptr_xor(src_opts);
|
||||
self.instruction(LocalSet(src_len));
|
||||
let s1 = match dst_opts.string_encoding {
|
||||
StringEncoding::Utf8 => {
|
||||
self.string_deflate_to_utf8(src_str, FE::Utf16, dst_opts)
|
||||
}
|
||||
StringEncoding::Utf16 => {
|
||||
self.string_copy(src_str, FE::Utf16, dst_opts, FE::Utf16)
|
||||
}
|
||||
StringEncoding::CompactUtf16 => {
|
||||
self.string_compact_utf16_to_compact(src_str, dst_opts)
|
||||
}
|
||||
};
|
||||
|
||||
self.instruction(Else);
|
||||
|
||||
// In the latin1 block the `src_len` local is already the number
|
||||
// of code units, so the string transcoding is all that needs to
|
||||
// happen.
|
||||
let s2 = match dst_opts.string_encoding {
|
||||
StringEncoding::Utf16 => {
|
||||
self.string_copy(src_str, FE::Latin1, dst_opts, FE::Utf16)
|
||||
}
|
||||
StringEncoding::Utf8 => {
|
||||
self.string_deflate_to_utf8(src_str, FE::Latin1, dst_opts)
|
||||
}
|
||||
StringEncoding::CompactUtf16 => {
|
||||
self.string_copy(src_str, FE::Latin1, dst_opts, FE::Latin1)
|
||||
}
|
||||
};
|
||||
// Set our `s2` generated locals to the `s2` generated locals
|
||||
// as the resulting pointer of this transcode.
|
||||
self.instruction(LocalGet(s2.ptr));
|
||||
self.instruction(LocalSet(s1.ptr));
|
||||
self.instruction(LocalGet(s2.len));
|
||||
self.instruction(LocalSet(s1.len));
|
||||
self.instruction(End);
|
||||
s1
|
||||
}
|
||||
};
|
||||
|
||||
// Store the ptr/length in the desired destination
|
||||
match dst {
|
||||
Destination::Stack(s, _) => {
|
||||
self.instruction(LocalGet(dst_str.ptr));
|
||||
self.stack_set(&s[..1], dst_opts.ptr());
|
||||
self.instruction(LocalGet(dst_str.len));
|
||||
self.stack_set(&s[1..], dst_opts.ptr());
|
||||
}
|
||||
Destination::Memory(mem) => {
|
||||
self.instruction(LocalGet(mem.addr_local));
|
||||
self.instruction(LocalGet(dst_str.ptr));
|
||||
self.ptr_store(mem);
|
||||
self.instruction(LocalGet(mem.addr_local));
|
||||
self.instruction(LocalGet(dst_str.len));
|
||||
self.ptr_store(&mem.bump(dst_opts.ptr_size().into()));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Corresponding function for `store_string_copy` in the spec.
|
||||
//
|
||||
// This performs a transcoding of the string with a one-pass copy from
|
||||
// the `src` encoding to the `dst` encoding. This is only possible for
|
||||
// fixed encodings where the first allocation is guaranteed to be an
|
||||
// appropriate fit so it's not suitable for all encodings.
|
||||
//
|
||||
// Imported host transcoding functions here take the src/dst pointers as
|
||||
// well as the number of code units in the source (which always matches
|
||||
// the number of code units in the destination). There is no return
|
||||
// value from the transcode function since the encoding should always
|
||||
// work on the first pass.
|
||||
fn string_copy<'a>(
|
||||
&mut self,
|
||||
src: &WasmString<'_>,
|
||||
src_enc: FE,
|
||||
dst_opts: &'a Options,
|
||||
dst_enc: FE,
|
||||
) -> WasmString<'a> {
|
||||
assert!(dst_enc.width() >= src_enc.width());
|
||||
self.validate_string_length(src, dst_enc);
|
||||
|
||||
// Calculate the source byte length given the size of each code
|
||||
// unit. Note that this shouldn't overflow given
|
||||
// `validate_string_length` above.
|
||||
let src_byte_len = if src_enc.width() == 1 {
|
||||
src.len
|
||||
} else {
|
||||
assert_eq!(src_enc.width(), 2);
|
||||
let tmp = self.gen_local(src.opts.ptr());
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.ptr_uconst(src.opts, 1);
|
||||
self.ptr_shl(src.opts);
|
||||
self.instruction(LocalSet(tmp));
|
||||
tmp
|
||||
};
|
||||
|
||||
// Convert the source code units length to the destination byte
|
||||
// length type.
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
|
||||
let dst_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalTee(dst_len));
|
||||
if dst_enc.width() > 1 {
|
||||
assert_eq!(dst_enc.width(), 2);
|
||||
self.ptr_uconst(dst_opts, 1);
|
||||
self.ptr_shl(dst_opts);
|
||||
}
|
||||
let dst_byte_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalSet(dst_byte_len));
|
||||
|
||||
// Allocate space in the destination using the calculated byte
|
||||
// length.
|
||||
let dst = {
|
||||
let dst_mem = self.malloc(
|
||||
dst_opts,
|
||||
MallocSize::Local(dst_byte_len),
|
||||
dst_enc.width().into(),
|
||||
);
|
||||
WasmString {
|
||||
ptr: dst_mem.addr_local,
|
||||
len: dst_len,
|
||||
opts: dst_opts,
|
||||
}
|
||||
};
|
||||
|
||||
// Validate that `src_len + src_ptr` and
|
||||
// `dst_mem.addr_local + dst_byte_len` are both in-bounds. This
|
||||
// is done by loading the last byte of the string and if that
|
||||
// doesn't trap then it's known valid.
|
||||
self.validate_string_inbounds(src, src_byte_len);
|
||||
self.validate_string_inbounds(&dst, dst_byte_len);
|
||||
|
||||
// If the validations pass then the host `transcode` intrinsic
|
||||
// is invoked. This will either raise a trap or otherwise succeed
|
||||
// in which case we're done.
|
||||
let op = if src_enc == dst_enc {
|
||||
Transcode::Copy(src_enc)
|
||||
} else {
|
||||
assert_eq!(src_enc, FE::Latin1);
|
||||
assert_eq!(dst_enc, FE::Utf16);
|
||||
Transcode::Latin1ToUtf16
|
||||
};
|
||||
let transcode = self.transcoder(src, &dst, op);
|
||||
self.instruction(LocalGet(src.ptr));
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.instruction(LocalGet(dst.ptr));
|
||||
self.instruction(Call(transcode));
|
||||
|
||||
dst
|
||||
}
|
||||
// Corresponding function for `store_string_to_utf8` in the spec.
|
||||
//
|
||||
// This translation works by possibly performing a number of
|
||||
// reallocations. First a buffer of size input-code-units is used to try
|
||||
// to get the transcoding correct on the first try. If that fails the
|
||||
// maximum worst-case size is used and then that is resized down if it's
|
||||
// too large.
|
||||
//
|
||||
// The host transcoding function imported here will receive src ptr/len
|
||||
// and dst ptr/len and return how many code units were consumed on both
|
||||
// sides. The amount of code units consumed in the source dictates which
|
||||
// branches are taken in this conversion.
|
||||
fn string_deflate_to_utf8<'a>(
|
||||
&mut self,
|
||||
src: &WasmString<'_>,
|
||||
src_enc: FE,
|
||||
dst_opts: &'a Options,
|
||||
) -> WasmString<'a> {
|
||||
self.validate_string_length(src, src_enc);
|
||||
|
||||
// Optimistically assume that the code unit length of the source is
|
||||
// all that's needed in the destination. Perform that allocaiton
|
||||
// here and proceed to transcoding below.
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
|
||||
let dst_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalTee(dst_len));
|
||||
let dst_byte_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalSet(dst_byte_len));
|
||||
|
||||
let dst = {
|
||||
let dst_mem = self.malloc(dst_opts, MallocSize::Local(dst_byte_len), 1);
|
||||
WasmString {
|
||||
ptr: dst_mem.addr_local,
|
||||
len: dst_len,
|
||||
opts: dst_opts,
|
||||
}
|
||||
};
|
||||
|
||||
// Ensure buffers are all in-bounds
|
||||
let src_byte_len = match src_enc {
|
||||
FE::Latin1 => src.len,
|
||||
FE::Utf16 => {
|
||||
let tmp = self.gen_local(src.opts.ptr());
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.ptr_uconst(src.opts, 1);
|
||||
self.ptr_shl(src.opts);
|
||||
self.instruction(LocalSet(tmp));
|
||||
tmp
|
||||
}
|
||||
FE::Utf8 => unreachable!(),
|
||||
};
|
||||
self.validate_string_inbounds(src, src_byte_len);
|
||||
self.validate_string_inbounds(&dst, dst_byte_len);
|
||||
|
||||
// Perform the initial transcode
|
||||
let op = match src_enc {
|
||||
FE::Latin1 => Transcode::Latin1ToUtf8,
|
||||
FE::Utf16 => Transcode::Utf16ToUtf8,
|
||||
FE::Utf8 => unreachable!(),
|
||||
};
|
||||
let transcode = self.transcoder(src, &dst, op);
|
||||
self.instruction(LocalGet(src.ptr));
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.instruction(LocalGet(dst.ptr));
|
||||
self.instruction(LocalGet(dst_byte_len));
|
||||
self.instruction(Call(transcode));
|
||||
self.instruction(LocalSet(dst.len));
|
||||
let src_len_tmp = self.gen_local(src.opts.ptr());
|
||||
self.instruction(LocalSet(src_len_tmp));
|
||||
|
||||
// Test if the source was entirely transcoded by comparing
|
||||
// `src_len_tmp`, the number of code units transcoded from the
|
||||
// source, with `src_len`, the original number of code units.
|
||||
self.instruction(LocalGet(src_len_tmp));
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.ptr_ne(src.opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
|
||||
// Here a worst-case reallocation is performed to grow `dst_mem`.
|
||||
// In-line a check is also performed that the worst-case byte size
|
||||
// fits within the maximum size of strings.
|
||||
self.instruction(LocalGet(dst.ptr)); // old_ptr
|
||||
self.instruction(LocalGet(dst_byte_len)); // old_size
|
||||
self.ptr_uconst(dst.opts, 1); // align
|
||||
let factor = match src_enc {
|
||||
FE::Latin1 => 2,
|
||||
FE::Utf16 => 3,
|
||||
_ => unreachable!(),
|
||||
};
|
||||
self.validate_string_length_u8(src, factor);
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
|
||||
self.ptr_uconst(dst_opts, factor.into());
|
||||
self.ptr_mul(dst_opts);
|
||||
self.instruction(LocalTee(dst_byte_len));
|
||||
self.instruction(Call(dst_opts.realloc.unwrap().as_u32()));
|
||||
self.instruction(LocalSet(dst.ptr));
|
||||
|
||||
// Verify that the destination is still in-bounds
|
||||
self.validate_string_inbounds(&dst, dst_byte_len);
|
||||
|
||||
// Perform another round of transcoding that should be guaranteed
|
||||
// to succeed. Note that all the parameters here are offset by the
|
||||
// results of the first transcoding to only perform the remaining
|
||||
// transcode on the final units.
|
||||
self.instruction(LocalGet(src.ptr));
|
||||
self.instruction(LocalGet(src_len_tmp));
|
||||
if let FE::Utf16 = src_enc {
|
||||
self.ptr_uconst(src.opts, 1);
|
||||
self.ptr_shl(src.opts);
|
||||
}
|
||||
self.ptr_add(src.opts);
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.instruction(LocalGet(src_len_tmp));
|
||||
self.ptr_sub(src.opts);
|
||||
self.instruction(LocalGet(dst.ptr));
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_add(dst.opts);
|
||||
self.instruction(LocalGet(dst_byte_len));
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_sub(dst.opts);
|
||||
self.instruction(Call(transcode));
|
||||
|
||||
// Add the second result, the amount of destination units encoded,
|
||||
// to `dst_len` so it's an accurate reflection of the final size of
|
||||
// the destination buffer.
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_add(dst.opts);
|
||||
self.instruction(LocalSet(dst.len));
|
||||
|
||||
// In debug mode verify the first result consumed the entire string,
|
||||
// otherwise simply discard it.
|
||||
if self.module.debug {
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.instruction(LocalGet(src_len_tmp));
|
||||
self.ptr_sub(src.opts);
|
||||
self.ptr_ne(src.opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
self.trap(Trap::AssertFailed("should have finished encoding"));
|
||||
self.instruction(End);
|
||||
} else {
|
||||
self.instruction(Drop);
|
||||
}
|
||||
|
||||
// Perform a downsizing if the worst-case size was too large
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.instruction(LocalGet(dst_byte_len));
|
||||
self.ptr_ne(dst.opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
self.instruction(LocalGet(dst.ptr)); // old_ptr
|
||||
self.instruction(LocalGet(dst_byte_len)); // old_size
|
||||
self.ptr_uconst(dst.opts, 1); // align
|
||||
self.instruction(LocalGet(dst.len)); // new_size
|
||||
self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
|
||||
self.instruction(LocalSet(dst.ptr));
|
||||
self.instruction(End);
|
||||
|
||||
// If the first transcode was enough then assert that the returned
|
||||
// amount of destination items written equals the byte size.
|
||||
if self.module.debug {
|
||||
self.instruction(Else);
|
||||
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.instruction(LocalGet(dst_byte_len));
|
||||
self.ptr_ne(dst_opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
self.trap(Trap::AssertFailed("should have finished encoding"));
|
||||
self.instruction(End);
|
||||
}
|
||||
|
||||
self.instruction(End); // end of "first transcode not enough"
|
||||
|
||||
dst
|
||||
}
|
||||
|
||||
// Corresponds to the `store_utf8_to_utf16` function in the spec.
|
||||
//
|
||||
// When converting utf-8 to utf-16 a pessimistic allocation is
|
||||
// done which is twice the byte length of the utf-8 string.
|
||||
// The host then transcodes and returns how many code units were
|
||||
// actually used during the transcoding and if it's beneath the
|
||||
// pessimistic maximum then the buffer is reallocated down to
|
||||
// a smaller amount.
|
||||
//
|
||||
// The host-imported transcoding function takes the src/dst pointer as
|
||||
// well as the code unit size of both the source and destination. The
|
||||
// destination should always be big enough to hold the result of the
|
||||
// transcode and so the result of the host function is how many code
|
||||
// units were written to the destination.
|
||||
fn string_utf8_to_utf16<'a>(
|
||||
&mut self,
|
||||
src: &WasmString<'_>,
|
||||
dst_opts: &'a Options,
|
||||
) -> WasmString<'a> {
|
||||
self.validate_string_length(src, FE::Utf16);
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
|
||||
let dst_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalTee(dst_len));
|
||||
self.ptr_uconst(dst_opts, 1);
|
||||
self.ptr_shl(dst_opts);
|
||||
let dst_byte_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalSet(dst_byte_len));
|
||||
let dst = {
|
||||
let dst_mem = self.malloc(dst_opts, MallocSize::Local(dst_byte_len), 2);
|
||||
WasmString {
|
||||
ptr: dst_mem.addr_local,
|
||||
len: dst_len,
|
||||
opts: dst_opts,
|
||||
}
|
||||
};
|
||||
|
||||
self.validate_string_inbounds(src, src.len);
|
||||
self.validate_string_inbounds(&dst, dst_byte_len);
|
||||
|
||||
let transcode = self.transcoder(src, &dst, Transcode::Utf8ToUtf16);
|
||||
self.instruction(LocalGet(src.ptr));
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.instruction(LocalGet(dst.ptr));
|
||||
self.instruction(Call(transcode));
|
||||
self.instruction(LocalSet(dst.len));
|
||||
|
||||
// If the number of code units returned by transcode is not
|
||||
// equal to the original number of code units then
|
||||
// the buffer must be shrunk.
|
||||
//
|
||||
// Note that the byte length of the final allocation we
|
||||
// want is twice the code unit length returned by the
|
||||
// transcoding function.
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_ne(dst_opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
self.instruction(LocalGet(dst.ptr));
|
||||
self.instruction(LocalGet(dst_byte_len));
|
||||
self.ptr_uconst(dst.opts, 2);
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_uconst(dst.opts, 1);
|
||||
self.ptr_shl(dst.opts);
|
||||
self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
|
||||
self.instruction(LocalSet(dst.ptr));
|
||||
self.instruction(End); // end of shrink-to-fit
|
||||
|
||||
dst
|
||||
}
|
||||
|
||||
// Corresponds to `store_probably_utf16_to_latin1_or_utf16` in the spec.
|
||||
//
|
||||
// This will try to transcode the input utf16 string to utf16 in the
|
||||
// destination. If utf16 isn't needed though and latin1 could be used
|
||||
// then that's used instead and a reallocation to downsize occurs
|
||||
// afterwards.
|
||||
//
|
||||
// The host transcode function here will take the src/dst pointers as
|
||||
// well as src length. The destination byte length is twice the src code
|
||||
// unit length. The return value is the tagged length of the returned
|
||||
// string. If the upper bit is set then utf16 was used and the
|
||||
// conversion is done. If the upper bit is not set then latin1 was used
|
||||
// and a downsizing needs to happen.
|
||||
fn string_compact_utf16_to_compact<'a>(
|
||||
&mut self,
|
||||
src: &WasmString<'_>,
|
||||
dst_opts: &'a Options,
|
||||
) -> WasmString<'a> {
|
||||
self.validate_string_length(src, FE::Utf16);
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
|
||||
let dst_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalTee(dst_len));
|
||||
self.ptr_uconst(dst_opts, 1);
|
||||
self.ptr_shl(dst_opts);
|
||||
let dst_byte_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalSet(dst_byte_len));
|
||||
let dst = {
|
||||
let dst_mem = self.malloc(dst_opts, MallocSize::Local(dst_byte_len), 2);
|
||||
WasmString {
|
||||
ptr: dst_mem.addr_local,
|
||||
len: dst_len,
|
||||
opts: dst_opts,
|
||||
}
|
||||
};
|
||||
|
||||
self.validate_string_inbounds(src, dst_byte_len);
|
||||
self.validate_string_inbounds(&dst, dst_byte_len);
|
||||
|
||||
let transcode = self.transcoder(src, &dst, Transcode::Utf16ToCompactProbablyUtf16);
|
||||
self.instruction(LocalGet(src.ptr));
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.instruction(LocalGet(dst.ptr));
|
||||
self.instruction(Call(transcode));
|
||||
self.instruction(LocalSet(dst.len));
|
||||
|
||||
// Assert that the untagged code unit length is the same as the
|
||||
// source code unit length.
|
||||
if self.module.debug {
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_uconst(dst.opts, !UTF16_TAG);
|
||||
self.ptr_and(dst.opts);
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
|
||||
self.ptr_ne(dst.opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
self.trap(Trap::AssertFailed("expected equal code units"));
|
||||
self.instruction(End);
|
||||
}
|
||||
|
||||
// If the UTF16_TAG is set then utf16 was used and the destination
|
||||
// should be appropriately sized. Bail out of the "is this string
|
||||
// empty" block and fall through otherwise to resizing.
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_uconst(dst.opts, UTF16_TAG);
|
||||
self.ptr_and(dst.opts);
|
||||
self.ptr_br_if(dst.opts, 0);
|
||||
|
||||
// Here `realloc` is used to downsize the string
|
||||
self.instruction(LocalGet(dst.ptr)); // old_ptr
|
||||
self.instruction(LocalGet(dst_byte_len)); // old_size
|
||||
self.ptr_uconst(dst.opts, 2); // align
|
||||
self.instruction(LocalGet(dst.len)); // new_size
|
||||
self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
|
||||
self.instruction(LocalSet(dst.ptr));
|
||||
|
||||
dst
|
||||
}
|
||||
|
||||
// Corresponds to `store_string_to_latin1_or_utf16` in the spec.
|
||||
//
|
||||
// This will attempt a first pass of transcoding to latin1 and on
|
||||
// failure a larger buffer is allocated for utf16 and then utf16 is
|
||||
// encoded in-place into the buffer. After either latin1 or utf16 the
|
||||
// buffer is then resized to fit the final string allocation.
|
||||
fn string_to_compact<'a>(
|
||||
&mut self,
|
||||
src: &WasmString<'_>,
|
||||
src_enc: FE,
|
||||
dst_opts: &'a Options,
|
||||
) -> WasmString<'a> {
|
||||
self.validate_string_length(src, src_enc);
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
|
||||
let dst_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalTee(dst_len));
|
||||
let dst_byte_len = self.gen_local(dst_opts.ptr());
|
||||
self.instruction(LocalSet(dst_byte_len));
|
||||
let dst = {
|
||||
let dst_mem = self.malloc(dst_opts, MallocSize::Local(dst_byte_len), 2);
|
||||
WasmString {
|
||||
ptr: dst_mem.addr_local,
|
||||
len: dst_len,
|
||||
opts: dst_opts,
|
||||
}
|
||||
};
|
||||
|
||||
self.validate_string_inbounds(src, src.len);
|
||||
self.validate_string_inbounds(&dst, dst_byte_len);
|
||||
|
||||
// Perform the initial latin1 transcode. This returns the number of
|
||||
// source code units consumed and the number of destination code
|
||||
// units (bytes) written.
|
||||
let (latin1, utf16) = match src_enc {
|
||||
FE::Utf8 => (Transcode::Utf8ToLatin1, Transcode::Utf8ToCompactUtf16),
|
||||
FE::Utf16 => (Transcode::Utf16ToLatin1, Transcode::Utf16ToCompactUtf16),
|
||||
FE::Latin1 => unreachable!(),
|
||||
};
|
||||
let transcode_latin1 = self.transcoder(src, &dst, latin1);
|
||||
let transcode_utf16 = self.transcoder(src, &dst, utf16);
|
||||
self.instruction(LocalGet(src.ptr));
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.instruction(LocalGet(dst.ptr));
|
||||
self.instruction(Call(transcode_latin1));
|
||||
self.instruction(LocalSet(dst.len));
|
||||
let src_len_tmp = self.gen_local(src.opts.ptr());
|
||||
self.instruction(LocalSet(src_len_tmp));
|
||||
|
||||
// If the source was entirely consumed then the transcode completed
|
||||
// and all that's necessary is to optionally shrink the buffer.
|
||||
self.instruction(LocalGet(src_len_tmp));
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.ptr_eq(src.opts);
|
||||
self.instruction(If(BlockType::Empty)); // if latin1-or-utf16 block
|
||||
|
||||
// Test if the original byte length of the allocation is the same as
|
||||
// the number of written bytes, and if not then shrink the buffer
|
||||
// with a call to `realloc`.
|
||||
self.instruction(LocalGet(dst_byte_len));
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_ne(dst.opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
self.instruction(LocalGet(dst.ptr)); // old_ptr
|
||||
self.instruction(LocalGet(dst_byte_len)); // old_size
|
||||
self.ptr_uconst(dst.opts, 2); // align
|
||||
self.instruction(LocalGet(dst.len)); // new_size
|
||||
self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
|
||||
self.instruction(LocalSet(dst.ptr));
|
||||
self.instruction(End);
|
||||
|
||||
// In this block the latin1 encoding failed. The host transcode
|
||||
// returned how many units were consumed from the source and how
|
||||
// many bytes were written to the destination. Here the buffer is
|
||||
// inflated and sized and the second utf16 intrinsic is invoked to
|
||||
// perform the final inflation.
|
||||
self.instruction(Else); // else latin1-or-utf16 block
|
||||
|
||||
// For utf8 validate that the inflated size is still within bounds.
|
||||
if src_enc.width() == 1 {
|
||||
self.validate_string_length_u8(src, 2);
|
||||
}
|
||||
|
||||
// Reallocate the buffer with twice the source code units in byte
|
||||
// size.
|
||||
self.instruction(LocalGet(dst.ptr)); // old_ptr
|
||||
self.instruction(LocalGet(dst_byte_len)); // old_size
|
||||
self.ptr_uconst(dst.opts, 2); // align
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
|
||||
self.ptr_uconst(dst.opts, 1);
|
||||
self.ptr_shl(dst.opts);
|
||||
self.instruction(LocalTee(dst_byte_len));
|
||||
self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
|
||||
self.instruction(LocalSet(dst.ptr));
|
||||
|
||||
// Call the host utf16 transcoding function. This will inflate the
|
||||
// prior latin1 bytes and then encode the rest of the source string
|
||||
// as utf16 into the remaining space in the destination buffer.
|
||||
self.instruction(LocalGet(src.ptr));
|
||||
self.instruction(LocalGet(src_len_tmp));
|
||||
if let FE::Utf16 = src_enc {
|
||||
self.ptr_uconst(src.opts, 1);
|
||||
self.ptr_shl(src.opts);
|
||||
}
|
||||
self.ptr_add(src.opts);
|
||||
self.instruction(LocalGet(src.len));
|
||||
self.instruction(LocalGet(src_len_tmp));
|
||||
self.ptr_sub(src.opts);
|
||||
self.instruction(LocalGet(dst.ptr));
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.instruction(Call(transcode_utf16));
|
||||
self.instruction(LocalSet(dst.len));
|
||||
|
||||
// If the returned number of code units written to the destination
|
||||
// is not equal to the size of the allocation then the allocation is
|
||||
// resized down to the appropriate size.
|
||||
//
|
||||
// Note that the byte size desired is `2*dst_len` and the current
|
||||
// byte buffer size is `2*src_len` so the `2` factor isn't checked
|
||||
// here, just the lengths.
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
|
||||
self.ptr_ne(dst.opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
self.instruction(LocalGet(dst.ptr)); // old_ptr
|
||||
self.instruction(LocalGet(dst_byte_len)); // old_size
|
||||
self.ptr_uconst(dst.opts, 2); // align
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_uconst(dst.opts, 1);
|
||||
self.ptr_shl(dst.opts);
|
||||
self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
|
||||
self.instruction(LocalSet(dst.ptr));
|
||||
self.instruction(End);
|
||||
|
||||
// Tag the returned pointer as utf16
|
||||
self.instruction(LocalGet(dst.len));
|
||||
self.ptr_uconst(dst.opts, UTF16_TAG);
|
||||
self.ptr_or(dst.opts);
|
||||
self.instruction(LocalSet(dst.len));
|
||||
|
||||
self.instruction(End); // end latin1-or-utf16 block
|
||||
|
||||
dst
|
||||
}
|
||||
|
||||
fn validate_string_length(&mut self, src: &WasmString<'_>, dst: FE) {
|
||||
self.validate_string_length_u8(src, dst.width())
|
||||
}
|
||||
|
||||
fn validate_string_length_u8(&mut self, s: &WasmString<'_>, dst: u8) {
|
||||
// Check to see if the source byte length is out of bounds in
|
||||
// which case a trap is generated.
|
||||
self.instruction(LocalGet(s.len));
|
||||
let max = MAX_STRING_BYTE_LENGTH / u32::from(dst);
|
||||
self.ptr_uconst(s.opts, max);
|
||||
self.ptr_ge_u(s.opts);
|
||||
self.instruction(If(BlockType::Empty));
|
||||
self.trap(Trap::StringLengthTooBig);
|
||||
self.instruction(End);
|
||||
}
|
||||
|
||||
fn transcoder(&mut self, src: &WasmString<'_>, dst: &WasmString<'_>, op: Transcode) -> u32 {
|
||||
self.transcoders.import(
|
||||
self.types,
|
||||
Transcoder {
|
||||
from_memory: src.opts.memory.unwrap(),
|
||||
from_memory64: src.opts.memory64,
|
||||
to_memory: dst.opts.memory.unwrap(),
|
||||
to_memory64: dst.opts.memory64,
|
||||
op,
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
fn validate_string_inbounds(&mut self, s: &WasmString<'_>, byte_len: u32) {
|
||||
let extend_to_64 = |me: &mut Self| {
|
||||
if !s.opts.memory64 {
|
||||
me.instruction(I64ExtendI32U);
|
||||
}
|
||||
};
|
||||
|
||||
self.instruction(Block(BlockType::Empty));
|
||||
self.instruction(Block(BlockType::Empty));
|
||||
|
||||
// Calculate the full byte size of memory with `memory.size`. Note that
|
||||
// arithmetic here is done always in 64-bits to accomodate 4G memories.
|
||||
// Additionally it's assumed that 64-bit memories never fill up
|
||||
// entirely.
|
||||
self.instruction(MemorySize(s.opts.memory.unwrap().as_u32()));
|
||||
extend_to_64(self);
|
||||
self.instruction(I64Const(16));
|
||||
self.instruction(I64Shl);
|
||||
|
||||
// Calculate the end address of the string. This is done by adding the
|
||||
// base pointer to the byte length. For 32-bit memories there's no need
|
||||
// to check for overflow since everything is extended to 64-bit, but for
|
||||
// 64-bit memories overflow is checked.
|
||||
self.instruction(LocalGet(s.ptr));
|
||||
extend_to_64(self);
|
||||
self.instruction(LocalGet(byte_len));
|
||||
extend_to_64(self);
|
||||
self.instruction(I64Add);
|
||||
if s.opts.memory64 {
|
||||
let tmp = self.gen_local(ValType::I64);
|
||||
self.instruction(LocalTee(tmp));
|
||||
self.instruction(LocalGet(s.ptr));
|
||||
self.ptr_lt_u(s.opts);
|
||||
self.ptr_br_if(s.opts, 0);
|
||||
self.instruction(LocalGet(tmp));
|
||||
}
|
||||
|
||||
// If the byte size of memory is greater than the final address of the
|
||||
// string then the string is invalid. Note that if it's precisely equal
|
||||
// then that's ok.
|
||||
self.instruction(I64GtU);
|
||||
self.instruction(BrIf(1));
|
||||
|
||||
self.instruction(End);
|
||||
self.trap(Trap::StringLengthOverflow);
|
||||
self.instruction(End);
|
||||
}
|
||||
|
||||
fn translate_list(
|
||||
&mut self,
|
||||
src_ty: TypeInterfaceIndex,
|
||||
@@ -1467,17 +2232,17 @@ impl Compiler<'_, '_> {
|
||||
self.instruction(GlobalSet(flags_global.as_u32()));
|
||||
}
|
||||
|
||||
fn verify_aligned(&mut self, memory: &Memory, align: usize) {
|
||||
fn verify_aligned(&mut self, opts: &Options, addr_local: u32, align: usize) {
|
||||
// If the alignment is 1 then everything is trivially aligned and the
|
||||
// check can be omitted.
|
||||
if align == 1 {
|
||||
return;
|
||||
}
|
||||
self.instruction(LocalGet(memory.addr_local));
|
||||
self.instruction(LocalGet(addr_local));
|
||||
assert!(align.is_power_of_two());
|
||||
self.ptr_uconst(memory.opts, u32::try_from(align - 1).unwrap());
|
||||
self.ptr_and(memory.opts);
|
||||
self.ptr_if(memory.opts, BlockType::Empty);
|
||||
self.ptr_uconst(opts, u32::try_from(align - 1).unwrap());
|
||||
self.ptr_and(opts);
|
||||
self.ptr_if(opts, BlockType::Empty);
|
||||
self.trap(Trap::UnalignedPointer);
|
||||
self.instruction(End);
|
||||
}
|
||||
@@ -1527,7 +2292,7 @@ impl Compiler<'_, '_> {
|
||||
offset: 0,
|
||||
opts,
|
||||
};
|
||||
self.verify_aligned(&ret, align);
|
||||
self.verify_aligned(opts, ret.addr_local, align);
|
||||
ret
|
||||
}
|
||||
|
||||
@@ -1711,6 +2476,46 @@ impl Compiler<'_, '_> {
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_sub(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Sub);
|
||||
} else {
|
||||
self.instruction(I32Sub);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_mul(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Mul);
|
||||
} else {
|
||||
self.instruction(I32Mul);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_ge_u(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64GeU);
|
||||
} else {
|
||||
self.instruction(I32GeU);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_lt_u(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64LtU);
|
||||
} else {
|
||||
self.instruction(I32LtU);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_shl(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Shl);
|
||||
} else {
|
||||
self.instruction(I32Shl);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_eqz(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Eqz);
|
||||
@@ -1735,6 +2540,22 @@ impl Compiler<'_, '_> {
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_eq(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Eq);
|
||||
} else {
|
||||
self.instruction(I32Eq);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_ne(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Ne);
|
||||
} else {
|
||||
self.instruction(I32Ne);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_and(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64And);
|
||||
@@ -1743,6 +2564,22 @@ impl Compiler<'_, '_> {
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_or(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Or);
|
||||
} else {
|
||||
self.instruction(I32Or);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_xor(&mut self, opts: &Options) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Xor);
|
||||
} else {
|
||||
self.instruction(I32Xor);
|
||||
}
|
||||
}
|
||||
|
||||
fn ptr_if(&mut self, opts: &Options, ty: BlockType) {
|
||||
if opts.memory64 {
|
||||
self.instruction(I64Const(0));
|
||||
@@ -1974,3 +2811,9 @@ enum MallocSize {
|
||||
Const(usize),
|
||||
Local(u32),
|
||||
}
|
||||
|
||||
struct WasmString<'a> {
|
||||
ptr: u32,
|
||||
len: u32,
|
||||
opts: &'a Options,
|
||||
}
|
||||
|
||||
178
crates/environ/src/fact/transcode.rs
Normal file
178
crates/environ/src/fact/transcode.rs
Normal file
@@ -0,0 +1,178 @@
|
||||
use crate::fact::core_types::CoreTypes;
|
||||
use crate::MemoryIndex;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::HashMap;
|
||||
use wasm_encoder::{EntityType, ValType};
|
||||
|
||||
pub struct Transcoders {
|
||||
imported: HashMap<Transcoder, u32>,
|
||||
prev_func_imports: u32,
|
||||
imports: Vec<(String, EntityType, Transcoder)>,
|
||||
}
|
||||
|
||||
#[derive(Copy, Clone, Hash, Eq, PartialEq)]
|
||||
pub struct Transcoder {
|
||||
pub from_memory: MemoryIndex,
|
||||
pub from_memory64: bool,
|
||||
pub to_memory: MemoryIndex,
|
||||
pub to_memory64: bool,
|
||||
pub op: Transcode,
|
||||
}
|
||||
|
||||
/// Possible transcoding operations that must be provided by the host.
|
||||
///
|
||||
/// Note that each transcoding operation may have a unique signature depending
|
||||
/// on the precise operation.
|
||||
#[allow(missing_docs)]
|
||||
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize)]
|
||||
pub enum Transcode {
|
||||
Copy(FixedEncoding),
|
||||
Latin1ToUtf16,
|
||||
Latin1ToUtf8,
|
||||
Utf16ToCompactProbablyUtf16,
|
||||
Utf16ToCompactUtf16,
|
||||
Utf16ToLatin1,
|
||||
Utf16ToUtf8,
|
||||
Utf8ToCompactUtf16,
|
||||
Utf8ToLatin1,
|
||||
Utf8ToUtf16,
|
||||
}
|
||||
|
||||
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize)]
|
||||
#[allow(missing_docs)]
|
||||
pub enum FixedEncoding {
|
||||
Utf8,
|
||||
Utf16,
|
||||
Latin1,
|
||||
}
|
||||
|
||||
impl Transcoders {
|
||||
pub fn new(prev_func_imports: u32) -> Transcoders {
|
||||
Transcoders {
|
||||
imported: HashMap::new(),
|
||||
prev_func_imports,
|
||||
imports: Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn import(&mut self, types: &mut CoreTypes, transcoder: Transcoder) -> u32 {
|
||||
*self.imported.entry(transcoder).or_insert_with(|| {
|
||||
let idx = self.prev_func_imports + (self.imports.len() as u32);
|
||||
self.imports
|
||||
.push((transcoder.name(), transcoder.ty(types), transcoder));
|
||||
idx
|
||||
})
|
||||
}
|
||||
|
||||
pub fn imports(&self) -> impl Iterator<Item = (&str, &str, EntityType, &Transcoder)> {
|
||||
self.imports
|
||||
.iter()
|
||||
.map(|(name, ty, transcoder)| ("transcode", &name[..], *ty, transcoder))
|
||||
}
|
||||
}
|
||||
|
||||
impl Transcoder {
|
||||
fn name(&self) -> String {
|
||||
format!(
|
||||
"{} (mem{} => mem{})",
|
||||
self.op.desc(),
|
||||
self.from_memory.as_u32(),
|
||||
self.to_memory.as_u32(),
|
||||
)
|
||||
}
|
||||
|
||||
fn ty(&self, types: &mut CoreTypes) -> EntityType {
|
||||
let from_ptr = if self.from_memory64 {
|
||||
ValType::I64
|
||||
} else {
|
||||
ValType::I32
|
||||
};
|
||||
let to_ptr = if self.to_memory64 {
|
||||
ValType::I64
|
||||
} else {
|
||||
ValType::I32
|
||||
};
|
||||
|
||||
let ty = match self.op {
|
||||
// These direct transcodings take the source pointer, the source
|
||||
// code units, and the destination pointer.
|
||||
//
|
||||
// The memories being copied between are part of each intrinsic and
|
||||
// the destination code units are the same as the source.
|
||||
// Note that the pointers are dynamically guaranteed to be aligned
|
||||
// and in-bounds for the code units length as defined by the string
|
||||
// encoding.
|
||||
Transcode::Copy(_) | Transcode::Latin1ToUtf16 => {
|
||||
types.function(&[from_ptr, from_ptr, to_ptr], &[])
|
||||
}
|
||||
|
||||
// Transcoding from utf8 to utf16 takes the from ptr/len as well as
|
||||
// a destination. The destination is valid for len*2 bytes. The
|
||||
// return value is how many code units were written to the
|
||||
// destination.
|
||||
Transcode::Utf8ToUtf16 => types.function(&[from_ptr, from_ptr, to_ptr], &[to_ptr]),
|
||||
|
||||
// Transcoding to utf8 as a smaller format takes all the parameters
|
||||
// and returns the amount of space consumed in the src/destination
|
||||
Transcode::Utf16ToUtf8 | Transcode::Latin1ToUtf8 => {
|
||||
types.function(&[from_ptr, from_ptr, to_ptr, to_ptr], &[from_ptr, to_ptr])
|
||||
}
|
||||
|
||||
// The return type is a tagged length which indicates which was
|
||||
// used
|
||||
Transcode::Utf16ToCompactProbablyUtf16 => {
|
||||
types.function(&[from_ptr, from_ptr, to_ptr], &[to_ptr])
|
||||
}
|
||||
|
||||
// The initial step of transcoding from a fixed format to a compact
|
||||
// format. Takes the ptr/len of the source the the destination
|
||||
// pointer. The destination length is implicitly the same. Returns
|
||||
// how many code units were consumed in the source, which is also
|
||||
// how many bytes were written to the destination.
|
||||
Transcode::Utf8ToLatin1 | Transcode::Utf16ToLatin1 => {
|
||||
types.function(&[from_ptr, from_ptr, to_ptr], &[from_ptr, to_ptr])
|
||||
}
|
||||
|
||||
// The final step of transcoding to a compact format when the fixed
|
||||
// transcode has failed. This takes the ptr/len of the source that's
|
||||
// remaining to transcode. Then this takes the destination ptr/len
|
||||
// as well as the destination bytes written so far with latin1.
|
||||
// Finally this returns the number of code units written to the
|
||||
// destination.
|
||||
Transcode::Utf8ToCompactUtf16 | Transcode::Utf16ToCompactUtf16 => {
|
||||
types.function(&[from_ptr, from_ptr, to_ptr, to_ptr, to_ptr], &[to_ptr])
|
||||
}
|
||||
};
|
||||
EntityType::Function(ty)
|
||||
}
|
||||
}
|
||||
|
||||
impl Transcode {
|
||||
/// Returns a human-readable description for this transcoding operation.
|
||||
pub fn desc(&self) -> &'static str {
|
||||
match self {
|
||||
Transcode::Copy(FixedEncoding::Utf8) => "utf8-to-utf8",
|
||||
Transcode::Copy(FixedEncoding::Utf16) => "utf16-to-utf16",
|
||||
Transcode::Copy(FixedEncoding::Latin1) => "latin1-to-latin1",
|
||||
Transcode::Latin1ToUtf16 => "latin1-to-utf16",
|
||||
Transcode::Latin1ToUtf8 => "latin1-to-utf8",
|
||||
Transcode::Utf16ToCompactProbablyUtf16 => "utf16-to-compact-probably-utf16",
|
||||
Transcode::Utf16ToCompactUtf16 => "utf16-to-compact-utf16",
|
||||
Transcode::Utf16ToLatin1 => "utf16-to-latin1",
|
||||
Transcode::Utf16ToUtf8 => "utf16-to-utf8",
|
||||
Transcode::Utf8ToCompactUtf16 => "utf8-to-compact-utf16",
|
||||
Transcode::Utf8ToLatin1 => "utf8-to-latin1",
|
||||
Transcode::Utf8ToUtf16 => "utf8-to-utf16",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl FixedEncoding {
|
||||
pub(crate) fn width(&self) -> u8 {
|
||||
match self {
|
||||
FixedEncoding::Utf8 => 1,
|
||||
FixedEncoding::Utf16 => 2,
|
||||
FixedEncoding::Latin1 => 1,
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -30,6 +30,8 @@ pub enum Trap {
|
||||
InvalidDiscriminant,
|
||||
InvalidChar,
|
||||
ListByteLengthOverflow,
|
||||
StringLengthTooBig,
|
||||
StringLengthOverflow,
|
||||
AssertFailed(&'static str),
|
||||
}
|
||||
|
||||
@@ -105,6 +107,8 @@ impl fmt::Display for Trap {
|
||||
Trap::InvalidDiscriminant => "invalid variant discriminant".fmt(f),
|
||||
Trap::InvalidChar => "invalid char value specified".fmt(f),
|
||||
Trap::ListByteLengthOverflow => "byte size of list too large for i32".fmt(f),
|
||||
Trap::StringLengthTooBig => "string byte size exceeds maximum".fmt(f),
|
||||
Trap::StringLengthOverflow => "string byte size overflows i32".fmt(f),
|
||||
Trap::AssertFailed(s) => write!(f, "assertion failure: {}", s),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -974,7 +974,7 @@ impl Module {
|
||||
|
||||
/// Returns an iterator of all the imports in this module, along with their
|
||||
/// module name, field name, and type that's being imported.
|
||||
pub fn imports(&self) -> impl Iterator<Item = (&str, &str, EntityType)> {
|
||||
pub fn imports(&self) -> impl ExactSizeIterator<Item = (&str, &str, EntityType)> {
|
||||
self.initializers.iter().map(move |i| match i {
|
||||
Initializer::Import { name, field, index } => {
|
||||
(name.as_str(), field.as_str(), self.type_of(*index))
|
||||
|
||||
@@ -136,6 +136,8 @@ pub trait PtrSize {
|
||||
16
|
||||
}
|
||||
|
||||
// Offsets within `VMRuntimeLimits`
|
||||
|
||||
/// Return the offset of the `stack_limit` field of `VMRuntimeLimits`
|
||||
#[inline]
|
||||
fn vmruntime_limits_stack_limit(&self) -> u8 {
|
||||
@@ -168,6 +170,34 @@ pub trait PtrSize {
|
||||
fn vmruntime_limits_last_wasm_entry_sp(&self) -> u8 {
|
||||
self.vmruntime_limits_last_wasm_exit_pc() + self.size()
|
||||
}
|
||||
|
||||
// Offsets within `VMMemoryDefinition`
|
||||
|
||||
/// The offset of the `base` field.
|
||||
#[allow(clippy::erasing_op)]
|
||||
#[inline]
|
||||
fn vmmemory_definition_base(&self) -> u8 {
|
||||
0 * self.size()
|
||||
}
|
||||
|
||||
/// The offset of the `current_length` field.
|
||||
#[allow(clippy::identity_op)]
|
||||
#[inline]
|
||||
fn vmmemory_definition_current_length(&self) -> u8 {
|
||||
1 * self.size()
|
||||
}
|
||||
|
||||
/// Return the size of `VMMemoryDefinition`.
|
||||
#[inline]
|
||||
fn size_of_vmmemory_definition(&self) -> u8 {
|
||||
2 * self.size()
|
||||
}
|
||||
|
||||
/// Return the size of `*mut VMMemoryDefinition`.
|
||||
#[inline]
|
||||
fn size_of_vmmemory_pointer(&self) -> u8 {
|
||||
self.size()
|
||||
}
|
||||
}
|
||||
|
||||
/// Type representing the size of a pointer for the current compilation host
|
||||
@@ -395,9 +425,9 @@ impl<P: PtrSize> From<VMOffsetsFields<P>> for VMOffsets<P> {
|
||||
size(defined_tables)
|
||||
= cmul(ret.num_defined_tables, ret.size_of_vmtable_definition()),
|
||||
size(defined_memories)
|
||||
= cmul(ret.num_defined_memories, ret.size_of_vmmemory_pointer()),
|
||||
= cmul(ret.num_defined_memories, ret.ptr.size_of_vmmemory_pointer()),
|
||||
size(owned_memories)
|
||||
= cmul(ret.num_owned_memories, ret.size_of_vmmemory_definition()),
|
||||
= cmul(ret.num_owned_memories, ret.ptr.size_of_vmmemory_definition()),
|
||||
align(16),
|
||||
size(defined_globals)
|
||||
= cmul(ret.num_defined_globals, ret.ptr.size_of_vmglobal_definition()),
|
||||
@@ -523,35 +553,6 @@ impl<P: PtrSize> VMOffsets<P> {
|
||||
}
|
||||
}
|
||||
|
||||
/// Offsets for `VMMemoryDefinition`.
|
||||
impl<P: PtrSize> VMOffsets<P> {
|
||||
/// The offset of the `base` field.
|
||||
#[allow(clippy::erasing_op)]
|
||||
#[inline]
|
||||
pub fn vmmemory_definition_base(&self) -> u8 {
|
||||
0 * self.pointer_size()
|
||||
}
|
||||
|
||||
/// The offset of the `current_length` field.
|
||||
#[allow(clippy::identity_op)]
|
||||
#[inline]
|
||||
pub fn vmmemory_definition_current_length(&self) -> u8 {
|
||||
1 * self.pointer_size()
|
||||
}
|
||||
|
||||
/// Return the size of `VMMemoryDefinition`.
|
||||
#[inline]
|
||||
pub fn size_of_vmmemory_definition(&self) -> u8 {
|
||||
2 * self.pointer_size()
|
||||
}
|
||||
|
||||
/// Return the size of `*mut VMMemoryDefinition`.
|
||||
#[inline]
|
||||
pub fn size_of_vmmemory_pointer(&self) -> u8 {
|
||||
self.pointer_size()
|
||||
}
|
||||
}
|
||||
|
||||
/// Offsets for `VMGlobalImport`.
|
||||
impl<P: PtrSize> VMOffsets<P> {
|
||||
/// The offset of the `from` field.
|
||||
@@ -733,7 +734,8 @@ impl<P: PtrSize> VMOffsets<P> {
|
||||
#[inline]
|
||||
pub fn vmctx_vmmemory_pointer(&self, index: DefinedMemoryIndex) -> u32 {
|
||||
assert!(index.as_u32() < self.num_defined_memories);
|
||||
self.vmctx_memories_begin() + index.as_u32() * u32::from(self.size_of_vmmemory_pointer())
|
||||
self.vmctx_memories_begin()
|
||||
+ index.as_u32() * u32::from(self.ptr.size_of_vmmemory_pointer())
|
||||
}
|
||||
|
||||
/// Return the offset to the owned `VMMemoryDefinition` at index `index`.
|
||||
@@ -741,7 +743,7 @@ impl<P: PtrSize> VMOffsets<P> {
|
||||
pub fn vmctx_vmmemory_definition(&self, index: OwnedMemoryIndex) -> u32 {
|
||||
assert!(index.as_u32() < self.num_owned_memories);
|
||||
self.vmctx_owned_memories_begin()
|
||||
+ index.as_u32() * u32::from(self.size_of_vmmemory_definition())
|
||||
+ index.as_u32() * u32::from(self.ptr.size_of_vmmemory_definition())
|
||||
}
|
||||
|
||||
/// Return the offset to the `VMGlobalDefinition` index `index`.
|
||||
@@ -807,13 +809,14 @@ impl<P: PtrSize> VMOffsets<P> {
|
||||
/// Return the offset to the `base` field in `VMMemoryDefinition` index `index`.
|
||||
#[inline]
|
||||
pub fn vmctx_vmmemory_definition_base(&self, index: OwnedMemoryIndex) -> u32 {
|
||||
self.vmctx_vmmemory_definition(index) + u32::from(self.vmmemory_definition_base())
|
||||
self.vmctx_vmmemory_definition(index) + u32::from(self.ptr.vmmemory_definition_base())
|
||||
}
|
||||
|
||||
/// Return the offset to the `current_length` field in `VMMemoryDefinition` index `index`.
|
||||
#[inline]
|
||||
pub fn vmctx_vmmemory_definition_current_length(&self, index: OwnedMemoryIndex) -> u32 {
|
||||
self.vmctx_vmmemory_definition(index) + u32::from(self.vmmemory_definition_current_length())
|
||||
self.vmctx_vmmemory_definition(index)
|
||||
+ u32::from(self.ptr.vmmemory_definition_current_length())
|
||||
}
|
||||
|
||||
/// Return the offset to the `from` field in `VMGlobalImport` index `index`.
|
||||
|
||||
Reference in New Issue
Block a user