Implement strings in adapter modules (#4623)

* Implement strings in adapter modules This commit is a hefty addition to Wasmtime's support for the component model. This implements the final remaining type (in the current type hierarchy) unimplemented in adapter module trampolines: strings. Strings are the most complicated type to implement in adapter trampolines because they are highly structured chunks of data in memory (according to specific encodings). Additionally each lift/lower operation can choose its own encoding for strings meaning that Wasmtime, the host, may have to convert between any pairwise ordering of string encodings. The `CanonicalABI.md` in the component-model repo in general specifies all the fiddly bits of string encoding so there's not a ton of wiggle room for Wasmtime to get creative. This PR largely "just" implements that. The high-level architecture of this implementation is: * Fused adapters are first identified to determine src/dst string encodings. This statically fixes what transcoding operation is being performed. * The generated adapter will be responsible for managing calls to `realloc` and performing bounds checks. The adapter itself does not perform memory copies or validation of string contents, however. Instead each transcoding operation is modeled as an imported function into the adapter module. This means that the adapter module dynamically, during compile time, determines what string transcoders are needed. Note that an imported transcoder is not only parameterized over the transcoding operation but additionally which memory is the source and which is the destination. * The imported core wasm functions are modeled as a new `CoreDef::Transcoder` structure. These transcoders end up being small Cranelift-compiled trampolines. The Cranelift-compiled trampoline will load the actual base pointer of memory and add it to the relative pointers passed as function arguments. This trampoline then calls a transcoder "libcall" which enters Rust-defined functions for actual transcoding operations. * Each possible transcoding operation is implemented in Rust with a unique name and a unique signature depending on the needs of the transcoder. I've tried to document inline what each transcoder does. This means that the `Module::translate_string` in adapter modules is by far the largest translation method. The main reason for this is due to the management around calling the imported transcoder functions in the face of validating string pointer/lengths and performing the dance of `realloc`-vs-transcode at the right time. I've tried to ensure that each individual case in transcoding is documented well enough to understand what's going on as well. Additionally in this PR is a full implementation in the host for the `latin1+utf16` encoding which means that both lifting and lowering host strings now works with this encoding. Currently the implementation of each transcoder function is likely far from optimal. Where possible I've leaned on the standard library itself and for latin1-related things I'm leaning on the `encoding_rs` crate. I initially tried to implement everything with `encoding_rs` but was unable to uniformly do so easily. For now I settled on trying to get a known-correct (even in the face of endianness) implementation for all of these transcoders. If an when performance becomes an issue it should be possible to implement more optimized versions of each of these transcoding operations. Testing this commit has been somewhat difficult and my general plan, like with the `(list T)` type, is to rely heavily on fuzzing to cover the various cases here. In this PR though I've added a simple test that pushes some statically known strings through all the pairs of encodings between source and destination. I've attempted to pick "interesting" strings that one way or another stress the various paths in each transcoding operation to ideally get full branch coverage there. Additionally a suite of "negative" tests have also been added to ensure that validity of encoding is actually checked. * Fix a temporarily commented out case * Fix wasmtime-runtime tests * Update deny.toml configuration * Add `BSD-3-Clause` for the `encoding_rs` crate * Remove some unused licenses * Add an exemption for `encoding_rs` for now * Split up the `translate_string` method Move out all the closures and package up captured state into smaller lists of arguments. * Test out-of-bounds for zero-length strings
2022-08-08 11:01:57 -05:00
parent e6d339b6ac
commit 650979ae40
33 changed files with 3239 additions and 190 deletions
--- a/crates/environ/src/fact/trampoline.rs
+++ b/crates/environ/src/fact/trampoline.rs
@@ -16,12 +16,13 @@
 //! can be somewhat arbitrary, an intentional decision.

 use crate::component::{
-    InterfaceType, TypeEnumIndex, TypeExpectedIndex, TypeFlagsIndex, TypeInterfaceIndex,
-    TypeRecordIndex, TypeTupleIndex, TypeUnionIndex, TypeVariantIndex, FLAG_MAY_ENTER,
-    FLAG_MAY_LEAVE, MAX_FLAT_PARAMS, MAX_FLAT_RESULTS,
+    InterfaceType, StringEncoding, TypeEnumIndex, TypeExpectedIndex, TypeFlagsIndex,
+    TypeInterfaceIndex, TypeRecordIndex, TypeTupleIndex, TypeUnionIndex, TypeVariantIndex,
+    FLAG_MAY_ENTER, FLAG_MAY_LEAVE, MAX_FLAT_PARAMS, MAX_FLAT_RESULTS,
 };
 use crate::fact::core_types::CoreTypes;
 use crate::fact::signature::{align_to, Signature};
+use crate::fact::transcode::{FixedEncoding as FE, Transcode, Transcoder, Transcoders};
 use crate::fact::traps::Trap;
 use crate::fact::{AdapterData, Context, Module, Options};
 use crate::GlobalIndex;
@@ -31,6 +32,9 @@ use std::ops::Range;
 use wasm_encoder::{BlockType, Encode, Instruction, Instruction::*, MemArg, ValType};
 use wasmtime_component_util::{DiscriminantSize, FlagsSize};

+const MAX_STRING_BYTE_LENGTH: u32 = 1 << 31;
+const UTF16_TAG: u32 = 1 << 31;
+
 struct Compiler<'a, 'b> {
    /// The module that the adapter will eventually be inserted into.
    module: &'a Module<'a>,
@@ -38,6 +42,9 @@ struct Compiler<'a, 'b> {
    /// The type section of `module`
    types: &'b mut CoreTypes,

+    /// Imported functions to transcode between various string encodings.
+    transcoders: &'b mut Transcoders,
+
    /// Metadata about the adapter that is being compiled.
    adapter: &'a AdapterData,

@@ -71,6 +78,7 @@ struct Compiler<'a, 'b> {
 pub(super) fn compile(
    module: &Module<'_>,
    types: &mut CoreTypes,
+    transcoders: &mut Transcoders,
    adapter: &AdapterData,
 ) -> (Vec<u8>, Vec<(usize, Trap)>) {
    let lower_sig = &module.signature(&adapter.lower, Context::Lower);
@@ -79,6 +87,7 @@ pub(super) fn compile(
        module,
        types,
        adapter,
+        transcoders,
        code: Vec::new(),
        locals: Vec::new(),
        nlocals: lower_sig.params.len() as u32,
@@ -356,6 +365,7 @@ impl Compiler<'_, '_> {
            InterfaceType::Float32 => self.translate_f32(src, dst_ty, dst),
            InterfaceType::Float64 => self.translate_f64(src, dst_ty, dst),
            InterfaceType::Char => self.translate_char(src, dst_ty, dst),
+            InterfaceType::String => self.translate_string(src, dst_ty, dst),
            InterfaceType::List(t) => self.translate_list(*t, src, dst_ty, dst),
            InterfaceType::Record(t) => self.translate_record(*t, src, dst_ty, dst),
            InterfaceType::Flags(f) => self.translate_flags(*f, src, dst_ty, dst),
@@ -365,13 +375,6 @@ impl Compiler<'_, '_> {
            InterfaceType::Enum(t) => self.translate_enum(*t, src, dst_ty, dst),
            InterfaceType::Option(t) => self.translate_option(*t, src, dst_ty, dst),
            InterfaceType::Expected(t) => self.translate_expected(*t, src, dst_ty, dst),
-
-            InterfaceType::String => {
-                // consider this field used for now until this is fully
-                // implemented.
-                drop(&self.adapter.lift.string_encoding);
-                unimplemented!("don't know how to translate strings")
-            }
        }
    }

@@ -636,6 +639,768 @@ impl Compiler<'_, '_> {
        }
    }

+    fn translate_string(&mut self, src: &Source<'_>, dst_ty: &InterfaceType, dst: &Destination) {
+        assert!(matches!(dst_ty, InterfaceType::String));
+        let src_opts = src.opts();
+        let dst_opts = dst.opts();
+
+        // Load the pointer/length of this string into temporary locals. These
+        // will be referenced a good deal so this just makes it easier to deal
+        // with them consistently below rather than trying to reload from memory
+        // for example.
+        let src_ptr = self.gen_local(src_opts.ptr());
+        let src_len = self.gen_local(src_opts.ptr());
+        match src {
+            Source::Stack(s) => {
+                assert_eq!(s.locals.len(), 2);
+                self.stack_get(&s.slice(0..1), src_opts.ptr());
+                self.instruction(LocalSet(src_ptr));
+                self.stack_get(&s.slice(1..2), src_opts.ptr());
+                self.instruction(LocalSet(src_len));
+            }
+            Source::Memory(mem) => {
+                self.ptr_load(mem);
+                self.instruction(LocalSet(src_ptr));
+                self.ptr_load(&mem.bump(src_opts.ptr_size().into()));
+                self.instruction(LocalSet(src_len));
+            }
+        }
+        let src_str = &WasmString {
+            ptr: src_ptr,
+            len: src_len,
+            opts: src_opts,
+        };
+
+        let dst_str = match src_opts.string_encoding {
+            StringEncoding::Utf8 => match dst_opts.string_encoding {
+                StringEncoding::Utf8 => self.string_copy(src_str, FE::Utf8, dst_opts, FE::Utf8),
+                StringEncoding::Utf16 => self.string_utf8_to_utf16(src_str, dst_opts),
+                StringEncoding::CompactUtf16 => self.string_to_compact(src_str, FE::Utf8, dst_opts),
+            },
+
+            StringEncoding::Utf16 => {
+                self.verify_aligned(src_opts, src_ptr, 2);
+                match dst_opts.string_encoding {
+                    StringEncoding::Utf8 => {
+                        self.string_deflate_to_utf8(src_str, FE::Utf16, dst_opts)
+                    }
+                    StringEncoding::Utf16 => {
+                        self.string_copy(src_str, FE::Utf16, dst_opts, FE::Utf16)
+                    }
+                    StringEncoding::CompactUtf16 => {
+                        self.string_to_compact(src_str, FE::Utf16, dst_opts)
+                    }
+                }
+            }
+
+            StringEncoding::CompactUtf16 => {
+                self.verify_aligned(src_opts, src_ptr, 2);
+
+                // Test the tag big to see if this is a utf16 or a latin1 string
+                // at runtime...
+                self.instruction(LocalGet(src_len));
+                self.ptr_uconst(src_opts, UTF16_TAG);
+                self.ptr_and(src_opts);
+                self.ptr_if(src_opts, BlockType::Empty);
+
+                // In the utf16 block unset the upper bit from the length local
+                // so further calculations have the right value. Afterwards the
+                // string transcode proceeds assuming utf16.
+                self.instruction(LocalGet(src_len));
+                self.ptr_uconst(src_opts, UTF16_TAG);
+                self.ptr_xor(src_opts);
+                self.instruction(LocalSet(src_len));
+                let s1 = match dst_opts.string_encoding {
+                    StringEncoding::Utf8 => {
+                        self.string_deflate_to_utf8(src_str, FE::Utf16, dst_opts)
+                    }
+                    StringEncoding::Utf16 => {
+                        self.string_copy(src_str, FE::Utf16, dst_opts, FE::Utf16)
+                    }
+                    StringEncoding::CompactUtf16 => {
+                        self.string_compact_utf16_to_compact(src_str, dst_opts)
+                    }
+                };
+
+                self.instruction(Else);
+
+                // In the latin1 block the `src_len` local is already the number
+                // of code units, so the string transcoding is all that needs to
+                // happen.
+                let s2 = match dst_opts.string_encoding {
+                    StringEncoding::Utf16 => {
+                        self.string_copy(src_str, FE::Latin1, dst_opts, FE::Utf16)
+                    }
+                    StringEncoding::Utf8 => {
+                        self.string_deflate_to_utf8(src_str, FE::Latin1, dst_opts)
+                    }
+                    StringEncoding::CompactUtf16 => {
+                        self.string_copy(src_str, FE::Latin1, dst_opts, FE::Latin1)
+                    }
+                };
+                // Set our `s2` generated locals to the `s2` generated locals
+                // as the resulting pointer of this transcode.
+                self.instruction(LocalGet(s2.ptr));
+                self.instruction(LocalSet(s1.ptr));
+                self.instruction(LocalGet(s2.len));
+                self.instruction(LocalSet(s1.len));
+                self.instruction(End);
+                s1
+            }
+        };
+
+        // Store the ptr/length in the desired destination
+        match dst {
+            Destination::Stack(s, _) => {
+                self.instruction(LocalGet(dst_str.ptr));
+                self.stack_set(&s[..1], dst_opts.ptr());
+                self.instruction(LocalGet(dst_str.len));
+                self.stack_set(&s[1..], dst_opts.ptr());
+            }
+            Destination::Memory(mem) => {
+                self.instruction(LocalGet(mem.addr_local));
+                self.instruction(LocalGet(dst_str.ptr));
+                self.ptr_store(mem);
+                self.instruction(LocalGet(mem.addr_local));
+                self.instruction(LocalGet(dst_str.len));
+                self.ptr_store(&mem.bump(dst_opts.ptr_size().into()));
+            }
+        }
+    }
+
+    // Corresponding function for `store_string_copy` in the spec.
+    //
+    // This performs a transcoding of the string with a one-pass copy from
+    // the `src` encoding to the `dst` encoding. This is only possible for
+    // fixed encodings where the first allocation is guaranteed to be an
+    // appropriate fit so it's not suitable for all encodings.
+    //
+    // Imported host transcoding functions here take the src/dst pointers as
+    // well as the number of code units in the source (which always matches
+    // the number of code units in the destination). There is no return
+    // value from the transcode function since the encoding should always
+    // work on the first pass.
+    fn string_copy<'a>(
+        &mut self,
+        src: &WasmString<'_>,
+        src_enc: FE,
+        dst_opts: &'a Options,
+        dst_enc: FE,
+    ) -> WasmString<'a> {
+        assert!(dst_enc.width() >= src_enc.width());
+        self.validate_string_length(src, dst_enc);
+
+        // Calculate the source byte length given the size of each code
+        // unit. Note that this shouldn't overflow given
+        // `validate_string_length` above.
+        let src_byte_len = if src_enc.width() == 1 {
+            src.len
+        } else {
+            assert_eq!(src_enc.width(), 2);
+            let tmp = self.gen_local(src.opts.ptr());
+            self.instruction(LocalGet(src.len));
+            self.ptr_uconst(src.opts, 1);
+            self.ptr_shl(src.opts);
+            self.instruction(LocalSet(tmp));
+            tmp
+        };
+
+        // Convert the source code units length to the destination byte
+        // length type.
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
+        let dst_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalTee(dst_len));
+        if dst_enc.width() > 1 {
+            assert_eq!(dst_enc.width(), 2);
+            self.ptr_uconst(dst_opts, 1);
+            self.ptr_shl(dst_opts);
+        }
+        let dst_byte_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalSet(dst_byte_len));
+
+        // Allocate space in the destination using the calculated byte
+        // length.
+        let dst = {
+            let dst_mem = self.malloc(
+                dst_opts,
+                MallocSize::Local(dst_byte_len),
+                dst_enc.width().into(),
+            );
+            WasmString {
+                ptr: dst_mem.addr_local,
+                len: dst_len,
+                opts: dst_opts,
+            }
+        };
+
+        // Validate that `src_len + src_ptr` and
+        // `dst_mem.addr_local + dst_byte_len` are both in-bounds. This
+        // is done by loading the last byte of the string and if that
+        // doesn't trap then it's known valid.
+        self.validate_string_inbounds(src, src_byte_len);
+        self.validate_string_inbounds(&dst, dst_byte_len);
+
+        // If the validations pass then the host `transcode` intrinsic
+        // is invoked. This will either raise a trap or otherwise succeed
+        // in which case we're done.
+        let op = if src_enc == dst_enc {
+            Transcode::Copy(src_enc)
+        } else {
+            assert_eq!(src_enc, FE::Latin1);
+            assert_eq!(dst_enc, FE::Utf16);
+            Transcode::Latin1ToUtf16
+        };
+        let transcode = self.transcoder(src, &dst, op);
+        self.instruction(LocalGet(src.ptr));
+        self.instruction(LocalGet(src.len));
+        self.instruction(LocalGet(dst.ptr));
+        self.instruction(Call(transcode));
+
+        dst
+    }
+    // Corresponding function for `store_string_to_utf8` in the spec.
+    //
+    // This translation works by possibly performing a number of
+    // reallocations. First a buffer of size input-code-units is used to try
+    // to get the transcoding correct on the first try. If that fails the
+    // maximum worst-case size is used and then that is resized down if it's
+    // too large.
+    //
+    // The host transcoding function imported here will receive src ptr/len
+    // and dst ptr/len and return how many code units were consumed on both
+    // sides. The amount of code units consumed in the source dictates which
+    // branches are taken in this conversion.
+    fn string_deflate_to_utf8<'a>(
+        &mut self,
+        src: &WasmString<'_>,
+        src_enc: FE,
+        dst_opts: &'a Options,
+    ) -> WasmString<'a> {
+        self.validate_string_length(src, src_enc);
+
+        // Optimistically assume that the code unit length of the source is
+        // all that's needed in the destination. Perform that allocaiton
+        // here and proceed to transcoding below.
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
+        let dst_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalTee(dst_len));
+        let dst_byte_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalSet(dst_byte_len));
+
+        let dst = {
+            let dst_mem = self.malloc(dst_opts, MallocSize::Local(dst_byte_len), 1);
+            WasmString {
+                ptr: dst_mem.addr_local,
+                len: dst_len,
+                opts: dst_opts,
+            }
+        };
+
+        // Ensure buffers are all in-bounds
+        let src_byte_len = match src_enc {
+            FE::Latin1 => src.len,
+            FE::Utf16 => {
+                let tmp = self.gen_local(src.opts.ptr());
+                self.instruction(LocalGet(src.len));
+                self.ptr_uconst(src.opts, 1);
+                self.ptr_shl(src.opts);
+                self.instruction(LocalSet(tmp));
+                tmp
+            }
+            FE::Utf8 => unreachable!(),
+        };
+        self.validate_string_inbounds(src, src_byte_len);
+        self.validate_string_inbounds(&dst, dst_byte_len);
+
+        // Perform the initial transcode
+        let op = match src_enc {
+            FE::Latin1 => Transcode::Latin1ToUtf8,
+            FE::Utf16 => Transcode::Utf16ToUtf8,
+            FE::Utf8 => unreachable!(),
+        };
+        let transcode = self.transcoder(src, &dst, op);
+        self.instruction(LocalGet(src.ptr));
+        self.instruction(LocalGet(src.len));
+        self.instruction(LocalGet(dst.ptr));
+        self.instruction(LocalGet(dst_byte_len));
+        self.instruction(Call(transcode));
+        self.instruction(LocalSet(dst.len));
+        let src_len_tmp = self.gen_local(src.opts.ptr());
+        self.instruction(LocalSet(src_len_tmp));
+
+        // Test if the source was entirely transcoded by comparing
+        // `src_len_tmp`, the number of code units transcoded from the
+        // source, with `src_len`, the original number of code units.
+        self.instruction(LocalGet(src_len_tmp));
+        self.instruction(LocalGet(src.len));
+        self.ptr_ne(src.opts);
+        self.instruction(If(BlockType::Empty));
+
+        // Here a worst-case reallocation is performed to grow `dst_mem`.
+        // In-line a check is also performed that the worst-case byte size
+        // fits within the maximum size of strings.
+        self.instruction(LocalGet(dst.ptr)); // old_ptr
+        self.instruction(LocalGet(dst_byte_len)); // old_size
+        self.ptr_uconst(dst.opts, 1); // align
+        let factor = match src_enc {
+            FE::Latin1 => 2,
+            FE::Utf16 => 3,
+            _ => unreachable!(),
+        };
+        self.validate_string_length_u8(src, factor);
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
+        self.ptr_uconst(dst_opts, factor.into());
+        self.ptr_mul(dst_opts);
+        self.instruction(LocalTee(dst_byte_len));
+        self.instruction(Call(dst_opts.realloc.unwrap().as_u32()));
+        self.instruction(LocalSet(dst.ptr));
+
+        // Verify that the destination is still in-bounds
+        self.validate_string_inbounds(&dst, dst_byte_len);
+
+        // Perform another round of transcoding that should be guaranteed
+        // to succeed. Note that all the parameters here are offset by the
+        // results of the first transcoding to only perform the remaining
+        // transcode on the final units.
+        self.instruction(LocalGet(src.ptr));
+        self.instruction(LocalGet(src_len_tmp));
+        if let FE::Utf16 = src_enc {
+            self.ptr_uconst(src.opts, 1);
+            self.ptr_shl(src.opts);
+        }
+        self.ptr_add(src.opts);
+        self.instruction(LocalGet(src.len));
+        self.instruction(LocalGet(src_len_tmp));
+        self.ptr_sub(src.opts);
+        self.instruction(LocalGet(dst.ptr));
+        self.instruction(LocalGet(dst.len));
+        self.ptr_add(dst.opts);
+        self.instruction(LocalGet(dst_byte_len));
+        self.instruction(LocalGet(dst.len));
+        self.ptr_sub(dst.opts);
+        self.instruction(Call(transcode));
+
+        // Add the second result, the amount of destination units encoded,
+        // to `dst_len` so it's an accurate reflection of the final size of
+        // the destination buffer.
+        self.instruction(LocalGet(dst.len));
+        self.ptr_add(dst.opts);
+        self.instruction(LocalSet(dst.len));
+
+        // In debug mode verify the first result consumed the entire string,
+        // otherwise simply discard it.
+        if self.module.debug {
+            self.instruction(LocalGet(src.len));
+            self.instruction(LocalGet(src_len_tmp));
+            self.ptr_sub(src.opts);
+            self.ptr_ne(src.opts);
+            self.instruction(If(BlockType::Empty));
+            self.trap(Trap::AssertFailed("should have finished encoding"));
+            self.instruction(End);
+        } else {
+            self.instruction(Drop);
+        }
+
+        // Perform a downsizing if the worst-case size was too large
+        self.instruction(LocalGet(dst.len));
+        self.instruction(LocalGet(dst_byte_len));
+        self.ptr_ne(dst.opts);
+        self.instruction(If(BlockType::Empty));
+        self.instruction(LocalGet(dst.ptr)); // old_ptr
+        self.instruction(LocalGet(dst_byte_len)); // old_size
+        self.ptr_uconst(dst.opts, 1); // align
+        self.instruction(LocalGet(dst.len)); // new_size
+        self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
+        self.instruction(LocalSet(dst.ptr));
+        self.instruction(End);
+
+        // If the first transcode was enough then assert that the returned
+        // amount of destination items written equals the byte size.
+        if self.module.debug {
+            self.instruction(Else);
+
+            self.instruction(LocalGet(dst.len));
+            self.instruction(LocalGet(dst_byte_len));
+            self.ptr_ne(dst_opts);
+            self.instruction(If(BlockType::Empty));
+            self.trap(Trap::AssertFailed("should have finished encoding"));
+            self.instruction(End);
+        }
+
+        self.instruction(End); // end of "first transcode not enough"
+
+        dst
+    }
+
+    // Corresponds to the `store_utf8_to_utf16` function in the spec.
+    //
+    // When converting utf-8 to utf-16 a pessimistic allocation is
+    // done which is twice the byte length of the utf-8 string.
+    // The host then transcodes and returns how many code units were
+    // actually used during the transcoding and if it's beneath the
+    // pessimistic maximum then the buffer is reallocated down to
+    // a smaller amount.
+    //
+    // The host-imported transcoding function takes the src/dst pointer as
+    // well as the code unit size of both the source and destination. The
+    // destination should always be big enough to hold the result of the
+    // transcode and so the result of the host function is how many code
+    // units were written to the destination.
+    fn string_utf8_to_utf16<'a>(
+        &mut self,
+        src: &WasmString<'_>,
+        dst_opts: &'a Options,
+    ) -> WasmString<'a> {
+        self.validate_string_length(src, FE::Utf16);
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
+        let dst_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalTee(dst_len));
+        self.ptr_uconst(dst_opts, 1);
+        self.ptr_shl(dst_opts);
+        let dst_byte_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalSet(dst_byte_len));
+        let dst = {
+            let dst_mem = self.malloc(dst_opts, MallocSize::Local(dst_byte_len), 2);
+            WasmString {
+                ptr: dst_mem.addr_local,
+                len: dst_len,
+                opts: dst_opts,
+            }
+        };
+
+        self.validate_string_inbounds(src, src.len);
+        self.validate_string_inbounds(&dst, dst_byte_len);
+
+        let transcode = self.transcoder(src, &dst, Transcode::Utf8ToUtf16);
+        self.instruction(LocalGet(src.ptr));
+        self.instruction(LocalGet(src.len));
+        self.instruction(LocalGet(dst.ptr));
+        self.instruction(Call(transcode));
+        self.instruction(LocalSet(dst.len));
+
+        // If the number of code units returned by transcode is not
+        // equal to the original number of code units then
+        // the buffer must be shrunk.
+        //
+        // Note that the byte length of the final allocation we
+        // want is twice the code unit length returned by the
+        // transcoding function.
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
+        self.instruction(LocalGet(dst.len));
+        self.ptr_ne(dst_opts);
+        self.instruction(If(BlockType::Empty));
+        self.instruction(LocalGet(dst.ptr));
+        self.instruction(LocalGet(dst_byte_len));
+        self.ptr_uconst(dst.opts, 2);
+        self.instruction(LocalGet(dst.len));
+        self.ptr_uconst(dst.opts, 1);
+        self.ptr_shl(dst.opts);
+        self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
+        self.instruction(LocalSet(dst.ptr));
+        self.instruction(End); // end of shrink-to-fit
+
+        dst
+    }
+
+    // Corresponds to `store_probably_utf16_to_latin1_or_utf16` in the spec.
+    //
+    // This will try to transcode the input utf16 string to utf16 in the
+    // destination. If utf16 isn't needed though and latin1 could be used
+    // then that's used instead and a reallocation to downsize occurs
+    // afterwards.
+    //
+    // The host transcode function here will take the src/dst pointers as
+    // well as src length. The destination byte length is twice the src code
+    // unit length. The return value is the tagged length of the returned
+    // string. If the upper bit is set then utf16 was used and the
+    // conversion is done. If the upper bit is not set then latin1 was used
+    // and a downsizing needs to happen.
+    fn string_compact_utf16_to_compact<'a>(
+        &mut self,
+        src: &WasmString<'_>,
+        dst_opts: &'a Options,
+    ) -> WasmString<'a> {
+        self.validate_string_length(src, FE::Utf16);
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
+        let dst_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalTee(dst_len));
+        self.ptr_uconst(dst_opts, 1);
+        self.ptr_shl(dst_opts);
+        let dst_byte_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalSet(dst_byte_len));
+        let dst = {
+            let dst_mem = self.malloc(dst_opts, MallocSize::Local(dst_byte_len), 2);
+            WasmString {
+                ptr: dst_mem.addr_local,
+                len: dst_len,
+                opts: dst_opts,
+            }
+        };
+
+        self.validate_string_inbounds(src, dst_byte_len);
+        self.validate_string_inbounds(&dst, dst_byte_len);
+
+        let transcode = self.transcoder(src, &dst, Transcode::Utf16ToCompactProbablyUtf16);
+        self.instruction(LocalGet(src.ptr));
+        self.instruction(LocalGet(src.len));
+        self.instruction(LocalGet(dst.ptr));
+        self.instruction(Call(transcode));
+        self.instruction(LocalSet(dst.len));
+
+        // Assert that the untagged code unit length is the same as the
+        // source code unit length.
+        if self.module.debug {
+            self.instruction(LocalGet(dst.len));
+            self.ptr_uconst(dst.opts, !UTF16_TAG);
+            self.ptr_and(dst.opts);
+            self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
+            self.ptr_ne(dst.opts);
+            self.instruction(If(BlockType::Empty));
+            self.trap(Trap::AssertFailed("expected equal code units"));
+            self.instruction(End);
+        }
+
+        // If the UTF16_TAG is set then utf16 was used and the destination
+        // should be appropriately sized. Bail out of the "is this string
+        // empty" block and fall through otherwise to resizing.
+        self.instruction(LocalGet(dst.len));
+        self.ptr_uconst(dst.opts, UTF16_TAG);
+        self.ptr_and(dst.opts);
+        self.ptr_br_if(dst.opts, 0);
+
+        // Here `realloc` is used to downsize the string
+        self.instruction(LocalGet(dst.ptr)); // old_ptr
+        self.instruction(LocalGet(dst_byte_len)); // old_size
+        self.ptr_uconst(dst.opts, 2); // align
+        self.instruction(LocalGet(dst.len)); // new_size
+        self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
+        self.instruction(LocalSet(dst.ptr));
+
+        dst
+    }
+
+    // Corresponds to `store_string_to_latin1_or_utf16` in the spec.
+    //
+    // This will attempt a first pass of transcoding to latin1 and on
+    // failure a larger buffer is allocated for utf16 and then utf16 is
+    // encoded in-place into the buffer. After either latin1 or utf16 the
+    // buffer is then resized to fit the final string allocation.
+    fn string_to_compact<'a>(
+        &mut self,
+        src: &WasmString<'_>,
+        src_enc: FE,
+        dst_opts: &'a Options,
+    ) -> WasmString<'a> {
+        self.validate_string_length(src, src_enc);
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst_opts.ptr());
+        let dst_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalTee(dst_len));
+        let dst_byte_len = self.gen_local(dst_opts.ptr());
+        self.instruction(LocalSet(dst_byte_len));
+        let dst = {
+            let dst_mem = self.malloc(dst_opts, MallocSize::Local(dst_byte_len), 2);
+            WasmString {
+                ptr: dst_mem.addr_local,
+                len: dst_len,
+                opts: dst_opts,
+            }
+        };
+
+        self.validate_string_inbounds(src, src.len);
+        self.validate_string_inbounds(&dst, dst_byte_len);
+
+        // Perform the initial latin1 transcode. This returns the number of
+        // source code units consumed and the number of destination code
+        // units (bytes) written.
+        let (latin1, utf16) = match src_enc {
+            FE::Utf8 => (Transcode::Utf8ToLatin1, Transcode::Utf8ToCompactUtf16),
+            FE::Utf16 => (Transcode::Utf16ToLatin1, Transcode::Utf16ToCompactUtf16),
+            FE::Latin1 => unreachable!(),
+        };
+        let transcode_latin1 = self.transcoder(src, &dst, latin1);
+        let transcode_utf16 = self.transcoder(src, &dst, utf16);
+        self.instruction(LocalGet(src.ptr));
+        self.instruction(LocalGet(src.len));
+        self.instruction(LocalGet(dst.ptr));
+        self.instruction(Call(transcode_latin1));
+        self.instruction(LocalSet(dst.len));
+        let src_len_tmp = self.gen_local(src.opts.ptr());
+        self.instruction(LocalSet(src_len_tmp));
+
+        // If the source was entirely consumed then the transcode completed
+        // and all that's necessary is to optionally shrink the buffer.
+        self.instruction(LocalGet(src_len_tmp));
+        self.instruction(LocalGet(src.len));
+        self.ptr_eq(src.opts);
+        self.instruction(If(BlockType::Empty)); // if latin1-or-utf16 block
+
+        // Test if the original byte length of the allocation is the same as
+        // the number of written bytes, and if not then shrink the buffer
+        // with a call to `realloc`.
+        self.instruction(LocalGet(dst_byte_len));
+        self.instruction(LocalGet(dst.len));
+        self.ptr_ne(dst.opts);
+        self.instruction(If(BlockType::Empty));
+        self.instruction(LocalGet(dst.ptr)); // old_ptr
+        self.instruction(LocalGet(dst_byte_len)); // old_size
+        self.ptr_uconst(dst.opts, 2); // align
+        self.instruction(LocalGet(dst.len)); // new_size
+        self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
+        self.instruction(LocalSet(dst.ptr));
+        self.instruction(End);
+
+        // In this block the latin1 encoding failed. The host transcode
+        // returned how many units were consumed from the source and how
+        // many bytes were written to the destination. Here the buffer is
+        // inflated and sized and the second utf16 intrinsic is invoked to
+        // perform the final inflation.
+        self.instruction(Else); // else latin1-or-utf16 block
+
+        // For utf8 validate that the inflated size is still within bounds.
+        if src_enc.width() == 1 {
+            self.validate_string_length_u8(src, 2);
+        }
+
+        // Reallocate the buffer with twice the source code units in byte
+        // size.
+        self.instruction(LocalGet(dst.ptr)); // old_ptr
+        self.instruction(LocalGet(dst_byte_len)); // old_size
+        self.ptr_uconst(dst.opts, 2); // align
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
+        self.ptr_uconst(dst.opts, 1);
+        self.ptr_shl(dst.opts);
+        self.instruction(LocalTee(dst_byte_len));
+        self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
+        self.instruction(LocalSet(dst.ptr));
+
+        // Call the host utf16 transcoding function. This will inflate the
+        // prior latin1 bytes and then encode the rest of the source string
+        // as utf16 into the remaining space in the destination buffer.
+        self.instruction(LocalGet(src.ptr));
+        self.instruction(LocalGet(src_len_tmp));
+        if let FE::Utf16 = src_enc {
+            self.ptr_uconst(src.opts, 1);
+            self.ptr_shl(src.opts);
+        }
+        self.ptr_add(src.opts);
+        self.instruction(LocalGet(src.len));
+        self.instruction(LocalGet(src_len_tmp));
+        self.ptr_sub(src.opts);
+        self.instruction(LocalGet(dst.ptr));
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
+        self.instruction(LocalGet(dst.len));
+        self.instruction(Call(transcode_utf16));
+        self.instruction(LocalSet(dst.len));
+
+        // If the returned number of code units written to the destination
+        // is not equal to the size of the allocation then the allocation is
+        // resized down to the appropriate size.
+        //
+        // Note that the byte size desired is `2*dst_len` and the current
+        // byte buffer size is `2*src_len` so the `2` factor isn't checked
+        // here, just the lengths.
+        self.instruction(LocalGet(dst.len));
+        self.convert_src_len_to_dst(src.len, src.opts.ptr(), dst.opts.ptr());
+        self.ptr_ne(dst.opts);
+        self.instruction(If(BlockType::Empty));
+        self.instruction(LocalGet(dst.ptr)); // old_ptr
+        self.instruction(LocalGet(dst_byte_len)); // old_size
+        self.ptr_uconst(dst.opts, 2); // align
+        self.instruction(LocalGet(dst.len));
+        self.ptr_uconst(dst.opts, 1);
+        self.ptr_shl(dst.opts);
+        self.instruction(Call(dst.opts.realloc.unwrap().as_u32()));
+        self.instruction(LocalSet(dst.ptr));
+        self.instruction(End);
+
+        // Tag the returned pointer as utf16
+        self.instruction(LocalGet(dst.len));
+        self.ptr_uconst(dst.opts, UTF16_TAG);
+        self.ptr_or(dst.opts);
+        self.instruction(LocalSet(dst.len));
+
+        self.instruction(End); // end latin1-or-utf16 block
+
+        dst
+    }
+
+    fn validate_string_length(&mut self, src: &WasmString<'_>, dst: FE) {
+        self.validate_string_length_u8(src, dst.width())
+    }
+
+    fn validate_string_length_u8(&mut self, s: &WasmString<'_>, dst: u8) {
+        // Check to see if the source byte length is out of bounds in
+        // which case a trap is generated.
+        self.instruction(LocalGet(s.len));
+        let max = MAX_STRING_BYTE_LENGTH / u32::from(dst);
+        self.ptr_uconst(s.opts, max);
+        self.ptr_ge_u(s.opts);
+        self.instruction(If(BlockType::Empty));
+        self.trap(Trap::StringLengthTooBig);
+        self.instruction(End);
+    }
+
+    fn transcoder(&mut self, src: &WasmString<'_>, dst: &WasmString<'_>, op: Transcode) -> u32 {
+        self.transcoders.import(
+            self.types,
+            Transcoder {
+                from_memory: src.opts.memory.unwrap(),
+                from_memory64: src.opts.memory64,
+                to_memory: dst.opts.memory.unwrap(),
+                to_memory64: dst.opts.memory64,
+                op,
+            },
+        )
+    }
+
+    fn validate_string_inbounds(&mut self, s: &WasmString<'_>, byte_len: u32) {
+        let extend_to_64 = |me: &mut Self| {
+            if !s.opts.memory64 {
+                me.instruction(I64ExtendI32U);
+            }
+        };
+
+        self.instruction(Block(BlockType::Empty));
+        self.instruction(Block(BlockType::Empty));
+
+        // Calculate the full byte size of memory with `memory.size`. Note that
+        // arithmetic here is done always in 64-bits to accomodate 4G memories.
+        // Additionally it's assumed that 64-bit memories never fill up
+        // entirely.
+        self.instruction(MemorySize(s.opts.memory.unwrap().as_u32()));
+        extend_to_64(self);
+        self.instruction(I64Const(16));
+        self.instruction(I64Shl);
+
+        // Calculate the end address of the string. This is done by adding the
+        // base pointer to the byte length. For 32-bit memories there's no need
+        // to check for overflow since everything is extended to 64-bit, but for
+        // 64-bit memories overflow is checked.
+        self.instruction(LocalGet(s.ptr));
+        extend_to_64(self);
+        self.instruction(LocalGet(byte_len));
+        extend_to_64(self);
+        self.instruction(I64Add);
+        if s.opts.memory64 {
+            let tmp = self.gen_local(ValType::I64);
+            self.instruction(LocalTee(tmp));
+            self.instruction(LocalGet(s.ptr));
+            self.ptr_lt_u(s.opts);
+            self.ptr_br_if(s.opts, 0);
+            self.instruction(LocalGet(tmp));
+        }
+
+        // If the byte size of memory is greater than the final address of the
+        // string then the string is invalid. Note that if it's precisely equal
+        // then that's ok.
+        self.instruction(I64GtU);
+        self.instruction(BrIf(1));
+
+        self.instruction(End);
+        self.trap(Trap::StringLengthOverflow);
+        self.instruction(End);
+    }
+
    fn translate_list(
        &mut self,
        src_ty: TypeInterfaceIndex,
@@ -1467,17 +2232,17 @@ impl Compiler<'_, '_> {
        self.instruction(GlobalSet(flags_global.as_u32()));
    }

-    fn verify_aligned(&mut self, memory: &Memory, align: usize) {
+    fn verify_aligned(&mut self, opts: &Options, addr_local: u32, align: usize) {
        // If the alignment is 1 then everything is trivially aligned and the
        // check can be omitted.
        if align == 1 {
            return;
        }
-        self.instruction(LocalGet(memory.addr_local));
+        self.instruction(LocalGet(addr_local));
        assert!(align.is_power_of_two());
-        self.ptr_uconst(memory.opts, u32::try_from(align - 1).unwrap());
-        self.ptr_and(memory.opts);
-        self.ptr_if(memory.opts, BlockType::Empty);
+        self.ptr_uconst(opts, u32::try_from(align - 1).unwrap());
+        self.ptr_and(opts);
+        self.ptr_if(opts, BlockType::Empty);
        self.trap(Trap::UnalignedPointer);
        self.instruction(End);
    }
@@ -1527,7 +2292,7 @@ impl Compiler<'_, '_> {
            offset: 0,
            opts,
        };
-        self.verify_aligned(&ret, align);
+        self.verify_aligned(opts, ret.addr_local, align);
        ret
    }

@@ -1711,6 +2476,46 @@ impl Compiler<'_, '_> {
        }
    }

+    fn ptr_sub(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64Sub);
+        } else {
+            self.instruction(I32Sub);
+        }
+    }
+
+    fn ptr_mul(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64Mul);
+        } else {
+            self.instruction(I32Mul);
+        }
+    }
+
+    fn ptr_ge_u(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64GeU);
+        } else {
+            self.instruction(I32GeU);
+        }
+    }
+
+    fn ptr_lt_u(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64LtU);
+        } else {
+            self.instruction(I32LtU);
+        }
+    }
+
+    fn ptr_shl(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64Shl);
+        } else {
+            self.instruction(I32Shl);
+        }
+    }
+
    fn ptr_eqz(&mut self, opts: &Options) {
        if opts.memory64 {
            self.instruction(I64Eqz);
@@ -1735,6 +2540,22 @@ impl Compiler<'_, '_> {
        }
    }

+    fn ptr_eq(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64Eq);
+        } else {
+            self.instruction(I32Eq);
+        }
+    }
+
+    fn ptr_ne(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64Ne);
+        } else {
+            self.instruction(I32Ne);
+        }
+    }
+
    fn ptr_and(&mut self, opts: &Options) {
        if opts.memory64 {
            self.instruction(I64And);
@@ -1743,6 +2564,22 @@ impl Compiler<'_, '_> {
        }
    }

+    fn ptr_or(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64Or);
+        } else {
+            self.instruction(I32Or);
+        }
+    }
+
+    fn ptr_xor(&mut self, opts: &Options) {
+        if opts.memory64 {
+            self.instruction(I64Xor);
+        } else {
+            self.instruction(I32Xor);
+        }
+    }
+
    fn ptr_if(&mut self, opts: &Options, ty: BlockType) {
        if opts.memory64 {
            self.instruction(I64Const(0));
@@ -1974,3 +2811,9 @@ enum MallocSize {
    Const(usize),
    Local(u32),
 }
+
+struct WasmString<'a> {
+    ptr: u32,
+    len: u32,
+    opts: &'a Options,
+}
--- a/crates/environ/src/fact/transcode.rs
+++ b/crates/environ/src/fact/transcode.rs
@@ -0,0 +1,178 @@
+use crate::fact::core_types::CoreTypes;
+use crate::MemoryIndex;
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use wasm_encoder::{EntityType, ValType};
+
+pub struct Transcoders {
+    imported: HashMap<Transcoder, u32>,
+    prev_func_imports: u32,
+    imports: Vec<(String, EntityType, Transcoder)>,
+}
+
+#[derive(Copy, Clone, Hash, Eq, PartialEq)]
+pub struct Transcoder {
+    pub from_memory: MemoryIndex,
+    pub from_memory64: bool,
+    pub to_memory: MemoryIndex,
+    pub to_memory64: bool,
+    pub op: Transcode,
+}
+
+/// Possible transcoding operations that must be provided by the host.
+///
+/// Note that each transcoding operation may have a unique signature depending
+/// on the precise operation.
+#[allow(missing_docs)]
+#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize)]
+pub enum Transcode {
+    Copy(FixedEncoding),
+    Latin1ToUtf16,
+    Latin1ToUtf8,
+    Utf16ToCompactProbablyUtf16,
+    Utf16ToCompactUtf16,
+    Utf16ToLatin1,
+    Utf16ToUtf8,
+    Utf8ToCompactUtf16,
+    Utf8ToLatin1,
+    Utf8ToUtf16,
+}
+
+#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize)]
+#[allow(missing_docs)]
+pub enum FixedEncoding {
+    Utf8,
+    Utf16,
+    Latin1,
+}
+
+impl Transcoders {
+    pub fn new(prev_func_imports: u32) -> Transcoders {
+        Transcoders {
+            imported: HashMap::new(),
+            prev_func_imports,
+            imports: Vec::new(),
+        }
+    }
+
+    pub fn import(&mut self, types: &mut CoreTypes, transcoder: Transcoder) -> u32 {
+        *self.imported.entry(transcoder).or_insert_with(|| {
+            let idx = self.prev_func_imports + (self.imports.len() as u32);
+            self.imports
+                .push((transcoder.name(), transcoder.ty(types), transcoder));
+            idx
+        })
+    }
+
+    pub fn imports(&self) -> impl Iterator<Item = (&str, &str, EntityType, &Transcoder)> {
+        self.imports
+            .iter()
+            .map(|(name, ty, transcoder)| ("transcode", &name[..], *ty, transcoder))
+    }
+}
+
+impl Transcoder {
+    fn name(&self) -> String {
+        format!(
+            "{} (mem{} => mem{})",
+            self.op.desc(),
+            self.from_memory.as_u32(),
+            self.to_memory.as_u32(),
+        )
+    }
+
+    fn ty(&self, types: &mut CoreTypes) -> EntityType {
+        let from_ptr = if self.from_memory64 {
+            ValType::I64
+        } else {
+            ValType::I32
+        };
+        let to_ptr = if self.to_memory64 {
+            ValType::I64
+        } else {
+            ValType::I32
+        };
+
+        let ty = match self.op {
+            // These direct transcodings take the source pointer, the source
+            // code units, and the destination pointer.
+            //
+            // The memories being copied between are part of each intrinsic and
+            // the destination code units are the same as the source.
+            // Note that the pointers are dynamically guaranteed to be aligned
+            // and in-bounds for the code units length as defined by the string
+            // encoding.
+            Transcode::Copy(_) | Transcode::Latin1ToUtf16 => {
+                types.function(&[from_ptr, from_ptr, to_ptr], &[])
+            }
+
+            // Transcoding from utf8 to utf16 takes the from ptr/len as well as
+            // a destination. The destination is valid for len*2 bytes. The
+            // return value is how many code units were written to the
+            // destination.
+            Transcode::Utf8ToUtf16 => types.function(&[from_ptr, from_ptr, to_ptr], &[to_ptr]),
+
+            // Transcoding to utf8 as a smaller format takes all the parameters
+            // and returns the amount of space consumed in the src/destination
+            Transcode::Utf16ToUtf8 | Transcode::Latin1ToUtf8 => {
+                types.function(&[from_ptr, from_ptr, to_ptr, to_ptr], &[from_ptr, to_ptr])
+            }
+
+            // The return type is a tagged length which indicates which was
+            // used
+            Transcode::Utf16ToCompactProbablyUtf16 => {
+                types.function(&[from_ptr, from_ptr, to_ptr], &[to_ptr])
+            }
+
+            // The initial step of transcoding from a fixed format to a compact
+            // format. Takes the ptr/len of the source the the destination
+            // pointer. The destination length is implicitly the same. Returns
+            // how many code units were consumed in the source, which is also
+            // how many bytes were written to the destination.
+            Transcode::Utf8ToLatin1 | Transcode::Utf16ToLatin1 => {
+                types.function(&[from_ptr, from_ptr, to_ptr], &[from_ptr, to_ptr])
+            }
+
+            // The final step of transcoding to a compact format when the fixed
+            // transcode has failed. This takes the ptr/len of the source that's
+            // remaining to transcode. Then this takes the destination ptr/len
+            // as well as the destination bytes written so far with latin1.
+            // Finally this returns the number of code units written to the
+            // destination.
+            Transcode::Utf8ToCompactUtf16 | Transcode::Utf16ToCompactUtf16 => {
+                types.function(&[from_ptr, from_ptr, to_ptr, to_ptr, to_ptr], &[to_ptr])
+            }
+        };
+        EntityType::Function(ty)
+    }
+}
+
+impl Transcode {
+    /// Returns a human-readable description for this transcoding operation.
+    pub fn desc(&self) -> &'static str {
+        match self {
+            Transcode::Copy(FixedEncoding::Utf8) => "utf8-to-utf8",
+            Transcode::Copy(FixedEncoding::Utf16) => "utf16-to-utf16",
+            Transcode::Copy(FixedEncoding::Latin1) => "latin1-to-latin1",
+            Transcode::Latin1ToUtf16 => "latin1-to-utf16",
+            Transcode::Latin1ToUtf8 => "latin1-to-utf8",
+            Transcode::Utf16ToCompactProbablyUtf16 => "utf16-to-compact-probably-utf16",
+            Transcode::Utf16ToCompactUtf16 => "utf16-to-compact-utf16",
+            Transcode::Utf16ToLatin1 => "utf16-to-latin1",
+            Transcode::Utf16ToUtf8 => "utf16-to-utf8",
+            Transcode::Utf8ToCompactUtf16 => "utf8-to-compact-utf16",
+            Transcode::Utf8ToLatin1 => "utf8-to-latin1",
+            Transcode::Utf8ToUtf16 => "utf8-to-utf16",
+        }
+    }
+}
+
+impl FixedEncoding {
+    pub(crate) fn width(&self) -> u8 {
+        match self {
+            FixedEncoding::Utf8 => 1,
+            FixedEncoding::Utf16 => 2,
+            FixedEncoding::Latin1 => 1,
+        }
+    }
+}
--- a/crates/environ/src/fact/traps.rs
+++ b/crates/environ/src/fact/traps.rs
@@ -30,6 +30,8 @@ pub enum Trap {
    InvalidDiscriminant,
    InvalidChar,
    ListByteLengthOverflow,
+    StringLengthTooBig,
+    StringLengthOverflow,
    AssertFailed(&'static str),
 }

@@ -105,6 +107,8 @@ impl fmt::Display for Trap {
            Trap::InvalidDiscriminant => "invalid variant discriminant".fmt(f),
            Trap::InvalidChar => "invalid char value specified".fmt(f),
            Trap::ListByteLengthOverflow => "byte size of list too large for i32".fmt(f),
+            Trap::StringLengthTooBig => "string byte size exceeds maximum".fmt(f),
+            Trap::StringLengthOverflow => "string byte size overflows i32".fmt(f),
            Trap::AssertFailed(s) => write!(f, "assertion failure: {}", s),
        }
    }