Add shared memories (#4187)

* Add shared memories This change adds the ability to use shared memories in Wasmtime when the [threads proposal] is enabled. Shared memories are annotated as `shared` in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are protected from concurrent access during `memory.size` and `memory.grow`. [threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md In order to implement this in Wasmtime, there are two main cases to cover: - a program may simply create a shared memory and possibly export it; this means that Wasmtime itself must be able to create shared memories - a user may create a shared memory externally and pass it in as an import during instantiation; this is the case when the program contains code like `(import "env" "memory" (memory 1 1 shared))`--this case is handled by a new Wasmtime API type--`SharedMemory` Because of the first case, this change allows any of the current memory-creation mechanisms to work as-is. Wasmtime can still create either static or dynamic memories in either on-demand or pooling modes, and any of these memories can be considered shared. When shared, the `Memory` runtime container will lock appropriately during `memory.size` and `memory.grow` operations; since all memories use this container, it is an ideal place for implementing the locking once and once only. The second case is covered by the new `SharedMemory` structure. It uses the same `Mmap` allocation under the hood as non-shared memories, but allows the user to perform the allocation externally to Wasmtime and share the memory across threads (via an `Arc`). The pointer address to the actual memory is carefully wired through and owned by the `SharedMemory` structure itself. This means that there are differing views of where to access the pointer (i.e., `VMMemoryDefinition`): for owned memories (the default), the `VMMemoryDefinition` is stored directly by the `VMContext`; in the `SharedMemory` case, however, this `VMContext` must point to this separate structure. To ensure that the `VMContext` can always point to the correct `VMMemoryDefinition`, this change alters the `VMContext` structure. Since a `SharedMemory` owns its own `VMMemoryDefinition`, the `defined_memories` table in the `VMContext` becomes a sequence of pointers--in the shared memory case, they point to the `VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a new table, `owned_memories`. This change adds an additional indirection (through the `*mut VMMemoryDefinition` pointer) that could add overhead. Using an imported memory as a proxy, we measured a 1-3% overhead of this approach on the `pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will special-case the owned memory access (i.e., load a pointer directly to the `owned_memories` entry) for `memory.size` so that only shared memories (and imported memories, as before) incur the indirection cost. * review: remove thread feature check * review: swap wasmtime-types dependency for existing wasmtime-environ use * review: remove unused VMMemoryUnion * review: reword cross-engine error message * review: improve tests * review: refactor to separate prevent Memory <-> SharedMemory conversion * review: into_shared_memory -> as_shared_memory * review: remove commented out code * review: limit shared min/max to 32 bits * review: skip imported memories * review: imported memories are not owned * review: remove TODO * review: document unsafe send + sync * review: add limiter assertion * review: remove TODO * review: improve tests * review: fix doc test * fix: fixes based on discussion with Alex This changes several key parts: - adds memory indexes to imports and exports - makes `VMMemoryDefinition::current_length` an atomic usize * review: add `Extern::SharedMemory` * review: remove TODO * review: atomically load from VMMemoryDescription in JIT-generated code * review: add test probing the last available memory slot across threads * fix: move assertion to new location due to rebase * fix: doc link * fix: add TODOs to c-api * fix: broken doc link * fix: modify pooling allocator messages in tests * review: make owned_memory_index panic instead of returning an option * review: clarify calculation of num_owned_memories * review: move 'use' to top of file * review: change '*const [u8]' to '*mut [u8]' * review: remove TODO * review: avoid hard-coding memory index * review: remove 'preallocation' parameter from 'Memory::_new' * fix: component model memory length * review: check that shared memory plans are static * review: ignore growth limits for shared memory * review: improve atomic store comment * review: add FIXME for memory growth failure * review: add comment about absence of bounds-checked 'memory.size' * review: make 'current_length()' doc comment more precise * review: more comments related to memory.size non-determinism * review: make 'vmmemory' unreachable for shared memory * review: move code around * review: thread plan through to 'wrap()' * review: disallow shared memory allocation with the pooling allocator
2022-06-08 10:13:40 -07:00
parent ed9db962de
commit 2b52f47b83
27 changed files with 1211 additions and 226 deletions
--- a/crates/runtime/src/export.rs
+++ b/crates/runtime/src/export.rs
@@ -2,7 +2,7 @@ use crate::vmcontext::{
    VMCallerCheckedAnyfunc, VMContext, VMGlobalDefinition, VMMemoryDefinition, VMTableDefinition,
 };
 use std::ptr::NonNull;
-use wasmtime_environ::{Global, MemoryPlan, TablePlan};
+use wasmtime_environ::{DefinedMemoryIndex, Global, MemoryPlan, TablePlan};

 /// The value of an export passed from one instance to another.
 pub enum Export {
@@ -71,6 +71,8 @@ pub struct ExportMemory {
    pub vmctx: *mut VMContext,
    /// The memory declaration, used for compatibility checking.
    pub memory: MemoryPlan,
+    /// The index at which the memory is defined within the `vmctx`.
+    pub index: DefinedMemoryIndex,
 }

 // See docs on send/sync for `ExportFunction` above.
--- a/crates/runtime/src/externref.rs
+++ b/crates/runtime/src/externref.rs
@@ -1043,6 +1043,7 @@ mod tests {
            num_imported_globals: 0,
            num_defined_tables: 0,
            num_defined_memories: 0,
+            num_owned_memories: 0,
            num_defined_globals: 0,
            num_escaped_funcs: 0,
        });
@@ -1069,6 +1070,7 @@ mod tests {
            num_imported_globals: 0,
            num_defined_tables: 0,
            num_defined_memories: 0,
+            num_owned_memories: 0,
            num_defined_globals: 0,
            num_escaped_funcs: 0,
        });
@@ -1095,6 +1097,7 @@ mod tests {
            num_imported_globals: 0,
            num_defined_tables: 0,
            num_defined_memories: 0,
+            num_owned_memories: 0,
            num_defined_globals: 0,
            num_escaped_funcs: 0,
        });
--- a/crates/runtime/src/instance.rs
+++ b/crates/runtime/src/instance.rs
@@ -26,7 +26,7 @@ use std::ops::Range;
 use std::ptr::NonNull;
 use std::sync::atomic::AtomicU64;
 use std::sync::Arc;
-use std::{mem, ptr, slice};
+use std::{mem, ptr};
 use wasmtime_environ::{
    packed_option::ReservedValue, DataIndex, DefinedGlobalIndex, DefinedMemoryIndex,
    DefinedTableIndex, ElemIndex, EntityIndex, EntityRef, EntitySet, FuncIndex, GlobalIndex,
@@ -193,13 +193,13 @@ impl Instance {
            self.memory(defined_index)
        } else {
            let import = self.imported_memory(index);
-            *unsafe { import.from.as_ref().unwrap() }
+            unsafe { VMMemoryDefinition::load(import.from) }
        }
    }

    /// Return the indexed `VMMemoryDefinition`.
    fn memory(&self, index: DefinedMemoryIndex) -> VMMemoryDefinition {
-        unsafe { *self.memory_ptr(index) }
+        unsafe { VMMemoryDefinition::load(self.memory_ptr(index)) }
    }

    /// Set the indexed memory to `VMMemoryDefinition`.
@@ -211,7 +211,7 @@ impl Instance {

    /// Return the indexed `VMMemoryDefinition`.
    fn memory_ptr(&self, index: DefinedMemoryIndex) -> *mut VMMemoryDefinition {
-        unsafe { self.vmctx_plus_offset(self.offsets.vmctx_vmmemory_definition(index)) }
+        unsafe { *self.vmctx_plus_offset(self.offsets.vmctx_vmmemory_pointer(index)) }
    }

    /// Return the indexed `VMGlobalDefinition`.
@@ -309,17 +309,18 @@ impl Instance {
    }

    fn get_exported_memory(&mut self, index: MemoryIndex) -> ExportMemory {
-        let (definition, vmctx) = if let Some(def_index) = self.module().defined_memory_index(index)
-        {
-            (self.memory_ptr(def_index), self.vmctx_ptr())
-        } else {
-            let import = self.imported_memory(index);
-            (import.from, import.vmctx)
-        };
+        let (definition, vmctx, def_index) =
+            if let Some(def_index) = self.module().defined_memory_index(index) {
+                (self.memory_ptr(def_index), self.vmctx_ptr(), def_index)
+            } else {
+                let import = self.imported_memory(index);
+                (import.from, import.vmctx, import.index)
+            };
        ExportMemory {
            definition,
            vmctx,
            memory: self.module().memory_plans[index].clone(),
+            index: def_index,
        }
    }

@@ -369,19 +370,6 @@ impl Instance {
        index
    }

-    /// Return the memory index for the given `VMMemoryDefinition`.
-    unsafe fn memory_index(&self, memory: &VMMemoryDefinition) -> DefinedMemoryIndex {
-        let index = DefinedMemoryIndex::new(
-            usize::try_from(
-                (memory as *const VMMemoryDefinition)
-                    .offset_from(self.memory_ptr(DefinedMemoryIndex::new(0))),
-            )
-            .unwrap(),
-        );
-        assert_lt!(index.index(), self.memories.len());
-        index
-    }
-
    /// Grow memory by the specified amount of pages.
    ///
    /// Returns `None` if memory can't be grown by the specified amount
@@ -398,20 +386,20 @@ impl Instance {
            let import = self.imported_memory(index);
            unsafe {
                let foreign_instance = (*import.vmctx).instance_mut();
-                let foreign_memory_def = &*import.from;
-                let foreign_memory_index = foreign_instance.memory_index(foreign_memory_def);
-                (foreign_memory_index, foreign_instance)
+                (import.index, foreign_instance)
            }
        };
        let store = unsafe { &mut *instance.store() };
        let memory = &mut instance.memories[idx];

-        let result = unsafe { memory.grow(delta, store) };
-        let vmmemory = memory.vmmemory();
+        let result = unsafe { memory.grow(delta, Some(store)) };

-        // Update the state used by wasm code in case the base pointer and/or
-        // the length changed.
-        instance.set_memory(idx, vmmemory);
+        // Update the state used by a non-shared Wasm memory in case the base
+        // pointer and/or the length changed.
+        if memory.as_shared_memory().is_none() {
+            let vmmemory = memory.vmmemory();
+            instance.set_memory(idx, vmmemory);
+        }

        result
    }
@@ -661,14 +649,16 @@ impl Instance {
        let src_mem = self.get_memory(src_index);
        let dst_mem = self.get_memory(dst_index);

-        let src = self.validate_inbounds(src_mem.current_length, src, len)?;
-        let dst = self.validate_inbounds(dst_mem.current_length, dst, len)?;
+        let src = self.validate_inbounds(src_mem.current_length(), src, len)?;
+        let dst = self.validate_inbounds(dst_mem.current_length(), dst, len)?;

        // Bounds and casts are checked above, by this point we know that
        // everything is safe.
        unsafe {
            let dst = dst_mem.base.add(dst);
            let src = src_mem.base.add(src);
+            // FIXME audit whether this is safe in the presence of shared memory
+            // (https://github.com/bytecodealliance/wasmtime/issues/4203).
            ptr::copy(src, dst, len as usize);
        }

@@ -701,12 +691,14 @@ impl Instance {
        len: u64,
    ) -> Result<(), Trap> {
        let memory = self.get_memory(memory_index);
-        let dst = self.validate_inbounds(memory.current_length, dst, len)?;
+        let dst = self.validate_inbounds(memory.current_length(), dst, len)?;

        // Bounds and casts are checked above, by this point we know that
        // everything is safe.
        unsafe {
            let dst = memory.base.add(dst);
+            // FIXME audit whether this is safe in the presence of shared memory
+            // (https://github.com/bytecodealliance/wasmtime/issues/4203).
            ptr::write_bytes(dst, val, len as usize);
        }

@@ -751,16 +743,16 @@ impl Instance {

        let memory = self.get_memory(memory_index);
        let data = self.wasm_data(range);
-        let dst = self.validate_inbounds(memory.current_length, dst, len.into())?;
+        let dst = self.validate_inbounds(memory.current_length(), dst, len.into())?;
        let src = self.validate_inbounds(data.len(), src.into(), len.into())?;
        let len = len as usize;

-        let src_slice = &data[src..(src + len)];
-
        unsafe {
+            let src_start = data.as_ptr().add(src);
            let dst_start = memory.base.add(dst);
-            let dst_slice = slice::from_raw_parts_mut(dst_start, len);
-            dst_slice.copy_from_slice(src_slice);
+            // FIXME audit whether this is safe in the presence of shared memory
+            // (https://github.com/bytecodealliance/wasmtime/issues/4203).
+            ptr::copy_nonoverlapping(src_start, dst_start, len);
        }

        Ok(())
@@ -935,10 +927,27 @@ impl Instance {
            ptr = ptr.add(1);
        }

-        // Initialize the defined memories
+        // Initialize the defined memories. This fills in both the
+        // `defined_memories` table and the `owned_memories` table at the same
+        // time. Entries in `defined_memories` hold a pointer to a definition
+        // (all memories) whereas the `owned_memories` hold the actual
+        // definitions of memories owned (not shared) in the module.
        let mut ptr = self.vmctx_plus_offset(self.offsets.vmctx_memories_begin());
+        let mut owned_ptr = self.vmctx_plus_offset(self.offsets.vmctx_owned_memories_begin());
        for i in 0..module.memory_plans.len() - module.num_imported_memories {
-            ptr::write(ptr, self.memories[DefinedMemoryIndex::new(i)].vmmemory());
+            let defined_memory_index = DefinedMemoryIndex::new(i);
+            let memory_index = module.memory_index(defined_memory_index);
+            if module.memory_plans[memory_index].memory.shared {
+                let def_ptr = self.memories[defined_memory_index]
+                    .as_shared_memory()
+                    .unwrap()
+                    .vmmemory_ptr_mut();
+                ptr::write(ptr, def_ptr);
+            } else {
+                ptr::write(owned_ptr, self.memories[defined_memory_index].vmmemory());
+                ptr::write(ptr, owned_ptr);
+                owned_ptr = owned_ptr.add(1);
+            }
            ptr = ptr.add(1);
        }

@@ -1104,11 +1113,6 @@ impl InstanceHandle {
        self.instance().host_state()
    }

-    /// Return the memory index for the given `VMMemoryDefinition` in this instance.
-    pub unsafe fn memory_index(&self, memory: &VMMemoryDefinition) -> DefinedMemoryIndex {
-        self.instance().memory_index(memory)
-    }
-
    /// Get a memory defined locally within this module.
    pub fn get_defined_memory(&mut self, index: DefinedMemoryIndex) -> *mut Memory {
        self.instance_mut().get_defined_memory(index)
--- a/crates/runtime/src/instance/allocator.rs
+++ b/crates/runtime/src/instance/allocator.rs
@@ -10,7 +10,6 @@ use std::alloc;
 use std::any::Any;
 use std::convert::TryFrom;
 use std::ptr;
-use std::slice;
 use std::sync::Arc;
 use thiserror::Error;
 use wasmtime_environ::{
@@ -315,7 +314,7 @@ fn check_memory_init_bounds(
            .and_then(|start| start.checked_add(init.data.len()));

        match end {
-            Some(end) if end <= memory.current_length => {
+            Some(end) if end <= memory.current_length() => {
                // Initializer is in bounds
            }
            _ => {
@@ -331,7 +330,7 @@ fn check_memory_init_bounds(

 fn initialize_memories(instance: &mut Instance, module: &Module) -> Result<(), InstantiationError> {
    let memory_size_in_pages =
-        &|memory| (instance.get_memory(memory).current_length as u64) / u64::from(WASM_PAGE_SIZE);
+        &|memory| (instance.get_memory(memory).current_length() as u64) / u64::from(WASM_PAGE_SIZE);

    // Loads the `global` value and returns it as a `u64`, but sign-extends
    // 32-bit globals which can be used as the base for 32-bit memories.
@@ -372,10 +371,15 @@ fn initialize_memories(instance: &mut Instance, module: &Module) -> Result<(), I
                }
            }
            let memory = instance.get_memory(memory_index);
-            let dst_slice =
-                unsafe { slice::from_raw_parts_mut(memory.base, memory.current_length) };
-            let dst = &mut dst_slice[usize::try_from(init.offset).unwrap()..][..init.data.len()];
-            dst.copy_from_slice(instance.wasm_data(init.data.clone()));
+
+            unsafe {
+                let src = instance.wasm_data(init.data.clone());
+                let dst = memory.base.add(usize::try_from(init.offset).unwrap());
+                // FIXME audit whether this is safe in the presence of shared
+                // memory
+                // (https://github.com/bytecodealliance/wasmtime/issues/4203).
+                ptr::copy_nonoverlapping(src.as_ptr(), dst, src.len())
+            }
            true
        },
    );
@@ -513,6 +517,36 @@ impl Default for OnDemandInstanceAllocator {
    }
 }

+/// Allocate an instance containing a single memory.
+///
+/// In order to import a [`Memory`] into a WebAssembly instance, Wasmtime
+/// requires that memory to exist in its own instance. Here we bring to life
+/// such a "Frankenstein" instance with the only purpose of exporting a
+/// [`Memory`].
+pub unsafe fn allocate_single_memory_instance(
+    req: InstanceAllocationRequest,
+    memory: Memory,
+) -> Result<InstanceHandle, InstantiationError> {
+    let mut memories = PrimaryMap::default();
+    memories.push(memory);
+    let tables = PrimaryMap::default();
+    let module = req.runtime_info.module();
+    let offsets = VMOffsets::new(HostPtr, module);
+    let layout = Instance::alloc_layout(&offsets);
+    let instance = alloc::alloc(layout) as *mut Instance;
+    Instance::new_at(instance, layout.size(), offsets, req, memories, tables);
+    Ok(InstanceHandle { instance })
+}
+
+/// Internal implementation of [`InstanceHandle`] deallocation.
+///
+/// See [`InstanceAllocator::deallocate()`] for more details.
+pub unsafe fn deallocate(handle: &InstanceHandle) {
+    let layout = Instance::alloc_layout(&handle.instance().offsets);
+    ptr::drop_in_place(handle.instance);
+    alloc::dealloc(handle.instance.cast(), layout);
+}
+
 unsafe impl InstanceAllocator for OnDemandInstanceAllocator {
    unsafe fn allocate(
        &self,
@@ -542,9 +576,7 @@ unsafe impl InstanceAllocator for OnDemandInstanceAllocator {
    }

    unsafe fn deallocate(&self, handle: &InstanceHandle) {
-        let layout = Instance::alloc_layout(&handle.instance().offsets);
-        ptr::drop_in_place(handle.instance);
-        alloc::dealloc(handle.instance.cast(), layout);
+        deallocate(handle)
    }

    #[cfg(feature = "async")]
--- a/crates/runtime/src/lib.rs
+++ b/crates/runtime/src/lib.rs
@@ -52,12 +52,14 @@ pub use crate::export::*;
 pub use crate::externref::*;
 pub use crate::imports::Imports;
 pub use crate::instance::{
-    InstanceAllocationRequest, InstanceAllocator, InstanceHandle, InstantiationError, LinkError,
-    OnDemandInstanceAllocator, StorePtr,
+    allocate_single_memory_instance, InstanceAllocationRequest, InstanceAllocator, InstanceHandle,
+    InstantiationError, LinkError, OnDemandInstanceAllocator, StorePtr,
 };
 #[cfg(feature = "pooling-allocator")]
 pub use crate::instance::{InstanceLimits, PoolingAllocationStrategy, PoolingInstanceAllocator};
-pub use crate::memory::{DefaultMemoryCreator, Memory, RuntimeLinearMemory, RuntimeMemoryCreator};
+pub use crate::memory::{
+    DefaultMemoryCreator, Memory, RuntimeLinearMemory, RuntimeMemoryCreator, SharedMemory,
+};
 pub use crate::mmap::Mmap;
 pub use crate::mmap_vec::MmapVec;
 pub use crate::table::{Table, TableElement};
--- a/crates/runtime/src/libcalls.rs
+++ b/crates/runtime/src/libcalls.rs
@@ -586,7 +586,7 @@ unsafe fn validate_atomic_addr(
    memory: MemoryIndex,
    addr: usize,
 ) -> Result<(), Trap> {
-    if addr > instance.get_memory(memory).current_length {
+    if addr > instance.get_memory(memory).current_length() {
        return Err(Trap::wasm(TrapCode::HeapOutOfBounds));
    }
    Ok(())
--- a/crates/runtime/src/memory.rs
+++ b/crates/runtime/src/memory.rs
@@ -11,7 +11,8 @@ use anyhow::Error;
 use anyhow::{bail, format_err, Result};
 use more_asserts::{assert_ge, assert_le};
 use std::convert::TryFrom;
-use std::sync::Arc;
+use std::sync::atomic::Ordering;
+use std::sync::{Arc, RwLock};
 use wasmtime_environ::{MemoryPlan, MemoryStyle, WASM32_MAX_PAGES, WASM64_MAX_PAGES};

 const WASM_PAGE_SIZE: usize = wasmtime_environ::WASM_PAGE_SIZE as usize;
@@ -60,6 +61,81 @@ pub trait RuntimeLinearMemory: Send + Sync {
    /// Returns `None` if the memory is unbounded.
    fn maximum_byte_size(&self) -> Option<usize>;

+    /// Grows a memory by `delta_pages`.
+    ///
+    /// This performs the necessary checks on the growth before delegating to
+    /// the underlying `grow_to` implementation. A default implementation of
+    /// this memory is provided here since this is assumed to be the same for
+    /// most kinds of memory; one exception is shared memory, which must perform
+    /// all the steps of the default implementation *plus* the required locking.
+    ///
+    /// The `store` is used only for error reporting.
+    fn grow(
+        &mut self,
+        delta_pages: u64,
+        mut store: Option<&mut dyn Store>,
+    ) -> Result<Option<(usize, usize)>, Error> {
+        let old_byte_size = self.byte_size();
+
+        // Wasm spec: when growing by 0 pages, always return the current size.
+        if delta_pages == 0 {
+            return Ok(Some((old_byte_size, old_byte_size)));
+        }
+
+        // The largest wasm-page-aligned region of memory is possible to
+        // represent in a `usize`. This will be impossible for the system to
+        // actually allocate.
+        let absolute_max = 0usize.wrapping_sub(WASM_PAGE_SIZE);
+
+        // Calculate the byte size of the new allocation. Let it overflow up to
+        // `usize::MAX`, then clamp it down to `absolute_max`.
+        let new_byte_size = usize::try_from(delta_pages)
+            .unwrap_or(usize::MAX)
+            .saturating_mul(WASM_PAGE_SIZE)
+            .saturating_add(old_byte_size);
+        let new_byte_size = if new_byte_size > absolute_max {
+            absolute_max
+        } else {
+            new_byte_size
+        };
+
+        let maximum = self.maximum_byte_size();
+        // Store limiter gets first chance to reject memory_growing.
+        if let Some(store) = &mut store {
+            if !store.memory_growing(old_byte_size, new_byte_size, maximum)? {
+                return Ok(None);
+            }
+        }
+
+        // Never exceed maximum, even if limiter permitted it.
+        if let Some(max) = maximum {
+            if new_byte_size > max {
+                if let Some(store) = store {
+                    // FIXME: shared memories may not have an associated store
+                    // to report the growth failure to but the error should not
+                    // be dropped
+                    // (https://github.com/bytecodealliance/wasmtime/issues/4240).
+                    store.memory_grow_failed(&format_err!("Memory maximum size exceeded"));
+                }
+                return Ok(None);
+            }
+        }
+
+        match self.grow_to(new_byte_size) {
+            Ok(_) => Ok(Some((old_byte_size, new_byte_size))),
+            Err(e) => {
+                // FIXME: shared memories may not have an associated store to
+                // report the growth failure to but the error should not be
+                // dropped
+                // (https://github.com/bytecodealliance/wasmtime/issues/4240).
+                if let Some(store) = store {
+                    store.memory_grow_failed(&e);
+                }
+                Ok(None)
+            }
+        }
+    }
+
    /// Grow memory to the specified amount of bytes.
    ///
    /// Returns an error if memory can't be grown by the specified amount
@@ -77,7 +153,6 @@ pub trait RuntimeLinearMemory: Send + Sync {

    /// For the pooling allocator, we must be able to downcast this trait to its
    /// underlying structure.
-    #[cfg(feature = "pooling-allocator")]
    fn as_any_mut(&mut self) -> &mut dyn std::any::Any;
 }

@@ -114,7 +189,8 @@ pub struct MmapMemory {
 }

 impl MmapMemory {
-    /// Create a new linear memory instance with specified minimum and maximum number of wasm pages.
+    /// Create a new linear memory instance with specified minimum and maximum
+    /// number of wasm pages.
    pub fn new(
        plan: &MemoryPlan,
        minimum: usize,
@@ -145,13 +221,14 @@ impl MmapMemory {
                (bound_bytes, 0)
            }
        };
+
        let request_bytes = pre_guard_bytes
            .checked_add(alloc_bytes)
            .and_then(|i| i.checked_add(extra_to_reserve_on_growth))
            .and_then(|i| i.checked_add(offset_guard_bytes))
            .ok_or_else(|| format_err!("cannot allocate {} with guard regions", minimum))?;
-
        let mut mmap = Mmap::accessible_reserved(0, request_bytes)?;
+
        if minimum > 0 {
            mmap.make_accessible(pre_guard_bytes, minimum)?;
        }
@@ -250,7 +327,7 @@ impl RuntimeLinearMemory for MmapMemory {
    fn vmmemory(&mut self) -> VMMemoryDefinition {
        VMMemoryDefinition {
            base: unsafe { self.mmap.as_mut_ptr().add(self.pre_guard_size) },
-            current_length: self.accessible,
+            current_length: self.accessible.into(),
        }
    }

@@ -260,7 +337,6 @@ impl RuntimeLinearMemory for MmapMemory {
        self.memory_image.is_none()
    }

-    #[cfg(feature = "pooling-allocator")]
    fn as_any_mut(&mut self) -> &mut dyn std::any::Any {
        self
    }
@@ -268,7 +344,7 @@ impl RuntimeLinearMemory for MmapMemory {

 /// A "static" memory where the lifetime of the backing memory is managed
 /// elsewhere. Currently used with the pooling allocator.
-struct ExternalMemory {
+struct StaticMemory {
    /// The memory in the host for this wasm memory. The length of this
    /// slice is the maximum size of the memory that can be grown to.
    base: &'static mut [u8],
@@ -286,7 +362,7 @@ struct ExternalMemory {
    memory_image: Option<MemoryImageSlot>,
 }

-impl ExternalMemory {
+impl StaticMemory {
    fn new(
        base: &'static mut [u8],
        initial_size: usize,
@@ -324,7 +400,7 @@ impl ExternalMemory {
    }
 }

-impl RuntimeLinearMemory for ExternalMemory {
+impl RuntimeLinearMemory for StaticMemory {
    fn byte_size(&self) -> usize {
        self.size
    }
@@ -362,7 +438,7 @@ impl RuntimeLinearMemory for ExternalMemory {
    fn vmmemory(&mut self) -> VMMemoryDefinition {
        VMMemoryDefinition {
            base: self.base.as_mut_ptr().cast(),
-            current_length: self.size,
+            current_length: self.size.into(),
        }
    }

@@ -374,7 +450,150 @@ impl RuntimeLinearMemory for ExternalMemory {
        }
    }

-    #[cfg(feature = "pooling-allocator")]
+    fn as_any_mut(&mut self) -> &mut dyn std::any::Any {
+        self
+    }
+}
+
+/// For shared memory (and only for shared memory), this lock-version restricts
+/// access when growing the memory or checking its size. This is to conform with
+/// the [thread proposal]: "When `IsSharedArrayBuffer(...)` is true, the return
+/// value should be the result of an atomic read-modify-write of the new size to
+/// the internal `length` slot."
+///
+/// [thread proposal]:
+///     https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#webassemblymemoryprototypegrow
+#[derive(Clone)]
+pub struct SharedMemory(Arc<RwLock<SharedMemoryInner>>);
+impl SharedMemory {
+    /// Construct a new [`SharedMemory`].
+    pub fn new(plan: MemoryPlan) -> Result<Self> {
+        let (minimum_bytes, maximum_bytes) = Memory::limit_new(&plan, None)?;
+        let mmap_memory = MmapMemory::new(&plan, minimum_bytes, maximum_bytes, None)?;
+        Ok(Self::wrap(&plan, Box::new(mmap_memory), plan.memory))
+    }
+
+    /// Wrap an existing [Memory] with the locking provided by a [SharedMemory].
+    pub fn wrap(
+        plan: &MemoryPlan,
+        mut memory: Box<dyn RuntimeLinearMemory>,
+        ty: wasmtime_environ::Memory,
+    ) -> Self {
+        assert!(ty.shared);
+        assert!(matches!(plan.style, MemoryStyle::Static { .. }));
+        assert!(
+            memory.as_any_mut().type_id() != std::any::TypeId::of::<SharedMemory>(),
+            "cannot re-wrap a shared memory"
+        );
+        let def = LongTermVMMemoryDefinition(memory.vmmemory());
+        Self(Arc::new(RwLock::new(SharedMemoryInner {
+            memory: memory,
+            ty,
+            def,
+        })))
+    }
+
+    /// Return the memory type for this [`SharedMemory`].
+    pub fn ty(&self) -> wasmtime_environ::Memory {
+        self.0.read().unwrap().ty
+    }
+
+    /// Convert this shared memory into a [`Memory`].
+    pub fn as_memory(self) -> Memory {
+        Memory(Box::new(self))
+    }
+
+    /// Return a mutable pointer to the shared memory's [VMMemoryDefinition].
+    pub fn vmmemory_ptr_mut(&mut self) -> *mut VMMemoryDefinition {
+        &self.0.read().unwrap().def.0 as *const _ as *mut _
+    }
+
+    /// Return a pointer to the shared memory's [VMMemoryDefinition].
+    pub fn vmmemory_ptr(&self) -> *const VMMemoryDefinition {
+        &self.0.read().unwrap().def.0 as *const _
+    }
+}
+
+struct SharedMemoryInner {
+    memory: Box<dyn RuntimeLinearMemory>,
+    ty: wasmtime_environ::Memory,
+    def: LongTermVMMemoryDefinition,
+}
+
+/// Shared memory needs some representation of a `VMMemoryDefinition` for
+/// JIT-generated code to access. This structure owns the base pointer and
+/// length to the actual memory and we share this definition across threads by:
+/// - never changing the base pointer; according to the specification, shared
+///   memory must be created with a known maximum size so it can be allocated
+///   once and never moved
+/// - carefully changing the length, using atomic accesses in both the runtime
+///   and JIT-generated code.
+struct LongTermVMMemoryDefinition(VMMemoryDefinition);
+unsafe impl Send for LongTermVMMemoryDefinition {}
+unsafe impl Sync for LongTermVMMemoryDefinition {}
+
+/// Proxy all calls through the [`RwLock`].
+impl RuntimeLinearMemory for SharedMemory {
+    fn byte_size(&self) -> usize {
+        self.0.read().unwrap().memory.byte_size()
+    }
+
+    fn maximum_byte_size(&self) -> Option<usize> {
+        self.0.read().unwrap().memory.maximum_byte_size()
+    }
+
+    fn grow(
+        &mut self,
+        delta_pages: u64,
+        store: Option<&mut dyn Store>,
+    ) -> Result<Option<(usize, usize)>, Error> {
+        let mut inner = self.0.write().unwrap();
+        let result = inner.memory.grow(delta_pages, store)?;
+        if let Some((_old_size_in_bytes, new_size_in_bytes)) = result {
+            // Store the new size to the `VMMemoryDefinition` for JIT-generated
+            // code (and runtime functions) to access. No other code can be
+            // growing this memory due to the write lock, but code in other
+            // threads could have access to this shared memory and we want them
+            // to see the most consistent version of the `current_length`; a
+            // weaker consistency is possible if we accept them seeing an older,
+            // smaller memory size (assumption: memory only grows) but presently
+            // we are aiming for accuracy.
+            //
+            // Note that it could be possible to access a memory address that is
+            // now-valid due to changes to the page flags in `grow` above but
+            // beyond the `memory.size` that we are about to assign to. In these
+            // and similar cases, discussion in the thread proposal concluded
+            // that: "multiple accesses in one thread racing with another
+            // thread's `memory.grow` that are in-bounds only after the grow
+            // commits may independently succeed or trap" (see
+            // https://github.com/WebAssembly/threads/issues/26#issuecomment-433930711).
+            // In other words, some non-determinism is acceptable when using
+            // `memory.size` on work being done by `memory.grow`.
+            inner
+                .def
+                .0
+                .current_length
+                .store(new_size_in_bytes, Ordering::SeqCst);
+        }
+        Ok(result)
+    }
+
+    fn grow_to(&mut self, size: usize) -> Result<()> {
+        self.0.write().unwrap().memory.grow_to(size)
+    }
+
+    fn vmmemory(&mut self) -> VMMemoryDefinition {
+        // `vmmemory()` is used for writing the `VMMemoryDefinition` of a memory
+        // into its `VMContext`; this should never be possible for a shared
+        // memory because the only `VMMemoryDefinition` for it should be stored
+        // in its own `def` field.
+        unreachable!()
+    }
+
+    fn needs_init(&self) -> bool {
+        self.0.read().unwrap().memory.needs_init()
+    }
+
    fn as_any_mut(&mut self) -> &mut dyn std::any::Any {
        self
    }
@@ -391,13 +610,14 @@ impl Memory {
        store: &mut dyn Store,
        memory_image: Option<&Arc<MemoryImage>>,
    ) -> Result<Self> {
-        let (minimum, maximum) = Self::limit_new(plan, store)?;
-        Ok(Memory(creator.new_memory(
-            plan,
-            minimum,
-            maximum,
-            memory_image,
-        )?))
+        let (minimum, maximum) = Self::limit_new(plan, Some(store))?;
+        let allocation = creator.new_memory(plan, minimum, maximum, memory_image)?;
+        let allocation = if plan.memory.shared {
+            Box::new(SharedMemory::wrap(plan, allocation, plan.memory))
+        } else {
+            allocation
+        };
+        Ok(Memory(allocation))
    }

    /// Create a new static (immovable) memory instance for the specified plan.
@@ -408,17 +628,30 @@ impl Memory {
        memory_image: Option<MemoryImageSlot>,
        store: &mut dyn Store,
    ) -> Result<Self> {
-        let (minimum, maximum) = Self::limit_new(plan, store)?;
+        let (minimum, maximum) = Self::limit_new(plan, Some(store))?;
        let pooled_memory =
-            ExternalMemory::new(base, minimum, maximum, make_accessible, memory_image)?;
-        Ok(Memory(Box::new(pooled_memory)))
+            StaticMemory::new(base, minimum, maximum, make_accessible, memory_image)?;
+        let allocation = Box::new(pooled_memory);
+        let allocation: Box<dyn RuntimeLinearMemory> = if plan.memory.shared {
+            // FIXME: since the pooling allocator owns the memory allocation
+            // (which is torn down with the instance), the current shared memory
+            // implementation will cause problems; see
+            // https://github.com/bytecodealliance/wasmtime/issues/4244.
+            todo!("using shared memory with the pooling allocator is a work in progress");
+        } else {
+            allocation
+        };
+        Ok(Memory(allocation))
    }

    /// Calls the `store`'s limiter to optionally prevent a memory from being allocated.
    ///
    /// Returns the minimum size and optional maximum size of the memory, in
    /// bytes.
-    fn limit_new(plan: &MemoryPlan, store: &mut dyn Store) -> Result<(usize, Option<usize>)> {
+    fn limit_new(
+        plan: &MemoryPlan,
+        store: Option<&mut dyn Store>,
+    ) -> Result<(usize, Option<usize>)> {
        // Sanity-check what should already be true from wasm module validation.
        let absolute_max = if plan.memory.memory64 {
            WASM64_MAX_PAGES
@@ -473,17 +706,24 @@ impl Memory {
            maximum = usize::try_from(1u64 << 32).ok();
        }

-        // Inform the store's limiter what's about to happen. This will let the limiter
-        // reject anything if necessary, and this also guarantees that we should
-        // call the limiter for all requested memories, even if our `minimum`
-        // calculation overflowed. This means that the `minimum` we're informing
-        // the limiter is lossy and may not be 100% accurate, but for now the
-        // expected uses of limiter means that's ok.
-        if !store.memory_growing(0, minimum.unwrap_or(absolute_max), maximum)? {
-            bail!(
-                "memory minimum size of {} pages exceeds memory limits",
-                plan.memory.minimum
-            );
+        // Inform the store's limiter what's about to happen. This will let the
+        // limiter reject anything if necessary, and this also guarantees that
+        // we should call the limiter for all requested memories, even if our
+        // `minimum` calculation overflowed. This means that the `minimum` we're
+        // informing the limiter is lossy and may not be 100% accurate, but for
+        // now the expected uses of limiter means that's ok.
+        if let Some(store) = store {
+            // We ignore the store limits for shared memories since they are
+            // technically not created within a store (though, trickily, they
+            // may be associated with one in order to get a `vmctx`).
+            if !plan.memory.shared {
+                if !store.memory_growing(0, minimum.unwrap_or(absolute_max), maximum)? {
+                    bail!(
+                        "memory minimum size of {} pages exceeds memory limits",
+                        plan.memory.minimum
+                    );
+                }
+            }
        }

        // At this point we need to actually handle overflows, so bail out with
@@ -539,52 +779,11 @@ impl Memory {
    pub unsafe fn grow(
        &mut self,
        delta_pages: u64,
-        store: &mut dyn Store,
+        store: Option<&mut dyn Store>,
    ) -> Result<Option<usize>, Error> {
-        let old_byte_size = self.byte_size();
-
-        // Wasm spec: when growing by 0 pages, always return the current size.
-        if delta_pages == 0 {
-            return Ok(Some(old_byte_size));
-        }
-
-        // largest wasm-page-aligned region of memory it is possible to
-        // represent in a usize. This will be impossible for the system to
-        // actually allocate.
-        let absolute_max = 0usize.wrapping_sub(WASM_PAGE_SIZE);
-        // calculate byte size of the new allocation. Let it overflow up to
-        // usize::MAX, then clamp it down to absolute_max.
-        let new_byte_size = usize::try_from(delta_pages)
-            .unwrap_or(usize::MAX)
-            .saturating_mul(WASM_PAGE_SIZE)
-            .saturating_add(old_byte_size);
-        let new_byte_size = if new_byte_size > absolute_max {
-            absolute_max
-        } else {
-            new_byte_size
-        };
-
-        let maximum = self.maximum_byte_size();
-        // Store limiter gets first chance to reject memory_growing.
-        if !store.memory_growing(old_byte_size, new_byte_size, maximum)? {
-            return Ok(None);
-        }
-
-        // Never exceed maximum, even if limiter permitted it.
-        if let Some(max) = maximum {
-            if new_byte_size > max {
-                store.memory_grow_failed(&format_err!("Memory maximum size exceeded"));
-                return Ok(None);
-            }
-        }
-
-        match self.0.grow_to(new_byte_size) {
-            Ok(_) => Ok(Some(old_byte_size)),
-            Err(e) => {
-                store.memory_grow_failed(&e);
-                Ok(None)
-            }
-        }
+        self.0
+            .grow(delta_pages, store)
+            .map(|opt| opt.map(|(old, _new)| old))
    }

    /// Return a `VMMemoryDefinition` for exposing the memory to compiled wasm code.
@@ -597,7 +796,7 @@ impl Memory {
    #[cfg(feature = "pooling-allocator")]
    pub fn is_static(&mut self) -> bool {
        let as_any = self.0.as_any_mut();
-        as_any.downcast_ref::<ExternalMemory>().is_some()
+        as_any.downcast_ref::<StaticMemory>().is_some()
    }

    /// Consume the memory, returning its [`MemoryImageSlot`] if any is present.
@@ -606,10 +805,21 @@ impl Memory {
    #[cfg(feature = "pooling-allocator")]
    pub fn unwrap_static_image(mut self) -> Option<MemoryImageSlot> {
        let as_any = self.0.as_any_mut();
-        if let Some(m) = as_any.downcast_mut::<ExternalMemory>() {
+        if let Some(m) = as_any.downcast_mut::<StaticMemory>() {
            std::mem::take(&mut m.memory_image)
        } else {
            None
        }
    }
+
+    /// If the [Memory] is a [SharedMemory], unwrap it and return a clone to
+    /// that shared memory.
+    pub fn as_shared_memory(&mut self) -> Option<SharedMemory> {
+        let as_any = self.0.as_any_mut();
+        if let Some(m) = as_any.downcast_mut::<SharedMemory>() {
+            Some(m.clone())
+        } else {
+            None
+        }
+    }
 }
--- a/crates/runtime/src/vmcontext.rs
+++ b/crates/runtime/src/vmcontext.rs
@@ -7,7 +7,9 @@ use std::any::Any;
 use std::cell::UnsafeCell;
 use std::marker;
 use std::ptr::NonNull;
+use std::sync::atomic::{AtomicUsize, Ordering};
 use std::u32;
+use wasmtime_environ::DefinedMemoryIndex;

 pub const VMCONTEXT_MAGIC: u32 = u32::from_le_bytes(*b"core");

@@ -129,6 +131,9 @@ pub struct VMMemoryImport {

    /// A pointer to the `VMContext` that owns the memory description.
    pub vmctx: *mut VMContext,
+
+    /// The index of the memory in the containing `vmctx`.
+    pub index: DefinedMemoryIndex,
 }

 // Declare that this type is send/sync, it's the responsibility of users of
@@ -205,14 +210,41 @@ mod test_vmglobal_import {
 /// The fields compiled code needs to access to utilize a WebAssembly linear
 /// memory defined within the instance, namely the start address and the
 /// size in bytes.
-#[derive(Debug, Copy, Clone)]
+#[derive(Debug)]
 #[repr(C)]
 pub struct VMMemoryDefinition {
    /// The start address.
    pub base: *mut u8,

    /// The current logical size of this linear memory in bytes.
-    pub current_length: usize,
+    ///
+    /// This is atomic because shared memories must be able to grow their length
+    /// atomically. For relaxed access, see
+    /// [`VMMemoryDefinition::current_length()`].
+    pub current_length: AtomicUsize,
+}
+
+impl VMMemoryDefinition {
+    /// Return the current length of the [`VMMemoryDefinition`] by performing a
+    /// relaxed load; do not use this function for situations in which a precise
+    /// length is needed. Owned memories (i.e., non-shared) will always return a
+    /// precise result (since no concurrent modification is possible) but shared
+    /// memories may see an imprecise value--a `current_length` potentially
+    /// smaller than what some other thread observes. Since Wasm memory only
+    /// grows, this under-estimation may be acceptable in certain cases.
+    pub fn current_length(&self) -> usize {
+        self.current_length.load(Ordering::Relaxed)
+    }
+
+    /// Return a copy of the [`VMMemoryDefinition`] using the relaxed value of
+    /// `current_length`; see [`VMMemoryDefinition::current_length()`].
+    pub unsafe fn load(ptr: *mut Self) -> Self {
+        let other = &*ptr;
+        VMMemoryDefinition {
+            base: other.base,
+            current_length: other.current_length().into(),
+        }
+    }
 }

 #[cfg(test)]