Implement support for dynamic memories in the pooling allocator (#5208)

* Implement support for dynamic memories in the pooling allocator This is a continuation of the thrust in #5207 for reducing page faults and lock contention when using the pooling allocator. To that end this commit implements support for efficient memory management in the pooling allocator when using wasm that is instrumented with bounds checks. The `MemoryImageSlot` type now avoids unconditionally shrinking memory back to its initial size during the `clear_and_remain_ready` operation, instead deferring optional resizing of memory to the subsequent call to `instantiate` when the slot is reused. The instantiation portion then takes the "memory style" as an argument which dictates whether the accessible memory must be precisely fit or whether it's allowed to exceed the maximum. This in effect enables skipping a call to `mprotect` to shrink the heap when dynamic memory checks are enabled. In terms of page fault and contention this should improve the situation by: * Fewer calls to `mprotect` since once a heap grows it stays grown and it never shrinks. This means that a write lock is taken within the kernel much more rarely from before (only asymptotically now, not N-times-per-instance). * Accessed memory after a heap growth operation will not fault if it was previously paged in by a prior instance and set to zero with `memset`. Unlike #5207 which requires a 6.0 kernel to see this optimization this commit enables the optimization for any kernel. The major cost of choosing this strategy is naturally the performance hit of the wasm itself. This is being looked at in PRs such as #5190 to improve Wasmtime's story here. This commit does not implement any new configuration options for Wasmtime but instead reinterprets existing configuration options. The pooling allocator no longer unconditionally sets `static_memory_bound_is_maximum` and then implements support necessary for this memory type. This other change to this commit is that the `Tunables::static_memory_bound` configuration option is no longer gating on the creation of a `MemoryPool` and it will now appropriately size to `instance_limits.memory_pages` if the `static_memory_bound` is to small. This is done to accomodate fuzzing more easily where the `static_memory_bound` will become small during fuzzing and otherwise the configuration would be rejected and require manual handling. The spirit of the `MemoryPool` is one of large virtual address space reservations anyway so it seemed reasonable to interpret the configuration this way. * Skip zero memory_size cases These are causing errors to happen when fuzzing and otherwise in theory shouldn't be too interesting to optimize for anyway since they likely aren't used in practice.
2022-11-08 14:43:08 -06:00
parent 70bca801ab
commit 50cffad0d3
6 changed files with 470 additions and 262 deletions
--- a/crates/runtime/src/cow.rs
+++ b/crates/runtime/src/cow.rs
@@ -9,7 +9,7 @@ use rustix::fd::AsRawFd;
 use std::fs::File;
 use std::sync::Arc;
 use std::{convert::TryFrom, ops::Range};
-use wasmtime_environ::{DefinedMemoryIndex, MemoryInitialization, Module, PrimaryMap};
+use wasmtime_environ::{DefinedMemoryIndex, MemoryInitialization, MemoryStyle, Module, PrimaryMap};

 /// Backing images for memories in a module.
 ///
@@ -250,45 +250,89 @@ impl ModuleMemoryImages {
    }
 }

-/// A single slot handled by the copy-on-write memory initialization mechanism.
+/// Slot management of a copy-on-write image which can be reused for the pooling
+/// allocator.
 ///
-/// The mmap scheme is:
+/// This data structure manages a slot of linear memory, primarily in the
+/// pooling allocator, which optionally has a contiguous memory image in the
+/// middle of it. Pictorially this data structure manages a virtual memory
+/// region that looks like:
 ///
-/// base ==> (points here)
-/// - (image.offset bytes)   anonymous zero memory, pre-image
-/// - (image.len bytes)      CoW mapping of memory image
-/// - (up to static_size)    anonymous zero memory, post-image
+/// ```ignore
+///   +--------------------+-------------------+--------------+--------------+
+///   |   anonymous        |      optional     |   anonymous  |    PROT_NONE |
+///   |     zero           |       memory      |     zero     |     memory   |
+///   |    memory          |       image       |    memory    |              |
+///   +--------------------+-------------------+--------------+--------------+
+///   |                     <------+---------->
+///   |<-----+------------>         \
+///   |      \                   image.len
+///   |       \
+///   |  image.linear_memory_offset
+///   |
+///   \
+///  self.base is this virtual address
 ///
-/// The ordering of mmaps to set this up is:
+///    <------------------+------------------------------------------------>
+///                        \
+///                      static_size
 ///
-/// - once, when pooling allocator is created:
-///   - one large mmap to create 8GiB * instances * memories slots
+///    <------------------+---------------------------------->
+///                        \
+///                      accessible
+/// ```
 ///
-/// - per instantiation of new image in a slot:
-///   - mmap of anonymous zero memory, from 0 to max heap size
-///     (static_size)
-///   - mmap of CoW'd image, from `image.offset` to
-///     `image.offset + image.len`. This overwrites part of the
-///     anonymous zero memory, potentially splitting it into a pre-
-///     and post-region.
-///   - mprotect(PROT_NONE) on the part of the heap beyond the initial
-///     heap size; we re-mprotect it with R+W bits when the heap is
-///     grown.
+/// When a `MemoryImageSlot` is created it's told what the `static_size` and
+/// `accessible` limits are. Initially there is assumed to be no image in linear
+/// memory.
+///
+/// When [`MemoryImageSlot::instantiate`] is called then the method will perform
+/// a "synchronization" to take the image from its prior state to the new state
+/// for the image specified. The first instantiation for example will mmap the
+/// heap image into place. Upon reuse of a slot nothing happens except possibly
+/// shrinking `self.accessible`. When a new image is used then the old image is
+/// mapped to anonymous zero memory and then the new image is mapped in place.
+///
+/// A `MemoryImageSlot` is either `dirty` or it isn't. When a `MemoryImageSlot`
+/// is dirty then it is assumed that any memory beneath `self.accessible` could
+/// have any value. Instantiation cannot happen into a `dirty` slot, however, so
+/// the [`MemoryImageSlot::clear_and_remain_ready`] returns this memory back to
+/// its original state to mark `dirty = false`. This is done by resetting all
+/// anonymous memory back to zero and the image itself back to its initial
+/// contents.
+///
+/// On Linux this is achieved with the `madvise(MADV_DONTNEED)` syscall. This
+/// syscall will release the physical pages back to the OS but retain the
+/// original mappings, effectively resetting everything back to its initial
+/// state. Non-linux platforms will replace all memory below `self.accessible`
+/// with a fresh zero'd mmap, meaning that reuse is effectively not supported.
 #[derive(Debug)]
 pub struct MemoryImageSlot {
-    /// The base of the actual heap memory. Bytes at this address are
-    /// what is seen by the Wasm guest code.
+    /// The base address in virtual memory of the actual heap memory.
+    ///
+    /// Bytes at this address are what is seen by the Wasm guest code.
+    ///
+    /// Note that this is stored as `usize` instead of `*mut u8` to not deal
+    /// with `Send`/`Sync.
    base: usize,
-    /// The maximum static memory size, plus post-guard.
+
+    /// The maximum static memory size which `self.accessible` can grow to.
    static_size: usize,
-    /// The image that backs this memory. May be `None`, in
-    /// which case the memory is all zeroes.
-    pub(crate) image: Option<Arc<MemoryImage>>,
-    /// The initial heap size.
-    initial_size: usize,
-    /// The current heap size. All memory above `base + cur_size`
-    /// should be PROT_NONE (mapped inaccessible).
-    cur_size: usize,
+
+    /// An optional image that is currently being used in this linear memory.
+    ///
+    /// This can be `None` in which case memory is originally all zeros. When
+    /// `Some` the image describes where it's located within the image.
+    image: Option<Arc<MemoryImage>>,
+
+    /// The size of the heap that is readable and writable.
+    ///
+    /// Note that this may extend beyond the actual linear memory heap size in
+    /// the case of dynamic memories in use. Memory accesses to memory below
+    /// `self.accessible` may still page fault as pages are lazily brought in
+    /// but the faults will always be resolved by the kernel.
+    accessible: usize,
+
    /// Whether this slot may have "dirty" pages (pages written by an
    /// instantiation). Set by `instantiate()` and cleared by
    /// `clear_and_remain_ready()`, and used in assertions to ensure
@@ -297,9 +341,11 @@ pub struct MemoryImageSlot {
    /// Invariant: if !dirty, then this memory slot contains a clean
    /// CoW mapping of `image`, if `Some(..)`, and anonymous-zero
    /// memory beyond the image up to `static_size`. The addresses
-    /// from offset 0 to `initial_size` are accessible R+W and the
-    /// rest of the slot is inaccessible.
+    /// from offset 0 to `self.accessible` are R+W and set to zero or the
+    /// initial image content, as appropriate. Everything between
+    /// `self.accessible` and `self.static_size` is inaccessible.
    dirty: bool,
+
    /// Whether this MemoryImageSlot is responsible for mapping anonymous
    /// memory (to hold the reservation while overwriting mappings
    /// specific to this slot) in place when it is dropped. Default
@@ -310,13 +356,18 @@ pub struct MemoryImageSlot {
 impl MemoryImageSlot {
    /// Create a new MemoryImageSlot. Assumes that there is an anonymous
    /// mmap backing in the given range to start.
-    pub(crate) fn create(base_addr: *mut c_void, initial_size: usize, static_size: usize) -> Self {
+    ///
+    /// The `accessible` parameter descibes how much of linear memory is
+    /// already mapped as R/W with all zero-bytes. The `static_size` value is
+    /// the maximum size of this image which `accessible` cannot grow beyond,
+    /// and all memory from `accessible` from `static_size` should be mapped as
+    /// `PROT_NONE` backed by zero-bytes.
+    pub(crate) fn create(base_addr: *mut c_void, accessible: usize, static_size: usize) -> Self {
        let base = base_addr as usize;
        MemoryImageSlot {
            base,
            static_size,
-            initial_size,
-            cur_size: initial_size,
+            accessible,
            image: None,
            dirty: false,
            clear_on_drop: true,
@@ -332,135 +383,144 @@ impl MemoryImageSlot {
    }

    pub(crate) fn set_heap_limit(&mut self, size_bytes: usize) -> Result<()> {
-        // mprotect the relevant region.
+        assert!(size_bytes <= self.static_size);
+
+        // If the heap limit already addresses accessible bytes then no syscalls
+        // are necessary since the data is already mapped into the process and
+        // waiting to go.
+        //
+        // This is used for "dynamic" memories where memory is not always
+        // decommitted during recycling (but it's still always reset).
+        if size_bytes <= self.accessible {
+            return Ok(());
+        }
+
+        // Otherwise use `mprotect` to make the new pages read/write.
        self.set_protection(
-            self.cur_size..size_bytes,
+            self.accessible..size_bytes,
            rustix::mm::MprotectFlags::READ | rustix::mm::MprotectFlags::WRITE,
        )?;
-        self.cur_size = size_bytes;
+        self.accessible = size_bytes;

        Ok(())
    }

+    /// Prepares this slot for the instantiation of a new instance with the
+    /// provided linear memory image.
+    ///
+    /// The `initial_size_bytes` parameter indicates the required initial size
+    /// of the heap for the instance. The `maybe_image` is an optional initial
+    /// image for linear memory to contains. The `style` is the way compiled
+    /// code will be accessing this memory.
+    ///
+    /// The purpose of this method is to take a previously pristine slot
+    /// (`!self.dirty`) and transform its prior state into state necessary for
+    /// the given parameters. This could include, for example:
+    ///
+    /// * More memory may be made read/write if `initial_size_bytes` is larger
+    ///   than `self.accessible`.
+    /// * For `MemoryStyle::Static` linear memory may be made `PROT_NONE` if
+    ///   `self.accessible` is larger than `initial_size_bytes`.
+    /// * If no image was previously in place or if the wrong image was
+    ///   previously in place then `mmap` may be used to setup the initial
+    ///   image.
    pub(crate) fn instantiate(
        &mut self,
        initial_size_bytes: usize,
        maybe_image: Option<&Arc<MemoryImage>>,
+        style: &MemoryStyle,
    ) -> Result<(), InstantiationError> {
        assert!(!self.dirty);
-        assert_eq!(self.cur_size, self.initial_size);
+        assert!(initial_size_bytes <= self.static_size);

-        // Fast-path: previously instantiated with the same image, or
-        // no image but the same initial size, so the mappings are
-        // already correct; there is no need to mmap anything. Given
-        // that we asserted not-dirty above, any dirty pages will have
-        // already been thrown away by madvise() during the previous
-        // termination. The `clear_and_remain_ready()` path also
-        // mprotects memory above the initial heap size back to
-        // PROT_NONE, so we don't need to do that here.
-        if self.image.as_ref() == maybe_image && self.initial_size == initial_size_bytes {
-            self.dirty = true;
-            return Ok(());
-        }
-        // Otherwise, we need to transition from the previous state to the
-        // state now requested. An attempt is made here to minimize syscalls to
-        // the kernel to ideally reduce the overhead of this as it's fairly
-        // performance sensitive with memories. Note that the "previous state"
-        // is assumed to be post-initialization (e.g. after an mmap on-demand
-        // memory was created) or after `clear_and_remain_ready` was called
-        // which notably means that `madvise` has reset all the memory back to
-        // its original state.
+        // First order of business is to blow away the previous linear memory
+        // image if it doesn't match the image specified here. If one is
+        // detected then it's reset with anonymous memory which means that all
+        // of memory up to `self.accessible` will now be read/write and zero.
        //
-        // Security/audit note: we map all of these MAP_PRIVATE, so
-        // all instance data is local to the mapping, not propagated
-        // to the backing fd. We throw away this CoW overlay with
-        // madvise() below, from base up to static_size (which is the
-        // whole slot) when terminating the instance.
-
-        if self.image.is_some() {
-            // In this case the state of memory at this time is that the memory
-            // from `0..self.initial_size` is reset back to its original state,
-            // but this memory contians a CoW image that is different from the
-            // one specified here. To reset state we first reset the mapping
-            // of memory to anonymous PROT_NONE memory, and then afterwards the
-            // heap is made visible with an mprotect.
-            self.reset_with_anon_memory()
-                .map_err(|e| InstantiationError::Resource(e.into()))?;
-            self.set_protection(
-                0..initial_size_bytes,
-                rustix::mm::MprotectFlags::READ | rustix::mm::MprotectFlags::WRITE,
-            )
-            .map_err(|e| InstantiationError::Resource(e.into()))?;
-        } else if initial_size_bytes < self.initial_size {
-            // In this case the previous module had now CoW image which means
-            // that the memory at `0..self.initial_size` is all zeros and
-            // read-write, everything afterwards being PROT_NONE.
-            //
-            // Our requested heap size is smaller than the previous heap size
-            // so all that's needed now is to shrink the heap further to
-            // `initial_size_bytes`.
-            //
-            // So we come in with:
-            // - anon-zero memory, R+W,  [0, self.initial_size)
-            // - anon-zero memory, none, [self.initial_size, self.static_size)
-            // and we want:
-            // - anon-zero memory, R+W,  [0, initial_size_bytes)
-            // - anon-zero memory, none, [initial_size_bytes, self.static_size)
-            //
-            // so given initial_size_bytes < self.initial_size we
-            // mprotect(NONE) the zone from the first to the second.
-            self.set_protection(
-                initial_size_bytes..self.initial_size,
-                rustix::mm::MprotectFlags::empty(),
-            )
-            .map_err(|e| InstantiationError::Resource(e.into()))?;
-        } else if initial_size_bytes > self.initial_size {
-            // In this case, like the previous one, the previous module had no
-            // CoW image but had a smaller heap than desired for this module.
-            // That means that here `mprotect` is used to make the new pages
-            // read/write, and since they're all reset from before they'll be
-            // made visible as zeros.
-            self.set_protection(
-                self.initial_size..initial_size_bytes,
-                rustix::mm::MprotectFlags::READ | rustix::mm::MprotectFlags::WRITE,
-            )
-            .map_err(|e| InstantiationError::Resource(e.into()))?;
-        } else {
-            // The final case here is that the previous module has no CoW image
-            // so the previous heap is all zeros. The previous heap is the exact
-            // same size as the requested heap, so no syscalls are needed to do
-            // anything else.
-        }
-
-        // The memory image, at this point, should have `initial_size_bytes` of
-        // zeros starting at `self.base` followed by inaccessible memory to
-        // `self.static_size`. Update sizing fields to reflect this.
-        self.initial_size = initial_size_bytes;
-        self.cur_size = initial_size_bytes;
-
-        // The initial memory image, if given. If not, we just get a
-        // memory filled with zeroes.
-        if let Some(image) = maybe_image.as_ref() {
-            assert!(
-                image.linear_memory_offset.checked_add(image.len).unwrap() <= initial_size_bytes
-            );
-            if image.len > 0 {
+        // Note that this intentionally a "small mmap" which only covers the
+        // extent of the prior initialization image in order to preserve
+        // resident memory that might come before or after the image.
+        if self.image.as_ref() != maybe_image {
+            if let Some(image) = &self.image {
                unsafe {
-                    let ptr = rustix::mm::mmap(
+                    let ptr = rustix::mm::mmap_anonymous(
                        (self.base + image.linear_memory_offset) as *mut c_void,
                        image.len,
                        rustix::mm::ProtFlags::READ | rustix::mm::ProtFlags::WRITE,
                        rustix::mm::MapFlags::PRIVATE | rustix::mm::MapFlags::FIXED,
-                        image.fd.as_file(),
-                        image.fd_offset,
                    )
                    .map_err(|e| InstantiationError::Resource(e.into()))?;
                    assert_eq!(ptr as usize, self.base + image.linear_memory_offset);
                }
+                self.image = None;
            }
        }

-        self.image = maybe_image.cloned();
+        // The next order of business is to ensure that `self.accessible` is
+        // appropriate. First up is to grow the read/write portion of memory if
+        // it's not large enough to accommodate `initial_size_bytes`.
+        if self.accessible < initial_size_bytes {
+            self.set_protection(
+                self.accessible..initial_size_bytes,
+                rustix::mm::MprotectFlags::READ | rustix::mm::MprotectFlags::WRITE,
+            )
+            .map_err(|e| InstantiationError::Resource(e.into()))?;
+            self.accessible = initial_size_bytes;
+        }
+
+        // Next, if the "static" style of memory is being used then that means
+        // that the addressable heap must be shrunk to match
+        // `initial_size_bytes`. This is because the "static" flavor of memory
+        // relies on page faults to indicate out-of-bounds accesses to memory.
+        //
+        // Note that "dynamic" memories do not shrink the heap here. A dynamic
+        // memory performs dynamic bounds checks so if the remaining heap is
+        // still addressable then that's ok since it still won't get accessed.
+        if initial_size_bytes < self.accessible {
+            match style {
+                MemoryStyle::Static { .. } => {
+                    self.set_protection(
+                        initial_size_bytes..self.accessible,
+                        rustix::mm::MprotectFlags::empty(),
+                    )
+                    .map_err(|e| InstantiationError::Resource(e.into()))?;
+                    self.accessible = initial_size_bytes;
+                }
+                MemoryStyle::Dynamic { .. } => {}
+            }
+        }
+
+        // Now that memory is sized appropriately the final operation is to
+        // place the new image into linear memory. Note that this operation is
+        // skipped if `self.image` matches `maybe_image`.
+        assert!(initial_size_bytes <= self.accessible);
+        if self.image.as_ref() != maybe_image {
+            if let Some(image) = maybe_image.as_ref() {
+                assert!(
+                    image.linear_memory_offset.checked_add(image.len).unwrap()
+                        <= initial_size_bytes
+                );
+                if image.len > 0 {
+                    unsafe {
+                        let ptr = rustix::mm::mmap(
+                            (self.base + image.linear_memory_offset) as *mut c_void,
+                            image.len,
+                            rustix::mm::ProtFlags::READ | rustix::mm::ProtFlags::WRITE,
+                            rustix::mm::MapFlags::PRIVATE | rustix::mm::MapFlags::FIXED,
+                            image.fd.as_file(),
+                            image.fd_offset,
+                        )
+                        .map_err(|e| InstantiationError::Resource(e.into()))?;
+                        assert_eq!(ptr as usize, self.base + image.linear_memory_offset);
+                    }
+                }
+            }
+            self.image = maybe_image.cloned();
+        }
+
+        // Flag ourselves as `dirty` which means that the next operation on this
+        // slot is required to be `clear_and_remain_ready`.
        self.dirty = true;

        Ok(())
@@ -481,13 +541,6 @@ impl MemoryImageSlot {
            self.reset_all_memory_contents(keep_resident)?;
        }

-        // mprotect the initial heap region beyond the initial heap size back to
-        // PROT_NONE.
-        self.set_protection(
-            self.initial_size..self.cur_size,
-            rustix::mm::MprotectFlags::empty(),
-        )?;
-        self.cur_size = self.initial_size;
        self.dirty = false;
        Ok(())
    }
@@ -506,7 +559,7 @@ impl MemoryImageSlot {

        match &self.image {
            Some(image) => {
-                assert!(self.cur_size >= image.linear_memory_offset + image.len);
+                assert!(self.accessible >= image.linear_memory_offset + image.len);
                if image.linear_memory_offset < keep_resident {
                    // If the image starts below the `keep_resident` then
                    // memory looks something like this:
@@ -518,7 +571,7 @@ impl MemoryImageSlot {
                    //  <-------------->                <------->
                    //
                    //                              image_end
-                    // 0        linear_memory_offset   |               cur_size
+                    // 0        linear_memory_offset   |             accessible
                    // |                |              |                  |
                    // +----------------+--------------+---------+--------+
                    // |  dirty memory  |    image     |   dirty memory   |
@@ -539,7 +592,7 @@ impl MemoryImageSlot {
                    // zero bytes large.

                    let image_end = image.linear_memory_offset + image.len;
-                    let mem_after_image = self.cur_size - image_end;
+                    let mem_after_image = self.accessible - image_end;
                    let remaining_memset =
                        (keep_resident - image.linear_memory_offset).min(mem_after_image);

@@ -566,7 +619,7 @@ impl MemoryImageSlot {
                    // then we memset the start of linear memory and then use
                    // madvise below for the rest of it, including the image.
                    //
-                    // 0             keep_resident                     cur_size
+                    // 0             keep_resident                   accessible
                    // |                |                                 |
                    // +----------------+---+----------+------------------+
                    // |  dirty memory      |  image   |   dirty memory   |
@@ -585,7 +638,7 @@ impl MemoryImageSlot {
                    std::ptr::write_bytes(self.base as *mut u8, 0u8, keep_resident);

                    // This is madvise (2)
-                    self.madvise_reset(keep_resident, self.cur_size - keep_resident)?;
+                    self.madvise_reset(keep_resident, self.accessible - keep_resident)?;
                }
            }

@@ -593,9 +646,9 @@ impl MemoryImageSlot {
            // bytes in the memory back to zero while using `madvise` to purge
            // the rest.
            None => {
-                let size_to_memset = keep_resident.min(self.cur_size);
+                let size_to_memset = keep_resident.min(self.accessible);
                std::ptr::write_bytes(self.base as *mut u8, 0u8, size_to_memset);
-                self.madvise_reset(size_to_memset, self.cur_size - size_to_memset)?;
+                self.madvise_reset(size_to_memset, self.accessible - size_to_memset)?;
            }
        }

@@ -604,7 +657,7 @@ impl MemoryImageSlot {

    #[allow(dead_code)] // ignore warnings as this is only used in some cfgs
    unsafe fn madvise_reset(&self, base: usize, len: usize) -> Result<()> {
-        assert!(base + len <= self.cur_size);
+        assert!(base + len <= self.accessible);
        if len == 0 {
            return Ok(());
        }
@@ -658,8 +711,7 @@ impl MemoryImageSlot {
        }

        self.image = None;
-        self.cur_size = 0;
-        self.initial_size = 0;
+        self.accessible = 0;

        Ok(())
    }
@@ -708,7 +760,7 @@ impl Drop for MemoryImageSlot {
 mod test {
    use std::sync::Arc;

-    use super::{create_memfd, FdSource, MemoryImage, MemoryImageSlot};
+    use super::{create_memfd, FdSource, MemoryImage, MemoryImageSlot, MemoryStyle};
    use crate::mmap::Mmap;
    use anyhow::Result;
    use std::io::Write;
@@ -734,6 +786,7 @@ mod test {

    #[test]
    fn instantiate_no_image() {
+        let style = MemoryStyle::Static { bound: 4 << 30 };
        // 4 MiB mmap'd area, not accessible
        let mut mmap = Mmap::accessible_reserved(0, 4 << 20).unwrap();
        // Create a MemoryImageSlot on top of it
@@ -741,7 +794,7 @@ mod test {
        memfd.no_clear_on_drop();
        assert!(!memfd.is_dirty());
        // instantiate with 64 KiB initial size
-        memfd.instantiate(64 << 10, None).unwrap();
+        memfd.instantiate(64 << 10, None, &style).unwrap();
        assert!(memfd.is_dirty());
        // We should be able to access this 64 KiB (try both ends) and
        // it should consist of zeroes.
@@ -759,13 +812,14 @@ mod test {
        // reuse-anon-mmap-opt kicks in
        memfd.clear_and_remain_ready(0).unwrap();
        assert!(!memfd.is_dirty());
-        memfd.instantiate(64 << 10, None).unwrap();
+        memfd.instantiate(64 << 10, None, &style).unwrap();
        let slice = mmap.as_slice();
        assert_eq!(0, slice[1024]);
    }

    #[test]
    fn instantiate_image() {
+        let style = MemoryStyle::Static { bound: 4 << 30 };
        // 4 MiB mmap'd area, not accessible
        let mut mmap = Mmap::accessible_reserved(0, 4 << 20).unwrap();
        // Create a MemoryImageSlot on top of it
@@ -774,38 +828,38 @@ mod test {
        // Create an image with some data.
        let image = Arc::new(create_memfd_with_data(4096, &[1, 2, 3, 4]).unwrap());
        // Instantiate with this image
-        memfd.instantiate(64 << 10, Some(&image)).unwrap();
+        memfd.instantiate(64 << 10, Some(&image), &style).unwrap();
        assert!(memfd.has_image());
        let slice = mmap.as_mut_slice();
        assert_eq!(&[1, 2, 3, 4], &slice[4096..4100]);
        slice[4096] = 5;
        // Clear and re-instantiate same image
        memfd.clear_and_remain_ready(0).unwrap();
-        memfd.instantiate(64 << 10, Some(&image)).unwrap();
+        memfd.instantiate(64 << 10, Some(&image), &style).unwrap();
        let slice = mmap.as_slice();
        // Should not see mutation from above
        assert_eq!(&[1, 2, 3, 4], &slice[4096..4100]);
        // Clear and re-instantiate no image
        memfd.clear_and_remain_ready(0).unwrap();
-        memfd.instantiate(64 << 10, None).unwrap();
+        memfd.instantiate(64 << 10, None, &style).unwrap();
        assert!(!memfd.has_image());
        let slice = mmap.as_slice();
        assert_eq!(&[0, 0, 0, 0], &slice[4096..4100]);
        // Clear and re-instantiate image again
        memfd.clear_and_remain_ready(0).unwrap();
-        memfd.instantiate(64 << 10, Some(&image)).unwrap();
+        memfd.instantiate(64 << 10, Some(&image), &style).unwrap();
        let slice = mmap.as_slice();
        assert_eq!(&[1, 2, 3, 4], &slice[4096..4100]);
        // Create another image with different data.
        let image2 = Arc::new(create_memfd_with_data(4096, &[10, 11, 12, 13]).unwrap());
        memfd.clear_and_remain_ready(0).unwrap();
-        memfd.instantiate(128 << 10, Some(&image2)).unwrap();
+        memfd.instantiate(128 << 10, Some(&image2), &style).unwrap();
        let slice = mmap.as_slice();
        assert_eq!(&[10, 11, 12, 13], &slice[4096..4100]);
        // Instantiate the original image again; we should notice it's
        // a different image and not reuse the mappings.
        memfd.clear_and_remain_ready(0).unwrap();
-        memfd.instantiate(64 << 10, Some(&image)).unwrap();
+        memfd.instantiate(64 << 10, Some(&image), &style).unwrap();
        let slice = mmap.as_slice();
        assert_eq!(&[1, 2, 3, 4], &slice[4096..4100]);
    }
@@ -813,6 +867,7 @@ mod test {
    #[test]
    #[cfg(target_os = "linux")]
    fn memset_instead_of_madvise() {
+        let style = MemoryStyle::Static { bound: 100 };
        let mut mmap = Mmap::accessible_reserved(0, 4 << 20).unwrap();
        let mut memfd = MemoryImageSlot::create(mmap.as_mut_ptr() as *mut _, 0, 4 << 20);
        memfd.no_clear_on_drop();
@@ -821,7 +876,7 @@ mod test {
        for image_off in [0, 4096, 8 << 10] {
            let image = Arc::new(create_memfd_with_data(image_off, &[1, 2, 3, 4]).unwrap());
            for amt_to_memset in [0, 4096, 10 << 12, 1 << 20, 10 << 20] {
-                memfd.instantiate(64 << 10, Some(&image)).unwrap();
+                memfd.instantiate(64 << 10, Some(&image), &style).unwrap();
                assert!(memfd.has_image());
                let slice = mmap.as_mut_slice();
                if image_off > 0 {
@@ -837,7 +892,7 @@ mod test {

        // Test without an image
        for amt_to_memset in [0, 4096, 10 << 12, 1 << 20, 10 << 20] {
-            memfd.instantiate(64 << 10, None).unwrap();
+            memfd.instantiate(64 << 10, None, &style).unwrap();
            for chunk in mmap.as_mut_slice()[..64 << 10].chunks_mut(1024) {
                assert_eq!(chunk[0], 0);
                chunk[0] = 5;
@@ -845,4 +900,56 @@ mod test {
            memfd.clear_and_remain_ready(amt_to_memset).unwrap();
        }
    }
+
+    #[test]
+    #[cfg(target_os = "linux")]
+    fn dynamic() {
+        let style = MemoryStyle::Dynamic { reserve: 200 };
+
+        let mut mmap = Mmap::accessible_reserved(0, 4 << 20).unwrap();
+        let mut memfd = MemoryImageSlot::create(mmap.as_mut_ptr() as *mut _, 0, 4 << 20);
+        memfd.no_clear_on_drop();
+        let image = Arc::new(create_memfd_with_data(4096, &[1, 2, 3, 4]).unwrap());
+        let initial = 64 << 10;
+
+        // Instantiate the image and test that memory remains accessible after
+        // it's cleared.
+        memfd.instantiate(initial, Some(&image), &style).unwrap();
+        assert!(memfd.has_image());
+        let slice = mmap.as_mut_slice();
+        assert_eq!(&[1, 2, 3, 4], &slice[4096..4100]);
+        slice[4096] = 5;
+        assert_eq!(&[5, 2, 3, 4], &slice[4096..4100]);
+        memfd.clear_and_remain_ready(0).unwrap();
+        assert_eq!(&[1, 2, 3, 4], &slice[4096..4100]);
+
+        // Re-instantiate make sure it preserves memory. Grow a bit and set data
+        // beyond the initial size.
+        memfd.instantiate(initial, Some(&image), &style).unwrap();
+        assert_eq!(&[1, 2, 3, 4], &slice[4096..4100]);
+        memfd.set_heap_limit(initial * 2).unwrap();
+        assert_eq!(&[0, 0], &slice[initial..initial + 2]);
+        slice[initial] = 100;
+        assert_eq!(&[100, 0], &slice[initial..initial + 2]);
+        memfd.clear_and_remain_ready(0).unwrap();
+
+        // Test that memory is still accessible, but it's been reset
+        assert_eq!(&[0, 0], &slice[initial..initial + 2]);
+
+        // Instantiate again, and again memory beyond the initial size should
+        // still be accessible. Grow into it again and make sure it works.
+        memfd.instantiate(initial, Some(&image), &style).unwrap();
+        assert_eq!(&[0, 0], &slice[initial..initial + 2]);
+        memfd.set_heap_limit(initial * 2).unwrap();
+        assert_eq!(&[0, 0], &slice[initial..initial + 2]);
+        slice[initial] = 100;
+        assert_eq!(&[100, 0], &slice[initial..initial + 2]);
+        memfd.clear_and_remain_ready(0).unwrap();
+
+        // Reset the image to none and double-check everything is back to zero
+        memfd.instantiate(64 << 10, None, &style).unwrap();
+        assert!(!memfd.has_image());
+        assert_eq!(&[0, 0, 0, 0], &slice[4096..4100]);
+        assert_eq!(&[0, 0], &slice[initial..initial + 2]);
+    }
 }
--- a/crates/runtime/src/cow_disabled.rs
+++ b/crates/runtime/src/cow_disabled.rs
@@ -5,7 +5,7 @@
 use crate::{InstantiationError, MmapVec};
 use anyhow::Result;
 use std::sync::Arc;
-use wasmtime_environ::{DefinedMemoryIndex, Module};
+use wasmtime_environ::{DefinedMemoryIndex, MemoryStyle, Module};

 /// A shim for the memory image container when support is not included.
 pub enum ModuleMemoryImages {}
@@ -49,6 +49,7 @@ impl MemoryImageSlot {
        &mut self,
        _: usize,
        _: Option<&Arc<MemoryImage>>,
+        _: &MemoryStyle,
    ) -> Result<Self, InstantiationError> {
        unreachable!();
    }
--- a/crates/runtime/src/instance/allocator/pooling.rs
+++ b/crates/runtime/src/instance/allocator/pooling.rs
@@ -19,8 +19,8 @@ use std::convert::TryFrom;
 use std::mem;
 use std::sync::Mutex;
 use wasmtime_environ::{
-    DefinedMemoryIndex, DefinedTableIndex, HostPtr, Module, PrimaryMap, Tunables, VMOffsets,
-    WASM_PAGE_SIZE,
+    DefinedMemoryIndex, DefinedTableIndex, HostPtr, MemoryStyle, Module, PrimaryMap, Tunables,
+    VMOffsets, WASM_PAGE_SIZE,
 };

 mod index_allocator;
@@ -312,7 +312,7 @@ impl InstancePool {
            let memory = unsafe {
                std::slice::from_raw_parts_mut(
                    self.memories.get_base(instance_index, defined_index),
-                    self.memories.max_memory_size,
+                    self.memories.max_accessible,
                )
            };

@@ -338,7 +338,7 @@ impl InstancePool {
                // the process to continue, because we never perform a
                // mmap that would leave an open space for someone
                // else to come in and map something.
-                slot.instantiate(initial_size as usize, Some(image))
+                slot.instantiate(initial_size as usize, Some(image), &plan.style)
                    .map_err(|e| InstantiationError::Resource(e.into()))?;

                memories.push(
@@ -496,7 +496,20 @@ impl InstancePool {
            .iter()
            .skip(module.num_imported_memories)
        {
-            let max = self.memories.max_memory_size / (WASM_PAGE_SIZE as usize);
+            match &plan.style {
+                MemoryStyle::Static { bound } => {
+                    let memory_size_pages =
+                        (self.memories.memory_size as u64) / u64::from(WASM_PAGE_SIZE);
+                    if memory_size_pages < *bound {
+                        bail!(
+                            "memory size allocated per-memory is too small to \
+                             satisfy static bound of {bound:#x} pages"
+                        );
+                    }
+                }
+                MemoryStyle::Dynamic { .. } => {}
+            }
+            let max = self.memories.max_accessible / (WASM_PAGE_SIZE as usize);
            if plan.memory.minimum > (max as u64) {
                bail!(
                    "memory index {} has a minimum page size of {} which exceeds the limit of {}",
@@ -572,8 +585,28 @@ impl InstancePool {
 ///
 /// A linear memory is divided into accessible pages and guard pages.
 ///
-/// Each instance index into the pool returns an iterator over the base addresses
-/// of the instance's linear memories.
+/// Each instance index into the pool returns an iterator over the base
+/// addresses of the instance's linear memories.
+///
+/// A diagram for this struct's fields is:
+///
+/// ```ignore
+///                       memory_size
+///                           /
+///         max_accessible   /                    memory_and_guard_size
+///                 |       /                               |
+///              <--+--->  /                    <-----------+---------->
+///              <--------+->
+///
+/// +-----------+--------+---+-----------+     +--------+---+-----------+
+/// | PROT_NONE |            | PROT_NONE | ... |            | PROT_NONE |
+/// +-----------+--------+---+-----------+     +--------+---+-----------+
+/// |           |<------------------+---------------------------------->
+/// \           |                    \
+/// mapping     |     `max_instances * max_memories` memories
+///            /
+///    initial_memory_offset
+/// ```
 #[derive(Debug)]
 struct MemoryPool {
    mapping: Mmap,
@@ -581,12 +614,15 @@ struct MemoryPool {
    // dynamically transfer ownership of a slot to a Memory when in
    // use.
    image_slots: Vec<Mutex<Option<MemoryImageSlot>>>,
-    // The size, in bytes, of each linear memory's reservation plus the guard
-    // region allocated for it.
-    memory_reservation_size: usize,
-    // The maximum size, in bytes, of each linear memory. Guaranteed to be a
-    // whole number of wasm pages.
-    max_memory_size: usize,
+    // The size, in bytes, of each linear memory's reservation, not including
+    // any guard region.
+    memory_size: usize,
+    // The size, in bytes, of each linear memory's reservation plus the trailing
+    // guard region allocated for it.
+    memory_and_guard_size: usize,
+    // The maximum size that can become accessible, in bytes, of each linear
+    // memory. Guaranteed to be a whole number of wasm pages.
+    max_accessible: usize,
    // The size, in bytes, of the offset to the first linear memory in this
    // pool. This is here to help account for the first region of guard pages,
    // if desired, before the first linear memory.
@@ -605,29 +641,25 @@ impl MemoryPool {
            );
        }

-        // The maximum module memory page count cannot exceed the memory reservation size
-        if u64::from(instance_limits.memory_pages) > tunables.static_memory_bound {
-            bail!(
-                "module memory page limit of {} pages exceeds maximum static memory limit of {} pages",
-                instance_limits.memory_pages,
-                tunables.static_memory_bound,
-            );
-        }
+        // Interpret the larger of the maximal size of memory or the static
+        // memory bound as the size of the virtual address space reservation for
+        // memory itself. Typically `static_memory_bound` is 4G which helps
+        // elide most bounds checks in wasm. If `memory_pages` is larger,
+        // though, then this is a non-moving pooling allocator so create larger
+        // reservations for account for that.
+        let memory_size = instance_limits
+            .memory_pages
+            .max(tunables.static_memory_bound)
+            * u64::from(WASM_PAGE_SIZE);

-        let memory_size = if instance_limits.memory_pages > 0 {
-            usize::try_from(
-                u64::from(tunables.static_memory_bound) * u64::from(WASM_PAGE_SIZE)
-                    + tunables.static_memory_offset_guard_size,
-            )
-            .map_err(|_| anyhow!("memory reservation size exceeds addressable memory"))?
-        } else {
-            0
-        };
+        let memory_and_guard_size =
+            usize::try_from(memory_size + tunables.static_memory_offset_guard_size)
+                .map_err(|_| anyhow!("memory reservation size exceeds addressable memory"))?;

        assert!(
-            memory_size % crate::page_size() == 0,
+            memory_and_guard_size % crate::page_size() == 0,
            "memory size {} is not a multiple of system page size",
-            memory_size
+            memory_and_guard_size
        );

        let max_instances = instance_limits.count as usize;
@@ -651,7 +683,7 @@ impl MemoryPool {
        // `initial_memory_offset` variable here. If guards aren't specified
        // before linear memories this is set to `0`, otherwise it's set to
        // the same size as guard regions for other memories.
-        let allocation_size = memory_size
+        let allocation_size = memory_and_guard_size
            .checked_mul(max_memories)
            .and_then(|c| c.checked_mul(max_instances))
            .and_then(|c| c.checked_add(initial_memory_offset))
@@ -675,11 +707,12 @@ impl MemoryPool {
        let pool = Self {
            mapping,
            image_slots,
-            memory_reservation_size: memory_size,
+            memory_size: memory_size.try_into().unwrap(),
+            memory_and_guard_size,
            initial_memory_offset,
            max_memories,
            max_instances,
-            max_memory_size: (instance_limits.memory_pages as usize) * (WASM_PAGE_SIZE as usize),
+            max_accessible: (instance_limits.memory_pages as usize) * (WASM_PAGE_SIZE as usize),
        };

        Ok(pool)
@@ -690,7 +723,7 @@ impl MemoryPool {
        let memory_index = memory_index.as_u32() as usize;
        assert!(memory_index < self.max_memories);
        let idx = instance_index * self.max_memories + memory_index;
-        let offset = self.initial_memory_offset + idx * self.memory_reservation_size;
+        let offset = self.initial_memory_offset + idx * self.memory_and_guard_size;
        unsafe { self.mapping.as_mut_ptr().offset(offset as isize) }
    }

@@ -713,7 +746,7 @@ impl MemoryPool {
            MemoryImageSlot::create(
                self.get_base(instance_index, memory_index) as *mut c_void,
                0,
-                self.max_memory_size,
+                self.max_accessible,
            )
        })
    }
@@ -1061,13 +1094,6 @@ unsafe impl InstanceAllocator for PoolingInstanceAllocator {
        Ok(())
    }

-    fn adjust_tunables(&self, tunables: &mut Tunables) {
-        // Treat the static memory bound as the maximum for unbounded Wasm memories
-        // Because we guarantee a module cannot compile unless it fits in the limits of
-        // the pool allocator, this ensures all memories are treated as static (i.e. immovable).
-        tunables.static_memory_bound_is_maximum = true;
-    }
-
    unsafe fn allocate(
        &self,
        req: InstanceAllocationRequest,
@@ -1265,10 +1291,10 @@ mod test {
            },
        )?;

-        assert_eq!(pool.memory_reservation_size, WASM_PAGE_SIZE as usize);
+        assert_eq!(pool.memory_and_guard_size, WASM_PAGE_SIZE as usize);
        assert_eq!(pool.max_memories, 3);
        assert_eq!(pool.max_instances, 5);
-        assert_eq!(pool.max_memory_size, WASM_PAGE_SIZE as usize);
+        assert_eq!(pool.max_accessible, WASM_PAGE_SIZE as usize);

        let base = pool.mapping.as_ptr() as usize;

@@ -1278,7 +1304,7 @@ mod test {
            for j in 0..3 {
                assert_eq!(
                    iter.next().unwrap() as usize - base,
-                    ((i * 3) + j) * pool.memory_reservation_size
+                    ((i * 3) + j) * pool.memory_and_guard_size
                );
            }

@@ -1454,19 +1480,16 @@ mod test {
            },
            ..PoolingInstanceAllocatorConfig::default()
        };
-        assert_eq!(
-            PoolingInstanceAllocator::new(
-                &config,
-                &Tunables {
-                    static_memory_bound: 1,
-                    static_memory_offset_guard_size: 0,
-                    ..Tunables::default()
-                },
-            )
-            .map_err(|e| e.to_string())
-            .expect_err("expected a failure constructing instance allocator"),
-            "module memory page limit of 2 pages exceeds maximum static memory limit of 1 pages"
-        );
+        let pool = PoolingInstanceAllocator::new(
+            &config,
+            &Tunables {
+                static_memory_bound: 1,
+                static_memory_offset_guard_size: 0,
+                ..Tunables::default()
+            },
+        )
+        .unwrap();
+        assert_eq!(pool.instances.memories.memory_size, 2 * 65536);
    }

    #[cfg(all(unix, target_pointer_width = "64", feature = "async"))]
--- a/crates/runtime/src/memory.rs
+++ b/crates/runtime/src/memory.rs
@@ -242,7 +242,7 @@ impl MmapMemory {
                    minimum,
                    alloc_bytes + extra_to_reserve_on_growth,
                );
-                slot.instantiate(minimum, Some(image))?;
+                slot.instantiate(minimum, Some(image), &plan.style)?;
                // On drop, we will unmap our mmap'd range that this slot was
                // mapped on top of, so there is no need for the slot to wipe
                // it with an anonymous mapping first.