Add shared memories (#4187)

* Add shared memories This change adds the ability to use shared memories in Wasmtime when the [threads proposal] is enabled. Shared memories are annotated as `shared` in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are protected from concurrent access during `memory.size` and `memory.grow`. [threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md In order to implement this in Wasmtime, there are two main cases to cover: - a program may simply create a shared memory and possibly export it; this means that Wasmtime itself must be able to create shared memories - a user may create a shared memory externally and pass it in as an import during instantiation; this is the case when the program contains code like `(import "env" "memory" (memory 1 1 shared))`--this case is handled by a new Wasmtime API type--`SharedMemory` Because of the first case, this change allows any of the current memory-creation mechanisms to work as-is. Wasmtime can still create either static or dynamic memories in either on-demand or pooling modes, and any of these memories can be considered shared. When shared, the `Memory` runtime container will lock appropriately during `memory.size` and `memory.grow` operations; since all memories use this container, it is an ideal place for implementing the locking once and once only. The second case is covered by the new `SharedMemory` structure. It uses the same `Mmap` allocation under the hood as non-shared memories, but allows the user to perform the allocation externally to Wasmtime and share the memory across threads (via an `Arc`). The pointer address to the actual memory is carefully wired through and owned by the `SharedMemory` structure itself. This means that there are differing views of where to access the pointer (i.e., `VMMemoryDefinition`): for owned memories (the default), the `VMMemoryDefinition` is stored directly by the `VMContext`; in the `SharedMemory` case, however, this `VMContext` must point to this separate structure. To ensure that the `VMContext` can always point to the correct `VMMemoryDefinition`, this change alters the `VMContext` structure. Since a `SharedMemory` owns its own `VMMemoryDefinition`, the `defined_memories` table in the `VMContext` becomes a sequence of pointers--in the shared memory case, they point to the `VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a new table, `owned_memories`. This change adds an additional indirection (through the `*mut VMMemoryDefinition` pointer) that could add overhead. Using an imported memory as a proxy, we measured a 1-3% overhead of this approach on the `pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will special-case the owned memory access (i.e., load a pointer directly to the `owned_memories` entry) for `memory.size` so that only shared memories (and imported memories, as before) incur the indirection cost. * review: remove thread feature check * review: swap wasmtime-types dependency for existing wasmtime-environ use * review: remove unused VMMemoryUnion * review: reword cross-engine error message * review: improve tests * review: refactor to separate prevent Memory <-> SharedMemory conversion * review: into_shared_memory -> as_shared_memory * review: remove commented out code * review: limit shared min/max to 32 bits * review: skip imported memories * review: imported memories are not owned * review: remove TODO * review: document unsafe send + sync * review: add limiter assertion * review: remove TODO * review: improve tests * review: fix doc test * fix: fixes based on discussion with Alex This changes several key parts: - adds memory indexes to imports and exports - makes `VMMemoryDefinition::current_length` an atomic usize * review: add `Extern::SharedMemory` * review: remove TODO * review: atomically load from VMMemoryDescription in JIT-generated code * review: add test probing the last available memory slot across threads * fix: move assertion to new location due to rebase * fix: doc link * fix: add TODOs to c-api * fix: broken doc link * fix: modify pooling allocator messages in tests * review: make owned_memory_index panic instead of returning an option * review: clarify calculation of num_owned_memories * review: move 'use' to top of file * review: change '*const [u8]' to '*mut [u8]' * review: remove TODO * review: avoid hard-coding memory index * review: remove 'preallocation' parameter from 'Memory::_new' * fix: component model memory length * review: check that shared memory plans are static * review: ignore growth limits for shared memory * review: improve atomic store comment * review: add FIXME for memory growth failure * review: add comment about absence of bounds-checked 'memory.size' * review: make 'current_length()' doc comment more precise * review: more comments related to memory.size non-determinism * review: make 'vmmemory' unreachable for shared memory * review: move code around * review: thread plan through to 'wrap()' * review: disallow shared memory allocation with the pooling allocator
2022-06-08 10:13:40 -07:00
parent ed9db962de
commit 2b52f47b83
27 changed files with 1211 additions and 226 deletions
--- a/crates/cranelift/src/func_environ.rs
+++ b/crates/cranelift/src/func_environ.rs
@@ -1368,18 +1368,37 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m

    fn make_heap(&mut self, func: &mut ir::Function, index: MemoryIndex) -> WasmResult<ir::Heap> {
        let pointer_type = self.pointer_type();
-
+        let is_shared = self.module.memory_plans[index].memory.shared;
        let (ptr, base_offset, current_length_offset) = {
            let vmctx = self.vmctx(func);
            if let Some(def_index) = self.module.defined_memory_index(index) {
-                let base_offset =
-                    i32::try_from(self.offsets.vmctx_vmmemory_definition_base(def_index)).unwrap();
-                let current_length_offset = i32::try_from(
-                    self.offsets
-                        .vmctx_vmmemory_definition_current_length(def_index),
-                )
-                .unwrap();
-                (vmctx, base_offset, current_length_offset)
+                if is_shared {
+                    // As with imported memory, the `VMMemoryDefinition` for a
+                    // shared memory is stored elsewhere. We store a `*mut
+                    // VMMemoryDefinition` to it and dereference that when
+                    // atomically growing it.
+                    let from_offset = self.offsets.vmctx_vmmemory_pointer(def_index);
+                    let memory = func.create_global_value(ir::GlobalValueData::Load {
+                        base: vmctx,
+                        offset: Offset32::new(i32::try_from(from_offset).unwrap()),
+                        global_type: pointer_type,
+                        readonly: true,
+                    });
+                    let base_offset = i32::from(self.offsets.vmmemory_definition_base());
+                    let current_length_offset =
+                        i32::from(self.offsets.vmmemory_definition_current_length());
+                    (memory, base_offset, current_length_offset)
+                } else {
+                    let owned_index = self.module.owned_memory_index(def_index);
+                    let owned_base_offset =
+                        self.offsets.vmctx_vmmemory_definition_base(owned_index);
+                    let owned_length_offset = self
+                        .offsets
+                        .vmctx_vmmemory_definition_current_length(owned_index);
+                    let current_base_offset = i32::try_from(owned_base_offset).unwrap();
+                    let current_length_offset = i32::try_from(owned_length_offset).unwrap();
+                    (vmctx, current_base_offset, current_length_offset)
+                }
            } else {
                let from_offset = self.offsets.vmctx_vmmemory_import_from(index);
                let memory = func.create_global_value(ir::GlobalValueData::Load {
@@ -1693,28 +1712,65 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
    ) -> WasmResult<ir::Value> {
        let pointer_type = self.pointer_type();
        let vmctx = self.vmctx(&mut pos.func);
+        let is_shared = self.module.memory_plans[index].memory.shared;
        let base = pos.ins().global_value(pointer_type, vmctx);
        let current_length_in_bytes = match self.module.defined_memory_index(index) {
            Some(def_index) => {
-                let offset = i32::try_from(
-                    self.offsets
-                        .vmctx_vmmemory_definition_current_length(def_index),
-                )
-                .unwrap();
-                pos.ins()
-                    .load(pointer_type, ir::MemFlags::trusted(), base, offset)
+                if is_shared {
+                    let offset =
+                        i32::try_from(self.offsets.vmctx_vmmemory_pointer(def_index)).unwrap();
+                    let vmmemory_ptr =
+                        pos.ins()
+                            .load(pointer_type, ir::MemFlags::trusted(), base, offset);
+                    let vmmemory_definition_offset =
+                        i64::from(self.offsets.vmmemory_definition_current_length());
+                    let vmmemory_definition_ptr =
+                        pos.ins().iadd_imm(vmmemory_ptr, vmmemory_definition_offset);
+                    // This atomic access of the
+                    // `VMMemoryDefinition::current_length` is direct; no bounds
+                    // check is needed. This is possible because shared memory
+                    // has a static size (the maximum is always known). Shared
+                    // memory is thus built with a static memory plan and no
+                    // bounds-checked version of this is implemented.
+                    pos.ins().atomic_load(
+                        pointer_type,
+                        ir::MemFlags::trusted(),
+                        vmmemory_definition_ptr,
+                    )
+                } else {
+                    let owned_index = self.module.owned_memory_index(def_index);
+                    let offset = i32::try_from(
+                        self.offsets
+                            .vmctx_vmmemory_definition_current_length(owned_index),
+                    )
+                    .unwrap();
+                    pos.ins()
+                        .load(pointer_type, ir::MemFlags::trusted(), base, offset)
+                }
            }
            None => {
                let offset = i32::try_from(self.offsets.vmctx_vmmemory_import_from(index)).unwrap();
                let vmmemory_ptr =
                    pos.ins()
                        .load(pointer_type, ir::MemFlags::trusted(), base, offset);
-                pos.ins().load(
-                    pointer_type,
-                    ir::MemFlags::trusted(),
-                    vmmemory_ptr,
-                    i32::from(self.offsets.vmmemory_definition_current_length()),
-                )
+                if is_shared {
+                    let vmmemory_definition_offset =
+                        i64::from(self.offsets.vmmemory_definition_current_length());
+                    let vmmemory_definition_ptr =
+                        pos.ins().iadd_imm(vmmemory_ptr, vmmemory_definition_offset);
+                    pos.ins().atomic_load(
+                        pointer_type,
+                        ir::MemFlags::trusted(),
+                        vmmemory_definition_ptr,
+                    )
+                } else {
+                    pos.ins().load(
+                        pointer_type,
+                        ir::MemFlags::trusted(),
+                        vmmemory_ptr,
+                        i32::from(self.offsets.vmmemory_definition_current_length()),
+                    )
+                }
            }
        };
        let current_length_in_pages = pos