Add shared memories (#4187)

* Add shared memories

This change adds the ability to use shared memories in Wasmtime when the
[threads proposal] is enabled. Shared memories are annotated as `shared`
in the WebAssembly syntax, e.g., `(memory 1 1 shared)`, and are
protected from concurrent access during `memory.size` and `memory.grow`.

[threads proposal]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md

In order to implement this in Wasmtime, there are two main cases to
cover:
    - a program may simply create a shared memory and possibly export it;
    this means that Wasmtime itself must be able to create shared
    memories
    - a user may create a shared memory externally and pass it in as an
    import during instantiation; this is the case when the program
    contains code like `(import "env" "memory" (memory 1 1
    shared))`--this case is handled by a new Wasmtime API
    type--`SharedMemory`

Because of the first case, this change allows any of the current
memory-creation mechanisms to work as-is. Wasmtime can still create
either static or dynamic memories in either on-demand or pooling modes,
and any of these memories can be considered shared. When shared, the
`Memory` runtime container will lock appropriately during `memory.size`
and `memory.grow` operations; since all memories use this container, it
is an ideal place for implementing the locking once and once only.

The second case is covered by the new `SharedMemory` structure. It uses
the same `Mmap` allocation under the hood as non-shared memories, but
allows the user to perform the allocation externally to Wasmtime and
share the memory across threads (via an `Arc`). The pointer address to
the actual memory is carefully wired through and owned by the
`SharedMemory` structure itself. This means that there are differing
views of where to access the pointer (i.e., `VMMemoryDefinition`): for
owned memories (the default), the `VMMemoryDefinition` is stored
directly by the `VMContext`; in the `SharedMemory` case, however, this
`VMContext` must point to this separate structure.

To ensure that the `VMContext` can always point to the correct
`VMMemoryDefinition`, this change alters the `VMContext` structure.
Since a `SharedMemory` owns its own `VMMemoryDefinition`, the
`defined_memories` table in the `VMContext` becomes a sequence of
pointers--in the shared memory case, they point to the
`VMMemoryDefinition` owned by the `SharedMemory` and in the owned memory
case (i.e., not shared) they point to `VMMemoryDefinition`s stored in a
new table, `owned_memories`.

This change adds an additional indirection (through the `*mut
VMMemoryDefinition` pointer) that could add overhead. Using an imported
memory as a proxy, we measured a 1-3% overhead of this approach on the
`pulldown-cmark` benchmark. To avoid this, Cranelift-generated code will
special-case the owned memory access (i.e., load a pointer directly to
the `owned_memories` entry) for `memory.size` so that only
shared memories (and imported memories, as before) incur the indirection
cost.

* review: remove thread feature check

* review: swap wasmtime-types dependency for existing wasmtime-environ use

* review: remove unused VMMemoryUnion

* review: reword cross-engine error message

* review: improve tests

* review: refactor to separate prevent Memory <-> SharedMemory conversion

* review: into_shared_memory -> as_shared_memory

* review: remove commented out code

* review: limit shared min/max to 32 bits

* review: skip imported memories

* review: imported memories are not owned

* review: remove TODO

* review: document unsafe send + sync

* review: add limiter assertion

* review: remove TODO

* review: improve tests

* review: fix doc test

* fix: fixes based on discussion with Alex

This changes several key parts:
 - adds memory indexes to imports and exports
 - makes `VMMemoryDefinition::current_length` an atomic usize

* review: add `Extern::SharedMemory`

* review: remove TODO

* review: atomically load from VMMemoryDescription in JIT-generated code

* review: add test probing the last available memory slot across threads

* fix: move assertion to new location due to rebase

* fix: doc link

* fix: add TODOs to c-api

* fix: broken doc link

* fix: modify pooling allocator messages in tests

* review: make owned_memory_index panic instead of returning an option

* review: clarify calculation of num_owned_memories

* review: move 'use' to top of file

* review: change '*const [u8]' to '*mut [u8]'

* review: remove TODO

* review: avoid hard-coding memory index

* review: remove 'preallocation' parameter from 'Memory::_new'

* fix: component model memory length

* review: check that shared memory plans are static

* review: ignore growth limits for shared memory

* review: improve atomic store comment

* review: add FIXME for memory growth failure

* review: add comment about absence of bounds-checked 'memory.size'

* review: make 'current_length()' doc comment more precise

* review: more comments related to memory.size non-determinism

* review: make 'vmmemory' unreachable for shared memory

* review: move code around

* review: thread plan through to 'wrap()'

* review: disallow shared memory allocation with the pooling allocator
This commit is contained in:
Andrew Brown
2022-06-08 10:13:40 -07:00
committed by GitHub
parent ed9db962de
commit 2b52f47b83
27 changed files with 1211 additions and 226 deletions

View File

@@ -1368,18 +1368,37 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
fn make_heap(&mut self, func: &mut ir::Function, index: MemoryIndex) -> WasmResult<ir::Heap> {
let pointer_type = self.pointer_type();
let is_shared = self.module.memory_plans[index].memory.shared;
let (ptr, base_offset, current_length_offset) = {
let vmctx = self.vmctx(func);
if let Some(def_index) = self.module.defined_memory_index(index) {
let base_offset =
i32::try_from(self.offsets.vmctx_vmmemory_definition_base(def_index)).unwrap();
let current_length_offset = i32::try_from(
self.offsets
.vmctx_vmmemory_definition_current_length(def_index),
)
.unwrap();
(vmctx, base_offset, current_length_offset)
if is_shared {
// As with imported memory, the `VMMemoryDefinition` for a
// shared memory is stored elsewhere. We store a `*mut
// VMMemoryDefinition` to it and dereference that when
// atomically growing it.
let from_offset = self.offsets.vmctx_vmmemory_pointer(def_index);
let memory = func.create_global_value(ir::GlobalValueData::Load {
base: vmctx,
offset: Offset32::new(i32::try_from(from_offset).unwrap()),
global_type: pointer_type,
readonly: true,
});
let base_offset = i32::from(self.offsets.vmmemory_definition_base());
let current_length_offset =
i32::from(self.offsets.vmmemory_definition_current_length());
(memory, base_offset, current_length_offset)
} else {
let owned_index = self.module.owned_memory_index(def_index);
let owned_base_offset =
self.offsets.vmctx_vmmemory_definition_base(owned_index);
let owned_length_offset = self
.offsets
.vmctx_vmmemory_definition_current_length(owned_index);
let current_base_offset = i32::try_from(owned_base_offset).unwrap();
let current_length_offset = i32::try_from(owned_length_offset).unwrap();
(vmctx, current_base_offset, current_length_offset)
}
} else {
let from_offset = self.offsets.vmctx_vmmemory_import_from(index);
let memory = func.create_global_value(ir::GlobalValueData::Load {
@@ -1693,28 +1712,65 @@ impl<'module_environment> cranelift_wasm::FuncEnvironment for FuncEnvironment<'m
) -> WasmResult<ir::Value> {
let pointer_type = self.pointer_type();
let vmctx = self.vmctx(&mut pos.func);
let is_shared = self.module.memory_plans[index].memory.shared;
let base = pos.ins().global_value(pointer_type, vmctx);
let current_length_in_bytes = match self.module.defined_memory_index(index) {
Some(def_index) => {
let offset = i32::try_from(
self.offsets
.vmctx_vmmemory_definition_current_length(def_index),
)
.unwrap();
pos.ins()
.load(pointer_type, ir::MemFlags::trusted(), base, offset)
if is_shared {
let offset =
i32::try_from(self.offsets.vmctx_vmmemory_pointer(def_index)).unwrap();
let vmmemory_ptr =
pos.ins()
.load(pointer_type, ir::MemFlags::trusted(), base, offset);
let vmmemory_definition_offset =
i64::from(self.offsets.vmmemory_definition_current_length());
let vmmemory_definition_ptr =
pos.ins().iadd_imm(vmmemory_ptr, vmmemory_definition_offset);
// This atomic access of the
// `VMMemoryDefinition::current_length` is direct; no bounds
// check is needed. This is possible because shared memory
// has a static size (the maximum is always known). Shared
// memory is thus built with a static memory plan and no
// bounds-checked version of this is implemented.
pos.ins().atomic_load(
pointer_type,
ir::MemFlags::trusted(),
vmmemory_definition_ptr,
)
} else {
let owned_index = self.module.owned_memory_index(def_index);
let offset = i32::try_from(
self.offsets
.vmctx_vmmemory_definition_current_length(owned_index),
)
.unwrap();
pos.ins()
.load(pointer_type, ir::MemFlags::trusted(), base, offset)
}
}
None => {
let offset = i32::try_from(self.offsets.vmctx_vmmemory_import_from(index)).unwrap();
let vmmemory_ptr =
pos.ins()
.load(pointer_type, ir::MemFlags::trusted(), base, offset);
pos.ins().load(
pointer_type,
ir::MemFlags::trusted(),
vmmemory_ptr,
i32::from(self.offsets.vmmemory_definition_current_length()),
)
if is_shared {
let vmmemory_definition_offset =
i64::from(self.offsets.vmmemory_definition_current_length());
let vmmemory_definition_ptr =
pos.ins().iadd_imm(vmmemory_ptr, vmmemory_definition_offset);
pos.ins().atomic_load(
pointer_type,
ir::MemFlags::trusted(),
vmmemory_definition_ptr,
)
} else {
pos.ins().load(
pointer_type,
ir::MemFlags::trusted(),
vmmemory_ptr,
i32::from(self.offsets.vmmemory_definition_current_length()),
)
}
}
};
let current_length_in_pages = pos