Use an mmap-friendly serialization format (#3257)
* Use an mmap-friendly serialization format This commit reimplements the main serialization format for Wasmtime's precompiled artifacts. Previously they were generally a binary blob of `bincode`-encoded metadata prefixed with some versioning information. The downside of this format, though, is that loading a precompiled artifact required pushing all information through `bincode`. This is inefficient when some data, such as trap/address tables, are rarely accessed. The new format added in this commit is one which is designed to be `mmap`-friendly. This means that the relevant parts of the precompiled artifact are already page-aligned for updating permissions of pieces here and there. Additionally the artifact is optimized so that if data is rarely read then we can delay reading it until necessary. The new artifact format for serialized modules is an ELF file. This is not a public API guarantee, so it cannot be relied upon. In the meantime though this is quite useful for exploring precompiled modules with standard tooling like `objdump`. The ELF file is already constructed as part of module compilation, and this is the main contents of the serialized artifact. THere is some extra information, though, not encoded in each module's individual ELF file such as type information. This information continues to be `bincode`-encoded, but it's intended to be much smaller and much faster to deserialize. This extra information is appended to the end of the ELF file. This means that the original ELF file is still a valid ELF file, we just get to have extra bits at the end. More information on the new format can be found in the module docs of the serialization module of Wasmtime. Another refatoring implemented as part of this commit is to deserialize and store object files directly in `mmap`-backed storage. This avoids the need to copy bytes after the artifact is loaded into memory for each compiled module, and in a future commit it opens up the door to avoiding copying the text section into a `CodeMemory`. For now, though, the main change is that copies are not necessary when loading from a precompiled compilation artifact once the artifact is itself in mmap-based memory. To assist with managing `mmap`-based memory a new `MmapVec` type was added to `wasmtime_jit` which acts as a form of `Vec<T>` backed by a `wasmtime_runtime::Mmap`. This type notably supports `drain(..N)` to slice the buffer into disjoint regions that are all separately owned, such as having a separately owned window into one artifact for all object files contained within. Finally this commit implements a small refactoring in `wasmtime-cache` to use the standard artifact format for cache entries rather than a bincode-encoded version. This required some more hooks for serializing/deserializing but otherwise the crate still performs as before. * Review comments
This commit is contained in:
65
crates/cache/src/lib.rs
vendored
65
crates/cache/src/lib.rs
vendored
@@ -42,13 +42,40 @@ impl<'config> ModuleCacheEntry<'config> {
|
||||
Self(Some(inner))
|
||||
}
|
||||
|
||||
/// Gets cached data if state matches, otherwise calls the `compute`.
|
||||
// NOTE: This takes a function pointer instead of a closure so that it doesn't accidentally
|
||||
// close over something not accounted in the cache.
|
||||
pub fn get_data<T, U, E>(&self, state: T, compute: fn(T) -> Result<U, E>) -> Result<U, E>
|
||||
/// Gets cached data if state matches, otherwise calls `compute`.
|
||||
///
|
||||
/// Data is automatically serialized/deserialized with `bincode`.
|
||||
pub fn get_data<T, U, E>(&self, state: T, compute: fn(&T) -> Result<U, E>) -> Result<U, E>
|
||||
where
|
||||
T: Hash,
|
||||
U: Serialize + for<'a> Deserialize<'a>,
|
||||
{
|
||||
self.get_data_raw(
|
||||
&state,
|
||||
compute,
|
||||
|_state, data| bincode::serialize(data).ok(),
|
||||
|_state, data| bincode::deserialize(&data).ok(),
|
||||
)
|
||||
}
|
||||
|
||||
/// Gets cached data if state matches, otherwise calls `compute`.
|
||||
///
|
||||
/// If the cache is disabled or no cached data is found then `compute` is
|
||||
/// called to calculate the data. If the data was found in cache it is
|
||||
/// passed to `deserialize`, which if successful will be the returned value.
|
||||
/// When computed the `serialize` function is used to generate the bytes
|
||||
/// from the returned value.
|
||||
pub fn get_data_raw<T, U, E>(
|
||||
&self,
|
||||
state: &T,
|
||||
// NOTE: These are function pointers instead of closures so that they
|
||||
// don't accidentally close over something not accounted in the cache.
|
||||
compute: fn(&T) -> Result<U, E>,
|
||||
serialize: fn(&T, &U) -> Option<Vec<u8>>,
|
||||
deserialize: fn(&T, Vec<u8>) -> Option<U>,
|
||||
) -> Result<U, E>
|
||||
where
|
||||
T: Hash,
|
||||
{
|
||||
let inner = match &self.0 {
|
||||
Some(inner) => inner,
|
||||
@@ -62,14 +89,18 @@ impl<'config> ModuleCacheEntry<'config> {
|
||||
let hash = base64::encode_config(&hash, base64::URL_SAFE_NO_PAD);
|
||||
|
||||
if let Some(cached_val) = inner.get_data(&hash) {
|
||||
let mod_cache_path = inner.root_path.join(&hash);
|
||||
inner.cache_config.on_cache_get_async(&mod_cache_path); // call on success
|
||||
return Ok(cached_val);
|
||||
if let Some(val) = deserialize(state, cached_val) {
|
||||
let mod_cache_path = inner.root_path.join(&hash);
|
||||
inner.cache_config.on_cache_get_async(&mod_cache_path); // call on success
|
||||
return Ok(val);
|
||||
}
|
||||
}
|
||||
let val_to_cache = compute(state)?;
|
||||
if inner.update_data(&hash, &val_to_cache).is_some() {
|
||||
let mod_cache_path = inner.root_path.join(&hash);
|
||||
inner.cache_config.on_cache_update_async(&mod_cache_path); // call on success
|
||||
if let Some(bytes) = serialize(state, &val_to_cache) {
|
||||
if inner.update_data(&hash, &bytes).is_some() {
|
||||
let mod_cache_path = inner.root_path.join(&hash);
|
||||
inner.cache_config.on_cache_update_async(&mod_cache_path); // call on success
|
||||
}
|
||||
}
|
||||
Ok(val_to_cache)
|
||||
}
|
||||
@@ -118,27 +149,19 @@ impl<'config> ModuleCacheEntryInner<'config> {
|
||||
}
|
||||
}
|
||||
|
||||
fn get_data<T>(&self, hash: &str) -> Option<T>
|
||||
where
|
||||
T: for<'a> Deserialize<'a>,
|
||||
{
|
||||
fn get_data(&self, hash: &str) -> Option<Vec<u8>> {
|
||||
let mod_cache_path = self.root_path.join(hash);
|
||||
trace!("get_data() for path: {}", mod_cache_path.display());
|
||||
let compressed_cache_bytes = fs::read(&mod_cache_path).ok()?;
|
||||
let cache_bytes = zstd::decode_all(&compressed_cache_bytes[..])
|
||||
.map_err(|err| warn!("Failed to decompress cached code: {}", err))
|
||||
.ok()?;
|
||||
bincode::deserialize(&cache_bytes[..])
|
||||
.map_err(|err| warn!("Failed to deserialize cached code: {}", err))
|
||||
.ok()
|
||||
Some(cache_bytes)
|
||||
}
|
||||
|
||||
fn update_data<T: Serialize>(&self, hash: &str, data: &T) -> Option<()> {
|
||||
fn update_data(&self, hash: &str, serialized_data: &[u8]) -> Option<()> {
|
||||
let mod_cache_path = self.root_path.join(hash);
|
||||
trace!("update_data() for path: {}", mod_cache_path.display());
|
||||
let serialized_data = bincode::serialize(&data)
|
||||
.map_err(|err| warn!("Failed to serialize cached code: {}", err))
|
||||
.ok()?;
|
||||
let compressed_data = zstd::encode_all(
|
||||
&serialized_data[..],
|
||||
self.cache_config.baseline_compression_level(),
|
||||
|
||||
Reference in New Issue
Block a user