Move address maps to a section of the compiled image (#3240)
This commit moves the `address_map` field of `FunctionInfo` into a custom-encoded section of the executable. The goal of this commit is, as previous commits, to push less data through `bincode`. The `address_map` field is actually extremely large and has huge benefits of not being decoded when we load a module. This data is only used for traps and such as well, so it's not overly important that it's massaged in to precise data the runtime can extremely speedily use. The `FunctionInfo` type does retain a tiny bit of information about the function itself (it's start source location), but other than that the `FunctionAddressMap` structure is moved from `wasmtime-environ` to `wasmtime-cranelift` since it's now no longer needed outside of that context.
This commit is contained in:
@@ -1,7 +1,11 @@
|
||||
//! Data structures to provide transformation of the source
|
||||
// addresses of a WebAssembly module into the native code.
|
||||
|
||||
use object::write::{Object, StandardSegment};
|
||||
use object::{Bytes, LittleEndian, SectionKind, U32Bytes};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::convert::TryFrom;
|
||||
use std::ops::Range;
|
||||
|
||||
/// Single source location to generated address mapping.
|
||||
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq)]
|
||||
@@ -15,32 +19,6 @@ pub struct InstructionAddressMap {
|
||||
pub code_offset: u32,
|
||||
}
|
||||
|
||||
/// Function and its instructions addresses mappings.
|
||||
#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, Eq, Default)]
|
||||
pub struct FunctionAddressMap {
|
||||
/// An array of data for the instructions in this function, indicating where
|
||||
/// each instruction maps back to in the original function.
|
||||
///
|
||||
/// This array is sorted least-to-greatest by the `code_offset` field.
|
||||
/// Additionally the span of each `InstructionAddressMap` is implicitly the
|
||||
/// gap between it and the next item in the array.
|
||||
pub instructions: Box<[InstructionAddressMap]>,
|
||||
|
||||
/// Function's initial offset in the source file, specified in bytes from
|
||||
/// the front of the file.
|
||||
pub start_srcloc: FilePos,
|
||||
|
||||
/// Function's end offset in the source file, specified in bytes from
|
||||
/// the front of the file.
|
||||
pub end_srcloc: FilePos,
|
||||
|
||||
/// Generated function body offset if applicable, otherwise 0.
|
||||
pub body_offset: usize,
|
||||
|
||||
/// Generated function body length.
|
||||
pub body_len: u32,
|
||||
}
|
||||
|
||||
/// A position within an original source file,
|
||||
///
|
||||
/// This structure is used as a newtype wrapper around a 32-bit integer which
|
||||
@@ -74,3 +52,140 @@ impl Default for FilePos {
|
||||
FilePos(u32::MAX)
|
||||
}
|
||||
}
|
||||
|
||||
/// Builder for the address map section of a wasmtime compilation image.
|
||||
///
|
||||
/// This builder is used to conveniently built the `ELF_WASMTIME_ADDRMAP`
|
||||
/// section by compilers, and provides utilities to directly insert the results
|
||||
/// into an `Object`.
|
||||
#[derive(Default)]
|
||||
pub struct AddressMapSection {
|
||||
offsets: Vec<U32Bytes<LittleEndian>>,
|
||||
positions: Vec<U32Bytes<LittleEndian>>,
|
||||
last_offset: u32,
|
||||
}
|
||||
|
||||
/// A custom Wasmtime-specific section of our compilation image which stores
|
||||
/// mapping data from offsets in the image to offset in the original wasm
|
||||
/// binary.
|
||||
///
|
||||
/// This section has a custom binary encoding. Currently its encoding is:
|
||||
///
|
||||
/// * The section starts with a 32-bit little-endian integer. This integer is
|
||||
/// how many entries are in the following two arrays.
|
||||
/// * Next is an array with the previous count number of 32-bit little-endian
|
||||
/// integers. This array is a sorted list of relative offsets within the text
|
||||
/// section. This is intended to be a lookup array to perform a binary search
|
||||
/// on an offset within the text section on this array.
|
||||
/// * Finally there is another array, with the same count as before, also of
|
||||
/// 32-bit little-endian integers. These integers map 1:1 with the previous
|
||||
/// array of offsets, and correspond to what the original offset was in the
|
||||
/// wasm file.
|
||||
///
|
||||
/// Decoding this section is intentionally simple, it only requires loading a
|
||||
/// 32-bit little-endian integer plus some bounds checks. Reading this section
|
||||
/// is done with the `lookup_file_pos` function below. Reading involves
|
||||
/// performing a binary search on the first array using the index found for the
|
||||
/// native code offset to index into the second array and find the wasm code
|
||||
/// offset.
|
||||
///
|
||||
/// At this time this section has an alignment of 1, which means all reads of it
|
||||
/// are unaligned. Additionally at this time the 32-bit encodings chosen here
|
||||
/// mean that >=4gb text sections are not supported.
|
||||
pub const ELF_WASMTIME_ADDRMAP: &str = ".wasmtime.addrmap";
|
||||
|
||||
impl AddressMapSection {
|
||||
/// Pushes a new set of instruction mapping information for a function added
|
||||
/// in the exectuable.
|
||||
///
|
||||
/// The `func` argument here is the range of the function, relative to the
|
||||
/// start of the text section in the executable. The `instrs` provided are
|
||||
/// the descriptors for instructions in the function and their various
|
||||
/// mappings back to original source positions.
|
||||
///
|
||||
/// This is required to be called for `func` values that are strictly
|
||||
/// increasing in addresses (e.g. as the object is built). Additionally the
|
||||
/// `instrs` map must be sorted based on code offset in the native text
|
||||
/// section.
|
||||
pub fn push(&mut self, func: Range<u64>, instrs: &[InstructionAddressMap]) {
|
||||
// NB: for now this only supports <=4GB text sections in object files.
|
||||
// Alternative schemes will need to be created for >32-bit offsets to
|
||||
// avoid making this section overly large.
|
||||
let func_start = u32::try_from(func.start).unwrap();
|
||||
let func_end = u32::try_from(func.end).unwrap();
|
||||
|
||||
self.offsets.reserve(instrs.len());
|
||||
self.positions.reserve(instrs.len());
|
||||
for map in instrs {
|
||||
// Sanity-check to ensure that functions are pushed in-order, otherwise
|
||||
// the `offsets` array won't be sorted which is our goal.
|
||||
let pos = func_start + map.code_offset;
|
||||
assert!(pos >= self.last_offset);
|
||||
self.offsets.push(U32Bytes::new(LittleEndian, pos));
|
||||
self.positions
|
||||
.push(U32Bytes::new(LittleEndian, map.srcloc.0));
|
||||
self.last_offset = pos;
|
||||
}
|
||||
self.last_offset = func_end;
|
||||
}
|
||||
|
||||
/// Finishes encoding this section into the `Object` provided.
|
||||
pub fn append_to(self, obj: &mut Object) {
|
||||
let section = obj.add_section(
|
||||
obj.segment_name(StandardSegment::Data).to_vec(),
|
||||
ELF_WASMTIME_ADDRMAP.as_bytes().to_vec(),
|
||||
SectionKind::ReadOnlyData,
|
||||
);
|
||||
|
||||
// NB: this matches the encoding expected by `lookup` below.
|
||||
let amt = u32::try_from(self.offsets.len()).unwrap();
|
||||
obj.append_section_data(section, &amt.to_le_bytes(), 1);
|
||||
obj.append_section_data(section, object::bytes_of_slice(&self.offsets), 1);
|
||||
obj.append_section_data(section, object::bytes_of_slice(&self.positions), 1);
|
||||
}
|
||||
}
|
||||
|
||||
/// Lookup an `offset` within an encoded address map section, returning the
|
||||
/// original `FilePos` that corresponds to the offset, if found.
|
||||
///
|
||||
/// This function takes a `section` as its first argument which must have been
|
||||
/// created with `AddressMapSection` above. This is intended to be the raw
|
||||
/// `ELF_WASMTIME_ADDRMAP` section from the compilation artifact.
|
||||
///
|
||||
/// The `offset` provided is a relative offset from the start of the text
|
||||
/// section of the pc that is being looked up. If `offset` is out of range or
|
||||
/// doesn't correspond to anything in this file then `None` is returned.
|
||||
pub fn lookup_file_pos(section: &[u8], offset: usize) -> Option<FilePos> {
|
||||
let mut section = Bytes(section);
|
||||
// NB: this matches the encoding written by `append_to` above.
|
||||
let count = section.read::<U32Bytes<LittleEndian>>().ok()?;
|
||||
let count = usize::try_from(count.get(LittleEndian)).ok()?;
|
||||
let (offsets, section) =
|
||||
object::slice_from_bytes::<U32Bytes<LittleEndian>>(section.0, count).ok()?;
|
||||
let (positions, section) =
|
||||
object::slice_from_bytes::<U32Bytes<LittleEndian>>(section, count).ok()?;
|
||||
debug_assert!(section.is_empty());
|
||||
|
||||
// First perform a binary search on the `offsets` array. This is a sorted
|
||||
// array of offsets within the text section, which is conveniently what our
|
||||
// `offset` also is. Note that we are somewhat unlikely to find a precise
|
||||
// match on the element in the array, so we're largely interested in which
|
||||
// "bucket" the `offset` falls into.
|
||||
let offset = u32::try_from(offset).ok()?;
|
||||
let index = match offsets.binary_search_by_key(&offset, |v| v.get(LittleEndian)) {
|
||||
// Exact hit!
|
||||
Ok(i) => i,
|
||||
|
||||
// This *would* be at the first slot in the array, so no
|
||||
// instructions cover `pc`.
|
||||
Err(0) => return None,
|
||||
|
||||
// This would be at the `nth` slot, so we're at the `n-1`th slot.
|
||||
Err(n) => n - 1,
|
||||
};
|
||||
|
||||
// Using the `index` we found of which bucket `offset` corresponds to we can
|
||||
// lookup the actual `FilePos` value in the `positions` array.
|
||||
let pos = positions.get(index)?;
|
||||
Some(FilePos(pos.get(LittleEndian)))
|
||||
}
|
||||
|
||||
@@ -2,8 +2,8 @@
|
||||
//! module.
|
||||
|
||||
use crate::{
|
||||
DefinedFuncIndex, FunctionAddressMap, FunctionBodyData, ModuleTranslation, PrimaryMap,
|
||||
StackMap, Tunables, TypeTables, WasmError, WasmFuncType,
|
||||
DefinedFuncIndex, FilePos, FunctionBodyData, ModuleTranslation, PrimaryMap, StackMap, Tunables,
|
||||
TypeTables, WasmError, WasmFuncType,
|
||||
};
|
||||
use anyhow::Result;
|
||||
use object::write::Object;
|
||||
@@ -21,7 +21,7 @@ use thiserror::Error;
|
||||
#[allow(missing_docs)]
|
||||
pub struct FunctionInfo {
|
||||
pub traps: Vec<TrapInformation>,
|
||||
pub address_map: FunctionAddressMap,
|
||||
pub start_srcloc: FilePos,
|
||||
pub stack_maps: Vec<StackMapInformation>,
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user