Rework/simplify unwind infrastructure and implement Windows unwind.
Our previous implementation of unwind infrastructure was somewhat complex and brittle: it parsed generated instructions in order to reverse-engineer unwind info from prologues. It also relied on some fragile linkage to communicate instruction-layout information that VCode was not designed to provide. A much simpler, more reliable, and easier-to-reason-about approach is to embed unwind directives as pseudo-instructions in the prologue as we generate it. That way, we can say what we mean and just emit it directly. The usual reasoning that leads to the reverse-engineering approach is that metadata is hard to keep in sync across optimization passes; but here, (i) prologues are generated at the very end of the pipeline, and (ii) if we ever do a post-prologue-gen optimization, we can treat unwind directives as black boxes with unknown side-effects, just as we do for some other pseudo-instructions today. It turns out that it was easier to just build this for both x64 and aarch64 (since they share a factored-out ABI implementation), and wire up the platform-specific unwind-info generation for Windows and SystemV. Now we have simpler unwind on all platforms and we can delete the old unwind infra as soon as we remove the old backend. There were a few consequences to supporting Fastcall unwind in particular that led to a refactor of the common ABI. Windows only supports naming clobbered-register save locations within 240 bytes of the frame-pointer register, whatever one chooses that to be (RSP or RBP). We had previously saved clobbers below the fixed frame (and below nominal-SP). The 240-byte range has to include the old RBP too, so we're forced to place clobbers at the top of the frame, just below saved RBP/RIP. This is fine; we always keep a frame pointer anyway because we use it to refer to stack args. It does mean that offsets of fixed-frame slots (spillslots, stackslots) from RBP are no longer known before we do regalloc, so if we ever want to index these off of RBP rather than nominal-SP because we add support for `alloca` (dynamic frame growth), then we'll need a "nominal-BP" mode that is resolved after regalloc and clobber-save code is generated. I added a comment to this effect in `abi_impl.rs`. The above refactor touched both x64 and aarch64 because of shared code. This had a further effect in that the old aarch64 prologue generation subtracted from `sp` once to allocate space, then used stores to `[sp, offset]` to save clobbers. Unfortunately the offset only has 7-bit range, so if there are enough clobbered registers (and there can be -- aarch64 has 384 bytes of registers; at least one unit test hits this) the stores/loads will be out-of-range. I really don't want to synthesize large-offset sequences here; better to go back to the simpler pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like a "push". It's likely not much worse microarchitecturally (dependence chain on SP, but oh well) and it actually saves an instruction if there's no other frame to allocate. As a further advantage, it's much simpler to understand; simpler is usually better. This PR adds the new backend on Windows to CI as well.
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
//! Windows x64 ABI unwind information.
|
||||
|
||||
use crate::isa::{unwind::input, RegUnit};
|
||||
use crate::isa::unwind::input;
|
||||
use crate::result::{CodegenError, CodegenResult};
|
||||
use alloc::vec::Vec;
|
||||
use byteorder::{ByteOrder, LittleEndian};
|
||||
@@ -8,6 +8,11 @@ use log::warn;
|
||||
#[cfg(feature = "enable-serde")]
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
#[cfg(feature = "x64")]
|
||||
use crate::binemit::CodeOffset;
|
||||
#[cfg(feature = "x64")]
|
||||
use crate::isa::unwind::UnwindInst;
|
||||
|
||||
/// Maximum (inclusive) size of a "small" stack allocation
|
||||
const SMALL_ALLOC_MAX_SIZE: u32 = 128;
|
||||
/// Maximum (inclusive) size of a "large" stack allocation that can represented in 16-bits
|
||||
@@ -44,22 +49,31 @@ impl<'a> Writer<'a> {
|
||||
/// See: https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64
|
||||
/// Only what is needed to describe the prologues generated by the Cranelift x86 ISA are represented here.
|
||||
/// Note: the Cranelift x86 ISA RU enum matches the Windows unwind GPR encoding values.
|
||||
#[allow(dead_code)]
|
||||
#[derive(Clone, Debug, PartialEq, Eq)]
|
||||
#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
|
||||
pub(crate) enum UnwindCode {
|
||||
PushRegister {
|
||||
offset: u8,
|
||||
instruction_offset: u8,
|
||||
reg: u8,
|
||||
},
|
||||
SaveReg {
|
||||
instruction_offset: u8,
|
||||
reg: u8,
|
||||
stack_offset: u32,
|
||||
},
|
||||
SaveXmm {
|
||||
offset: u8,
|
||||
instruction_offset: u8,
|
||||
reg: u8,
|
||||
stack_offset: u32,
|
||||
},
|
||||
StackAlloc {
|
||||
offset: u8,
|
||||
instruction_offset: u8,
|
||||
size: u32,
|
||||
},
|
||||
SetFPReg {
|
||||
instruction_offset: u8,
|
||||
},
|
||||
}
|
||||
|
||||
impl UnwindCode {
|
||||
@@ -68,37 +82,63 @@ impl UnwindCode {
|
||||
PushNonvolatileRegister = 0,
|
||||
LargeStackAlloc = 1,
|
||||
SmallStackAlloc = 2,
|
||||
SetFPReg = 3,
|
||||
SaveNonVolatileRegister = 4,
|
||||
SaveNonVolatileRegisterFar = 5,
|
||||
SaveXmm128 = 8,
|
||||
SaveXmm128Far = 9,
|
||||
}
|
||||
|
||||
match self {
|
||||
Self::PushRegister { offset, reg } => {
|
||||
writer.write_u8(*offset);
|
||||
Self::PushRegister {
|
||||
instruction_offset,
|
||||
reg,
|
||||
} => {
|
||||
writer.write_u8(*instruction_offset);
|
||||
writer.write_u8((*reg << 4) | (UnwindOperation::PushNonvolatileRegister as u8));
|
||||
}
|
||||
Self::SaveXmm {
|
||||
offset,
|
||||
Self::SaveReg {
|
||||
instruction_offset,
|
||||
reg,
|
||||
stack_offset,
|
||||
}
|
||||
| Self::SaveXmm {
|
||||
instruction_offset,
|
||||
reg,
|
||||
stack_offset,
|
||||
} => {
|
||||
writer.write_u8(*offset);
|
||||
let is_xmm = match self {
|
||||
Self::SaveXmm { .. } => true,
|
||||
_ => false,
|
||||
};
|
||||
let (op_small, op_large) = if is_xmm {
|
||||
(UnwindOperation::SaveXmm128, UnwindOperation::SaveXmm128Far)
|
||||
} else {
|
||||
(
|
||||
UnwindOperation::SaveNonVolatileRegister,
|
||||
UnwindOperation::SaveNonVolatileRegisterFar,
|
||||
)
|
||||
};
|
||||
writer.write_u8(*instruction_offset);
|
||||
let scaled_stack_offset = stack_offset / 16;
|
||||
if scaled_stack_offset <= core::u16::MAX as u32 {
|
||||
writer.write_u8((*reg << 4) | (UnwindOperation::SaveXmm128 as u8));
|
||||
writer.write_u8((*reg << 4) | (op_small as u8));
|
||||
writer.write_u16::<LittleEndian>(scaled_stack_offset as u16);
|
||||
} else {
|
||||
writer.write_u8((*reg << 4) | (UnwindOperation::SaveXmm128Far as u8));
|
||||
writer.write_u8((*reg << 4) | (op_large as u8));
|
||||
writer.write_u16::<LittleEndian>(*stack_offset as u16);
|
||||
writer.write_u16::<LittleEndian>((stack_offset >> 16) as u16);
|
||||
}
|
||||
}
|
||||
Self::StackAlloc { offset, size } => {
|
||||
Self::StackAlloc {
|
||||
instruction_offset,
|
||||
size,
|
||||
} => {
|
||||
// Stack allocations on Windows must be a multiple of 8 and be at least 1 slot
|
||||
assert!(*size >= 8);
|
||||
assert!((*size % 8) == 0);
|
||||
|
||||
writer.write_u8(*offset);
|
||||
writer.write_u8(*instruction_offset);
|
||||
if *size <= SMALL_ALLOC_MAX_SIZE {
|
||||
writer.write_u8(
|
||||
((((*size - 8) / 8) as u8) << 4) | UnwindOperation::SmallStackAlloc as u8,
|
||||
@@ -111,7 +151,11 @@ impl UnwindCode {
|
||||
writer.write_u32::<LittleEndian>(*size);
|
||||
}
|
||||
}
|
||||
};
|
||||
Self::SetFPReg { instruction_offset } => {
|
||||
writer.write_u8(*instruction_offset);
|
||||
writer.write_u8(UnwindOperation::SetFPReg as u8);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn node_count(&self) -> usize {
|
||||
@@ -125,7 +169,7 @@ impl UnwindCode {
|
||||
3
|
||||
}
|
||||
}
|
||||
Self::SaveXmm { stack_offset, .. } => {
|
||||
Self::SaveXmm { stack_offset, .. } | Self::SaveReg { stack_offset, .. } => {
|
||||
if *stack_offset <= core::u16::MAX as u32 {
|
||||
2
|
||||
} else {
|
||||
@@ -143,9 +187,9 @@ pub(crate) enum MappedRegister {
|
||||
}
|
||||
|
||||
/// Maps UnwindInfo register to Windows x64 unwind data.
|
||||
pub(crate) trait RegisterMapper {
|
||||
/// Maps RegUnit.
|
||||
fn map(reg: RegUnit) -> MappedRegister;
|
||||
pub(crate) trait RegisterMapper<Reg> {
|
||||
/// Maps a Reg to a Windows unwind register number.
|
||||
fn map(reg: Reg) -> MappedRegister;
|
||||
}
|
||||
|
||||
/// Represents Windows x64 unwind information.
|
||||
@@ -219,8 +263,11 @@ impl UnwindInfo {
|
||||
.fold(0, |nodes, c| nodes + c.node_count())
|
||||
}
|
||||
|
||||
pub(crate) fn build<MR: RegisterMapper>(
|
||||
unwind: input::UnwindInfo<RegUnit>,
|
||||
// TODO: remove `build()` below when old backend is removed. The new backend uses
|
||||
// a simpler approach in `create_unwind_info_from_insts()` below.
|
||||
|
||||
pub(crate) fn build<Reg: PartialEq + Copy + std::fmt::Debug, MR: RegisterMapper<Reg>>(
|
||||
unwind: input::UnwindInfo<Reg>,
|
||||
) -> CodegenResult<Self> {
|
||||
use crate::isa::unwind::input::UnwindCode as InputUnwindCode;
|
||||
|
||||
@@ -237,7 +284,7 @@ impl UnwindInfo {
|
||||
// `StackAlloc { size = word_size }`, `SaveRegister { stack_offset: 0 }`
|
||||
// to the shorter `UnwindCode::PushRegister`.
|
||||
let push_reg_sequence = if let Some(UnwindCode::StackAlloc {
|
||||
offset: alloc_offset,
|
||||
instruction_offset: alloc_offset,
|
||||
size,
|
||||
}) = unwind_codes.last()
|
||||
{
|
||||
@@ -246,19 +293,21 @@ impl UnwindInfo {
|
||||
false
|
||||
};
|
||||
if push_reg_sequence {
|
||||
*unwind_codes.last_mut().unwrap() =
|
||||
UnwindCode::PushRegister { offset, reg };
|
||||
*unwind_codes.last_mut().unwrap() = UnwindCode::PushRegister {
|
||||
instruction_offset: offset,
|
||||
reg,
|
||||
};
|
||||
} else {
|
||||
// TODO add `UnwindCode::SaveRegister` to handle multiple register
|
||||
// pushes with single `UnwindCode::StackAlloc`.
|
||||
return Err(CodegenError::Unsupported(
|
||||
"Unsupported UnwindCode::PushRegister sequence".into(),
|
||||
));
|
||||
unwind_codes.push(UnwindCode::SaveReg {
|
||||
instruction_offset: offset,
|
||||
reg,
|
||||
stack_offset: *stack_offset,
|
||||
});
|
||||
}
|
||||
}
|
||||
MappedRegister::Xmm(reg) => {
|
||||
unwind_codes.push(UnwindCode::SaveXmm {
|
||||
offset,
|
||||
instruction_offset: offset,
|
||||
reg,
|
||||
stack_offset: *stack_offset,
|
||||
});
|
||||
@@ -267,7 +316,7 @@ impl UnwindInfo {
|
||||
}
|
||||
InputUnwindCode::StackAlloc { size } => {
|
||||
unwind_codes.push(UnwindCode::StackAlloc {
|
||||
offset: ensure_unwind_offset(*offset)?,
|
||||
instruction_offset: ensure_unwind_offset(*offset)?,
|
||||
size: *size,
|
||||
});
|
||||
}
|
||||
@@ -285,6 +334,64 @@ impl UnwindInfo {
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(feature = "x64")]
|
||||
const UNWIND_RBP_REG: u8 = 5;
|
||||
|
||||
#[cfg(feature = "x64")]
|
||||
pub(crate) fn create_unwind_info_from_insts<MR: RegisterMapper<regalloc::Reg>>(
|
||||
insts: &[(CodeOffset, UnwindInst)],
|
||||
) -> CodegenResult<UnwindInfo> {
|
||||
let mut unwind_codes = vec![];
|
||||
let mut frame_register_offset = 0;
|
||||
let mut max_unwind_offset = 0;
|
||||
for &(instruction_offset, ref inst) in insts {
|
||||
let instruction_offset = ensure_unwind_offset(instruction_offset)?;
|
||||
match inst {
|
||||
&UnwindInst::PushFrameRegs { .. } => {
|
||||
unwind_codes.push(UnwindCode::PushRegister {
|
||||
instruction_offset,
|
||||
reg: UNWIND_RBP_REG,
|
||||
});
|
||||
}
|
||||
&UnwindInst::DefineNewFrame {
|
||||
offset_downward_to_clobbers,
|
||||
..
|
||||
} => {
|
||||
frame_register_offset = ensure_unwind_offset(offset_downward_to_clobbers)?;
|
||||
unwind_codes.push(UnwindCode::SetFPReg { instruction_offset });
|
||||
}
|
||||
&UnwindInst::SaveReg {
|
||||
clobber_offset,
|
||||
reg,
|
||||
} => match MR::map(reg.to_reg()) {
|
||||
MappedRegister::Int(reg) => {
|
||||
unwind_codes.push(UnwindCode::SaveReg {
|
||||
instruction_offset,
|
||||
reg,
|
||||
stack_offset: clobber_offset,
|
||||
});
|
||||
}
|
||||
MappedRegister::Xmm(reg) => {
|
||||
unwind_codes.push(UnwindCode::SaveXmm {
|
||||
instruction_offset,
|
||||
reg,
|
||||
stack_offset: clobber_offset,
|
||||
});
|
||||
}
|
||||
},
|
||||
}
|
||||
max_unwind_offset = instruction_offset;
|
||||
}
|
||||
|
||||
Ok(UnwindInfo {
|
||||
flags: 0,
|
||||
prologue_size: max_unwind_offset,
|
||||
frame_register: Some(UNWIND_RBP_REG),
|
||||
frame_register_offset,
|
||||
unwind_codes,
|
||||
})
|
||||
}
|
||||
|
||||
fn ensure_unwind_offset(offset: u32) -> CodegenResult<u8> {
|
||||
if offset > 255 {
|
||||
warn!("function prologues cannot exceed 255 bytes in size for Windows x64");
|
||||
|
||||
Reference in New Issue
Block a user