Our previous implementation of unwind infrastructure was somewhat complex and brittle: it parsed generated instructions in order to reverse-engineer unwind info from prologues. It also relied on some fragile linkage to communicate instruction-layout information that VCode was not designed to provide. A much simpler, more reliable, and easier-to-reason-about approach is to embed unwind directives as pseudo-instructions in the prologue as we generate it. That way, we can say what we mean and just emit it directly. The usual reasoning that leads to the reverse-engineering approach is that metadata is hard to keep in sync across optimization passes; but here, (i) prologues are generated at the very end of the pipeline, and (ii) if we ever do a post-prologue-gen optimization, we can treat unwind directives as black boxes with unknown side-effects, just as we do for some other pseudo-instructions today. It turns out that it was easier to just build this for both x64 and aarch64 (since they share a factored-out ABI implementation), and wire up the platform-specific unwind-info generation for Windows and SystemV. Now we have simpler unwind on all platforms and we can delete the old unwind infra as soon as we remove the old backend. There were a few consequences to supporting Fastcall unwind in particular that led to a refactor of the common ABI. Windows only supports naming clobbered-register save locations within 240 bytes of the frame-pointer register, whatever one chooses that to be (RSP or RBP). We had previously saved clobbers below the fixed frame (and below nominal-SP). The 240-byte range has to include the old RBP too, so we're forced to place clobbers at the top of the frame, just below saved RBP/RIP. This is fine; we always keep a frame pointer anyway because we use it to refer to stack args. It does mean that offsets of fixed-frame slots (spillslots, stackslots) from RBP are no longer known before we do regalloc, so if we ever want to index these off of RBP rather than nominal-SP because we add support for `alloca` (dynamic frame growth), then we'll need a "nominal-BP" mode that is resolved after regalloc and clobber-save code is generated. I added a comment to this effect in `abi_impl.rs`. The above refactor touched both x64 and aarch64 because of shared code. This had a further effect in that the old aarch64 prologue generation subtracted from `sp` once to allocate space, then used stores to `[sp, offset]` to save clobbers. Unfortunately the offset only has 7-bit range, so if there are enough clobbered registers (and there can be -- aarch64 has 384 bytes of registers; at least one unit test hits this) the stores/loads will be out-of-range. I really don't want to synthesize large-offset sequences here; better to go back to the simpler pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like a "push". It's likely not much worse microarchitecturally (dependence chain on SP, but oh well) and it actually saves an instruction if there's no other frame to allocate. As a further advantage, it's much simpler to understand; simpler is usually better. This PR adds the new backend on Windows to CI as well.
263 lines
10 KiB
Rust
263 lines
10 KiB
Rust
//! ABI definitions.
|
|
|
|
use crate::binemit::StackMap;
|
|
use crate::ir::{Signature, StackSlot};
|
|
use crate::isa::CallConv;
|
|
use crate::machinst::*;
|
|
use crate::settings;
|
|
use regalloc::{Reg, Set, SpillSlot, Writable};
|
|
use smallvec::SmallVec;
|
|
|
|
/// A small vector of instructions (with some reasonable size); appropriate for
|
|
/// a small fixed sequence implementing one operation.
|
|
pub type SmallInstVec<I> = SmallVec<[I; 4]>;
|
|
|
|
/// Trait implemented by an object that tracks ABI-related state (e.g., stack
|
|
/// layout) and can generate code while emitting the *body* of a function.
|
|
pub trait ABICallee {
|
|
/// The instruction type for the ISA associated with this ABI.
|
|
type I: VCodeInst;
|
|
|
|
/// Does the ABI-body code need a temp reg (and if so, of what type)? One
|
|
/// will be provided to `init()` as the `maybe_tmp` arg if so.
|
|
fn temp_needed(&self) -> Option<Type>;
|
|
|
|
/// Initialize. This is called after the ABICallee is constructed because it
|
|
/// may be provided with a temp vreg, which can only be allocated once the
|
|
/// lowering context exists.
|
|
fn init(&mut self, maybe_tmp: Option<Writable<Reg>>);
|
|
|
|
/// Access the (possibly legalized) signature.
|
|
fn signature(&self) -> &Signature;
|
|
|
|
/// Get the settings controlling this function's compilation.
|
|
fn flags(&self) -> &settings::Flags;
|
|
|
|
/// Get the calling convention implemented by this ABI object.
|
|
fn call_conv(&self) -> CallConv;
|
|
|
|
/// Get the liveins of the function.
|
|
fn liveins(&self) -> Set<RealReg>;
|
|
|
|
/// Get the liveouts of the function.
|
|
fn liveouts(&self) -> Set<RealReg>;
|
|
|
|
/// Number of arguments.
|
|
fn num_args(&self) -> usize;
|
|
|
|
/// Number of return values.
|
|
fn num_retvals(&self) -> usize;
|
|
|
|
/// Number of stack slots (not spill slots).
|
|
fn num_stackslots(&self) -> usize;
|
|
|
|
/// The offsets of all stack slots (not spill slots) for debuginfo purposes.
|
|
fn stackslot_offsets(&self) -> &PrimaryMap<StackSlot, u32>;
|
|
|
|
/// Generate an instruction which copies an argument to a destination
|
|
/// register.
|
|
fn gen_copy_arg_to_regs(
|
|
&self,
|
|
idx: usize,
|
|
into_reg: ValueRegs<Writable<Reg>>,
|
|
) -> SmallInstVec<Self::I>;
|
|
|
|
/// Is the given argument needed in the body (as opposed to, e.g., serving
|
|
/// only as a special ABI-specific placeholder)? This controls whether
|
|
/// lowering will copy it to a virtual reg use by CLIF instructions.
|
|
fn arg_is_needed_in_body(&self, idx: usize) -> bool;
|
|
|
|
/// Generate any setup instruction needed to save values to the
|
|
/// return-value area. This is usually used when were are multiple return
|
|
/// values or an otherwise large return value that must be passed on the
|
|
/// stack; typically the ABI specifies an extra hidden argument that is a
|
|
/// pointer to that memory.
|
|
fn gen_retval_area_setup(&self) -> Option<Self::I>;
|
|
|
|
/// Generate an instruction which copies a source register to a return value slot.
|
|
fn gen_copy_regs_to_retval(
|
|
&self,
|
|
idx: usize,
|
|
from_reg: ValueRegs<Writable<Reg>>,
|
|
) -> SmallInstVec<Self::I>;
|
|
|
|
/// Generate a return instruction.
|
|
fn gen_ret(&self) -> Self::I;
|
|
|
|
/// Generate an epilogue placeholder. The returned instruction should return `true` from
|
|
/// `is_epilogue_placeholder()`; this is used to indicate to the lowering driver when
|
|
/// the epilogue should be inserted.
|
|
fn gen_epilogue_placeholder(&self) -> Self::I;
|
|
|
|
// -----------------------------------------------------------------
|
|
// Every function above this line may only be called pre-regalloc.
|
|
// Every function below this line may only be called post-regalloc.
|
|
// `spillslots()` must be called before any other post-regalloc
|
|
// function.
|
|
// ----------------------------------------------------------------
|
|
|
|
/// Update with the number of spillslots, post-regalloc.
|
|
fn set_num_spillslots(&mut self, slots: usize);
|
|
|
|
/// Update with the clobbered registers, post-regalloc.
|
|
fn set_clobbered(&mut self, clobbered: Set<Writable<RealReg>>);
|
|
|
|
/// Get the address of a stackslot.
|
|
fn stackslot_addr(&self, slot: StackSlot, offset: u32, into_reg: Writable<Reg>) -> Self::I;
|
|
|
|
/// Load from a stackslot.
|
|
fn load_stackslot(
|
|
&self,
|
|
slot: StackSlot,
|
|
offset: u32,
|
|
ty: Type,
|
|
into_reg: ValueRegs<Writable<Reg>>,
|
|
) -> SmallInstVec<Self::I>;
|
|
|
|
/// Store to a stackslot.
|
|
fn store_stackslot(
|
|
&self,
|
|
slot: StackSlot,
|
|
offset: u32,
|
|
ty: Type,
|
|
from_reg: ValueRegs<Reg>,
|
|
) -> SmallInstVec<Self::I>;
|
|
|
|
/// Load from a spillslot.
|
|
fn load_spillslot(
|
|
&self,
|
|
slot: SpillSlot,
|
|
ty: Type,
|
|
into_reg: ValueRegs<Writable<Reg>>,
|
|
) -> SmallInstVec<Self::I>;
|
|
|
|
/// Store to a spillslot.
|
|
fn store_spillslot(
|
|
&self,
|
|
slot: SpillSlot,
|
|
ty: Type,
|
|
from_reg: ValueRegs<Reg>,
|
|
) -> SmallInstVec<Self::I>;
|
|
|
|
/// Generate a stack map, given a list of spillslots and the emission state
|
|
/// at a given program point (prior to emission fo the safepointing
|
|
/// instruction).
|
|
fn spillslots_to_stack_map(
|
|
&self,
|
|
slots: &[SpillSlot],
|
|
state: &<Self::I as MachInstEmit>::State,
|
|
) -> StackMap;
|
|
|
|
/// Generate a prologue, post-regalloc. This should include any stack
|
|
/// frame or other setup necessary to use the other methods (`load_arg`,
|
|
/// `store_retval`, and spillslot accesses.) `self` is mutable so that we
|
|
/// can store information in it which will be useful when creating the
|
|
/// epilogue.
|
|
fn gen_prologue(&mut self) -> SmallInstVec<Self::I>;
|
|
|
|
/// Generate an epilogue, post-regalloc. Note that this must generate the
|
|
/// actual return instruction (rather than emitting this in the lowering
|
|
/// logic), because the epilogue code comes before the return and the two are
|
|
/// likely closely related.
|
|
fn gen_epilogue(&self) -> SmallInstVec<Self::I>;
|
|
|
|
/// Returns the full frame size for the given function, after prologue
|
|
/// emission has run. This comprises the spill slots and stack-storage slots
|
|
/// (but not storage for clobbered callee-save registers, arguments pushed
|
|
/// at callsites within this function, or other ephemeral pushes). This is
|
|
/// used for ABI variants where the client generates prologue/epilogue code,
|
|
/// as in Baldrdash (SpiderMonkey integration).
|
|
fn frame_size(&self) -> u32;
|
|
|
|
/// Returns the size of arguments expected on the stack.
|
|
fn stack_args_size(&self) -> u32;
|
|
|
|
/// Get the spill-slot size.
|
|
fn get_spillslot_size(&self, rc: RegClass, ty: Type) -> u32;
|
|
|
|
/// Generate a spill. The type, if known, is given; this can be used to
|
|
/// generate a store instruction optimized for the particular type rather
|
|
/// than the RegClass (e.g., only F64 that resides in a V128 register). If
|
|
/// no type is given, the implementation should spill the whole register.
|
|
fn gen_spill(&self, to_slot: SpillSlot, from_reg: RealReg, ty: Option<Type>) -> Self::I;
|
|
|
|
/// Generate a reload (fill). As for spills, the type may be given to allow
|
|
/// a more optimized load instruction to be generated.
|
|
fn gen_reload(
|
|
&self,
|
|
to_reg: Writable<RealReg>,
|
|
from_slot: SpillSlot,
|
|
ty: Option<Type>,
|
|
) -> Self::I;
|
|
|
|
/// Desired unwind info type.
|
|
fn unwind_info_kind(&self) -> UnwindInfoKind;
|
|
}
|
|
|
|
/// Trait implemented by an object that tracks ABI-related state and can
|
|
/// generate code while emitting a *call* to a function.
|
|
///
|
|
/// An instance of this trait returns information for a *particular*
|
|
/// callsite. It will usually be computed from the called function's
|
|
/// signature.
|
|
///
|
|
/// Unlike `ABICallee` above, methods on this trait are not invoked directly
|
|
/// by the machine-independent code. Rather, the machine-specific lowering
|
|
/// code will typically create an `ABICaller` when creating machine instructions
|
|
/// for an IR call instruction inside `lower()`, directly emit the arg and
|
|
/// and retval copies, and attach the register use/def info to the call.
|
|
///
|
|
/// This trait is thus provided for convenience to the backends.
|
|
pub trait ABICaller {
|
|
/// The instruction type for the ISA associated with this ABI.
|
|
type I: VCodeInst;
|
|
|
|
/// Get the number of arguments expected.
|
|
fn num_args(&self) -> usize;
|
|
|
|
/// Access the (possibly legalized) signature.
|
|
fn signature(&self) -> &Signature;
|
|
|
|
/// Emit a copy of an argument value from a source register, prior to the call.
|
|
fn emit_copy_regs_to_arg<C: LowerCtx<I = Self::I>>(
|
|
&self,
|
|
ctx: &mut C,
|
|
idx: usize,
|
|
from_reg: ValueRegs<Reg>,
|
|
);
|
|
|
|
/// Specific order for copying into arguments at callsites. We must be
|
|
/// careful to copy into StructArgs first, because we need to be able
|
|
/// to invoke memcpy() before we've loaded other arg regs (see above).
|
|
fn get_copy_to_arg_order(&self) -> SmallVec<[usize; 8]>;
|
|
|
|
/// Emit a copy a return value into a destination register, after the call returns.
|
|
fn emit_copy_retval_to_regs<C: LowerCtx<I = Self::I>>(
|
|
&self,
|
|
ctx: &mut C,
|
|
idx: usize,
|
|
into_reg: ValueRegs<Writable<Reg>>,
|
|
);
|
|
|
|
/// Emit code to pre-adjust the stack, prior to argument copies and call.
|
|
fn emit_stack_pre_adjust<C: LowerCtx<I = Self::I>>(&self, ctx: &mut C);
|
|
|
|
/// Emit code to post-adjust the satck, after call return and return-value copies.
|
|
fn emit_stack_post_adjust<C: LowerCtx<I = Self::I>>(&self, ctx: &mut C);
|
|
|
|
/// Emit the call itself.
|
|
///
|
|
/// The returned instruction should have proper use- and def-sets according
|
|
/// to the argument registers, return-value registers, and clobbered
|
|
/// registers for this function signature in this ABI.
|
|
///
|
|
/// (Arg registers are uses, and retval registers are defs. Clobbered
|
|
/// registers are also logically defs, but should never be read; their
|
|
/// values are "defined" (to the regalloc) but "undefined" in every other
|
|
/// sense.)
|
|
///
|
|
/// This function should only be called once, as it is allowed to re-use
|
|
/// parts of the ABICaller object in emitting instructions.
|
|
fn emit_call<C: LowerCtx<I = Self::I>>(&mut self, ctx: &mut C);
|
|
}
|