//! In-memory representation of compiled machine code, with labels and fixups to //! refer to those labels. Handles constant-pool island insertion and also //! veneer insertion for out-of-range jumps. //! //! This code exists to solve three problems: //! //! - Branch targets for forward branches are not known until later, when we //! emit code in a single pass through the instruction structs. //! //! - On many architectures, address references or offsets have limited range. //! For example, on AArch64, conditional branches can only target code +/- 1MB //! from the branch itself. //! //! - The lowering of control flow from the CFG-with-edges produced by //! [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty //! edge blocks when the register allocator does not need to insert any //! spills/reloads/moves in edge blocks, results in many suboptimal branch //! patterns. The lowering also pays no attention to block order, and so //! two-target conditional forms (cond-br followed by uncond-br) can often by //! avoided because one of the targets is the fallthrough. There are several //! cases here where we can simplify to use fewer branches. //! //! This "buffer" implements a single-pass code emission strategy (with a later //! "fixup" pass, but only through recorded fixups, not all instructions). The //! basic idea is: //! //! - Emit branches as they are, including two-target (cond/uncond) compound //! forms, but with zero offsets and optimistically assuming the target will be //! in range. Record the "fixup" for later. Targets are denoted instead by //! symbolic "labels" that are then bound to certain offsets in the buffer as //! we emit code. (Nominally, there is a label at the start of every basic //! block.) //! //! - As we do this, track the offset in the buffer at which the first label //! reference "goes out of range". We call this the "deadline". If we reach the //! deadline and we still have not bound the label to which an unresolved branch //! refers, we have a problem! //! //! - To solve this problem, we emit "islands" full of "veneers". An island is //! simply a chunk of code inserted in the middle of the code actually produced //! by the emitter (e.g., vcode iterating over instruction structs). The emitter //! has some awareness of this: it either asks for an island between blocks, so //! it is not accidentally executed, or else it emits a branch around the island //! when all other options fail (see `Inst::EmitIsland` meta-instruction). //! //! - A "veneer" is an instruction (or sequence of instructions) in an "island" //! that implements a longer-range reference to a label. The idea is that, for //! example, a branch with a limited range can branch to a "veneer" instead, //! which is simply a branch in a form that can use a longer-range reference. On //! AArch64, for example, conditionals have a +/- 1 MB range, but a conditional //! can branch to an unconditional branch which has a +/- 128 MB range. Hence, a //! conditional branch's label reference can be fixed up with a "veneer" to //! achieve a longer range. //! //! - To implement all of this, we require the backend to provide a `LabelUse` //! type that implements a trait. This is nominally an enum that records one of //! several kinds of references to an offset in code -- basically, a relocation //! type -- and will usually correspond to different instruction formats. The //! `LabelUse` implementation specifies the maximum range, how to patch in the //! actual label location when known, and how to generate a veneer to extend the //! range. //! //! That satisfies label references, but we still may have suboptimal branch //! patterns. To clean up the branches, we do a simple "peephole"-style //! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`) //! informs the buffer of branches in the code and, in the case of conditionals, //! the code that would have been emitted to invert this branch's condition. We //! track the "latest branches": these are branches that are contiguous up to //! the current offset. (If any code is emitted after a branch, that branch or //! run of contiguous branches is no longer "latest".) The latest branches are //! those that we can edit by simply truncating the buffer and doing something //! else instead. //! //! To optimize branches, we implement several simple rules, and try to apply //! them to the "latest branches" when possible: //! //! - A branch with a label target, when that label is bound to the ending //! offset of the branch (the fallthrough location), can be removed altogether, //! because the branch would have no effect). //! //! - An unconditional branch that starts at a label location, and branches to //! another label, results in a "label alias": all references to the label bound //! *to* this branch instruction are instead resolved to the *target* of the //! branch instruction. This effectively removes empty blocks that just //! unconditionally branch to the next block. We call this "branch threading". //! //! - A conditional followed by an unconditional, when the conditional branches //! to the unconditional's fallthrough, results in (i) the truncation of the //! unconditional, (ii) the inversion of the condition's condition, and (iii) //! replacement of the conditional's target (using the original target of the //! unconditional). This is a fancy way of saying "we can flip a two-target //! conditional branch's taken/not-taken targets if it works better with our //! fallthrough". To make this work, the emitter actually gives the buffer //! *both* forms of every conditional branch: the true form is emitted into the //! buffer, and the "inverted" machine-code bytes are provided as part of the //! branch-fixup metadata. //! //! - An unconditional B preceded by another unconditional P, when B's label(s) have //! been redirected to target(B), can be removed entirely. This is an extension //! of the branch-threading optimization, and is valid because if we know there //! will be no fallthrough into this branch instruction (the prior instruction //! is an unconditional jump), and if we know we have successfully redirected //! all labels, then this branch instruction is unreachable. Note that this //! works because the redirection happens before the label is ever resolved //! (fixups happen at island emission time, at which point latest-branches are //! cleared, or at the end of emission), so we are sure to catch and redirect //! all possible paths to this instruction. //! //! # Branch-optimization Correctness //! //! The branch-optimization mechanism depends on a few data structures with //! invariants, which are always held outside the scope of top-level public //! methods: //! //! - The latest-branches list. Each entry describes a span of the buffer //! (start/end offsets), the label target, the corresponding fixup-list entry //! index, and the bytes (must be the same length) for the inverted form, if //! conditional. The list of labels that are bound to the start-offset of this //! branch is *complete* (if any label has a resolved offset equal to `start` //! and is not an alias, it must appear in this list) and *precise* (no label //! in this list can be bound to another offset). No label in this list should //! be an alias. No two branch ranges can overlap, and branches are in //! ascending-offset order. //! //! - The labels-at-tail list. This contains all MachLabels that have been bound //! to (whose resolved offsets are equal to) the tail offset of the buffer. //! No label in this list should be an alias. //! //! - The label_offsets array, containing the bound offset of a label or //! UNKNOWN. No label can be bound at an offset greater than the current //! buffer tail. //! //! - The label_aliases array, containing another label to which a label is //! bound or UNKNOWN. A label's resolved offset is the resolved offset //! of the label it is aliased to, if this is set. //! //! We argue below, at each method, how the invariants in these data structures //! are maintained (grep for "Post-invariant"). //! //! Given these invariants, we argue why each optimization preserves execution //! semantics below (grep for "Preserves execution semantics"). use crate::binemit::{Addend, CodeOffset, CodeSink, Reloc}; use crate::ir::{ExternalName, Opcode, SourceLoc, TrapCode}; use crate::machinst::{BlockIndex, MachInstLabelUse, VCodeInst}; use log::trace; use smallvec::SmallVec; use std::mem; /// A buffer of output to be produced, fixed up, and then emitted to a CodeSink /// in bulk. /// /// This struct uses `SmallVec`s to support small-ish function bodies without /// any heap allocation. As such, it will be several kilobytes large. This is /// likely fine as long as it is stack-allocated for function emission then /// thrown away; but beware if many buffer objects are retained persistently. pub struct MachBuffer { /// The buffer contents, as raw bytes. data: SmallVec<[u8; 1024]>, /// Any relocations referring to this code. Note that only *external* /// relocations are tracked here; references to labels within the buffer are /// resolved before emission. relocs: SmallVec<[MachReloc; 16]>, /// Any trap records referring to this code. traps: SmallVec<[MachTrap; 16]>, /// Any call site records referring to this code. call_sites: SmallVec<[MachCallSite; 16]>, /// Any source location mappings referring to this code. srclocs: SmallVec<[MachSrcLoc; 64]>, /// The current source location in progress (after `start_srcloc()` and /// before `end_srcloc()`). This is a (start_offset, src_loc) tuple. cur_srcloc: Option<(CodeOffset, SourceLoc)>, /// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown. label_offsets: SmallVec<[CodeOffset; 16]>, /// Label aliases: when one label points to an unconditional jump, and that /// jump points to another label, we can redirect references to the first /// label immediately to the second. (We don't chase arbitrarily deep to /// avoid problems with cycles, but rather only one level, i.e. through one /// jump.) label_aliases: SmallVec<[MachLabel; 16]>, /// Constants that must be emitted at some point. pending_constants: SmallVec<[MachLabelConstant; 16]>, /// Fixups that must be performed after all code is emitted. fixup_records: SmallVec<[MachLabelFixup; 16]>, /// Current deadline at which all constants are flushed and all code labels /// are extended by emitting long-range jumps in an island. This flush /// should be rare (e.g., on AArch64, the shortest-range PC-rel references /// are +/- 1MB for conditional jumps and load-literal instructions), so /// it's acceptable to track a minimum and flush-all rather than doing more /// detailed "current minimum" / sort-by-deadline trickery. island_deadline: CodeOffset, /// How many bytes are needed in the worst case for an island, given all /// pending constants and fixups. island_worst_case_size: CodeOffset, /// Latest branches, to facilitate in-place editing for better fallthrough /// behavior and empty-block removal. latest_branches: SmallVec<[MachBranch; 4]>, /// All labels at the current offset (emission tail). This is lazily /// cleared: it is actually accurate as long as the current offset is /// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should /// be considered as empty. /// /// For correctness, this *must* be complete (i.e., the vector must contain /// all labels whose offsets are resolved to the current tail), because we /// rely on it to update labels when we truncate branches. labels_at_tail: SmallVec<[MachLabel; 4]>, /// The last offset at which `labels_at_tail` is valid. It is conceptually /// always describing the tail of the buffer, but we do not clear /// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it /// when the offset has grown past this (`labels_at_tail_off`) point. /// Always <= `cur_offset()`. labels_at_tail_off: CodeOffset, } /// A `MachBuffer` once emission is completed: holds generated code and records, /// without fixups. This allows the type to be independent of the backend. pub struct MachBufferFinalized { /// The buffer contents, as raw bytes. pub data: SmallVec<[u8; 1024]>, /// Any relocations referring to this code. Note that only *external* /// relocations are tracked here; references to labels within the buffer are /// resolved before emission. relocs: SmallVec<[MachReloc; 16]>, /// Any trap records referring to this code. traps: SmallVec<[MachTrap; 16]>, /// Any call site records referring to this code. call_sites: SmallVec<[MachCallSite; 16]>, /// Any source location mappings referring to this code. srclocs: SmallVec<[MachSrcLoc; 64]>, } static UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff; static UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff); /// A label refers to some offset in a `MachBuffer`. It may not be resolved at /// the point at which it is used by emitted code; the buffer records "fixups" /// for references to the label, and will come back and patch the code /// appropriately when the label's location is eventually known. #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct MachLabel(u32); impl MachLabel { /// Get a label for a block. (The first N MachLabels are always reseved for /// the N blocks in the vcode.) pub fn from_block(bindex: BlockIndex) -> MachLabel { MachLabel(bindex) } /// Get the numeric label index. pub fn get(self) -> u32 { self.0 } } impl MachBuffer { /// Create a new section, known to start at `start_offset` and with a size limited to `length_limit`. pub fn new() -> MachBuffer { MachBuffer { data: SmallVec::new(), relocs: SmallVec::new(), traps: SmallVec::new(), call_sites: SmallVec::new(), srclocs: SmallVec::new(), cur_srcloc: None, label_offsets: SmallVec::new(), label_aliases: SmallVec::new(), pending_constants: SmallVec::new(), fixup_records: SmallVec::new(), island_deadline: UNKNOWN_LABEL_OFFSET, island_worst_case_size: 0, latest_branches: SmallVec::new(), labels_at_tail: SmallVec::new(), labels_at_tail_off: 0, } } /// Debug-only: check invariants of labels and branch-records described /// under "Branch-optimization Correctness" above. #[cfg(debug)] fn check_label_branch_invariants(&self) { let cur_off = self.cur_offset(); // Check that every entry in latest_branches has *correct* // labels_at_this_branch lists. We do not check completeness because // that would require building a reverse index, which is too slow even // for a debug invariant check. let mut last_end = 0; for b in &self.latest_branches { debug_assert!(b.start < b.end); debug_assert!(b.end <= cur_off); debug_assert!(b.start >= last_end); last_end = b.end; for &l in &b.labels_at_this_branch { debug_assert_eq!(self.resolve_label_offset(l), b.start); debug_assert_eq!(self.label_aliases[l.0 as usize], UNKNOWN_LABEL); } } // Check that every label is unresolved, or resolved at or before // cur_offset. If at cur_offset, must be in `labels_at_tail`. for (i, &off) in self.label_offsets.iter().enumerate() { let label = MachLabel(i as u32); debug_assert!(off == UNKNOWN_LABEL_OFFSET || off <= cur_off); if off == cur_off { debug_assert!( self.labels_at_tail_off == cur_off && self.labels_at_tail.contains(&label) ); } } // Check that every label in `labels_at_tail_off` is precise, i.e., // resolves to the cur offset. debug_assert!(self.labels_at_tail_off <= cur_off); if self.labels_at_tail_off == cur_off { for &l in &self.labels_at_tail { debug_assert_eq!(self.resolve_label_offset(l), cur_off); debug_assert_eq!(self.label_aliases[l.0 as usize], UNKNOWN_LABEL); } } } #[cfg(not(debug))] fn check_label_branch_invariants(&self) { // Nothing. } /// Current offset from start of buffer. pub fn cur_offset(&self) -> CodeOffset { self.data.len() as CodeOffset } /// Add a byte. pub fn put1(&mut self, value: u8) { trace!("MachBuffer: put byte @ {}: {:x}", self.cur_offset(), value); self.data.push(value); // Post-invariant: conceptual-labels_at_tail contains a complete and // precise list of labels bound at `cur_offset()`. We have advanced // `cur_offset()`, hence if it had been equal to `labels_at_tail_off` // before, it is not anymore (and it cannot become equal, because // `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is // conceptually empty (even though it is only lazily cleared). No labels // can be bound at this new offset (by invariant on `label_offsets`). // Hence the invariant holds. } /// Add 2 bytes. pub fn put2(&mut self, value: u16) { trace!( "MachBuffer: put 16-bit word @ {}: {:x}", self.cur_offset(), value ); let bytes = value.to_le_bytes(); self.data.extend_from_slice(&bytes[..]); // Post-invariant: as for `put1()`. } /// Add 4 bytes. pub fn put4(&mut self, value: u32) { trace!( "MachBuffer: put 32-bit word @ {}: {:x}", self.cur_offset(), value ); let bytes = value.to_le_bytes(); self.data.extend_from_slice(&bytes[..]); // Post-invariant: as for `put1()`. } /// Add 8 bytes. pub fn put8(&mut self, value: u64) { trace!( "MachBuffer: put 64-bit word @ {}: {:x}", self.cur_offset(), value ); let bytes = value.to_le_bytes(); self.data.extend_from_slice(&bytes[..]); // Post-invariant: as for `put1()`. } /// Add a slice of bytes. pub fn put_data(&mut self, data: &[u8]) { trace!( "MachBuffer: put data @ {}: len {}", self.cur_offset(), data.len() ); self.data.extend_from_slice(data); // Post-invariant: as for `put1()`. } /// Reserve appended space and return a mutable slice referring to it. pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] { trace!("MachBuffer: put data @ {}: len {}", self.cur_offset(), len); let off = self.data.len(); let new_len = self.data.len() + len; self.data.resize(new_len, 0); &mut self.data[off..] // Post-invariant: as for `put1()`. } /// Align up to the given alignment. pub fn align_to(&mut self, align_to: CodeOffset) { trace!("MachBuffer: align to {}", align_to); assert!(align_to.is_power_of_two()); while self.cur_offset() & (align_to - 1) != 0 { self.put1(0); } // Post-invariant: as for `put1()`. } /// Allocate a `Label` to refer to some offset. May not be bound to a fixed /// offset yet. pub fn get_label(&mut self) -> MachLabel { let l = self.label_offsets.len() as u32; self.label_offsets.push(UNKNOWN_LABEL_OFFSET); self.label_aliases.push(UNKNOWN_LABEL); trace!("MachBuffer: new label -> {:?}", MachLabel(l)); MachLabel(l) // Post-invariant: the only mutation is to add a new label; it has no // bound offset yet, so it trivially satisfies all invariants. } /// Reserve the first N MachLabels for blocks. pub fn reserve_labels_for_blocks(&mut self, blocks: BlockIndex) { trace!("MachBuffer: first {} labels are for blocks", blocks); debug_assert!(self.label_offsets.is_empty()); self.label_offsets .resize(blocks as usize, UNKNOWN_LABEL_OFFSET); self.label_aliases.resize(blocks as usize, UNKNOWN_LABEL); // Post-invariant: as for `get_label()`. } /// Bind a label to the current offset. A label can only be bound once. pub fn bind_label(&mut self, label: MachLabel) { trace!( "MachBuffer: bind label {:?} at offset {}", label, self.cur_offset() ); debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET); debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL); let offset = self.cur_offset(); self.label_offsets[label.0 as usize] = offset; self.lazily_clear_labels_at_tail(); self.labels_at_tail.push(label); // Invariants hold: bound offset of label is <= cur_offset (in fact it // is equal). If the `labels_at_tail` list was complete and precise // before, it is still, because we have bound this label to the current // offset and added it to the list (which contains all labels at the // current offset). self.check_label_branch_invariants(); self.optimize_branches(); // Post-invariant: by `optimize_branches()` (see argument there). self.check_label_branch_invariants(); } /// Lazily clear `labels_at_tail` if the tail offset has moved beyond the /// offset that it applies to. fn lazily_clear_labels_at_tail(&mut self) { let offset = self.cur_offset(); if offset > self.labels_at_tail_off { self.labels_at_tail_off = offset; self.labels_at_tail.clear(); } // Post-invariant: either labels_at_tail_off was at cur_offset, and // state is untouched, or was less than cur_offset, in which case the // labels_at_tail list was conceptually empty, and is now actually // empty. } /// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`. fn resolve_label_offset(&self, label: MachLabel) -> CodeOffset { let alias = self.label_aliases[label.0 as usize]; if alias != UNKNOWN_LABEL { self.label_offsets[alias.0 as usize] } else { self.label_offsets[label.0 as usize] } // Post-invariant: no mutations. } /// Emit a reference to the given label with the given reference type (i.e., /// branch-instruction format) at the current offset. This is like a /// relocation, but handled internally. /// /// This can be called before the branch is actually emitted; fixups will /// not happen until an island is emitted or the buffer is finished. pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) { trace!( "MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}", offset, label, kind ); // Add the fixup, and update the worst-case island size based on a // veneer for this label use. self.fixup_records.push(MachLabelFixup { label, offset, kind, }); if kind.supports_veneer() { self.island_worst_case_size += kind.veneer_size(); self.island_worst_case_size &= !(I::LabelUse::ALIGN - 1); } let deadline = offset + kind.max_pos_range(); if deadline < self.island_deadline { self.island_deadline = deadline; } // Post-invariant: no mutations to branches/labels data structures. self.check_label_branch_invariants(); } /// Inform the buffer of an unconditional branch at the given offset, /// targetting the given label. May be used to optimize branches. /// The last added label-use must correspond to this branch. /// This must be called when the current offset is equal to `start`; i.e., /// before actually emitting the branch. This implies that for a branch that /// uses a label and is eligible for optimizations by the MachBuffer, the /// proper sequence is: /// /// - Call `use_label_at_offset()` to emit the fixup record. /// - Call `add_uncond_branch()` to make note of the branch. /// - Emit the bytes for the branch's machine code. /// /// Additional requirement: no labels may be bound between `start` and `end` /// (exclusive on both ends). pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) { assert!(self.cur_offset() == start); debug_assert!(end > start); assert!(!self.fixup_records.is_empty()); let fixup = self.fixup_records.len() - 1; self.lazily_clear_labels_at_tail(); self.latest_branches.push(MachBranch { start, end, target, fixup, inverted: None, labels_at_this_branch: self.labels_at_tail.clone(), }); // Post-invariant: we asserted branch start is current tail; the list of // labels at branch is cloned from list of labels at current tail. self.check_label_branch_invariants(); } /// Inform the buffer of a conditional branch at the given offset, /// targetting the given label. May be used to optimize branches. /// The last added label-use must correspond to this branch. /// /// Additional requirement: no labels may be bound between `start` and `end` /// (exclusive on both ends). pub fn add_cond_branch( &mut self, start: CodeOffset, end: CodeOffset, target: MachLabel, inverted: &[u8], ) { assert!(self.cur_offset() == start); debug_assert!(end > start); assert!(!self.fixup_records.is_empty()); debug_assert!(inverted.len() == (end - start) as usize); let fixup = self.fixup_records.len() - 1; let inverted = Some(SmallVec::from(inverted)); self.lazily_clear_labels_at_tail(); self.latest_branches.push(MachBranch { start, end, target, fixup, inverted, labels_at_this_branch: self.labels_at_tail.clone(), }); // Post-invariant: we asserted branch start is current tail; labels at // branch list is cloned from list of labels at current tail. self.check_label_branch_invariants(); } fn truncate_last_branch(&mut self) { self.lazily_clear_labels_at_tail(); // Invariants hold at this point. let b = self.latest_branches.pop().unwrap(); assert!(b.end == self.cur_offset()); // State: // [PRE CODE] // Offset b.start, b.labels_at_this_branch: // [BRANCH CODE] // cur_off, self.labels_at_tail --> // (end of buffer) self.data.truncate(b.start as usize); self.fixup_records.truncate(b.fixup); // State: // [PRE CODE] // cur_off, Offset b.start, b.labels_at_this_branch: // (end of buffer) // // self.labels_at_tail --> (past end of buffer) let cur_off = self.cur_offset(); self.labels_at_tail_off = cur_off; // State: // [PRE CODE] // cur_off, Offset b.start, b.labels_at_this_branch, // self.labels_at_tail: // (end of buffer) // // resolve_label_offset(l) for l in labels_at_tail: // (past end of buffer) trace!( "truncate_last_branch: truncated {:?}; off now {}", b, cur_off ); // Fix up resolved label offsets for labels at tail. for &l in &self.labels_at_tail { self.label_offsets[l.0 as usize] = cur_off; } // Old labels_at_this_branch are now at cur_off. self.labels_at_tail .extend(b.labels_at_this_branch.into_iter()); // Post-invariant: this operation is defined to truncate the buffer, // which moves cur_off backward, and to move labels at the end of the // buffer back to the start-of-branch offset. // // latest_branches satisfies all invariants: // - it has no branches past the end of the buffer (branches are in // order, we removed the last one, and we truncated the buffer to just // before the start of that branch) // - no labels were moved to lower offsets than the (new) cur_off, so // the labels_at_this_branch list for any other branch need not change. // // labels_at_tail satisfies all invariants: // - all labels that were at the tail after the truncated branch are // moved backward to just before the branch, which becomes the new tail; // thus every element in the list should remain (ensured by `.extend()` // above). // - all labels that refer to the new tail, which is the start-offset of // the truncated branch, must be present. The `labels_at_this_branch` // list in the truncated branch's record is a complete and precise list // of exactly these labels; we append these to labels_at_tail. // - labels_at_tail_off is at cur_off after truncation occurs, so the // list is valid (not to be lazily cleared). // // The stated operation was performed: // - For each label at the end of the buffer prior to this method, it // now resolves to the new (truncated) end of the buffer: it must have // been in `labels_at_tail` (this list is precise and complete, and // the tail was at the end of the truncated branch on entry), and we // iterate over this list and set `label_offsets` to the new tail. // None of these labels could have been an alias (by invariant), so // `label_offsets` is authoritative for each. // - No other labels will be past the end of the buffer, because of the // requirement that no labels be bound to the middle of branch ranges // (see comments to `add_{cond,uncond}_branch()`). // - The buffer is truncated to just before the last branch, and the // fixup record referring to that last branch is removed. self.check_label_branch_invariants(); } fn optimize_branches(&mut self) { self.lazily_clear_labels_at_tail(); // Invariants valid at this point. trace!( "enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}", self.latest_branches, self.labels_at_tail, self.fixup_records ); // We continue to munch on branches at the tail of the buffer until no // more rules apply. Note that the loop only continues if a branch is // actually truncated (or if labels are redirected away from a branch), // so this always makes progress. while let Some(b) = self.latest_branches.last() { let cur_off = self.cur_offset(); trace!("optimize_branches: last branch {:?} at off {}", b, cur_off); // If there has been any code emission since the end of the last branch or // label definition, then there's nothing we can edit (because we // don't move code once placed, only back up and overwrite), so // clear the records and finish. if b.end < cur_off { break; } // Invariant: we are looking at a branch that ends at the tail of // the buffer. // For any branch, conditional or unconditional: // - If the target is a label at the current offset, then remove // the conditional branch, and reset all labels that targetted // the current offset (end of branch) to the truncated // end-of-code. // // Preserves execution semantics: a branch to its own fallthrough // address is equivalent to a no-op; in both cases, nextPC is the // fallthrough. if self.resolve_label_offset(b.target) == cur_off { trace!("branch with target == cur off; truncating"); self.truncate_last_branch(); continue; } // If latest is an unconditional branch: // // - If the branch's target is not its own start address, then for // each label at the start of branch, make the label an alias of the // branch target, and remove the label from the "labels at this // branch" list. // // - Preserves execution semantics: an unconditional branch's // only effect is to set PC to a new PC; this change simply // collapses one step in the step-semantics. // // - Post-invariant: the labels that were bound to the start of // this branch become aliases, so they must not be present in any // labels-at-this-branch list or the labels-at-tail list. The // labels are removed form the latest-branch record's // labels-at-this-branch list, and are never placed in the // labels-at-tail list. Furthermore, it is correct that they are // not in either list, because they are now aliases, and labels // that are aliases remain aliases forever. // // - If there is a prior unconditional branch that ends just before // this one begins, and this branch has no labels bound to its // start, then we can truncate this branch, because it is entirely // unreachable (we have redirected all labels that make it // reachable otherwise). Do so and continue around the loop. // // - Preserves execution semantics: the branch is unreachable, // because execution can only flow into an instruction from the // prior instruction's fallthrough or from a branch bound to that // instruction's start offset. Unconditional branches have no // fallthrough, so if the prior instruction is an unconditional // branch, no fallthrough entry can happen. The // labels-at-this-branch list is complete (by invariant), so if it // is empty, then the instruction is entirely unreachable. Thus, // it can be removed. // // - Post-invariant: ensured by truncate_last_branch(). // // - If there is a prior conditional branch whose target label // resolves to the current offset (branches around the // unconditional branch), then remove the unconditional branch, // and make the target of the unconditional the target of the // conditional instead. // // - Preserves execution semantics: previously we had: // // L1: // cond_br L2 // br L3 // L2: // (end of buffer) // // by removing the last branch, we have: // // L1: // cond_br L2 // L2: // (end of buffer) // // we then fix up the records for the conditional branch to // have: // // L1: // cond_br.inverted L3 // L2: // // In the original code, control flow reaches L2 when the // conditional branch's predicate is true, and L3 otherwise. In // the optimized code, the same is true. // // - Post-invariant: all edits to latest_branches and // labels_at_tail are performed by `truncate_last_branch()`, // which maintains the invariants at each step. if b.is_uncond() { // Set any label equal to current branch's start as an alias of // the branch's target, if the target is not the branch itself // (i.e., an infinite loop). if self.resolve_label_offset(b.target) != b.start { let redirected = b.labels_at_this_branch.len(); for &l in &b.labels_at_this_branch { trace!( " -> label at start of branch {:?} redirected to target {:?}", l, b.target ); self.label_aliases[l.0 as usize] = b.target; // NOTE: we continue to ensure the invariant that labels // pointing to tail of buffer are in `labels_at_tail` // because we already ensured above that the last branch // cannot have a target of `cur_off`; so we never have // to put the label into `labels_at_tail` when moving it // here. } // Maintain invariant: all branches have been redirected // and are no longer pointing at the start of this branch. let mut_b = self.latest_branches.last_mut().unwrap(); mut_b.labels_at_this_branch.clear(); if redirected > 0 { trace!(" -> after label redirects, restarting loop"); continue; } } let b = self.latest_branches.last().unwrap(); // Examine any immediately preceding branch. if self.latest_branches.len() > 1 { let prev_b = &self.latest_branches[self.latest_branches.len() - 2]; trace!(" -> more than one branch; prev_b = {:?}", prev_b); // This uncond is immediately after another uncond; we // should have already redirected labels to this uncond away // (but check to be sure); so we can truncate this uncond. if prev_b.is_uncond() && prev_b.end == b.start && b.labels_at_this_branch.is_empty() { trace!(" -> uncond follows another uncond; truncating"); self.truncate_last_branch(); continue; } // This uncond is immediately after a conditional, and the // conditional's target is the end of this uncond, and we've // already redirected labels to this uncond away; so we can // truncate this uncond, flip the sense of the conditional, and // set the conditional's target (in `latest_branches` and in // `fixup_records`) to the uncond's target. if prev_b.is_cond() && prev_b.end == b.start && self.resolve_label_offset(prev_b.target) == cur_off { trace!(" -> uncond follows a conditional, and conditional's target resolves to current offset"); // Save the target of the uncond (this becomes the // target of the cond), and truncate the uncond. let target = b.target; let data = prev_b.inverted.clone().unwrap(); self.truncate_last_branch(); // Mutate the code and cond branch. let off_before_edit = self.cur_offset(); let prev_b = self.latest_branches.last_mut().unwrap(); let not_inverted = SmallVec::from( &self.data[(prev_b.start as usize)..(prev_b.end as usize)], ); // Low-level edit: replaces bytes of branch with // inverted form. cur_off remains the same afterward, so // we do not need to modify label data structures. self.data.truncate(prev_b.start as usize); self.data.extend_from_slice(&data[..]); // Save the original code as the inversion of the // inverted branch, in case we later edit this branch // again. prev_b.inverted = Some(not_inverted); self.fixup_records[prev_b.fixup].label = target; trace!(" -> reassigning target of condbr to {:?}", target); prev_b.target = target; debug_assert_eq!(off_before_edit, self.cur_offset()); continue; } } } // If we couldn't do anything with the last branch, then break. break; } self.purge_latest_branches(); trace!( "leave optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}", self.latest_branches, self.labels_at_tail, self.fixup_records ); } fn purge_latest_branches(&mut self) { // All of our branch simplification rules work only if a branch ends at // the tail of the buffer, with no following code; and branches are in // order in latest_branches; so if the last entry ends prior to // cur_offset, then clear all entries. let cur_off = self.cur_offset(); if let Some(l) = self.latest_branches.last() { if l.end < cur_off { trace!("purge_latest_branches: removing branch {:?}", l); self.latest_branches.clear(); } } // Post-invariant: no invariant requires any branch to appear in // `latest_branches`; it is always optional. The list-clear above thus // preserves all semantics. } /// Emit a constant at some point in the future, binding the given label to /// its offset. The constant will be placed at most `max_distance` from the /// current offset. pub fn defer_constant( &mut self, label: MachLabel, align: CodeOffset, data: &[u8], max_distance: CodeOffset, ) { let deadline = self.cur_offset() + max_distance; self.island_worst_case_size += data.len() as CodeOffset; self.island_worst_case_size &= !(I::LabelUse::ALIGN - 1); self.pending_constants.push(MachLabelConstant { label, align, data: SmallVec::from(data), }); if deadline < self.island_deadline { self.island_deadline = deadline; } } /// Is an island needed within the next N bytes? pub fn island_needed(&self, distance: CodeOffset) -> bool { let worst_case_end_of_island = self.cur_offset() + distance + self.island_worst_case_size; worst_case_end_of_island > self.island_deadline } /// Emit all pending constants and veneers. Should only be called if /// `island_needed()` returns true, i.e., if we actually reach a deadline: /// otherwise, unnecessary veneers may be inserted. pub fn emit_island(&mut self) { // We're going to purge fixups, so no latest-branch editing can happen // anymore. self.latest_branches.clear(); let pending_constants = mem::replace(&mut self.pending_constants, SmallVec::new()); for MachLabelConstant { label, align, data } in pending_constants.into_iter() { self.align_to(align); self.bind_label(label); self.put_data(&data[..]); } let fixup_records = mem::replace(&mut self.fixup_records, SmallVec::new()); let mut new_fixups = SmallVec::new(); for MachLabelFixup { label, offset, kind, } in fixup_records.into_iter() { trace!( "emit_island: fixup for label {:?} at offset {} kind {:?}", label, offset, kind ); // We eagerly perform fixups whose label targets are known, if not out // of range, to avoid unnecessary veneers. let label_offset = self.resolve_label_offset(label); let known = label_offset != UNKNOWN_LABEL_OFFSET; let in_range = if known { if label_offset >= offset { (label_offset - offset) <= kind.max_pos_range() } else { (offset - label_offset) <= kind.max_neg_range() } } else { false }; trace!( " -> label_offset = {}, known = {}, in_range = {} (pos {} neg {})", label_offset, known, in_range, kind.max_pos_range(), kind.max_neg_range() ); let start = offset as usize; let end = (offset + kind.patch_size()) as usize; if in_range { debug_assert!(known); // implied by in_range. let slice = &mut self.data[start..end]; trace!("patching in-range!"); kind.patch(slice, offset, label_offset); } else if !known && !kind.supports_veneer() { // Nothing for now. Keep it for next round. new_fixups.push(MachLabelFixup { label, offset, kind, }); } else if !in_range && kind.supports_veneer() { // Allocate space for a veneer in the island. self.align_to(I::LabelUse::ALIGN); let veneer_offset = self.cur_offset(); trace!("making a veneer at {}", veneer_offset); let slice = &mut self.data[start..end]; // Patch the original label use to refer to the veneer. trace!( "patching original at offset {} to veneer offset {}", offset, veneer_offset ); kind.patch(slice, offset, veneer_offset); // Generate the veneer. let veneer_slice = self.get_appended_space(kind.veneer_size() as usize); let (veneer_fixup_off, veneer_label_use) = kind.generate_veneer(veneer_slice, veneer_offset); trace!( "generated veneer; fixup offset {}, label_use {:?}", veneer_fixup_off, veneer_label_use ); // If the label is known (but was just out of range), do the // veneer label-use fixup now too; otherwise, save it for later. if known { let start = veneer_fixup_off as usize; let end = (veneer_fixup_off + veneer_label_use.patch_size()) as usize; let veneer_slice = &mut self.data[start..end]; trace!("doing veneer fixup right away too"); veneer_label_use.patch(veneer_slice, veneer_fixup_off, label_offset); } else { new_fixups.push(MachLabelFixup { label, offset: veneer_fixup_off, kind: veneer_label_use, }); } } else { panic!( "Cannot support label-use {:?} (known = {}, in-range = {})", kind, known, in_range ); } } self.fixup_records = new_fixups; self.island_deadline = UNKNOWN_LABEL_OFFSET; } /// Finish any deferred emissions and/or fixups. pub fn finish(mut self) -> MachBufferFinalized { // Ensure that all labels are defined. This is a full (release-mode) // assert because we must avoid looping indefinitely below; an // unresolved label will prevent the fixup_records vec from emptying. assert!(self .label_offsets .iter() .all(|&off| off != UNKNOWN_LABEL_OFFSET)); while !self.pending_constants.is_empty() || !self.fixup_records.is_empty() { // `emit_island()` will emit any pending veneers and constants, and // as a side-effect, will also take care of any fixups with resolved // labels eagerly. self.emit_island(); } MachBufferFinalized { data: self.data, relocs: self.relocs, traps: self.traps, call_sites: self.call_sites, srclocs: self.srclocs, } } /// Add an external relocation at the current offset. pub fn add_reloc( &mut self, srcloc: SourceLoc, kind: Reloc, name: &ExternalName, addend: Addend, ) { let name = name.clone(); self.relocs.push(MachReloc { offset: self.data.len() as CodeOffset, srcloc, kind, name, addend, }); } /// Add a trap record at the current offset. pub fn add_trap(&mut self, srcloc: SourceLoc, code: TrapCode) { self.traps.push(MachTrap { offset: self.data.len() as CodeOffset, srcloc, code, }); } /// Add a call-site record at the current offset. pub fn add_call_site(&mut self, srcloc: SourceLoc, opcode: Opcode) { self.call_sites.push(MachCallSite { ret_addr: self.data.len() as CodeOffset, srcloc, opcode, }); } /// Set the `SourceLoc` for code from this offset until the offset at the /// next call to `end_srcloc()`. pub fn start_srcloc(&mut self, loc: SourceLoc) { self.cur_srcloc = Some((self.cur_offset(), loc)); } /// Mark the end of the `SourceLoc` segment started at the last /// `start_srcloc()` call. pub fn end_srcloc(&mut self) { let (start, loc) = self .cur_srcloc .take() .expect("end_srcloc() called without start_srcloc()"); let end = self.cur_offset(); // Skip zero-length extends. debug_assert!(end >= start); if end > start { self.srclocs.push(MachSrcLoc { start, end, loc }); } } } impl MachBufferFinalized { /// Get a list of source location mapping tuples in sorted-by-start-offset order. pub fn get_srclocs_sorted(&self) -> &[MachSrcLoc] { &self.srclocs[..] } /// Get the total required size for the code. pub fn total_size(&self) -> CodeOffset { self.data.len() as CodeOffset } /// Emit this buffer to the given CodeSink. pub fn emit(&self, sink: &mut CS) { // N.B.: we emit every section into the .text section as far as // the `CodeSink` is concerned; we do not bother to segregate // the contents into the actual program text, the jumptable and the // rodata (constant pool). This allows us to generate code assuming // that these will not be relocated relative to each other, and avoids // having to designate each section as belonging in one of the three // fixed categories defined by `CodeSink`. If this becomes a problem // later (e.g. because of memory permissions or similar), we can // add this designation and segregate the output; take care, however, // to add the appropriate relocations in this case. let mut next_reloc = 0; let mut next_trap = 0; let mut next_call_site = 0; for (idx, byte) in self.data.iter().enumerate() { if next_reloc < self.relocs.len() { let reloc = &self.relocs[next_reloc]; if reloc.offset == idx as CodeOffset { sink.reloc_external(reloc.srcloc, reloc.kind, &reloc.name, reloc.addend); next_reloc += 1; } } if next_trap < self.traps.len() { let trap = &self.traps[next_trap]; if trap.offset == idx as CodeOffset { sink.trap(trap.code, trap.srcloc); next_trap += 1; } } if next_call_site < self.call_sites.len() { let call_site = &self.call_sites[next_call_site]; if call_site.ret_addr == idx as CodeOffset { sink.add_call_site(call_site.opcode, call_site.srcloc); next_call_site += 1; } } sink.put1(*byte); } sink.begin_jumptables(); sink.begin_rodata(); sink.end_codegen(); } } /// A constant that is deferred to the next constant-pool opportunity. struct MachLabelConstant { /// This label will refer to the constant's offset. label: MachLabel, /// Required alignment. align: CodeOffset, /// This data will be emitted when able. data: SmallVec<[u8; 16]>, } /// A fixup to perform on the buffer once code is emitted. Fixups always refer /// to labels and patch the code based on label offsets. Hence, they are like /// relocations, but internal to one buffer. #[derive(Debug)] struct MachLabelFixup { /// The label whose offset controls this fixup. label: MachLabel, /// The offset to fix up / patch to refer to this label. offset: CodeOffset, /// The kind of fixup. This is architecture-specific; each architecture may have, /// e.g., several types of branch instructions, each with differently-sized /// offset fields and different places within the instruction to place the /// bits. kind: I::LabelUse, } /// A relocation resulting from a compilation. struct MachReloc { /// The offset at which the relocation applies, *relative to the /// containing section*. offset: CodeOffset, /// The original source location. srcloc: SourceLoc, /// The kind of relocation. kind: Reloc, /// The external symbol / name to which this relocation refers. name: ExternalName, /// The addend to add to the symbol value. addend: i64, } /// A trap record resulting from a compilation. struct MachTrap { /// The offset at which the trap instruction occurs, *relative to the /// containing section*. offset: CodeOffset, /// The original source location. srcloc: SourceLoc, /// The trap code. code: TrapCode, } /// A call site record resulting from a compilation. struct MachCallSite { /// The offset of the call's return address, *relative to the containing section*. ret_addr: CodeOffset, /// The original source location. srcloc: SourceLoc, /// The call's opcode. opcode: Opcode, } /// A source-location mapping resulting from a compilation. #[derive(Clone, Debug)] pub struct MachSrcLoc { /// The start of the region of code corresponding to a source location. /// This is relative to the start of the function, not to the start of the /// section. pub start: CodeOffset, /// The end of the region of code corresponding to a source location. /// This is relative to the start of the section, not to the start of the /// section. pub end: CodeOffset, /// The source location. pub loc: SourceLoc, } /// Record of branch instruction in the buffer, to facilitate editing. #[derive(Clone, Debug)] struct MachBranch { start: CodeOffset, end: CodeOffset, target: MachLabel, fixup: usize, inverted: Option>, /// All labels pointing to the start of this branch. For correctness, this /// *must* be complete (i.e., must contain all labels whose resolved offsets /// are at the start of this branch): we rely on being able to redirect all /// labels that could jump to this branch before removing it, if it is /// otherwise unreachable. labels_at_this_branch: SmallVec<[MachLabel; 4]>, } impl MachBranch { fn is_cond(&self) -> bool { self.inverted.is_some() } fn is_uncond(&self) -> bool { self.inverted.is_none() } } // We use an actual instruction definition to do tests, so we depend on the `arm64` feature here. #[cfg(all(test, feature = "arm64"))] mod test { use super::*; use crate::isa::aarch64::inst::xreg; use crate::isa::aarch64::inst::{BranchTarget, CondBrKind, Inst}; use crate::machinst::MachInstEmit; use crate::settings; use std::default::Default; fn label(n: u32) -> MachLabel { MachLabel::from_block(n) } fn target(n: u32) -> BranchTarget { BranchTarget::Label(label(n)) } #[test] fn test_elide_jump_to_next() { let flags = settings::Flags::new(settings::builder()); let mut buf = MachBuffer::new(); let mut state = Default::default(); buf.reserve_labels_for_blocks(2); buf.bind_label(label(0)); let inst = Inst::Jump { dest: target(1) }; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(1)); let buf = buf.finish(); assert_eq!(0, buf.total_size()); } #[test] fn test_elide_trivial_jump_blocks() { let flags = settings::Flags::new(settings::builder()); let mut buf = MachBuffer::new(); let mut state = Default::default(); buf.reserve_labels_for_blocks(4); buf.bind_label(label(0)); let inst = Inst::CondBr { kind: CondBrKind::NotZero(xreg(0)), taken: target(1), not_taken: target(2), }; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(1)); let inst = Inst::Jump { dest: target(3) }; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(2)); let inst = Inst::Jump { dest: target(3) }; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(3)); let buf = buf.finish(); assert_eq!(0, buf.total_size()); } #[test] fn test_flip_cond() { let flags = settings::Flags::new(settings::builder()); let mut buf = MachBuffer::new(); let mut state = Default::default(); buf.reserve_labels_for_blocks(4); buf.bind_label(label(0)); let inst = Inst::CondBr { kind: CondBrKind::NotZero(xreg(0)), taken: target(1), not_taken: target(2), }; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(1)); let inst = Inst::Nop4; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(2)); let inst = Inst::Nop4; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(3)); let buf = buf.finish(); let mut buf2 = MachBuffer::new(); let mut state = Default::default(); let inst = Inst::OneWayCondBr { kind: CondBrKind::Zero(xreg(0)), target: BranchTarget::ResolvedOffset(8), }; inst.emit(&mut buf2, &flags, &mut state); let inst = Inst::Nop4; inst.emit(&mut buf2, &flags, &mut state); inst.emit(&mut buf2, &flags, &mut state); let buf2 = buf2.finish(); assert_eq!(buf.data, buf2.data); } #[test] fn test_island() { let flags = settings::Flags::new(settings::builder()); let mut buf = MachBuffer::new(); let mut state = Default::default(); buf.reserve_labels_for_blocks(4); buf.bind_label(label(0)); let inst = Inst::CondBr { kind: CondBrKind::NotZero(xreg(0)), taken: target(2), not_taken: target(3), }; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(1)); while buf.cur_offset() < 2000000 { if buf.island_needed(0) { buf.emit_island(); } let inst = Inst::Nop4; inst.emit(&mut buf, &flags, &mut state); } buf.bind_label(label(2)); let inst = Inst::Nop4; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(3)); let inst = Inst::Nop4; inst.emit(&mut buf, &flags, &mut state); let buf = buf.finish(); assert_eq!(2000000 + 8, buf.total_size()); let mut buf2 = MachBuffer::new(); let mut state = Default::default(); let inst = Inst::CondBr { kind: CondBrKind::NotZero(xreg(0)), taken: BranchTarget::ResolvedOffset(1048576 - 4), not_taken: BranchTarget::ResolvedOffset(2000000 + 4 - 4), }; inst.emit(&mut buf2, &flags, &mut state); let buf2 = buf2.finish(); assert_eq!(&buf.data[0..8], &buf2.data[..]); } #[test] fn test_island_backward() { let flags = settings::Flags::new(settings::builder()); let mut buf = MachBuffer::new(); let mut state = Default::default(); buf.reserve_labels_for_blocks(4); buf.bind_label(label(0)); let inst = Inst::Nop4; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(1)); let inst = Inst::Nop4; inst.emit(&mut buf, &flags, &mut state); buf.bind_label(label(2)); while buf.cur_offset() < 2000000 { let inst = Inst::Nop4; inst.emit(&mut buf, &flags, &mut state); } buf.bind_label(label(3)); let inst = Inst::CondBr { kind: CondBrKind::NotZero(xreg(0)), taken: target(0), not_taken: target(1), }; inst.emit(&mut buf, &flags, &mut state); let buf = buf.finish(); assert_eq!(2000000 + 12, buf.total_size()); let mut buf2 = MachBuffer::new(); let mut state = Default::default(); let inst = Inst::CondBr { kind: CondBrKind::NotZero(xreg(0)), taken: BranchTarget::ResolvedOffset(8), not_taken: BranchTarget::ResolvedOffset(4 - (2000000 + 4)), }; inst.emit(&mut buf2, &flags, &mut state); let inst = Inst::Jump { dest: BranchTarget::ResolvedOffset(-(2000000 + 8)), }; inst.emit(&mut buf2, &flags, &mut state); let buf2 = buf2.finish(); assert_eq!(&buf.data[2000000..], &buf2.data[..]); } }