Cranelift: Fix cold-blocks-related lowering bug.
If a block is marked cold but has side-effect-free code that is only used by side-effectful code in non-cold blocks, we will erroneously fail to emit it, causing a regalloc failure. This is due to the interaction of block ordering and lowering: we rely on block ordering to visit uses before defs (except for backedges) so that we can effectively do an inline liveness analysis and skip lowering operations that are not used anywhere. This "inline DCE" is needed because instruction lowering can pattern-match and merge one instruction into another, removing the need to generate the source instruction. Unfortunately, the way that I added cold-block support in #3698 was oblivious to this -- it just changed the block sort order. For efficiency reasons, we generate code in its final order directly, so it would not be tenable to generate it in e.g. RPO first and then reorder cold blocks to the bottom; we really do want to visit in the same order as the final code. This PR fixes the bug by moving the point at which cold blocks are sunk to emission-time instead. This is cheaper than either trying to visit blocks during lowering in RPO but add to VCode out-of-order, or trying to do some expensive analysis to recover proper liveness. It's not clear that the latter would be possible anyway -- the need to lower some instructions depends on other instructions' isel results/merging success, so we really do need to visit in RPO, and we can't simply lower all instructions as side-effecting roots (some can't be toplevel nodes). The one downside of this approach is that the VCode itself still has cold blocks inline; so in the text format (and hence compile-tests) it's not possible to see the sinking. This PR adds a test for cold-block sinking that actually verifies the machine code. (The test also includes an add-instruction in the cold path that would have been incorrectly skipped prior to this fix.) Fortunately this bug would not have been triggered by the one current use of cold blocks in #3699, because there the only operation in the cold block was an (always effectful) call instruction. The worst-case effect of the bug in other code would be a regalloc panic; no silent miscompilations could result.
This commit is contained in:
@@ -470,11 +470,24 @@ impl<I: VCodeInst> VCode<I> {
|
||||
let mut inst_ends = vec![0; self.insts.len()];
|
||||
let mut label_insn_iix = vec![0; self.num_blocks()];
|
||||
|
||||
// Construct the final order we emit code in: cold blocks at the end.
|
||||
let mut final_order: SmallVec<[BlockIndex; 16]> = smallvec![];
|
||||
let mut cold_blocks: SmallVec<[BlockIndex; 16]> = smallvec![];
|
||||
for block in 0..self.num_blocks() {
|
||||
let block = block as BlockIndex;
|
||||
if self.block_order.is_cold(block) {
|
||||
cold_blocks.push(block);
|
||||
} else {
|
||||
final_order.push(block);
|
||||
}
|
||||
}
|
||||
final_order.extend(cold_blocks);
|
||||
|
||||
// Emit blocks.
|
||||
let mut safepoint_idx = 0;
|
||||
let mut cur_srcloc = None;
|
||||
let mut last_offset = None;
|
||||
for block in 0..self.num_blocks() {
|
||||
let block = block as BlockIndex;
|
||||
for block in final_order {
|
||||
let new_offset = I::align_basic_block(buffer.cur_offset());
|
||||
while new_offset > buffer.cur_offset() {
|
||||
// Pad with NOPs up to the aligned block offset.
|
||||
|
||||
Reference in New Issue
Block a user