Cranelift: Fix cold-blocks-related lowering bug.

If a block is marked cold but has side-effect-free code that is only
used by side-effectful code in non-cold blocks, we will erroneously fail
to emit it, causing a regalloc failure.

This is due to the interaction of block ordering and lowering: we rely
on block ordering to visit uses before defs (except for backedges) so
that we can effectively do an inline liveness analysis and skip lowering
operations that are not used anywhere. This "inline DCE" is needed
because instruction lowering can pattern-match and merge one instruction
into another, removing the need to generate the source instruction.

Unfortunately, the way that I added cold-block support in #3698 was
oblivious to this -- it just changed the block sort order. For
efficiency reasons, we generate code in its final order directly, so it
would not be tenable to generate it in e.g. RPO first and then reorder
cold blocks to the bottom; we really do want to visit in the same order
as the final code.

This PR fixes the bug by moving the point at which cold blocks are sunk
to emission-time instead. This is cheaper than either trying to visit
blocks during lowering in RPO but add to VCode out-of-order, or trying
to do some expensive analysis to recover proper liveness. It's not clear
that the latter would be possible anyway -- the need to lower some
instructions depends on other instructions' isel results/merging
success, so we really do need to visit in RPO, and we can't simply lower
all instructions as side-effecting roots (some can't be toplevel nodes).

The one downside of this approach is that the VCode itself still has
cold blocks inline; so in the text format (and hence compile-tests) it's
not possible to see the sinking. This PR adds a test for cold-block
sinking that actually verifies the machine code. (The test also includes
an add-instruction in the cold path that would have been incorrectly
skipped prior to this fix.)

Fortunately this bug would not have been triggered by the one current
use of cold blocks in #3699, because there the only operation in the
cold block was an (always effectful) call instruction. The worst-case
effect of the bug in other code would be a regalloc panic; no silent
miscompilations could result.
This commit is contained in:
Chris Fallin
2022-01-20 15:40:56 -08:00
parent 90e7cef56c
commit ef1b2d2fa8
3 changed files with 143 additions and 17 deletions

View File

@@ -470,11 +470,24 @@ impl<I: VCodeInst> VCode<I> {
let mut inst_ends = vec![0; self.insts.len()];
let mut label_insn_iix = vec![0; self.num_blocks()];
// Construct the final order we emit code in: cold blocks at the end.
let mut final_order: SmallVec<[BlockIndex; 16]> = smallvec![];
let mut cold_blocks: SmallVec<[BlockIndex; 16]> = smallvec![];
for block in 0..self.num_blocks() {
let block = block as BlockIndex;
if self.block_order.is_cold(block) {
cold_blocks.push(block);
} else {
final_order.push(block);
}
}
final_order.extend(cold_blocks);
// Emit blocks.
let mut safepoint_idx = 0;
let mut cur_srcloc = None;
let mut last_offset = None;
for block in 0..self.num_blocks() {
let block = block as BlockIndex;
for block in final_order {
let new_offset = I::align_basic_block(buffer.cur_offset());
while new_offset > buffer.cur_offset() {
// Pad with NOPs up to the aligned block offset.