cranelift-frontend: Replace Vecs with ListPools (#5001)
* Elide redundant sentinel values The `undef_variables` lists were a binding from Variable to Value, but the Values were always equal to a suffix of the block's parameters. So instead of storing another copy, we can just get them back from the block parameters. According to DHAT, this decreases total memory allocated and number of bytes written, and increases number of bytes read and instructions retired, but all by small fractions of a percent. According to hyperfine, main is "1.00 ± 0.01 times faster". * Use entity_impl for cranelift_frontend::Variable Instead of hand-coding essentially the same thing. * Keep undefined variables in a ListPool According to DHAT, this improves every measure of performance (instructions retired, total memory allocated, max heap size, bytes read, and bytes written), although by fractions of a percent. According to hyperfine the difference is nearly zero, but on Spidermonkey this branch is "1.01 ± 0.00 times faster" than main. * Elide redundant block IDs In a list of predecessors, we previously kept both the jump instruction that points to the current block, and the block where that instruction resides. But we can look up the block from the instruction as long as we have access to the current Layout, which we do everywhere that it was necessary. So don't store the block, just store the instruction. * Keep predecessor definitions in a ListPool * Make append_jump_argument independent of self This makes it easier to reason about borrow-checking issues. * Reuse `results` instead of re-doing variable lookup This eliminates three array lookups per predecessor by hanging on to the results of earlier steps a little longer. This only works now because I previously removed the need to borrow all of `self`, which otherwise prevented keeping a borrow of self.results alive. I had experimented with using `Vec::split_off` to copy the relevant chunk of results to a temporary heap allocation, but the extra allocation and copy was measurably slower. So it's important that this is just a borrow. * Cache single-predecessor block ID when sealing Of the code in cranelift_frontend, `use_var` is the second-hottest path, sitting close behind the `build` function that's used when inserting every new instruction. This makes sense given that the operands of a new instruction usually need to be looked up immediately before building the instruction. So making the single-predecessor loops in `find_var` and `use_var_local` do fewer memory accesses and execute fewer instructions turns out to have a measurable effect. It's still only a small fraction of a percent overall since cranelift-frontend is only a few percent of total runtime. This patch keeps a block ID in the SSABlockData, which is None unless both the block is sealed and it has exactly one predecessor. Doing so avoids two array lookups on each iteration of the two loops. According to DHAT, compared with main, at this point this PR uses 0.3% less memory at max heap, reads 0.6% fewer bytes, and writes 0.2% fewer bytes. According to Hyperfine, this PR is "1.01 ± 0.01 times faster" than main when compiling Spidermonkey. On the other hand, Sightglass says main is 1.01x faster than this PR on the same benchmark by CPU cycles. In short, actual effects are too small to measure reliably.
This commit is contained in:
@@ -104,6 +104,12 @@ impl<T: core::hash::Hash + EntityRef + ReservedValue> core::hash::Hash for ListP
|
||||
}
|
||||
}
|
||||
|
||||
impl<T: EntityRef + ReservedValue> Default for ListPool<T> {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
/// Lists are allocated in sizes that are powers of two, starting from 4.
|
||||
/// Each power of two is assigned a size class number, so the size is `4 << SizeClass`.
|
||||
type SizeClass = u8;
|
||||
|
||||
Reference in New Issue
Block a user