wasmtime

Files

Jamey Sharp d35c508436 cranelift-frontend: Replace Vecs with ListPools (#5001 )

* Elide redundant sentinel values

The `undef_variables` lists were a binding from Variable to Value, but
the Values were always equal to a suffix of the block's parameters. So
instead of storing another copy, we can just get them back from the
block parameters.

According to DHAT, this decreases total memory allocated and number of
bytes written, and increases number of bytes read and instructions
retired, but all by small fractions of a percent. According to
hyperfine, main is "1.00 ± 0.01 times faster".

* Use entity_impl for cranelift_frontend::Variable

Instead of hand-coding essentially the same thing.

* Keep undefined variables in a ListPool

According to DHAT, this improves every measure of performance
(instructions retired, total memory allocated, max heap size, bytes
read, and bytes written), although by fractions of a percent. According
to hyperfine the difference is nearly zero, but on Spidermonkey this
branch is "1.01 ± 0.00 times faster" than main.

* Elide redundant block IDs

In a list of predecessors, we previously kept both the jump instruction
that points to the current block, and the block where that instruction
resides. But we can look up the block from the instruction as long as we
have access to the current Layout, which we do everywhere that it was
necessary. So don't store the block, just store the instruction.

* Keep predecessor definitions in a ListPool

* Make append_jump_argument independent of self

This makes it easier to reason about borrow-checking issues.

* Reuse `results` instead of re-doing variable lookup

This eliminates three array lookups per predecessor by hanging on to the
results of earlier steps a little longer. This only works now because I
previously removed the need to borrow all of `self`, which otherwise
prevented keeping a borrow of self.results alive.

I had experimented with using `Vec::split_off` to copy the relevant
chunk of results to a temporary heap allocation, but the extra
allocation and copy was measurably slower. So it's important that this
is just a borrow.

* Cache single-predecessor block ID when sealing

Of the code in cranelift_frontend, `use_var` is the second-hottest path,
sitting close behind the `build` function that's used when inserting
every new instruction. This makes sense given that the operands of a new
instruction usually need to be looked up immediately before building the
instruction.

So making the single-predecessor loops in `find_var` and `use_var_local`
do fewer memory accesses and execute fewer instructions turns out to
have a measurable effect. It's still only a small fraction of a percent
overall since cranelift-frontend is only a few percent of total runtime.

This patch keeps a block ID in the SSABlockData, which is None unless
both the block is sealed and it has exactly one predecessor. Doing so
avoids two array lookups on each iteration of the two loops.

According to DHAT, compared with main, at this point this PR uses 0.3%
less memory at max heap, reads 0.6% fewer bytes, and writes 0.2% fewer
bytes.

According to Hyperfine, this PR is "1.01 ± 0.01 times faster" than main
when compiling Spidermonkey. On the other hand, Sightglass says main is
1.01x faster than this PR on the same benchmark by CPU cycles. In short,
actual effects are too small to measure reliably.

2022-10-03 14:29:12 -07:00

src

cranelift-frontend: Replace Vecs with ListPools (#5001 )

2022-10-03 14:29:12 -07:00

Cargo.toml

Leverage Cargo's workspace inheritance feature (#4905 )

2022-09-26 11:30:01 -05:00

LICENSE

moved crates in lib/ to src/, renamed crates, modified some files' text (#660 )

2019-01-28 15:56:54 -08:00

README.md

moved crates in lib/ to src/, renamed crates, modified some files' text (#660 )

2019-01-28 15:56:54 -08:00

README.md

This crate contains array-based data structures used by the core Cranelift code generator which use densely numbered entity references as mapping keys.

One major difference between this crate and crates like slotmap, slab, and generational-arena is that this crate currently provides no way to delete entities. This limits its use to situations where deleting isn't important, however this also makes it more efficient, because it doesn't need extra bookkeeping state to reuse the storage for deleted objects, or to ensure that new objects always have unique keys (eg. slotmap's and generational-arena's versioning).

Another major difference is that this crate protects against using a key from one map to access an element in another. Where SlotMap, Slab, and Arena have a value type parameter, PrimaryMap has a key type parameter and a value type parameter. The crate also provides the entity_impl macro which makes it easy to declare new unique types for use as keys. Any attempt to use a key in a map it's not intended for is diagnosed with a type error.

Another is that this crate has two core map types, PrimaryMap and SecondaryMap, which serve complementary purposes. A PrimaryMap creates its own keys when elements are inserted, while an SecondaryMap reuses the keys values of a PrimaryMap, conceptually storing additional data in the same index space. SecondaryMap's values must implement Default and all elements in an SecondaryMap initially have the value of default().

A common way to implement Default is to wrap a type in Option, however this crate also provides the PackedOption utility which can use less memory in some cases.

Additional utilities provided by this crate include:

EntityList, for allocating many small arrays (such as instruction operand lists in a compiler code generator).
SparseMap: an alternative to SecondaryMap which can use less memory in some situations.
EntitySet: a specialized form of SecondaryMap using a bitvector to record which entities are members of the set.