If a block is marked cold but has side-effect-free code that is only used by side-effectful code in non-cold blocks, we will erroneously fail to emit it, causing a regalloc failure. This is due to the interaction of block ordering and lowering: we rely on block ordering to visit uses before defs (except for backedges) so that we can effectively do an inline liveness analysis and skip lowering operations that are not used anywhere. This "inline DCE" is needed because instruction lowering can pattern-match and merge one instruction into another, removing the need to generate the source instruction. Unfortunately, the way that I added cold-block support in #3698 was oblivious to this -- it just changed the block sort order. For efficiency reasons, we generate code in its final order directly, so it would not be tenable to generate it in e.g. RPO first and then reorder cold blocks to the bottom; we really do want to visit in the same order as the final code. This PR fixes the bug by moving the point at which cold blocks are sunk to emission-time instead. This is cheaper than either trying to visit blocks during lowering in RPO but add to VCode out-of-order, or trying to do some expensive analysis to recover proper liveness. It's not clear that the latter would be possible anyway -- the need to lower some instructions depends on other instructions' isel results/merging success, so we really do need to visit in RPO, and we can't simply lower all instructions as side-effecting roots (some can't be toplevel nodes). The one downside of this approach is that the VCode itself still has cold blocks inline; so in the text format (and hence compile-tests) it's not possible to see the sinking. This PR adds a test for cold-block sinking that actually verifies the machine code. (The test also includes an add-instruction in the cold path that would have been incorrectly skipped prior to this fix.) Fortunately this bug would not have been triggered by the one current use of cold blocks in #3699, because there the only operation in the cold block was an (always effectful) call instruction. The worst-case effect of the bug in other code would be a regalloc panic; no silent miscompilations could result.
Cranelift Code Generator
A Bytecode Alliance project
Cranelift is a low-level retargetable code generator. It translates a target-independent intermediate representation into executable machine code.
For more information, see the documentation.
For an example of how to use the JIT, see the JIT Demo, which implements a toy language.
For an example of how to use Cranelift to run WebAssembly code, see Wasmtime, which implements a standalone, embeddable, VM using Cranelift.
Status
Cranelift currently supports enough functionality to run a wide variety of programs, including all the functionality needed to execute WebAssembly MVP functions, although it needs to be used within an external WebAssembly embedding to be part of a complete WebAssembly implementation.
The x86-64 backend is currently the most complete and stable; other architectures are in various stages of development. Cranelift currently supports both the System V AMD64 ABI calling convention used on many platforms and the Windows x64 calling convention. The performance of code produced by Cranelift is not yet impressive, though we have plans to fix that.
The core codegen crates have minimal dependencies, support no_std mode (see below), and do not require any host floating-point support, and do not use callstack recursion.
Cranelift does not yet perform mitigations for Spectre or related security issues, though it may do so in the future. It does not currently make any security-relevant instruction timing guarantees. It has seen a fair amount of testing and fuzzing, although more work is needed before it would be ready for a production use case.
Cranelift's APIs are not yet stable.
Cranelift currently requires Rust 1.37 or later to build.
Contributing
If you're interested in contributing to Cranelift: thank you! We have a contributing guide which will help you getting involved in the Cranelift project.
Planned uses
Cranelift is designed to be a code generator for WebAssembly, but it is general enough to be useful elsewhere too. The initial planned uses that affected its design are:
- WebAssembly compiler for the SpiderMonkey engine in Firefox.
- Backend for the IonMonkey JavaScript JIT compiler in Firefox.
- Debug build backend for the Rust compiler.
- Wasmtime non-Web wasm engine.
Building Cranelift
Cranelift uses a conventional Cargo build process.
Cranelift consists of a collection of crates, and uses a Cargo
Workspace,
so for some cargo commands, such as cargo test, the --all is needed
to tell cargo to visit all of the crates.
test-all.sh at the top level is a script which runs all the cargo
tests and also performs code format, lint, and documentation checks.
Building with no_std
The following crates support `no_std`, although they do depend on liballoc:
- cranelift-entity
- cranelift-bforest
- cranelift-codegen
- cranelift-frontend
- cranelift-native
- cranelift-wasm
- cranelift-module
- cranelift-preopt
- cranelift
To use no_std mode, disable the std feature and enable the core feature. This currently requires nightly rust.
For example, to build `cranelift-codegen`:
cd cranelift-codegen
cargo build --no-default-features --features core
Or, when using cranelift-codegen as a dependency (in Cargo.toml):
[dependency.cranelift-codegen]
...
default-features = false
features = ["core"]
no_std support is currently "best effort". We won't try to break it, and we'll accept patches fixing problems, however we don't expect all developers to build and test no_std when submitting patches. Accordingly, the ./test-all.sh script does not test no_std.
There is a separate ./test-no_std.sh script that tests the no_std support in packages which support it.
It's important to note that cranelift still needs liballoc to compile. Thus, whatever environment is used must implement an allocator.
Also, to allow the use of HashMaps with no_std, an external crate called hashmap_core is pulled in (via the core feature). This is mostly the same as std::collections::HashMap, except that it doesn't have DOS protection. Just something to think about.
Log configuration
Cranelift uses the log crate to log messages at various levels. It doesn't
specify any maximal logging level, so embedders can choose what it should be;
however, this can have an impact of Cranelift's code size. You can use log
features to reduce the maximum logging level. For instance if you want to limit
the level of logging to warn messages and above in release mode:
[dependency.log]
...
features = ["release_max_level_warn"]
Editor Support
Editor support for working with Cranelift IR (clif) files: