Files

Jamey Sharp ad050e6fb2 cranelift-wasm: Only allocate if vectors need bitcasts (#4543 )

For wasm programs using SIMD vector types, the type known at function
entry or exit may not match the type used within the body of the
function, so we have to bitcast them. We have to check all calls and
returns for this condition, which involves comparing a subset of a
function's signature with the CLIF types we're trying to use. Currently,
this check heap-allocates a short-lived Vec for the appropriate subset
of the signature.

But most of the time none of the values need a bitcast. So this patch
avoids allocating unless at least one bitcast is actually required, and
only saves the pointers to values that need fixing up. I leaned heavily
on iterators to keep space usage constant until we discover allocation
is necessary after all.

I don't think it's possible to eliminate the allocation entirely,
because the function signature we're examining is borrowed from the
FuncBuilder, but we need to mutably borrow the FuncBuilder to insert the
bitcast instructions. Fortunately, the FromIterator implementation for
Vec doesn't reserve any heap space if the iterator is empty, so we can
unconditionally collect into a Vec and still avoid unnecessary
allocations.

Since the relationship between a function signature and a list of CLIF
values is somewhat complicated, I refactored all the uses of
`bitcast_arguments` and `wasm_param_types`. Instead there's
`bitcast_wasm_params` and `bitcast_wasm_returns` which share a helper
that combines the previous pair of functions into one.

According to DHAT, when compiling the Sightglass Spidermonkey benchmark,
this avoids 52k allocations averaging about 9 bytes each, which are
freed on average within 2k instructions.

Most Sightglass benchmarks, including Spidermonkey, show no performance
difference with this change. Only one has a slowdown, and it's small:

compilation :: nanoseconds :: benchmarks/shootout-matrix/benchmark.wasm

  Δ = 689373.34 ± 593720.78 (confidence = 99%)

  lazy-bitcast.so is 0.94x to 1.00x faster than main-e121c209f.so!
  main-e121c209f.so is 1.00x to 1.06x faster than lazy-bitcast.so!

  [19741713 21375844.46 32976047] lazy-bitcast.so
  [19345471 20686471.12 30872267] main-e121c209f.so

But several Sightglass benchmarks have notable speed-ups, with smaller
improvements for shootout-ed25519, meshoptimizer, and pulldown-cmark,
and larger ones as follows:

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 20071545.47 ± 2950014.92 (confidence = 99%)

  lazy-bitcast.so is 1.26x to 1.36x faster than main-e121c209f.so!
  main-e121c209f.so is 0.73x to 0.80x faster than lazy-bitcast.so!

  [55995164 64849257.68 89083031] lazy-bitcast.so
  [79382460 84920803.15 98016185] main-e121c209f.so

compilation :: nanoseconds :: benchmarks/blake3-scalar/benchmark.wasm

  Δ = 16620780.61 ± 5395162.13 (confidence = 99%)

  lazy-bitcast.so is 1.14x to 1.28x faster than main-e121c209f.so!
  main-e121c209f.so is 0.77x to 0.88x faster than lazy-bitcast.so!

  [54604352 79877776.35 103666598] lazy-bitcast.so
  [94011835 96498556.96 106200091] main-e121c209f.so

compilation :: nanoseconds :: benchmarks/intgemm-simd/benchmark.wasm

  Δ = 36891254.34 ± 12403663.50 (confidence = 99%)

  lazy-bitcast.so is 1.12x to 1.24x faster than main-e121c209f.so!
  main-e121c209f.so is 0.79x to 0.90x faster than lazy-bitcast.so!

  [131610845 201289587.88 247341883] lazy-bitcast.so
  [232065032 238180842.22 250957563] main-e121c209f.so

2022-07-27 16:41:30 -07:00

bforest

Bump Wasmtime to 0.40.0 (#4378 )

2022-07-05 09:10:52 -05:00

codegen

cranelift: Align Scalar and SIMD shift semantics (#4520 )

2022-07-27 17:54:00 +00:00

docs

Update Cranelift-ISLE integration docs to reflect no more checked-in code. (#4229 )

2022-06-06 12:54:23 -07:00

entity

Add initial support for fused adapter trampolines (#4501 )

2022-07-25 23:13:26 +00:00

filetests

cranelift: Align Scalar and SIMD shift semantics (#4520 )

2022-07-27 17:54:00 +00:00

frontend

cranelift-frontend: Reuse visited block sets in SSABuilder::can_optimize_var_lookup (#4536 )

2022-07-26 18:38:24 -07:00

fuzzgen

fuzzgen: Add i128 support (#4529 )

2022-07-26 09:40:12 -07:00

interpreter

cranelift: Fix urem/srem in interpreter (#4532 )

2022-07-27 10:47:08 -07:00

isle

Update miette dependency to 5.1 (#4412 )

2022-07-07 22:20:09 +00:00

jit

Bump Wasmtime to 0.40.0 (#4378 )

2022-07-05 09:10:52 -05:00

media

Check in the Crane and Ferris drawing so that people can remix it :-).

2018-09-13 15:30:39 -07:00

module

Update some dependency versions used by Wasmtime (#4405 )

2022-07-07 18:47:39 +00:00

native

x64: Implement SIMD fma (#4474 )

2022-07-25 22:01:02 +00:00

object

Update some dependency versions used by Wasmtime (#4405 )

2022-07-07 18:47:39 +00:00

preopt

Bump Wasmtime to 0.40.0 (#4378 )

2022-07-05 09:10:52 -05:00

reader

Allow 64-bit vectors and implement for interpreter (#4509 )

2022-07-25 13:00:43 -07:00

serde

Migrate most of wasmtime from lazy_static to once_cell (#4368 )

2022-07-05 10:52:48 -07:00

src

Merge pull request from GHSA-5fhj-g3p3-pq9g

2022-07-20 11:52:23 -05:00

tests

Remove the old x86 backend

2021-09-29 16:13:46 +02:00

umbrella

Bump Wasmtime to 0.40.0 (#4378 )

2022-07-05 09:10:52 -05:00

wasm

cranelift-wasm: Only allocate if vectors need bitcasts (#4543 )

2022-07-27 16:41:30 -07:00

Cargo.toml

Add *.wast support for invoking components (#4526 )

2022-07-27 21:02:16 +00:00

README.md

Update Cranelift README. (#4220 )

2022-06-03 16:02:57 -07:00

rustc.md

Update outdated references to the Cranelift repository

2020-03-09 14:06:24 +01:00

spidermonkey.md

Convert top-level *.rst files to markdown.

2018-07-17 15:01:08 -07:00

README.md

Cranelift Code Generator

A Bytecode Alliance project

Cranelift is a low-level retargetable code generator. It translates a target-independent intermediate representation into executable machine code.

For more information, see the documentation.

For an example of how to use the JIT, see the JIT Demo, which implements a toy language.

For an example of how to use Cranelift to run WebAssembly code, see Wasmtime, which implements a standalone, embeddable, VM using Cranelift.

Status

Cranelift currently supports enough functionality to run a wide variety of programs, including all the functionality needed to execute WebAssembly (MVP and various extensions like SIMD), although it needs to be used within an external WebAssembly embedding such as Wasmtime to be part of a complete WebAssembly implementation. It is also usable as a backend for non-WebAssembly use cases: for example, there is an effort to build a Rust compiler backend using Cranelift.

Cranelift is production-ready, and is used in production in several places, all within the context of Wasmtime. It is carefully fuzzed as part of Wasmtime with differential comparison against V8 and the executable Wasm spec, and the register allocator is separately fuzzed with symbolic verification. There is an active effort to formally verify Cranelift's instruction-selection backends. We take security seriously and have a security policy as a part of Bytecode Alliance.

Cranelift has three backends: x86-64, aarch64 (aka ARM64), and s390x (aka IBM Z). All three backends fully support enough functionality for Wasm MVP, and x86-64 and aarch64 fully support SIMD as well. On x86-64, Cranelift supports both the System V AMD64 ABI calling convention used on many platforms and the Windows x64 calling convention. On aarch64, Cranelift supports the standard Linux calling convention and also has specific support for macOS (i.e., M1 / Apple Silicon).

Cranelift's code quality is within range of competitiveness to browser JIT engines' optimizing tiers. A recent paper includes third-party benchmarks of Cranelift, driven by Wasmtime, against V8 and an LLVM-based Wasm engine, WAVM (Fig 22). The speed of Cranelift's generated code is ~2% slower than that of V8 (TurboFan), and ~14% slower than WAVM (LLVM). Its compilation speed, in the same paper, is measured as approximately an order of magnitude faster than WAVM (LLVM). We continue to work to improve both measures.

The core codegen crates have minimal dependencies and are carefully written to handle malicious or arbitrary compiler input: in particular, they do not use callstack recursion.

Cranelift performs some basic mitigations for Spectre attacks on heap bounds checks, table bounds checks, and indirect branch bounds checks; see #1032 for more.

Cranelift's APIs are not yet considered stable, though we do follow semantic-versioning (semver) with minor-version patch releases.

Cranelift generally requires the latest stable Rust to build as a policy, and is tested as such, but we can incorporate fixes for compilation with older Rust versions on a best-effort basis.

Contributing

If you're interested in contributing to Cranelift: thank you! We have a contributing guide which will help you getting involved in the Cranelift project.

Planned uses

Cranelift is designed to be a code generator for WebAssembly, but it is general enough to be useful elsewhere too. The initial planned uses that affected its design are:

Building Cranelift

Cranelift uses a conventional Cargo build process.

Cranelift consists of a collection of crates, and uses a Cargo Workspace, so for some cargo commands, such as cargo test, the --all is needed to tell cargo to visit all of the crates.

test-all.sh at the top level is a script which runs all the cargo tests and also performs code format, lint, and documentation checks.

Log configuration

Cranelift uses the log crate to log messages at various levels. It doesn't specify any maximal logging level, so embedders can choose what it should be; however, this can have an impact of Cranelift's code size. You can use log features to reduce the maximum logging level. For instance if you want to limit the level of logging to warn messages and above in release mode:

[dependency.log]
...
features = ["release_max_level_warn"]

Editor Support

Editor support for working with Cranelift IR (clif) files:

Vim: https://github.com/bytecodealliance/cranelift.vim