Files

Alex Crichton 453330b2db x64: Add rudimentary support for some AVX instructions (#5795 )

* x64: Add rudimentary support for some AVX instructions

I was poking around Spidermonkey's wasm backend and saw that the various
assembler functions used are all `v*`-prefixed which look like they're
intended for use with AVX instructions. I looked at Cranelift and it
currently doesn't have support for many AVX-based instructions, so I
figured I'd take a crack at it!

The support added here is a bit of a mishmash when viewed alone, but my
general goal was to take a single instruction from the SIMD proposal for
WebAssembly and migrate all of its component instructions to AVX. I, by
random chance, picked a pretty complicated instruction of `f32x4.min`.
This wasm instruction is implemented on x64 with 4 unique SSE
instructions and ended up being a pretty good candidate.

Further digging about AVX-vs-SSE shows that there should be two major
benefits to using AVX over SSE:

* Primarily AVX instructions largely use a three-operand form where two
  input registers are operated with and an output register is also
  specified. This is in contrast to SSE's predominant
  one-register-is-input-but-also-output pattern. This should help free
  up the register allocator a bit and additionally remove the need for
  movement between registers.

* As #4767 notes the memory-based operations of VEX-encoded instructions
  (aka AVX instructions) do not have strict alignment requirements which
  means we would be able to sink loads and stores into individual
  instructions instead of having separate instructions.

So I set out on my journey to implement the instructions used by
`f32x4.min`. The first few were fairly easy. The machinst backends are
already of the shape "take these inputs and compute the output" where
the x86 requirement of a register being both input and output is
postprocessed in. This means that the `inst.isle` creation helpers for
SSE instructions were already of the correct form to use AVX. I chose to
add new `rule` branches for the instruction creation helpers, for
example `x64_andnps`. The new `rule` conditionally only runs if AVX is
enabled and emits an AVX instruction instead of an SSE instruction for
achieving the same goal. This means that no lowerings of clif
instructions were modified, instead just new instructions are being
generated.

The VEX encoding was previously not heavily used in Cranelift. The only
current user are the FMA-style instructions that Cranelift has at this
time. These FMA instructions have one extra operand than `vandnps`, for
example, so I split the existing `XmmRmRVex` into a few more variants to
fit the shape of the instructions that needed generating for
`f32x4.min`. This was accompanied then with more AVX opcode definitions,
more emission support, etc.

Upon implementing all of this it turned out that the test suite was
failing on my machine due to the memory-operand encodings of VEX
instructions not being supported. I didn't explicitly add those in
myself but some preexisting RIP-relative addressing was leaking into the
new instructions with existing tests. I opted to go ahead and fill out
the memory addressing modes of VEX encoding to get the tests passing
again.

All-in-all this PR adds new instructions to the x64 backend for a number
of AVX instructions, updates 5 existing instruction producers to use AVX
instructions conditionally, implements VEX memory operands, and adds
some simple tests for the new output of `f32x4.min`. The existing
runtest for `f32x.min` caught a few intermediate bugs along the way and
I additionally added a plain `target x86_64` to that runtest to ensure
that it executes with and without AVX to test the various lowerings.
I'll also note that this, and future support, should be well-fuzzed
through Wasmtime's fuzzing which may explicitly disable AVX support
despite the machine having access to AVX, so non-AVX lowerings should be
well-tested into the future.

It's also worth mentioning that I am not an AVX or VEX or x64 expert.
Implementing the memory operand part for VEX was the hardest part of
this PR and while I think it should be good someone else should
definitely double-check me. Additionally I haven't added many
instructions to the x64 backend yet so I may have missed obvious places
to tests or such, so am happy to follow-up with anything to be more
thorough if necessary.

Finally I should note that this is just the tip of the iceberg when it
comes to AVX. My hope is to get some of the idioms sorted out to make it
easier for future PRs to add one-off instruction lowerings or such.

* Review feedback

2023-02-17 01:29:55 +00:00

bforest

Bump Wasmtime to 7.0.0 (#5712 )

2023-02-06 09:10:19 -06:00

codegen

x64: Add rudimentary support for some AVX instructions (#5795 )

2023-02-17 01:29:55 +00:00

docs

Remove the unused sigid argument purpose (#5753 )

2023-02-09 09:18:39 -08:00

entity

Bump Wasmtime to 7.0.0 (#5712 )

2023-02-06 09:10:19 -06:00

filetests

x64: Add rudimentary support for some AVX instructions (#5795 )

2023-02-17 01:29:55 +00:00

frontend

Rework br_table to use BlockCall (#5731 )

2023-02-16 09:23:27 -08:00

fuzzgen

Rework br_table to use BlockCall (#5731 )

2023-02-16 09:23:27 -08:00

interpreter

Rework br_table to use BlockCall (#5731 )

2023-02-16 09:23:27 -08:00

isle

Some refactorings to the ISLE parser (#5693 )

2023-02-06 15:11:25 -08:00

jit

Add JITBuilder::with_flags constructor (#5751 )

2023-02-09 02:49:17 +00:00

media

Check in the Crane and Ferris drawing so that people can remix it :-).

2018-09-13 15:30:39 -07:00

module

Add JITBuilder::with_flags constructor (#5751 )

2023-02-09 02:49:17 +00:00

native

Bump Wasmtime to 7.0.0 (#5712 )

2023-02-06 09:10:19 -06:00

object

aarch64: Support GOT Relative relocations in PIC mode (#5550 )

2023-02-15 15:19:18 -08:00

reader

Rework br_table to use BlockCall (#5731 )

2023-02-16 09:23:27 -08:00

serde

Bump Wasmtime to 7.0.0 (#5712 )

2023-02-06 09:10:19 -06:00

src

Rework br_table to use BlockCall (#5731 )

2023-02-16 09:23:27 -08:00

tests

cranelift: Remove brz and brnz (#5630 )

2023-01-30 20:34:56 +00:00

umbrella

Bump Wasmtime to 7.0.0 (#5712 )

2023-02-06 09:10:19 -06:00

wasm

Rework br_table to use BlockCall (#5731 )

2023-02-16 09:23:27 -08:00

Cargo.toml

Use capstone to validate precise-output tests (#5780 )

2023-02-15 16:35:10 -08:00

README.md

Cranellift: remove Baldrdash support and related features. (#4571 )

2022-08-02 19:37:56 +00:00

rustc.md

Update outdated references to the Cranelift repository

2020-03-09 14:06:24 +01:00

README.md

Cranelift Code Generator

A Bytecode Alliance project

Cranelift is a low-level retargetable code generator. It translates a target-independent intermediate representation into executable machine code.

For more information, see the documentation.

For an example of how to use the JIT, see the JIT Demo, which implements a toy language.

For an example of how to use Cranelift to run WebAssembly code, see Wasmtime, which implements a standalone, embeddable, VM using Cranelift.

Status

Cranelift currently supports enough functionality to run a wide variety of programs, including all the functionality needed to execute WebAssembly (MVP and various extensions like SIMD), although it needs to be used within an external WebAssembly embedding such as Wasmtime to be part of a complete WebAssembly implementation. It is also usable as a backend for non-WebAssembly use cases: for example, there is an effort to build a Rust compiler backend using Cranelift.

Cranelift is production-ready, and is used in production in several places, all within the context of Wasmtime. It is carefully fuzzed as part of Wasmtime with differential comparison against V8 and the executable Wasm spec, and the register allocator is separately fuzzed with symbolic verification. There is an active effort to formally verify Cranelift's instruction-selection backends. We take security seriously and have a security policy as a part of Bytecode Alliance.

Cranelift has three backends: x86-64, aarch64 (aka ARM64), and s390x (aka IBM Z). All three backends fully support enough functionality for Wasm MVP, and x86-64 and aarch64 fully support SIMD as well. On x86-64, Cranelift supports both the System V AMD64 ABI calling convention used on many platforms and the Windows x64 calling convention. On aarch64, Cranelift supports the standard Linux calling convention and also has specific support for macOS (i.e., M1 / Apple Silicon).

Cranelift's code quality is within range of competitiveness to browser JIT engines' optimizing tiers. A recent paper includes third-party benchmarks of Cranelift, driven by Wasmtime, against V8 and an LLVM-based Wasm engine, WAVM (Fig 22). The speed of Cranelift's generated code is ~2% slower than that of V8 (TurboFan), and ~14% slower than WAVM (LLVM). Its compilation speed, in the same paper, is measured as approximately an order of magnitude faster than WAVM (LLVM). We continue to work to improve both measures.

The core codegen crates have minimal dependencies and are carefully written to handle malicious or arbitrary compiler input: in particular, they do not use callstack recursion.

Cranelift performs some basic mitigations for Spectre attacks on heap bounds checks, table bounds checks, and indirect branch bounds checks; see #1032 for more.

Cranelift's APIs are not yet considered stable, though we do follow semantic-versioning (semver) with minor-version patch releases.

Cranelift generally requires the latest stable Rust to build as a policy, and is tested as such, but we can incorporate fixes for compilation with older Rust versions on a best-effort basis.

Contributing

If you're interested in contributing to Cranelift: thank you! We have a contributing guide which will help you getting involved in the Cranelift project.

Planned uses

Cranelift is designed to be a code generator for WebAssembly, but it is general enough to be useful elsewhere too. The initial planned uses that affected its design were:

Wasmtime non-Web wasm engine.
Debug build backend for the Rust compiler.
WebAssembly compiler for the SpiderMonkey engine in Firefox (currently not planned anymore; SpiderMonkey team may re-assess in the future).
Backend for the IonMonkey JavaScript JIT compiler in Firefox (currently not planned anymore; SpiderMonkey team may re-assess in the future).

Building Cranelift

Cranelift uses a conventional Cargo build process.

Cranelift consists of a collection of crates, and uses a Cargo Workspace, so for some cargo commands, such as cargo test, the --all is needed to tell cargo to visit all of the crates.

test-all.sh at the top level is a script which runs all the cargo tests and also performs code format, lint, and documentation checks.

Log configuration

Cranelift uses the log crate to log messages at various levels. It doesn't specify any maximal logging level, so embedders can choose what it should be; however, this can have an impact of Cranelift's code size. You can use log features to reduce the maximum logging level. For instance if you want to limit the level of logging to warn messages and above in release mode:

[dependency.log]
...
features = ["release_max_level_warn"]

Editor Support

Editor support for working with Cranelift IR (clif) files:

Vim: https://github.com/bytecodealliance/cranelift.vim