Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList.
To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments:
inst_values - returns an iterator that traverses values in the instruction and block arguments
map_inst_values - applies a function to each value in the instruction and block arguments
overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* add clif-util compile option to output object file
* switch from a box to a borrow
* update objectmodule tests to use borrowed isa
* put targetisa into an arc
We have some operations defined on DataFlowGraph purely to work around borrow-checker issues with InstructionData and other data on DataFlowGraph. Part of the problem is that indexing the DFG directly hides the fact that we're only indexing the insts field of the DFG.
This PR makes the insts field of the DFG public, but wraps it in a newtype that only allows indexing. This means that the borrow checker is better able to tell when operations on memory held by the DFG won't conflict, which comes up frequently when mutating ValueLists held by InstructionData.
This adds support for `.wat` tests in `cranelift-filetest`. The test runner
translates the WAT to Wasm and then uses `cranelift-wasm` to translate the Wasm
to CLIF.
These tests are always precise output tests. The test expectations can be
updated by running tests with the `CRANELIFT_TEST_BLESS=1` environment variable
set, similar to our compile precise output tests. The test's expected output is
contained in the last comment in the test file.
The tests allow for configuring the kinds of heaps used to implement Wasm linear
memory via TOML in a `;;!` comment at the start of the test.
To get ISA and Cranelift flags parsing available in the filetests crate, I had
to move the `parse_sets_and_triple` helper from the `cranelift-tools` binary
crate to the `cranelift-reader` crate, where I think it logically
fits.
Additionally, I had to make some more bits of `cranelift-wasm`'s dummy
environment `pub` so that I could properly wrap and compose it with the
environment used for the `.wat` tests. I don't think this is a big deal, but if
we eventually want to clean this stuff up, we can probably remove the dummy
environments completely, remove `translate_module`, and fold them into these new
test environments and test runner (since Wasmtime isn't using those things
anyways).
* Cranelift: Make `heap_addr` return calculated `base + index + offset`
Rather than return just the `base + index`.
(Note: I've chosen to use the nomenclature "index" for the dynamic operand and
"offset" for the static immediate.)
This move the addition of the `offset` into `heap_addr`, instead of leaving it
for the subsequent memory operation, so that we can Spectre-guard the full
address, and not allow speculative execution to read the first 4GiB of memory.
Before this commit, we were effectively doing
load(spectre_guard(base + index) + offset)
Now we are effectively doing
load(spectre_guard(base + index + offset))
Finally, this also corrects `heap_addr`'s documented semantics to say that it
returns an address that will trap on access if `index + offset + access_size` is
out of bounds for the given heap, rather than saying that the `heap_addr` itself
will trap. This matches the implemented behavior for static memories, and after
https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked
on this commit) will also match the implemented behavior for dynamic memories.
* Update heap_addr docs
* Factor out `offset + size` to a helper
* cranelift: Remove iconst.i128
* bugpoint: Report Changed when only one instruction is mutated
* cranelift: Fix egraph bxor rule
* cranelift: Remove some simple_preopt opts for i128
Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements.
Fixes#3205
Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com>
Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
* cranelift: Change test runner order
Changes the ordering of runtests to run per target and then per function.
This change doesn't do a lot by itself, but helps future refactorings of runtests.
* cranelift: Rename SingleFunctionCompiler to TestCaseCompiler
* cranelift: Skip runtests per target instead of per run
* cranelift: Deduplicate test names
With the upcoming changes to the runtest infrastructure we require unique ExtNames for all tests.
Note that for test names we have a 16 character limit on test names, and must be unique within those 16 characters.
* cranelift: Add TestFileCompiler to runtests
TestFileCompiler allows us to compile the entire file once, and then call the trampolines for each test.
The previous code was compiling the function for each invocation of a test.
* cranelift: Deduplicate ExtName for avg_round tests
* cranelift: Rename functions as they are defined.
The JIT internally only deals with User functions, and cannot link test name funcs.
This also caches trampolines by signature.
* cranelift: Preserve original name when reporting errors.
* cranelift: Rename aarch64 test functions
* cranelift: Add `call` and `call_indirect` tests!
* cranelift: Add pauth runtests for aarch64
* cranelift: Rename duplicate s390x tests
* cranelift: Delete `i128_bricmp_of` function from i128-bricmp
It looks like we forgot to delete it when it was moved to
`i128-bricmp-overflow`, and since it didn't have a run invocation
it was never compiled.
However, s390x does not support this, and panics when lowering.
* cranelift: Add `colocated` call tests
* cranelift: Rename *more* `s390x` tests
* cranelift: Add pauth + sign_return_address call tests
* cranelift: Undeduplicate test names
With the latest main changes we now support *unlimited* length test names.
This commit reverts:
52274676ff631c630f9879dd32e756566d3e700f
7989edc172493547cdf63e180bb58365e8a43a42
25c8a8395527d98976be6a34baa3b0b214776739
792e8cfa8f748077f9d80fe7ee5e958b7124e83b
* cranelift: Add LibCall tests
* cranelift: Revert more test names
These weren't auto reverted by the previous revert.
* cranelift: Disable libcall tests for aarch64
* cranelift: Runtest fibonacci tests
* cranelift: Misc cleanup
When trying to read generated CLIF, it's nice to be able to see at a
glance that some of the operands are defined by `iconst` and similar
instructions, without having to go find each operand's definition
manually.
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris, `Function` has been split into mostly two parts:
- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.
Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
- `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
- The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.
The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.
A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.
Fixes#4155.
* cranelift: Use JIT in runtests
Using `cranelift-jit` in run tests allows us to preform relocations and
libcalls. This is important since some instruction lowerings fallback
to libcall's when an extension is missing, or when it's too complicated
to implement manually.
This is also a first step to being able to test `call`'s between functions
in the runtest suite. It should also make it easier to eventually test
TLS relocations, symbol resolution and ABI's.
Another benefit of this is that we also get to test the JIT more, since
it now runs the runtests, and gets some fuzzing via `fuzzgen` (which
uses the `SingleFunctionCompiler`).
This change causes regressions in terms of runtime for the filetests.
I haven't done any serious benchmarking but what I've been seeing is
that it now takes about ~3 seconds to run the testsuite while it
previously took around 2 seconds.
* Add FMA tests for X86
* Cranellift: remove Baldrdash support and related features.
As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.
In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.
This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425
* Review feedback.
* Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations.
The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.
* Review feedback.
* Move `emit_to_memory` to `MachCompileResult`
This small refactoring makes it clearer to me that emitting to memory
doesn't require anything else from the compilation `Context`. While it's
a trivial change, it's a small public API change that shouldn't cause
too much trouble, and doesn't seem RFC-worthy. Happy to hear different
opinions about this, though!
* hide the MachCompileResult behind a method
* Add a `CompileError` wrapper type that references a `Function`
* Rename MachCompileResult to CompiledCode
* Additionally remove the last unsafe API in cranelift-codegen
* Improve cranelift disassembly of stack maps
Print out extra information about stack maps such as their contents and
other related metadata available. Additionally also print out addresses
in hex to line up with the disassembly otherwise printed as well.
* Improve the `table_ops` fuzzer
* Generate more instructions by default
* Fix negative indices appearing in `table.{get,set}`
* Assert that the traps generated are expected to prevent accidental
other errors reporting a fuzzing success.
* Fix `reftype_vregs` reported to `regalloc2`
This fixes a mistake in the register allocation of Cranelift functions
where functions using reference-typed arguments incorrectly report which
virtual registers are reference-typed values if there are vreg aliases
in play. The fix here is to apply the vreg aliases to the final list of
reftyped regs which is eventually passed to `regalloc2`.
The main consequence of this fix is that functions which previously
accidentally didn't have correct stack maps should now have the missing
stack maps.
* Add a test that `table_ops` gc's eventually
* Add a comment about new alias resolution
* Update crates/fuzzing/src/oracles.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Add some comments
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
When parsing isa specific values we were accidentally discarding the
value of the flag, and treating it always as a boolean flag.
This would cause a `clif-util` invocation such as
`cargo run -- compile -D --set has_sse41=false --target x86_64 test.clif`
to be interpreted as `--set has_sse41` and enable that feature instead
of disabling it.
Rather than sometimes using `file-per-thread-logger`.
Also remove the debug CLI flags, so that we can always just define
`RUST_LOG=...` to get logging and don't need to also do other things.
Introduce a new concept in the IR that allows a producer to create
dynamic vector types. An IR function can now contain global value(s)
that represent a dynamic scaling factor, for a given fixed-width
vector type. A dynamic type is then created by 'multiplying' the
corresponding global value with a fixed-width type. These new types
can be used just like the existing types and the type system has a
set of hard-coded dynamic types, such as I32X4XN, which the user
defined types map onto. The dynamic types are also used explicitly
to create dynamic stack slots, which have no set size like their
existing counterparts. New IR instructions are added to access these
new stack entities.
Currently, during codegen, the dynamic scaling factor has to be
lowered to a constant so the dynamic slots do eventually have a
compile-time known size, as do spill slots.
The current lowering for aarch64 just targets Neon, using a dynamic
scale of 1.
Copyright (c) 2022, Arm Limited.
* Remove unused srcloc in MachReloc
* Remove unused srcloc in MachTrap
* Use `into_iter` on array in bench code to suppress a warning
* Remove unused srcloc in MachCallSite
* Update to clap 3.0
This commit migrates all CLI commands internally used in this project
from structopt/clap2 to clap 3. The intent here is to ensure that we're
using maintained versions of the dependencies as structopt and clap 2
are less maintained nowadays. Most transitions were pretty
straightforward and mostly dealing with structopt/clap3 differences.
* Fix a number of `cargo deny` errors
This commit fixes a few errors around duplicate dependencies which
arose from the prior update to clap3. This also uses a new feature in
`deny.toml`, `skip-tree`, which allows having a bit more targeted
ignores for skips of duplicate version checks. This showed a few more
locations in Wasmtime itself where we could update some dependencies.
Addresses #3809: when we are asked to create a Cranelift backend with
shared flags that indicate support for SIMD, we should check that the
ISA level needed for our SIMD lowerings is present.
Peepmatic was an early attempt at a DSL for peephole optimizations, with the
idea that maybe sometime in the future we could user it for instruction
selection as well. It didn't really pan out, however:
* Peepmatic wasn't quite flexible enough, and adding new operators or snippets
of code implemented externally in Rust was a bit of a pain.
* The performance was never competitive with the hand-written peephole
optimizers. It was *very* size efficient, but that came at the cost of
run-time efficiency. Everything was table-based and interpreted, rather than
generating any Rust code.
Ultimately, because of these reasons, we never turned Peepmatic on by default.
These days, we just landed the ISLE domain-specific language, and it is better
suited than Peepmatic for all the things that Peepmatic was originally designed
to do. It is more flexible and easy to integrate with external Rust code. It is
has better time efficiency, meeting or even beating hand-written code. I think a
small part of the reason why ISLE excels in these things is because its design
was informed by Peepmatic's failures. I still plan on continuing Peepmatic's
mission to make Cranelift's peephole optimizer passes generated from DSL rewrite
rules, but using ISLE instead of Peepmatic.
Thank you Peepmatic, rest in peace!
This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate.
Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!
Cranelift crates have historically been much more verbose with debug-level
logging than most other crates in the Rust ecosystem. We log things like how
many parameters a basic block has, the color of virtual registers during
regalloc, etc. Even for Cranelift hackers, these things are largely only useful
when hacking specifically on Cranelift and looking at a particular test case,
not even when using some Cranelift embedding (such as Wasmtime).
Most of the time, when people want logging for their Rust programs, they do
something like:
RUST_LOG=debug cargo run
This means that they get all that mostly not useful debug logging out of
Cranelift. So they might want to disable logging for Cranelift, or change it to
a higher log level:
RUST_LOG=debug,cranelift=info cargo run
The problem is that this is already more annoying to type that `RUST_LOG=debug`,
and that Cranelift isn't one single crate, so you actually have to play
whack-a-mole with naming all the Cranelift crates off the top of your head,
something more like this:
RUST_LOG=debug,cranelift=info,cranelift_codegen=info,cranelift_wasm=info,...
Therefore, we're changing most of the `debug!` logs into `trace!` logs: anything
that is very Cranelift-internal, unlikely to be useful/meaningful to the
"average" Cranelift embedder, or prints a message for each instruction visited
during a pass. On the other hand, things that just report a one line statistic
for a whole pass, for example, are left as `debug!`. The more verbose the log
messages are, the higher the bar they must clear to be `debug!` rather than
`trace!`.
* Consume fuel during function execution
This commit adds codegen infrastructure necessary to instrument wasm
code to consume fuel as it executes. Currently nothing is really done
with the fuel, but that'll come in later commits.
The focus of this commit is to implement the codegen infrastructure
necessary to consume fuel and account for fuel consumed correctly.
* Periodically check remaining fuel in wasm JIT code
This commit enables wasm code to periodically check to see if fuel has
run out. When fuel runs out an intrinsic is called which can do what it
needs to do in the result of fuel running out. For now a trap is thrown
to have at least some semantics in synchronous stores, but another
planned use for this feature is for asynchronous stores to periodically
yield back to the host based on fuel running out.
Checks for remaining fuel happen in the same locations as interrupt
checks, which is to say the start of the function as well as loop
headers.
* Improve codegen by caching `*const VMInterrupts`
The location of the shared interrupt value and fuel value is through a
double-indirection on the vmctx (load through the vmctx and then load
through that pointer). The second pointer in this chain, however, never
changes, so we can alter codegen to account for this and remove some
extraneous load instructions and hopefully reduce some register
pressure even maybe.
* Add tests fuel can abort infinite loops
* More fuzzing with fuel
Use fuel to time out modules in addition to time, using fuzz input to
figure out which.
* Update docs on trapping instructions
* Fix doc links
* Fix a fuzz test
* Change setting fuel to adding fuel
* Fix a doc link
* Squelch some rustdoc warnings
This will hopefully remove a small thorn in our side with periodic
nightly breakage due to nightly features changing. This commit moves
lightbeam to stable Rust, swapping out `staticvec` for `arrayvec` and
otherwise updating some dependencies (namely `dynasm`) to compile with
stable.
This then also updates CI appropriately to not use a pinned nightly and
instead us a floating `nightly` channel so we can head off any breakage
coming up ASAP.
* Rewrite interpreter generically
This change re-implements the Cranelift interpreter to use generic values; this makes it possible to do abstract interpretation of Cranelift instructions. In doing so, the interpretation state is extracted from the `Interpreter` structure and is accessed via a `State` trait; this makes it possible to not only more clearly observe the interpreter's state but also to interpret using a dummy state (e.g. `ImmutableRegisterState`). This addition made it possible to implement more of the Cranelift instructions (~70%, ignoring the x86-specific instructions).
* Replace macros with closures
* clif-util: do not convert `anyhow::Error`s into strings into `anyhow::Error`s
* filetests: Use the debug formatting of `anyhow::Error`s
This provides the full error context, not just the source error's message.
This allows for more flexibility of when/where to harvest LHS candidates. For
example, we could choose to harvest candidates that overlap with and supercede
our current preopt peepholes.
This commit also makes sure that we compute the CFG before running preopt, when
harvesting LHS candidates via `clif-util souper-harvest`.
Given a clif function, harvest all its integer subexpressions, so that they can
be fed into [Souper](https://github.com/google/souper) as candidates for
superoptimization. For some of these candidates, Souper will successfully
synthesize a right-hand side that is equivalent but has lower cost than the
left-hand side. Then, we can combine these left- and right-hand sides into a
complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value `x`, we do a post-order
traversal of the dataflow graph starting from `x`. As we do this traversal, we
maintain a map from clif values to their translated Souper values. We stop
traversing when we reach anything that can't be translated into Souper IR: a
memory load, a float-to-int conversion, a block parameter, etc. For values
produced by these instructions, we create a Souper `var`, which is an input
variable to the optimization. For instructions that have a direct mapping into
Souper IR, we get the Souper version of each of its operands and then create the
Souper version of the instruction itself. It should now be clear why we do a
post-order traversal: we need an instruction's translated operands in order to
translate the instruction itself. Once this instruction is translated, we update
the clif-to-souper map with this new translation so that any other instruction
that uses this result as an operand has access to the translated value. When the
traversal is complete we return the translation of `x` as the root of left-hand
side candidate.