* Replace resize+copy_from_slice with extend_from_slice
Vec::resize initializes the new space, which is wasted effort if we're
just going to call `copy_from_slice` on it immediately afterward. Using
`extend_from_slice` is simpler, and very slightly faster.
If the new size were bigger than the buffer we're copying from, then it
would make sense to initialize the excess. But it isn't: it's always
exactly the same size.
* Move helpers from Context to CompiledCode
These methods only use information from Context::compiled_code, so they
should live on CompiledCode instead.
* Remove an unnecessary #[cfg_attr]
There are other uses of `#[allow(clippy::too_many_arguments)]` in this
file, so apparently it doesn't need to be guarded by the "cargo-clippy"
feature.
* Fix a few comments
Two of these were wrong/misleading:
- `FunctionBuilder::new` does not clear the provided func_ctx. It does
debug-assert that the context is already clear, but I don't think
that's worth a comment.
- `switch_to_block` does not "create values for the arguments." That's
done by the combination of `append_block_params_for_function_params`
and `declare_wasm_parameters`.
* wasmtime-cranelift: Misc cleanups
The main change is to use the `CompiledCode` reference we already had
instead of getting it out of `Context` repeatedly. This removes a bunch
of `unwrap()` calls.
* wasmtime-cranelift: Factor out uncached compile
* Initial forward-edge CFI implementation
Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.
Copyright (c) 2022, Arm Limited.
* Refactor `from_artifacts` to avoid second `make_executable` (#1)
This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.
* Address the code review feedback
Copyright (c) 2022, Arm Limited.
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Add a function_alignment function to the TargetIsa trait, and use it to align functions when generating objects. Additionally, collect the maximum alignment required for pc-relative constants in functions and pass that value out. Use the max of these two values when padding functions for alignment.
This fixes a bug on x86_64 where rip-relative loads to sse registers could cause a segfault, as functions weren't always guaranteed to be aligned to 16-byte addresses.
Fixes#4812
This fixes two problems: minimum symbol alignment for the LARL
instruction, and alignment requirements for LRL/LGRL etc.
The first problem is that the LARL instruction used to load a
symbol address (PC relative) requires that the target symbol
is at least 2-byte aligned. This is always guaranteed for code
symbols (all instructions must be 2-aligned anyway), but not
necessarily for data symbols.
Other s390x compilers fix this problem by ensuring that all
global symbols are always emitted with a minimum 2-byte
alignment. This patch introduces an equivalent mechanism
for cranelift:
- Add a symbol_alignment routine to TargetIsa, similar to the
existing code_section_alignment routine.
- Respect symbol_alignment as minimum alignment for all symbols
emitted in the object backend (code and data).
The second problem is that PC-relative instructions that
directly *access* data (like LRL/LGRL, STRL/STGRL etc.)
not only have the 2-byte requirement like LARL, but actually
require that their memory operand is *naturally* aligned
(i.e. alignment is at least the size of the access).
This property (natural alignment for memory accesses) is
supposed to be provided by the "aligned" flag in MemFlags;
however, this is not implemented correctly at the moment.
To fix this, this patch:
- Only emits PC-relative memory access instructions if the
"aligned" flag is set in the associated MemFlags.
- Fixes a bug in emit_small_memory_copy and emit_small_memset
which currently set the aligned flag unconditionally, ignoring
the actual alignment info passed by their caller.
Tested with wasmtime and cg_clif.
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris, `Function` has been split into mostly two parts:
- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.
Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
- `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
- The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.
The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.
A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.
Fixes#4155.
* Move `emit_to_memory` to `MachCompileResult`
This small refactoring makes it clearer to me that emitting to memory
doesn't require anything else from the compilation `Context`. While it's
a trivial change, it's a small public API change that shouldn't cause
too much trouble, and doesn't seem RFC-worthy. Happy to hear different
opinions about this, though!
* hide the MachCompileResult behind a method
* Add a `CompileError` wrapper type that references a `Function`
* Rename MachCompileResult to CompiledCode
* Additionally remove the last unsafe API in cranelift-codegen
Introduce a new concept in the IR that allows a producer to create
dynamic vector types. An IR function can now contain global value(s)
that represent a dynamic scaling factor, for a given fixed-width
vector type. A dynamic type is then created by 'multiplying' the
corresponding global value with a fixed-width type. These new types
can be used just like the existing types and the type system has a
set of hard-coded dynamic types, such as I32X4XN, which the user
defined types map onto. The dynamic types are also used explicitly
to create dynamic stack slots, which have no set size like their
existing counterparts. New IR instructions are added to access these
new stack entities.
Currently, during codegen, the dynamic scaling factor has to be
lowered to a constant so the dynamic slots do eventually have a
compile-time known size, as do spill slots.
The current lowering for aarch64 just targets Neon, using a dynamic
scale of 1.
Copyright (c) 2022, Arm Limited.
This PR switches Cranelift over to the new register allocator, regalloc2.
See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801)
for a summary of the design changes. This switchover has implications for
core VCode/MachInst types and the lowering pass.
Overall, this change brings improvements to both compile time and speed of
generated code (runtime), as reported in #3942:
```
Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A
```
Addresses #3809: when we are asked to create a Cranelift backend with
shared flags that indicate support for SIMD, we should check that the
ISA level needed for our SIMD lowerings is present.
In #3721, we have been discussing what to do about the ARM32 backend in
Cranelift. Currently, this backend supports only 32-bit types, which is
insufficient for full Wasm-MVP; it's missing other critical bits, like
floating-point support; and it has only ever been exercised, AFAIK, via
the filetests for the individual CLIF instructions that are implemented.
We were very very thankful for the original contribution of this
backend, even in its partial state, and we had hoped at the time that we
could eventually mature it in-tree until it supported e.g. Wasm and
other use-cases. But that hasn't yet happened -- to the blame of no-one,
to be clear, we just haven't had a contributor with sufficient time.
Unfortunately, the existence of the backend and lack of active
maintainer now potentially pose a bit of a burden as we hope to make
continuing changes to the backend framework. For example, the ISLE
migration, and the use of regalloc2 that it will allow, would need all
of the existing lowering patterns in the hand-written ARM32 backend to
be rewritten as ISLE rules.
Given that we don't currently have the resources to do this, we think
it's probably best if we, sadly, for now remove this partial backend.
This is not in any way a statement of what we might accept in the
future, though. If, in the future, an ARM32 backend updated to our
latest codebase with an active maintainer were to appear, we'd be happy
to merge it (and likewise for any other architecture!). But for now,
this is probably the best path. Thanks again to the original contributor
@jmkrauz and we hope that this work can eventually be brought back and
reused if someone has the time to do so!
Some platforms such as AArch64 Linux support different memory page
sizes, so we need to be conservative when choosing the code section
alignment (which is equal to the page size) by using the maximum.
Copyright (c) 2021, Arm Limited.
This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate.
Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!
These backends will be removed in the future (see
bytecodealliance/rfcs#12 and the pending #3009 in this repo).
In the meantime, to more clearly communicate that they are using
"legacy" APIs and will eventually be removed, this PR places them in an
`isa/legacy/` subdirectory. No functional changes otherwise.
This adds support for the IBM z/Architecture (s390x-ibm-linux).
The status of the s390x backend in its current form is:
- Wasmtime is fully functional and passes all tests on s390x.
- All back-end features supported, with the exception of SIMD.
- There is still a lot of potential for performance improvements.
- Currently the only supported processor type is z15.
* Fully support multiple returns in Wasmtime
For quite some time now Wasmtime has "supported" multiple return values,
but only in the mose bare bones ways. Up until recently you couldn't get
a typed version of functions with multiple return values, and never have
you been able to use `Func::wrap` with functions that return multiple
values. Even recently where `Func::typed` can call functions that return
multiple values it uses a double-indirection by calling a trampoline
which calls the real function.
The underlying reason for this lack of support is that cranelift's ABI
for returning multiple values is not possible to write in Rust. For
example if a wasm function returns two `i32` values there is no Rust (or
C!) function you can write to correspond to that. This commit, however
fixes that.
This commit adds two new ABIs to Cranelift: `WasmtimeSystemV` and
`WasmtimeFastcall`. The intention is that these Wasmtime-specific ABIs
match their corresponding ABI (e.g. `SystemV` or `WindowsFastcall`) for
everything *except* how multiple values are returned. For multiple
return values we simply define our own version of the ABI which Wasmtime
implements, which is that for N return values the first is returned as
if the function only returned that and the latter N-1 return values are
returned via an out-ptr that's the last parameter to the function.
These custom ABIs provides the ability for Wasmtime to bind these in
Rust meaning that `Func::wrap` can now wrap functions that return
multiple values and `Func::typed` no longer uses trampolines when
calling functions that return multiple values. Although there's lots of
internal changes there's no actual changes in the API surface area of
Wasmtime, just a few more impls of more public traits which means that
more types are supported in more places!
Another change made with this PR is a consolidation of how the ABI of
each function in a wasm module is selected. The native `SystemV` ABI,
for example, is more efficient at returning multiple values than the
wasmtime version of the ABI (since more things are in more registers).
To continue to take advantage of this Wasmtime will now classify some
functions in a wasm module with the "fast" ABI. Only functions that are
not reachable externally from the module are classified with the fast
ABI (e.g. those not exported, used in tables, or used with `ref.func`).
This should enable purely internal functions of modules to have a faster
calling convention than those which might be exposed to Wasmtime itself.
Closes#1178
* Tweak some names and add docs
* "fix" lightbeam compile
* Fix TODO with dummy environ
* Unwind info is a property of the target, not the ABI
* Remove lightbeam unused imports
* Attempt to fix arm64
* Document new ABIs aren't stable
* Fix filetests to use the right target
* Don't always do 64-bit stores with cranelift
This was overwriting upper bits when 32-bit registers were being stored
into return values, so fix the code inline to do a sized store instead
of one-size-fits-all store.
* At least get tests passing on the old backend
* Fix a typo
* Add some filetests with mixed abi calls
* Get `multi` example working
* Fix doctests on old x86 backend
* Add a mixture of wasmtime/system_v tests
This PR switches the default backend on x86, for both the
`cranelift-codegen` crate and for Wasmtime, to the new
(`MachInst`-style, `VCode`-based) backend that has been under
development and testing for some time now.
The old backend is still available by default in builds with the
`old-x86-backend` feature, or by requesting `BackendVariant::Legacy`
from the appropriate APIs.
As part of that switch, it adds some more runtime-configurable plumbing
to the testing infrastructure so that tests can be run using the
appropriate backend. `clif-util test` is now capable of parsing a
backend selector option from filetests and instantiating the correct
backend.
CI has been updated so that the old x86 backend continues to run its
tests, just as we used to run the new x64 backend separately.
At some point, we will remove the old x86 backend entirely, once we are
satisfied that the new backend has not caused any unforeseen issues and
we do not need to revert.
This commit changes how both the shared flags and ISA flags are stored in the
serialized module to detect incompatibilities when a serialized module is
instantiated.
It improves the error reporting when a compiled module has mismatched shared
flags.
This commit adds the `wasmtime settings` command to print out available
Cranelift settings for a target (defaults to the host).
The compile command has been updated to remove the Cranelift ISA options in
favor of encouraging users to use `wasmtime settings` to discover what settings
are available. This will reduce the maintenance cost for syncing the compile
command with Cranelift ISA flags.
This commit adds a `compile` command to the Wasmtime CLI.
The command can be used to Ahead-Of-Time (AOT) compile WebAssembly modules.
With the `all-arch` feature enabled, AOT compilation can be performed for
non-native architectures (i.e. cross-compilation).
The `Module::compile` method has been added to perform AOT compilation.
A few of the CLI flags relating to "on by default" Wasm features have been
changed to be "--disable-XYZ" flags.
A simple example of using the `wasmtime compile` command:
```text
$ wasmtime compile input.wasm
$ wasmtime input.cwasm
```
* Remove some uses of riscv in tests
* Fix typo
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Benjamin Bouvier <public@benj.me>
With `Module::{serialize,deserialize}` it should be possible to share
wasmtime modules across machines or CPUs. Serialization, however, embeds
a hash of all configuration values, including cranelift compilation
settings. By default wasmtime's selection of the native ISA would enable
ISA flags according to CPU features available on the host, but the same
CPU features may not be available across two machines.
This commit adds a `Config::cranelift_clear_cpu_flags` method which
allows clearing the target-specific ISA flags that are automatically
inferred by default for the native CPU. Options can then be
incrementally built back up as-desired with teh `cranelift_other_flag`
method.
This PR propagates "value labels" all the way from CLIF to DWARF
metadata on the emitted machine code. The key idea is as follows:
- Translate value-label metadata on the input into "value_label"
pseudo-instructions when lowering into VCode. These
pseudo-instructions take a register as input, denote a value label,
and semantically are like a "move into value label" -- i.e., they
update the current value (as seen by debugging tools) of the given
local. These pseudo-instructions emit no machine code.
- Perform a dataflow analysis *at the machine-code level*, tracking
value-labels that propagate into registers and into [SP+constant]
stack storage. This is a forward dataflow fixpoint analysis where each
storage location can contain a *set* of value labels, and each value
label can reside in a *set* of storage locations. (Meet function is
pairwise intersection by storage location.)
This analysis traces value labels symbolically through loads and
stores and reg-to-reg moves, so it will naturally handle spills and
reloads without knowing anything special about them.
- When this analysis converges, we have, at each machine-code offset, a
mapping from value labels to some number of storage locations; for
each offset for each label, we choose the best location (prefer
registers). Note that we can choose any location, as the symbolic
dataflow analysis is sound and guarantees that the value at the
value_label instruction propagates to all of the named locations.
- Then we can convert this mapping into a format that the DWARF
generation code (wasmtime's debug crate) can use.
This PR also adds the new-backend variant to the gdb tests on CI.
This PR adds a new `isa::lookup_variant()` that takes a `BackendVariant`
(`Legacy`, `MachInst` or `Any`), and exposes both x86 backends as
separate variants if both are compiled into the build.
This will allow some new use-cases that require both backends in the
same process: for example, differential fuzzing between old and new
backends, or perhaps allowing for dynamic feature-flag selection between
the backends.
WebAssembly memory operations are by definition little-endian even on
big-endian target platforms. However, other memory accesses will require
native target endianness (e.g. to access parts of the VMContext that is
also accessed by VM native code). This means on big-endian targets,
the code generator will have to handle both little- and big-endian
memory accesses. However, there is currently no way to encode that
distinction into the Cranelift IR that describes memory accesses.
This patch provides such a way by adding an (optional) explicit
endianness marker to an instance of MemFlags. Since each Cranelift IR
instruction that describes memory accesses already has an instance of
MemFlags attached, this can now be used to provide endianness
information.
Note that by default, memory accesses will continue to use the native
target ISA endianness. To override this to specify an explicit
endianness, a MemFlags value that was built using the set_endianness
routine must be used. This patch does so for accesses that implement
WebAssembly memory operations.
This patch addresses issue #2124.
Some of the test failures tracked by #2079 are in unwind tests that are
specific to the old x86 backend: namely, these tests invoke the unwind
implementation that is paired with the old backend, rather than generic
over all backends. It thus doesn't make sense to try to run these tests
with the new backend. (The new backend's unwind code should have
analogous tests written/ported over eventually.)
It seems that we were actually building *both* x86 backends when the
`x64` feature was enabled, except that the old x86 backend would never
be instantiated by the usual ISA-lookup logic because a `x86-64` target
triple unconditionally resolves to the new one.
This PR resolves both of the issues by tweaking the feature-config
directives to exclude the `x86` backend when `x64` is enabled.
Before this patch, running the x64 new backend would require both
compiling with --features experimental_x64 and running with
`use_new_backend`.
This patches changes this behavior so that the runtime flag is not
needed anymore: using the feature flag will enforce usage of the new
backend everywhere, making using and testing it much simpler:
cargo run --features experimental_x64 ;; other CLI options/flags
This also gives a hint at what the meta language generation would look
like after switching to the new backend.
Compiling only with the x64 codegen flag gives a nice compile time speedup.
This introduces two changes:
- first, a Cargo feature is added to make it possible to use the
Cranelift x64 backend directly from wasmtime's CLI.
- second, when passing a `cranelift-flags` parameter, and the given
parameter's name doesn't exist at the target-independent flag level, try
to set it as a target-dependent setting.
These two changes make it possible to try out the new x64 backend with:
cargo run --features experimental_x64 -- run --cranelift-flags use_new_backend=true -- /path/to/a.wasm
Right now, this will fail because most opcodes required by the
trampolines are actually not implemented yet.