This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris, `Function` has been split into mostly two parts:
- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.
Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
- `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
- The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.
The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.
A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.
Fixes#4155.
Implement the tls_value for s390 in the ELF general-dynamic mode.
Notable differences to the x86_64 implementation are:
- We use a __tls_get_offset libcall instead of __tls_get_addr.
- The current thread pointer (stored in a pair of access registers)
needs to be added to the result of __tls_get_offset.
- __tls_get_offset has a variant ABI that requires the address of
the GOT (global offset table) is passed in %r12.
This means we need a new libcall entries for __tls_get_offset.
In addition, we also need a way to access _GLOBAL_OFFSET_TABLE_.
The latter is a "magic" symbol with a well-known name defined
by the ABI and recognized by the linker. This patch introduces
a new ExternalName::KnownSymbol variant to support such names
(originally due to @afonso360).
We also need to emit a relocation on a symbol placed in a
constant pool, as well as an extra relocation on the call
to __tls_get_offset required for TLS linker optimization.
Needed by the cg_clif frontend.
The comment says the enum is "likely to grow" and the function's been in libc since C89, so hopefully this is ok.
I'd like to use it for emitting things like array equality.
This commit is a slight refactoring of the `Module` trait and backend in
`cranelift-object`. The goal is to enable parallelization of compilation
when using `cranelift-object`. Currently this is difficult because
`ObjectModule::define_function` requires `&mut self`. This instead
soups up the `define_function_bytes` interface to handle relocations so
compilation can happen externally before defining it in a `Module`. This
also means that `define_function` is now a convenience wrapper around
`define_function_bytes`.
* move trap site definitions into cranelift-module
`cranelift-faerie` and `cranelift-object` already have identical
definitions of structures to represent trap sites. We might as well
merge them ahead of work to define functions via a raw slice of bytes
with associated traps, which will need some kind of common structure for
representing traps anyway.
* cranelift-module: add `define_function_bytes` interface
This interface is useful when the client needs to precisely specify the
ordering of bytes in a particular function.
* add comment about saving files for `perf`
* the target-lexicon crate no longer has or needs the std feature
in cargo, so we can delete all default-features=false, any mentions
of its std feature, and the nostd configs in many lib.rs files
* the representation of arm architectures has changed, so some case
statements needed refactoring