* Introduce new_got_entry and new_plt_entry functions
* Return NonNull<*const u8> from get_got_address
* Make GOT entry writes atomic
* Defer GOT updates until relocations and protection
Co-authored-by: Alan Egerton <eggyal@gmail.com>
This is currently a very common operation in host bindings where if wasm
gives a host function a relative pointer you'll want to simulataneously
work with the host state and the wasm memory. These two regions are
distinct and safe to borrow mutably simulataneously but it's not obvious
in the Rust type system that this is so, so add a helper method here to
assist in doing so.
* wasmtime_runtime: move ResourceLimiter defaults into this crate
In preparation of changing wasmtime::ResourceLimiter to be a re-export
of this definition, because translating between two traits was causing
problems elsewhere.
* wasmtime: make ResourceLimiter a re-export of wasmtime_runtime::ResourceLimiter
* refactor Store internals to support ResourceLimiter as part of store's data
* add hooks for entering and exiting native code to Store
* wasmtime-wast, fuzz: changes to adapt ResourceLimiter API
* fix tests
* wrap calls into wasm with entering/exiting exit hooks as well
* the most trivial test found a bug, lets write some more
* store: mark some methods as #[inline] on Store, StoreInner, StoreInnerMost
Co-authored-By: Alex Crichton <alex@alexcrichton.com>
* improve tests for the entering/exiting native hooks
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* Add support for x64 packed promote low
* Add support for x64 packed floating point demote
* Update vector promote low and demote by adding constraints
Also does some renaming and minor refactoring
* make Module::deserialize's version check optional via Config
A SerializedModule contains the CARGO_PKG_VERSION string, which is
checked for equality when loading. This is a great guard-rail but
some users may want to disable this check (e.g. so they can implement
their own versioning scheme)
* rename config to deserialize_check_wasmtime_version
* add test
* fix doc links
* fix
* thank you rustdoc
* Add the ability to cache typechecking an instance
This commit adds the abilty to cache the type-checked imports of an
instance if an instance is going to be instantiated multiple times. This
can also be useful to do a "dry run" of instantiation where no wasm code
is run but it's double-checked that a `Linker` possesses everything
necessary to instantiate the provided module.
This should ideally help cut down repeated instantiation costs slightly
by avoiding type-checking and allocation a `Vec<Extern>` on each
instantiation. It's expected though that the impact on instantiation
time is quite small and likely not super significant. The functionality,
though, of pre-checking can be useful for some embeddings.
* Fix build with async
Commit 7d36fd9a1e avoided these
x86-specific tests altogether. This change avoids any dependency on x86
entirely by specifying a frontend configuration (SystemV + U64); this is
enough information for the `FunctionBuilder` to correctly generate the
syscalls.
In commit 33c791e1f5 (PR #2944), I added LICENSE files to all published
crates. However, since then PR #2897 has been merged and remove 3 crates,
resulting in license files in empty directories.
Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread.
Previously, the multiple flags for certain AVX512 instructions were
checked using `OR`: e.g., if the CPU has AVX512VL `OR` AVX512DQ,
emit `VPMULLQ`. This is incorrect--the logic should be `AND`. The Intel
Software Developer Manual, vol. 1, sec. 15.4, has more information on
this (notable there is the suggestion to check with `XGETBV` that the OS
is allowing the use of the XMM registers--but that is a separate issue).
This change switches to `AND` logic in the new backend.
When shuffling values from two different registers, the x64 lowering for
`i8x16.shuffle` must first shuffle each register separately and then OR
the results with SSE instructions. With `VPERMI2B`, available in
AVX512VL + AVX512VBMI, this can be done in a single instruction after
the shuffle mask has been moved into the destination register. This
change uses `VPERMI2B` for that case when the CPU supports it.
The previous choice to use the WasmtimeSystemV calling convention for
apple-aarch64 devices was incorrect: padding of arguments was
incorrectly computed. So we have to use some flavor of the apple-aarch64
ABI there.
Since we want to support the wasmtime custom convention for multiple
returns on apple-aarch64 too, a new custom Wasmtime calling convention
was introduced to support this.
cranelift-codegen's build failed on s390x, with this error:
```
error[E0432]: unresolved import `crate::isa::unwind::systemv`
--> cranelift/codegen/src/isa/s390x/mod.rs:6:25
|
6 | use crate::isa::unwind::systemv::RegisterMappingError;
| ^^^^^^^ could not find `systemv` in `unwind`
```
This import should be used only with `unwind` feature enabled.
This commit adds LICENSE files to all **published** crates which do
not have it already (most of the crates have it).
Providing the license files is a requiment of the Apache 2.0 License.
I had no idea this was still in the repository, much less building!
There are much different ways to use wasmtime in Rust nowadays, such as
the `wasmtime` crate!
Moves the slow path which resizes the vector out-of-line. The actual
indexing is also done in the out-of-line path which avoids the need for
a second bounds check in the fast path after a potential resize.
Apparently `powershell Compress-Archive` produces zip files with
backslashes in filesnames which makes them unable to be extracted with
some Unix variants of extraction. For example [this failure][build] and
using macOS's built-in unzip feature it creates filenames with
backslashes in them rather than subdirectories.
[build]: https://github.com/bytecodealliance/wasmtime-go/runs/2680596219?check_suite_focus=true
This commit attempts to slim down our CI (more from #2933) by removing
testing both in debug and release mode. I can't actually recall a
concrete issue that this has turned up on CI itself, and otherwise we're
spending quite a lot of time building all of the dev-dependencies in
release mode when testing.
Additionally it removes testing for nightly/beta channels of Rust. One
of the main benefits of this, staying on top of breakage, is already
moot because we pin to a nightly anyway. We have a few nightly
references elsewhere in CI (fuzzing/docs) so we can largely rely on that
(and upstream testing with rust-lang/rust). We in general shouldn't need
to do nightly/beta testing on all builds.
The release builders were actually the only location that MinGW and
AArch64 was tested however. This means that the old nightly/beta
builders are now replaced with AArch64 and MinGW builders. Overall, the
changes made to CI here are:
* Upgrade to QEMU 6.0.0. I thought this would make aarch64 emulation
faster, but it didn't. Seems good to stay up to date though.
* Replace nightly/beta testing in debug mode with MinGW and AArch64 testing.
* Use `-g0` for C compilation on MinGW because otherwise `gcc` as used
on CI generates an ICE (!!)
* Exclude `wasi-crypto` from testing. We already exclude
`wasmtime-wasi-crypto` and it was an accident we were testing the
`wasi-crypto` crate (which isn't even part of this workspace).
* Remove testing DWARF on the old backend step, which nowadays didn't
actually do that.
* Remove testing on release builders, making then purely tasked with
release builds, nothing else.
* Rename `QEMU_VERSION` to `QEMU_BUILD_VERSION` so qemu doesn't just
immediately exit after printing its version.
Timing wise the release builds are ~20-30 minutes faster, depending on
the platform. This is not really because of testing time but rather we
have a huge dependency tree when `dev-dependencies` are considered
(criterion, tokio, proptest, ...).
MinGW tests are pretty fast since we don't run examples (we're not too
interested in doing examples there, just windows/mac/linux coverage).
AArch64 tests are run with optimizations enabled because unoptimized
tests take ~45 minutes to finish while optimized tests take ~20 minutes.
The build is naturally much faster in debug mode but apparently under
QEMU emulation the debug mode binaries are *extremely* slow compared to
the release binaries, which means that extra time we spend compiling
release tests is more than made up by faster test emulation time.
Closes#2938
When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt`
instruction can be lowered to a single x64 instruction, `VPOPCNTB`,
instead of 8+ instructions.
This commit removes the publish step in GitHub actions, insteading
folding all functionality into the release build steps. This avoids
having a separately scheduled job after all the release build jobs which
ends up getting delayed for quite a long time given the current
scheduling algorithm.
This involves refactoring the tarball assembly scripts and refactoring the
github asset upload script too. Tarball assembly now manages everything
internally and does platform-specific bits where necessary. The upload
script is restructured to be run in parallel (in theory) and hopefully
catches various errors and tries to not stomp over everyone else's work.
The main trickiness here is handling `dev`, which is less critical for
correctness than than tags themselves.
As a small tweak build-wise the QEMU build for cross-compiled builders
is now cached unlike before where it was unconditionally built, shaving
a minute or two off build time.
We used to allow at most one instantiation per compilation, but there is no
fundamental reason why that should be the case. Allowing multiple instantiations
per compilation allows us to, for example, benchmark repeated instantiation
within Wasmtime's pooling allocator.
This additionally switches to using host functions for WASI and for
`bench_{start,end}` rather than defining them on the linker, this way we can use
a new store for every instantiation and don't need to keep other instances alive
when instantiating new instances.
Finally, we switch all timing to be done through callback functions, rather than
having the bench API caller implicitly start/end timers around bench API
calls. This allows us to more precisely measure phases and exclude things like
file I/O performed when creating a WASI context.
Also move the gh-pages pushing step from the `publish` phase to just
this singular doc builder.
The motivation for this is to eventually remove the `publish` step since
it interacts badly with GitHub's scheduling of actions. This is
hopefully the first step towards that by removing the doc publish part
of the phase.
Instead of inheriting stdio, pass in explicit file paths that are opened for
reading (stdin) or writing (stderr/stdout). This will allow sightglass to assert
that benchmarks produce the expected output.
* First remove `fail-fast: false` annotations to fail faster. If desired
this could always be added in a on-off fashion to PRs.
* Next use the new `concurrency` feature to try to cancel previous
builds, ideally meaning that if a branch is pushed to multiple times
it only runs CI once.