Commit Graph

9545 Commits

Author SHA1 Message Date
wasmtime-publish
39b88e4e9e Release Wasmtime 0.34.0 (#3768)
* Bump Wasmtime to 0.34.0

[automatically-tag-and-release-this-commit]

* Add release notes for 0.34.0

* Update release date to today

Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
2022-02-07 19:16:26 -06:00
Chris Fallin
ddd39cdb84 Patch qemu in CI to fix madvise semantics. (#3770)
We currently skip some tests when running our qemu-based tests for
aarch64 and s390x. Qemu has broken madvise(MADV_DONTNEED) semantics --
specifically, it just ignores madvise() [1].

We could continue to whack-a-mole the tests whenever we create new
functionality that relies on madvise() semantics, but ideally we'd just
have emulation that properly emulates!

The earlier discussions on the qemu mailing list [2] had a proposed
patch for this, but (i) this patch doesn't seem to apply cleanly anymore
(it's 3.5 years old) and (ii) it's pretty complex due to the need to
handle qemu's ability to emulate differing page sizes on host and guest.

It turns out that we only really need this for CI when host and guest
have the same page size (4KiB), so we *could* just pass the madvise()s
through. I wouldn't expect such a patch to ever land upstream in qemu,
but it satisfies our needs I think. So this PR modifies our CI setup to
patch qemu before building it locally with a little one-off patch.

[1]
https://github.com/bytecodealliance/wasmtime/pull/2518#issuecomment-747280133

[2]
https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05416.html
2022-02-07 15:56:54 -08:00
Alex Crichton
43b37944ff Tweak parallelism and the instantiation benchmark (#3775)
Currently the "sequential" and "parallel" benchmarks reports somewhat
different timings. For sequential it's time-to-instantiate, but for
parallel it's time-to-instantiate-10k instances. The parallelism in the
parallel benchmark can also theoretically be affected by rayon's
work-stealing. For example if rayon doesn't actually do any work
stealing at all then this ends up being a sequential test again.
Otherwise though it's possible for some threads to finish much earlier
as rayon isn't guaranteed to keep threads busy.

This commit applies a few updates to the benchmark:

* First an `InstancePre<T>` is now used instead of a `Linker<T>` to
  front-load type-checking and avoid that on each instantiation (and
  this is generally the fastest path to instantiate right now).

* Next the instantiation benchmark is changed to measure one
  instantiation-per-iteration to measure per-instance instantiation to
  better compare with sequential numbers.

* Finally rayon is removed in favor of manually creating background
  threads that infinitely do work until we tell them to stop. These
  background threads are guaranteed to be working for the entire time
  the benchmark is executing and should theoretically exhibit what the
  situation that there's N units of work all happening at once.

I also applied some minor updates here such as having the parallel
instantiation defined conditionally for multiple modules as well as
upping the limits of the pooling allocator to handle a large module
(rustpython.wasm) that I threw at it.
2022-02-07 15:55:38 -08:00
Harald Hoyer
fa889b4fd2 wasmtime: add CLI options for pre-opened TCP listen sockets (#3729)
This patch  implements CLI options to insert pre-opened sockets.

`--listenfd` : Inherit environment variables and file descriptors following
               the systemd listen fd specification (UNIX only).

`--tcplisten <SOCKET ADDRESS>`: Grant access to the given TCP listen socket.

Signed-off-by: Harald Hoyer <harald@profian.com>
2022-02-07 14:26:38 -08:00
Chris Fallin
88b53b12aa Turn off memfd by default, at least for this upcoming release. (#3774)
Since memfd support just landed, and has had only ~0.5 weeks to bake
with fuzzing, we want to make release 0.34.0 of Wasmtime without it
enabled by default. This PR disables memfd by default; it can be enabled
by specifying the `memfd` feature for the `wasmtime` crate, or when
building the commandline binary.

We plan to explicitly add memfd-enabled fuzzing targets, let that go for
a while, then probably re-enable memfd in the subsequent release if no
issues come up.
2022-02-07 15:44:53 -06:00
Nick Fitzgerald
ff622667f7 Merge pull request #3773 from fitzgen/x64-traps-safepoints
ISLE: emit traps as safepoints on x64
2022-02-07 10:57:27 -08:00
Nick Fitzgerald
bb7ae46ecd ISLE: emit traps as safepoints on x64 2022-02-07 10:01:23 -08:00
Nick Fitzgerald
31e2d6b21c Merge pull request #3769 from cfallin/fix-debuginfo-cold-blocks
Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout.
2022-02-07 08:58:50 -08:00
Jonas Kruckenberg
79af8cd9ce chore: update zstd (#3771) 2022-02-07 09:38:12 -06:00
Chris Fallin
2cf3069b6b Extend cold-blocks test to test debuginfo as well. 2022-02-04 23:15:16 -08:00
Chris Fallin
d9d6469422 Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout.
The debuginfo analyses are written with the assumption that the order of
instructions in the VCode is the order of instructions in the final
machine ocde. This was previously a strong invariant, until we
introduced support for cold blocks. Cold blocks are implemented by
reordering during emission, because the VCode ordering has other
requirements related to lowering (respecting def-use dependencies in the
reverse pass), so it is much simpler to reorder instructions at the last
moment. Unfortunately, this causes the breakage we now see.

This commit fixes the issue by skipping all cold instructions when
emitting value-label ranges (which are translated into debuginfo). This
means that variables defined in cold blocks will not have DWARF
metadata. But cold blocks are usually compiler-inserted slowpaths, not
user code, so this is probably OK. Debuginfo is always best-effort, so
in any case this does not violate any correctness constraints.
2022-02-04 23:15:04 -08:00
Chris Fallin
04269355ca Merge pull request #3767 from avanhatt/patch-1
Add item to Cranelift 02-07 meeting agenda
2022-02-04 12:02:52 -08:00
Alexa VanHattum
f016a1d266 Add item to 02-07 meeting agenda 2022-02-04 14:58:39 -05:00
Alex Crichton
04d2caea7b Consolidate methods of memory initialization (#3766)
* Consolidate methods of memory initialization

This commit consolidates the few locations that we have which are
performing memory initialization. Namely the uffd logic for creating
paged memory as well as the memfd logic for creating a memory image now
share an implementation to avoid duplicating bounds-checks or other
validation conditions. The main purpose of this commit is to fix a
fuzz-bug where a multiplication overflowed. The overflow itself was
benign but it seemed better to fix the overflow in only one place
instead of multiple.

The overflow in question is specifically when an initializer is checked
to be statically out-of-bounds and multiplies a memory's minimum size by
the wasm page size, returning the result as a `u64`. For
memory64-memories of size `1 << 48` this multiplication will overflow.
This was actually a preexisting bug with the `try_paged_init` function
which was copied for memfd, but cropped up here since memfd is used more
often than paged initialization. The fix here is to skip validation of
the `end` index if the size of memory is `1 << 64` since if the `end`
index can be represented as a `u64` then it's in-bounds. This is
somewhat of an esoteric case, though, since a memory of minimum size `1
<< 64` can't ever exist (we can't even ask the os for that much memory,
and even if we could it would fail).

* Fix memfd test

* Fix some tests

* Remove InitMemory enum

* Add an `is_segmented` helper method

* More clear variable name

* Make arguments to `init_memory` more descriptive
2022-02-04 13:17:25 -06:00
Nick Fitzgerald
a519e5ab64 Merge pull request #3752 from fitzgen/newtypes-for-register-classes
cranelift: Add newtype wrappers for x64 register classes
2022-02-03 14:56:57 -08:00
Nick Fitzgerald
2c77cf866a ISLE: Rename {gpr,xmm}_mem_new constructors to reg_mem_to_{gpr,xmm}_mem 2022-02-03 14:08:08 -08:00
Nick Fitzgerald
795b0aaf9a cranelift: Add newtype wrappers for x64 register classes
This primary motivation of this large commit (apologies for its size!) is to
introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch
difficult-to-diagnose register class mixup bugs in x64 lowerings.

But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of
our operand-with-wrong-register-class bugs, because about 50% of operands on x64
aren't just a register, but a register or memory address or even an
immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well.

Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit
noisier to work with from ISLE. They need to maintain the invariant that their
registers really are of the claimed register class, so they need to encapsulate
the inner data. If they exposed the underlying `enum` variants, then anyone
could just change register classes or construct a `GprMem` that holds an XMM
register, defeating the whole point of these newtypes. So when working with
these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem
my_gpr)` instead of `(GprMem.Gpr my_gpr)`.

A bit of extra lines of code are included to add support for register mapping
for all of these newtypes as well. Ultimately this is all a bit wordier than I'd
hoped it would be when I first started authoring this commit, but I think it is
all worth it nonetheless!

In the process of adding these newtypes, I didn't want to have to update both
the ISLE `extern` type definition of `MInst` and the Rust definition, so I move
the definition fully into ISLE, similar as aarch64.

Finally, this process isn't complete. I've introduced the newtypes here, and
I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as
register class-converting instructions, but I haven't moved all of the GPR-using
instructions over to the newtypes yet. I figured this commit was big enough as
it was, and I can continue the adoption of these newtypes in follow up commits.

Part of #3685.
2022-02-03 14:08:08 -08:00
Nick Fitzgerald
e1f4e29efe ISLE: Add a nodebug type attribute to disable derive(Debug)
I need this to move the x64 `Inst` definition into ISLE without losing its
custom `Debug` implementation that prints the assembly for the `Inst`.
2022-02-03 14:08:08 -08:00
Chris Fallin
b3b83efdbe Merge pull request #3760 from cfallin/memfd-lazy-create
Make memfd image creation lazy (on first instantiation).
2022-02-03 13:20:24 -08:00
Chris Fallin
2a24a0fbde Make memfd image creation lazy (on first instantiation).
As a followup to the recent memfd allocator work, this PR makes the
memfd image creation occur on the first instantiation, rather than
immediately when the `Module` is loaded.

This shaves off a potentially surprising cost spike that would have
otherwise occurred: prior to the memfd work, no allocator eagerly read
the module's initial heap state into RAM. The behavior should now more
closely resemble what happened before (and the improvements in overall
instantiation time and performance, as compared to either eager init
with pure-mmap memory or user-mode pagefault handling with uffd,
remain).
2022-02-03 12:46:34 -08:00
Nick Fitzgerald
605c79fd05 Merge pull request #3756 from alexcrichton/update-wasm-tools
Update wasm-tools crates
2022-02-03 11:19:55 -08:00
Chris Fallin
43de2dca1f Merge pull request #3765 from cfallin/cranelift-isle-license
Add LICENSE file to cranelift/isle/.
2022-02-03 10:39:19 -08:00
Alex Crichton
4ba3404ca3 Fix MemFd's allocated memory for dynamic memories (#3763)
This fixes a bug in the memfd-related management of a linear memory
where for dynamic memories memfd wasn't informed of the extra room that
the dynamic memory could grow into, only the actual size of linear
memory, which ended up tripping an assert once the memory was grown. The
fix here is pretty simple which is to factor in this extra space when
passing the allocation size to the creation of the `MemFdSlot`.
2022-02-03 11:56:16 -06:00
Chris Fallin
695c64f2b2 Add LICENSE file to cranelift/isle/.
This is the same Apache 2.0 license file that is in all of our other
crates; omitting it here was just an oversight.

Fixes #3761.
2022-02-03 09:54:58 -08:00
Andrew Brown
31d4d2cbe0 meeting: add notes (#3764) 2022-02-03 09:46:17 -08:00
Chris Fallin
8fb7cbae9e Merge pull request #3762 from cfallin/meeting-20220203
Add item to Wasmtime meeting
2022-02-03 08:54:55 -08:00
Chris Fallin
aacf563e38 Add item to Wasmtime meeting 2022-02-03 08:53:37 -08:00
Alex Crichton
b647561c44 memfd: Some minor follow-ups (#3759)
* Tweak memfd-related features crates

This commit changes the `memfd` feature for the `wasmtime-cli` crate
from an always-on feature to a default-on feature which can be disabled
at compile time. Additionally the `pooling-allocator` feature is also
given similar treatment.

Additionally some documentation was added for the `memfd` feature on the
`wasmtime` crate.

* Don't store `Arc<T>` in `InstanceAllocationRequest`

Instead store `&Arc<T>` to avoid having the clone that lives in
`InstanceAllocationRequest` not actually going anywhere. Otherwise all
instance allocation requires an extra clone to create it for the request
and an extra decrement when the request goes away. Internally clones are
made as necessary when creating instances.

* Enable the pooling allocator by default for `wasmtime-cli`

While perhaps not the most useful option since the CLI doesn't have a
great way to take advantage of this it probably makes sense to at least
match the features of `wasmtime` itself.

* Fix some lints and issues

* More compile fixes
2022-02-03 09:17:04 -06:00
Alex Crichton
8ed79c8f57 memfd: Reduce some syscalls in the on-demand case (#3757)
* memfd: Reduce some syscalls in the on-demand case

This tweaks the internal organization of the `MemFdSlot` to avoid some
syscalls in the default case as well as opportunistically in the pooling
case. The two cases added here are:

* A `MemFdSlot` is now created with a specified initial size. For
  pooling this is 0 but for the on-demand case this can be non-zero.

* When `instantiate` is called with no prior image and the sizes match
  (as will be the case for on-demand allocation) then `mprotect` is
  skipped entirely.

* In the `clear_and_remain-ready` case the `mprotect` is skipped if the
  heap wasn't grown at all.

This should avoid ever using `mprotect` unnecessarily and makes the
ranges we `mprotect` a bit smaller as well.

* Review comments

* Tweak allow to apply to whole crate
2022-02-02 16:09:47 -06:00
Chris Fallin
5deb1f1fbf Merge pull request #3738 from cfallin/pooling-affinity
Pooling allocator: add a reuse-affinity policy.
2022-02-02 13:11:39 -08:00
Chris Fallin
99ed8cc9be Merge pull request #3697 from cfallin/memfd-cow
memfd/madvise-based CoW pooling allocator
2022-02-02 13:04:26 -08:00
Chris Fallin
1cbd393930 Review comments. 2022-02-02 12:25:30 -08:00
Chris Fallin
6011420557 Pooling allocator: add a reuse-affinity policy.
This policy attempts to reuse the same instance slot for subsequent
instantiations of the same module. This is particularly useful when
using a pooling backend such as memfd that benefits from this reuse: for
example, in the memfd case, instantiating the same module into the same
slot allows us to avoid several calls to mmap() because the same
mappings can be reused.

The policy tracks a freelist per "compiled module ID", and when
allocating a slot for an instance, tries these three options in order:

1. A slot from the freelist for this module (i.e., last used for another
   instantiation of this particular module), or
3. A slot that was last used by some other module or never before.

The "victim" slot for choice 2 is randomly chosen.

The data structures are carefully designed so that all updates are O(1),
and there is no retry-loop in any of the random selection.

This policy is now the default when the memfd backend is selected via
the `memfd-allocator` feature flag.
2022-02-02 12:25:30 -08:00
Chris Fallin
9880eba2a8 Skip memfd tests when on qemu, due to differing madvise semantics. 2022-02-02 12:25:20 -08:00
Chris Fallin
d7b04f5ced Review comments. 2022-02-02 11:41:31 -08:00
Chris Fallin
0ec45d3ae4 Add additional tests for MemFdSlot. 2022-02-02 11:33:05 -08:00
Alex Crichton
3f5cbddab5 Fix a text format test expectation 2022-02-02 10:17:18 -08:00
Chris Fallin
94410a8d4b Review comments. 2022-02-02 10:03:31 -08:00
Alex Crichton
9d1e517615 Update some more version reqs 2022-02-02 09:51:27 -08:00
Alex Crichton
65486a0680 Update wasm-tools crates
Nothing major here, just a routine update with a few extra things to
handle here-and-there.
2022-02-02 09:50:08 -08:00
Alex Crichton
c83968575a Lazily populate a store's trampoline map (#3742)
* Lazily populate a store's trampoline map

This commit is another installment of "how fast can we make
instantiation". Currently when instantiating a module with many function
imports each function, typically from the host, is inserted into the
store. This insertion process stores the `VMTrampoline` for the host
function in a side table so it can be looked up later if the host
function is called through the `Func` interface. This insertion process,
however, involves a hash map insertion which can be relatively expensive
at the scale of the rest of the instantiation process.

The optimization implemented in this commit is to avoid inserting
trampolines into the store at `Func`-insertion-time (aka instantiation
time) and instead only lazily populate the map of trampolines when
needed. The theory behind this is that almost all `Func` instances that
are called indirectly from the host are actually wasm functions, not
host-defined functions. This means that they already don't need to go
through the map of host trampolines and can instead be looked up from
the module they're defined in. With the assumed rarity of host functions
making `lookup_trampoline` a bit slower seems ok.

The `lookup_trampoline` function will now, on a miss from the wasm
modules and `host_trampolines` map, lazily iterate over the functions
within the store and insert trampolines into the `host_trampolines` map.
This process will eventually reach something which matches the function
provided because it should at least hit the same host function. The
relevant `lookup_trampoline` now sports a new documentation block
explaining all this as well for future readers.

Concretely this commit speeds up instantiation of an empty module with
100 imports and ~80 unique signatures from 10.6us to 6.4us, a 40%
improvement.

* Review comments

* Remove debug assert
2022-02-02 09:43:29 -06:00
Dan Gohman
ffa9fe32aa Use is-terminal instead of atty.
Following up on #3696, use the new is-terminal crate to test for a tty
rather than having platform-specific logic in Wasmtime. The is-terminal
crate has a platform-independent API which takes a handle.

This also updates the tree to cap-std 0.24 etc., to avoid depending on
multiple versions of io-lifetimes at once, as enforced by the cargo deny
check.
2022-02-01 17:48:49 -08:00
Chris Fallin
84a8368e88 Fix to the optimization: mprotect(NONE) sometimes needed after skipping the initial mmap. 2022-02-01 16:34:06 -08:00
Chris Fallin
01e6bb81fb Review feedback. 2022-02-01 15:49:44 -08:00
Nick Fitzgerald
491e98233e Merge pull request #3750 from bytecodealliance/pch/fix_3749
fix #3749: returns count should count the returns, not the params.
2022-02-01 10:29:17 -08:00
Pat Hickey
aa4c81a4e7 fix #3749: returns count should count the returns, not the params. 2022-02-01 09:46:46 -08:00
Chris Fallin
0ff8f6ab20 Make build-config magic use memfd by default. 2022-01-31 22:39:20 -08:00
Chris Fallin
ccfa245261 Optimization: only mprotect the *new* bit of heap, not all of it.
(This was not a correctness bug, but is an obvious performance bug...)
2022-01-31 21:25:40 -08:00
Chris Fallin
982df2f2e5 Review feedback. 2022-01-31 16:40:14 -08:00
Harald Hoyer
853a025613 Implement sock_accept
With the addition of `sock_accept()` in `wasi-0.11.0`, wasmtime can now
implement basic networking for pre-opened sockets.

For Windows `AsHandle` was replaced with `AsRawHandleOrSocket` to cope
with the duality of Handles and Sockets.

For Unix a `wasi_cap_std_sync::net::Socket` enum was created to handle
the {Tcp,Unix}{Listener,Stream} more efficiently in
`WasiCtxBuilder::preopened_socket()`.

The addition of that many `WasiFile` implementors was mainly necessary,
because of the difference in the `num_ready_bytes()` function.

A known issue is Windows now busy polling on sockets, because except
for `stdin`, nothing is querying the status of windows handles/sockets.

Another know issue on Windows, is that there is no crate providing
support for `fcntl(fd, F_GETFL, 0)` on a socket.

Signed-off-by: Harald Hoyer <harald@profian.com>
2022-01-31 16:25:11 -08:00