wasmtime

Author	SHA1	Message	Date
Chris Fallin	8636359b71	Merge pull request #2763 from cfallin/fix-srem-trap Handle `srem` properly when `avoid_div_traps` is false.	2021-03-24 23:58:26 -07:00
Chris Fallin	b429f77ee9	Handle `srem` properly when `avoid_div_traps` is false. The codegen for div/rem ops has two modes, depending on the `avoid_div_traps` flag: it can either do all checks for trapping conditions explicitly, and use explicit trap instructions, then use a hardware divide instruction that will not trap (`avoid_div_traps == true`); or it can run in a mode where a hardware FP fault on the divide instruction implies a Wasm trap (`avoid_div_traps == false`). Wasmtime uses the former while Lucet (for example) uses the latter. It turns out that because we run all our spec tests run under Wasmtime, we missed a spec corner case that fails in the latter: INT_MIN % -1 == 0 per the spec, but causes a trap with the x86 signed divide/remainder instruction. Hence, in Lucet, this specific remainder computation would incorrectly result in a Wasm trap. This PR fixes the issue by just forcing use of the explicit-checks implementation for `srem` even when `avoid_div_traps` is false.	2021-03-24 22:30:07 -07:00
Pat Hickey	db7ec9552c	Merge pull request #2760 from bytecodealliance/pch/wiggle_error_reporting wiggle: delete GuestErrorConversion, improve some error reporting	2021-03-24 13:39:04 -07:00
Pat Hickey	df18b44c53	oops	2021-03-24 11:37:42 -07:00
Pat Hickey	1d663bfd71	delete straggler InDataField	2021-03-24 11:19:12 -07:00
Pat Hickey	e6c7e00a52	wiggle-using crates: delete GuestErrorConversion	2021-03-24 10:39:06 -07:00
Alex Crichton	d4b54ee0a8	More optimizations for calling into WebAssembly (#2759 ) * Combine stack-based cleanups for faster wasm calls This commit is an extension of #2757 where the goal is to optimize entry into WebAssembly. Currently wasmtime has two stack-based cleanups when entering wasm, one for the externref activation table and another for stack limits getting reset. This commit fuses these two cleanups together into one and moves some code around which enables less captures for fewer closures and such to speed up calls in to wasm a bit more. Overall this drops the execution time from 88ns to 80ns locally for me. This also updates the atomic orderings when updating the stack limit from `SeqCst` to `Relaxed`. While `SeqCst` is a reasonable starting point the usage here should be safe to use `Relaxed` since we're not using the atomics to actually protect any memory, it's simply receiving signals from other threads. * Determine whether a pc is wasm via a global map The macOS implementation of traps recently changed to using mach ports for handlers instead of signal handlers. This means that a previously relied upon invariant, each thread fixes its own trap, was broken. The macOS implementation worked around this by maintaining a global map from thread id to thread local information, however, to solve the problem. This global map is quite slow though. It involves taking a lock and updating a hash map on all calls into WebAssembly. In my local testing this accounts for >70% of the overhead of calling into WebAssembly on macOS. Naturally it'd be great to remove this! This commit fixes this issue and removes the global lock/map that is updated on all calls into WebAssembly. The fix is to maintain a global map of wasm modules and their trap addresses in the `wasmtime` crate. Doing so is relatively simple since we're already tracking this information at the `Store` level. Once we've got a global map then the macOS implementation can use this from a foreign thread and everything works out. Locally this brings the overhead, on macOS specifically, of calling into wasm from 80ns to ~20ns. * Fix compiles * Review comments	2021-03-24 11:41:33 -05:00
Pat Hickey	1c4af27f2d	delete GuestErrorConversion from docs, tests	2021-03-23 22:20:29 -07:00
Pat Hickey	f74b0291ad	dead code: remove GuestErrorConversion, it now is never called	2021-03-23 22:14:49 -07:00
Pat Hickey	4a9ce90d34	GuestError::InDataField never constructed, so delete it	2021-03-23 22:04:34 -07:00
Pat Hickey	1151f630b8	wiggle GuestError: improve Display of InFunc, InDataField	2021-03-23 22:03:25 -07:00
Pat Hickey	6b2da3d299	Merge pull request #2756 from bytecodealliance/pch/wasi_sleep_fallible wasi: make WasiSched::sleep fallible	2021-03-23 14:40:21 -07:00
Alex Crichton	c95971ab59	Optimize calling a WebAssembly function (#2757 ) This commit implements a few optimizations, mainly inlining, that should improve the performance of calling a WebAssembly function. This code path can be quite hot depending on the embedding case and we hadn't really put much effort into optimizing the nitty gritty. The predominant optimization here is adding `#[inline]` to trivial functions so performance is improved without having to compile with LTO. Another optimization is to call `lazy_per_thread_init` when traps are initialized per-thread (when a `Store` is created) rather than each time a function is called. The next optimization is to change the unwind reason in the `CallThreadState` to `MaybeUninit` to avoid extra checks in the default case about whether we need to drop its variants (since in the happy path we never need to drop it). The final optimization is to optimize out a few checks when `async` support is disabled for a small speed boost. In a small benchmark where wasmtime calls a simple wasm function my macOS computer dropped from 110ns to 86ns overhead, a 20% decrease. The macOS overhead is still largely dominated by the global lock acquisition and hash table management for traps right now, but I suspect the Linux overhead is much better (should be on the order of ~30 or so ns). We still have a long way to go to compete with SpiderMonkey which, in testing, seem to have ~6ns overhead in calling the same wasm function on my computer.	2021-03-23 15:22:37 -05:00
Pat Hickey	81dfb9c458	wasi: make WasiSched::sleep fallible some systems do not support sleeping and may want to return EINVAL here.	2021-03-23 10:20:03 -07:00
Benjamin Bouvier	49ef2c652a	Cranelift: remove logging of vcode when the log level isn't debug or more (#2755 ) This logging step may be quite expensive, since logging has never been optimized at all. Removing it is a clear win in compile times on my machine for a large wasm module, for which parallel compilation is lowering from 6 seconds to 1.5 seconds. Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>	2021-03-23 16:07:32 +01:00
Dan Gohman	2880dab8f8	Add a `sleep` function to the `WasiSched` trait.	2021-03-22 12:50:16 -07:00
Dan Gohman	dd7e16762c	Arrange for the new test to be called.	2021-03-22 12:50:16 -07:00
Dan Gohman	6b40724d18	Support "sleep" forms of `poll_oneoff`. Add support for `poll_oneoff` calls which just sleep on a relative timeout. This fixes a bug handling code compiled with WASI libc's `sleep` family of functions, which call `poll_oneoff` with a `CLOCK_REALTIME` timer, which wasn't previously implemented.	2021-03-22 12:50:16 -07:00
Dan Gohman	cba0144612	Use `min_by` instead of `sort_by` when we only want the minimum element. This is just a minor code simplification I happened to notice while doing unrelated work on `poll_oneoff`.	2021-03-22 11:08:28 -07:00
Peter Huene	4471d27567	Merge pull request #2741 from peterhuene/refactor-fiber-stacks Split out fiber stacks from fibers.	2021-03-22 11:05:16 -07:00
bjorn3	b321a7291d	Clarify ownership of data returned by get_finalized_*	2021-03-22 09:48:09 -07:00
Benjamin Bouvier	6e6713ae0b	cranelift: add support for the Mac aarch64 calling convention This bumps target-lexicon and adds support for the AppleAarch64 calling convention. Specifically for WebAssembly support, we only have to worry about the new stack slots convention. Stack slots don't need to be at least 8-bytes, they can be as small as the data type's size. For instance, if we need stack slots for (i32, i32), they can be located at offsets (+0, +4). Note that they still need to be properly aligned on the data type they're containing, though, so if we need stack slots for (i32, i64), we can't start the i64 slot at the +4 offset (it must start at the +8 offset). Added one test that was failing on the Mac M1, as well as other tests stressing different yet similar situations.	2021-03-22 10:06:13 +01:00
bjorn3	cc89111463	Support declaring anonymous functions and data objects	2021-03-21 18:00:26 +01:00
bjorn3	1639f2c844	Allow passing arbitrary MemFlags to emit_small_mem{cpy,move}	2021-03-20 19:22:55 +01:00
Peter Huene	8e43e96410	Merge pull request #2743 from dalinaum/patch-1 Fix an incorrect link.	2021-03-20 00:07:54 -07:00
Peter Huene	e6dda413a4	Code review feedback. * Add assert to `StackPool::deallocate` to ensure the fiber stack given to it comes from the pool. * Remove outdated comment about windows and stacks as the allocator now returns fiber stacks. * Remove conditional compilation around `stack_size` in the allocators as it was just clutter.	2021-03-20 00:05:08 -07:00
Peter Huene	f556bd18a7	Set the thread stack guarantee for fibers on Windows. This commit fixes the Windows implementation of fibers in Wasmtime to reserve enough staack space for Rust to handle any stack overflow exceptions.	2021-03-19 14:48:36 -07:00
LYK	3f2d36d532	Fix an incorrect link.	2021-03-20 03:41:03 +09:00
Peter Huene	8e34022784	Add tests for hitting fiber stack guard pages.	2021-03-18 23:57:42 -07:00
Peter Huene	f8f51afac1	Split out fiber stacks from fibers. This commit splits out a `FiberStack` from `Fiber`, allowing the instance allocator trait to return `FiberStack` rather than raw stack pointers. This keeps the stack creation mostly in `wasmtime_fiber`, but now the on-demand instance allocator can make use of it. The instance allocators no longer have to return a "not supported" error to indicate that the store should allocate its own fiber stack. This includes a bunch of cleanup in the instance allocator to scope stacks to the new "async" feature in the runtime. Closes #2708.	2021-03-18 20:21:02 -07:00
Pat Hickey	0394a01194	Merge pull request #2739 from wrbs/stack-map-sink cranelift-module: Add support for passing a StackMapSink when defining functions	2021-03-18 17:49:36 -07:00
Will Robson	38926fb1fc	cranelift-module: Add support for passing a StackMapSink when defining functions Fixes #2738 This follows the convention set by the existing method of passing a TrapSink by adding another argument for a StackMapSink.	2021-03-19 00:02:15 +00:00
Chris Fallin	59dfe4b9f4	Merge pull request #2740 from cfallin/fix-lldb-ci Explicitly install LLDB in CI to fix intermittent failure on Ubuntu 20.04 image.	2021-03-18 12:34:52 -07:00
Chris Fallin	69f27c06d2	Explicitly install LLDB in CI to fix intermittent failure on 20.04 image.	2021-03-18 11:13:14 -07:00
Benjamin Bouvier	5fecdfa491	Mach ports continued + support aarch64-apple unwinding (#2723 ) * Switch macOS to using mach ports for trap handling This commit moves macOS to using mach ports instead of signals for handling traps. The motivation for this is listed in #2456, namely that once mach ports are used in a process that means traditional UNIX signal handlers won't get used. This means that if Wasmtime is integrated with Breakpad, for example, then Wasmtime's trap handler never fires and traps don't work. The `traphandlers` module is refactored as part of this commit to split the platform-specific bits into their own files (it was growing quite a lot for one inline `cfg_if!`). The `unix.rs` and `windows.rs` files remain the same as they were before with a few minor tweaks for some refactored interfaces. The `macos.rs` file is brand new and lifts almost its entire implementation from SpiderMonkey, adapted for Wasmtime though. The main gotcha with mach ports is that a separate thread is what services the exception. Some unsafe magic allows this separate thread to read non-`Send` and temporary state from other threads, but is hoped to be safe in this context. The unfortunate downside is that calling wasm on macOS now involves taking a global lock and modifying a global hash map twice-per-call. I'm not entirely sure how to get out of this cost for now, but hopefully for any embeddings on macOS it's not the end of the world. Closes #2456 * Add a sketch of arm64 apple support * store: maintain CallThreadState mapping when switching fibers * cranelift/aarch64: generate unwind directives to disable pointer auth Aarch64 post ARMv8.3 has a feature called pointer authentication, designed to fight ROP/JOP attacks: some pointers may be signed using new instructions, adding payloads to the high (previously unused) bits of the pointers. More on this here: https://lwn.net/Articles/718888/ Unwinders on aarch64 need to know if some pointers contained on the call frame contain an authentication code or not, to be able to properly authenticate them or use them directly. Since native code may have enabled it by default (as is the case on the Mac M1), and the default is that this configuration value is inherited, we need to explicitly disable it, for the only kind of supported pointers (return addresses). To do so, we set the value of a non-existing dwarf pseudo register (34) to 0, as documented in https://github.com/ARM-software/abi-aa/blob/master/aadwarf64/aadwarf64.rst#note-8. This is done at the function granularity, in the spirit of Cranelift compilation model. Alternatively, a single directive could be generated in the CIE, generating less information per module. * Make exception handling work on Mac aarch64 too * fibers: use a breakpoint instruction after the final call in wasmtime_fiber_start Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2021-03-17 09:43:22 -05:00
Benjamin Bouvier	4603b3b292	Bump dependencies to get a single version of rand (#2733 ) This removes a few crates in the dependencies, and a few exceptions (at the price of a new one) in the cargo-deny configuration.	2021-03-17 09:07:50 -05:00
Nick Fitzgerald	a8aaf812ef	Merge pull request #2731 from fitzgen/make-0.25.0-release Make 0.25.0 release	2021-03-16 13:20:42 -07:00
Nick Fitzgerald	fe933c601a	Fix day on date in RELEASES.md	2021-03-16 12:34:09 -07:00
Nick Fitzgerald	72b2bde808	Fix year on date in RELEASES.md Co-authored-by: Rémy Rakic <remy.rakic+github@gmail.com>	2021-03-16 12:33:22 -07:00
Nick Fitzgerald	2b57cd16c8	Fix date formatting in RELEASES.md Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>	2021-03-16 11:26:04 -07:00
Nick Fitzgerald	b92893e11c	Add the release date for 0.25.0 in RELEASES.md	2021-03-16 11:04:24 -07:00
Nick Fitzgerald	d081ef9c2e	Bump Wasmtime to 0.25.0; Cranelift to 0.72.0	2021-03-16 11:02:56 -07:00
Dan Gohman	2d3f2adf04	Fix nondeterministic failures in poll_oneoff_stdio. Adjust this test so that it tolerates poll_oneoff returning that both a timeout occurred and an input is ready for reading, at the same time.	2021-03-15 11:48:19 -07:00
Chris Fallin	a46daa7eee	Merge pull request #2724 from akirilov-arm/aarch64_atomics Cranelift AArch64: Add initial support for the Armv8.1 atomics	2021-03-15 11:21:48 -07:00
Anton Kirilov	07c27039b1	Cranelift AArch64: Add initial support for the Armv8.1 atomics This commit enables Cranelift's AArch64 backend to generate code for instruction set extensions (previously only the base Armv8-A architecture was supported); also, it makes it possible to detect the extensions supported by the host when JIT compiling. The new functionality is applied to the IR instruction `AtomicCas`. Copyright (c) 2021, Arm Limited.	2021-03-13 02:31:51 +00:00
Chris Fallin	df6812b855	Merge pull request #2710 from cfallin/x64-fastcall-unwind Rework/simplify unwind infrastructure, implement Windows unwind, and add Windows/new-backend to CI.	2021-03-11 20:47:58 -08:00
Chris Fallin	2d5db92a9e	Rework/simplify unwind infrastructure and implement Windows unwind. Our previous implementation of unwind infrastructure was somewhat complex and brittle: it parsed generated instructions in order to reverse-engineer unwind info from prologues. It also relied on some fragile linkage to communicate instruction-layout information that VCode was not designed to provide. A much simpler, more reliable, and easier-to-reason-about approach is to embed unwind directives as pseudo-instructions in the prologue as we generate it. That way, we can say what we mean and just emit it directly. The usual reasoning that leads to the reverse-engineering approach is that metadata is hard to keep in sync across optimization passes; but here, (i) prologues are generated at the very end of the pipeline, and (ii) if we ever do a post-prologue-gen optimization, we can treat unwind directives as black boxes with unknown side-effects, just as we do for some other pseudo-instructions today. It turns out that it was easier to just build this for both x64 and aarch64 (since they share a factored-out ABI implementation), and wire up the platform-specific unwind-info generation for Windows and SystemV. Now we have simpler unwind on all platforms and we can delete the old unwind infra as soon as we remove the old backend. There were a few consequences to supporting Fastcall unwind in particular that led to a refactor of the common ABI. Windows only supports naming clobbered-register save locations within 240 bytes of the frame-pointer register, whatever one chooses that to be (RSP or RBP). We had previously saved clobbers below the fixed frame (and below nominal-SP). The 240-byte range has to include the old RBP too, so we're forced to place clobbers at the top of the frame, just below saved RBP/RIP. This is fine; we always keep a frame pointer anyway because we use it to refer to stack args. It does mean that offsets of fixed-frame slots (spillslots, stackslots) from RBP are no longer known before we do regalloc, so if we ever want to index these off of RBP rather than nominal-SP because we add support for `alloca` (dynamic frame growth), then we'll need a "nominal-BP" mode that is resolved after regalloc and clobber-save code is generated. I added a comment to this effect in `abi_impl.rs`. The above refactor touched both x64 and aarch64 because of shared code. This had a further effect in that the old aarch64 prologue generation subtracted from `sp` once to allocate space, then used stores to `[sp, offset]` to save clobbers. Unfortunately the offset only has 7-bit range, so if there are enough clobbered registers (and there can be -- aarch64 has 384 bytes of registers; at least one unit test hits this) the stores/loads will be out-of-range. I really don't want to synthesize large-offset sequences here; better to go back to the simpler pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like a "push". It's likely not much worse microarchitecturally (dependence chain on SP, but oh well) and it actually saves an instruction if there's no other frame to allocate. As a further advantage, it's much simpler to understand; simpler is usually better. This PR adds the new backend on Windows to CI as well.	2021-03-11 20:03:52 -08:00
Peter Huene	71093ff91b	Merge pull request #2722 from peterhuene/update-release-notes Update RELEASES.md to mention the change to `Engine::new`.	2021-03-11 16:40:48 -08:00
Peter Huene	6925314738	Update RELEASES.md to mention the change to `Engine::new`.	2021-03-11 14:14:32 -08:00
Alex Crichton	fb0dc1045f	Update release notes for next version (#2721 ) Add some notes for major features which have landed	2021-03-11 16:06:41 -06:00

... 2 3 4 5 6 ...

8232 Commits