wasmtime

Author	SHA1	Message	Date
Nick Fitzgerald	3c2f695dd3	Merge pull request #3724 from uweigand/s390x-isle-branchtrap s390x: Migrate branches and traps to ISLE	2022-01-31 13:42:49 -08:00
Alex Crichton	a25f7bdba5	Don't copy `VMBuiltinFunctionsArray` into each `VMContext` (#3741 ) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with #3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments	2022-01-28 16:24:34 -06:00
Alex Crichton	2f494240f8	Lazily allocate the bump-alloc chunk in the externref table (#3739 ) This commit updates the allocation of a `VMExternRefActivationsTable` structure to perform zero malloc memory allocations. Previously it would allocate a page-size of `chunk` plus some space in hash sets for future insertions. The main trick here implemented is that after the first gc during the slow path the fast chunk allocation is allocated and configured. The motivation for this PR is that given our recent work to further refine and optimize the instantiation process this allocation started to show up in a nontrivial fashion. Most modules today never touch this table anyway as almost none of them use reference types, so the time spent allocation and deallocating the table per-store was largely wasted time. Concretely on a microbenchmark this PR speeds up instantiation of a module with one function by 30%, decreasing the instantiation cost from 1.8us to 1.2us. Overall a pretty minor win but when the instantiation times we're measuring start being in the single-digit microseconds this win ends up getting magnified!	2022-01-28 16:10:05 -06:00
Nick Fitzgerald	34537a39cd	Merge pull request #3734 from fitzgen/externref-debug-assertion-fix Fix a debug assertion in `externref` garbage collections	2022-01-28 11:03:03 -08:00
Nick Fitzgerald	19f8d94959	Expand on activations table invariants comment in `libcalls.rs`	2022-01-28 09:47:05 -08:00
Nick Fitzgerald	f292ff55cf	Do another GC after running Wasm in the `table_ops` test oracle	2022-01-27 16:53:10 -08:00
Nick Fitzgerald	cc8d7778e2	Make the `table_ops` test case generator use globals as well This will make it generate `table.set`s that come from `global.get`s and `global.get`s that come from `table.set`s. Ultimately, it should give us much more fuzzing coverage of `externref` globals, their barriers, and passing `externref`s into and out of Wasm to get or set globals.	2022-01-27 16:53:10 -08:00
Andrew Brown	90bfa123e0	docs: render the `fcmp` tables as code (#3735 ) Looking at [the `fcmp` documentation](https://docs.rs/cranelift-codegen/0.80.0/cranelift_codegen/ir/trait.InstBuilder.html#method.fcmp)--generated from Cranelift's instruction definitions, the charts explaining the logic for the various conditions is unreadable. Since rendering those charts as plain text is problematic, this change wraps them as code sections for a consistent layout.	2022-01-27 16:36:33 -08:00
Nick Fitzgerald	cbc6f6071f	Fix a debug assertion in `externref` garbage collections When we GC, we assert the invariant that all `externref`s we find on the stack have a corresponding entry in the `VMExternRefActivationsTable`. However, we also might be in code that is in the process of fixing up this invariant and adding an entry to the table, but the table's bump chunk is full, and so we do a GC and then add the entry into the table. This will ultimately maintain our desired invariant, but there is a moment in time when we are doing the GC where the invariant is relaxed which is okay because the reference will be in the table before we return to Wasm or do anything else. This isn't a possible UAF, in other words. To make it so that the assertion won't trip, we explicitly insert the reference into the table before we GC, so that the invariant is not relaxed across a possibly-GCing operation (even though it would be safe in this particular case).	2022-01-27 14:46:01 -08:00
Alex Crichton	7928a3ffb4	Lazily load types into `Func` (#3727 ) * Lazily load types into `Func` This commit changes the construction of a `Func` to lazily load the type information if required instead of always loading the type information at `Func`-construction time. The main purpose of this change is to accelerate instantiation of instances which have many imports. Currently in the fast way of doing this the instantiation loop looks like: let mut store = Store::new(&engine, ...); let instance = instance_pre.instantiate(&mut store); In this situation the `instance_pre` will typically load host-defined functions (defined via `Linker` APIs) into the `Store` as individual `Func` items and then perform the instantiation process. The operation of loading a `HostFunc` into a `Store` however currently involves two expensive operations: * First a read-only lock is taken on the `RwLock` around engine signatures. * Next a clone of the wasm type is made to pull it out of the engine signature registry. Neither of these is actually necessary for imported functions. The `FuncType` for imported functions is never used since all comparisons happen with the intern'd indices instead. The only time a `FuncType` is used typically is for exported functions when using `Func::typed` or similar APIs which need type information. This commit makes this path faster by storing `Option<FuncType>` instead of `FuncType` within a `Func`. This means that it starts out as `None` and is only filled in on-demand as necessary. This means that when instantiating a module with many imports no clones/locks are done. On a simple microbenchmark where a module with 100 imports is instantiated this PR improves instantiation time by ~35%. Due to the rwlock used here and the general inefficiency of pthreads rwlocks the effect is even more profound when many threads are performing the same instantiation process. On x86_64 with 8 threads performing instantiation this PR improves instantiation time by 80% and on arm64 it improves by 97% (wow read-contended glibc rwlocks on arm64 are slow). Note that much of the improvement here is also from memory allocatoins/deallocations no longer being performed because dropping functions within a store no longer requires deallocating the `FuncType` if it's not present. A downside of this PR is that `Func::ty` is now unconditionally taking an rwlock if the type hasn't already been filled in. (it uses the engine). If this is an issue in the future though we can investigate at that time using somthing like a `Once` to lazily fill in even when mutable access to the store isn't available. * Review comments	2022-01-26 14:23:37 -06:00
Nick Fitzgerald	595e3227c5	Merge pull request #3725 from fitzgen/use-emit-helper-aarch64 cranelift: Use `emit` helper function on aarch64	2022-01-25 10:35:59 -08:00
Nick Fitzgerald	9c7de7692b	cranelift: Use `emit` helper function on aarch64 ...instead of pushing to the `emitted_insts` vec directly.	2022-01-25 09:59:39 -08:00
Ulrich Weigand	36369a6f35	s390x: Migrate branches and traps to ISLE In order to migrate branches to ISLE, we define a second entry point `lower_branch` which gets the list of branch targets as additional argument. This requires a small change to `lower_common`: the `isle_lower` callback argument is changed from a function pointer to a closure. This allows passing the extra argument via a closure. Traps make use of the recently added facility to emit safepoints from ISLE, but are otherwise straightforward.	2022-01-25 18:15:32 +01:00
Chris Fallin	cd6b73fc90	Merge pull request #3723 from uweigand/isle-safepoint ISLE: Allow emitting safepoint insns	2022-01-25 08:56:22 -08:00
Chris Fallin	ce63a113ab	Merge pull request #3717 from uweigand/s390x-branchtarget s390x: Refactor branch and jumptable emission	2022-01-25 08:55:31 -08:00
Ulrich Weigand	906f6a35cf	ISLE: Allow emitting safepoint insns Change the implementation of emitted_insts in IsleContext from a plain vector of instructions into a vector of tuples, where the second element is a boolean that indicates whether this instruction should be emitted as a safepoint. This allows targets to emit safepoint insns via ISLE.	2022-01-25 14:21:41 +01:00
Dan Gohman	5fc01bafc7	Fix `isatty` in WASI. (#3696 ) WASI doesn't have an `isatty` ioctl or syscall, so wasi-libc's `isatty` implementation uses the file descriptor type and rights to determine if the file descriptor is likely to be a tty. The real fix here will be to add an `isatty` call to WASI. But for now, have Wasmtime set the filetype and rights for file descriptors so that wasi-libc's `isatty` works as expected.	2022-01-24 11:45:16 -08:00
Andrew Brown	b1ad02e43a	docs: remove fuzzit badge (#3719 ) The badge no longer displays. I believe this is due to the acquisition by GitLab; other repositories seem to be reporting similar issues.	2022-01-24 10:46:34 -08:00
Nick Fitzgerald	20225e2f88	Merge pull request #3718 from uweigand/clif-insndata ISLE: Fix clif.isle InstructionData entries	2022-01-24 09:33:26 -08:00
Chris Fallin	fa29c6f029	Merge pull request #3720 from bytecodealliance/fitzgen-patch-1 Add cranelift meeting notes 2022-01-24	2022-01-24 09:32:13 -08:00
Nick Fitzgerald	5e28b4e3bc	Add cranelift meeting notes 2022-01-24	2022-01-24 09:28:52 -08:00
Tim Park	198d4d64b0	Make examples compile without warnings (#3716 )	2022-01-24 09:26:11 -06:00
Ulrich Weigand	071d3a68d0	ISLE: Fix clif.isle InstructionData entries Attempt to match a Jump instruction in ISLE will currently lead to the generated files not compiling. This is because the definition of the InstructionData enum in clif.isle does not match the actual type used in Rust code. Specifically, clif.isle erroneously omits the ValueList variable-length argument entry if the format does not use a typevar operand. This is the case for Jump and a few other formats. The problem is caused by a bug in the gen_isle routine in meta/src/gen_inst.rs.	2022-01-24 12:54:16 +01:00
Ulrich Weigand	cee00c6591	s390x: Refactor branch and jumptable emission The BranchTarget abstraction is no longer needed, since all branches are being emitted using a MachLabel target. Remove BranchTarget and simply use MachLabel everywhere a branch target is required. (This brings the s390x back-end in line with what x64 does as well.) In addition, simplify jumptable emission by moving all instructions that do not depend on the internal label (i.e. the conditional branch to the default label, as well as the scaling the index register) out of the combined JTSequence instruction. This refactoring will make moving branch generation to ISLE easier.	2022-01-24 12:22:53 +01:00
Dan Gohman	881c19473d	Use `ptr::cast` instead of `as` casts in several places. (#3507 ) `ptr::cast` has the advantage of being unable to silently cast `const T` to `mut T`. This turned up several places that were performing such casts, which this PR also fixes.	2022-01-21 13:03:17 -08:00
Chris Fallin	a9d209732d	Merge pull request #3706 from uweigand/isle-s390x s390x: Add ISLE support	2022-01-21 11:25:49 -08:00
Chris Fallin	d61e4e0559	Merge pull request #3709 from cfallin/cold-blocks-dead-code-bug Cranelift: Fix cold-blocks-related lowering bug.	2022-01-21 11:18:48 -08:00
Chris Fallin	ef1b2d2fa8	Cranelift: Fix cold-blocks-related lowering bug. If a block is marked cold but has side-effect-free code that is only used by side-effectful code in non-cold blocks, we will erroneously fail to emit it, causing a regalloc failure. This is due to the interaction of block ordering and lowering: we rely on block ordering to visit uses before defs (except for backedges) so that we can effectively do an inline liveness analysis and skip lowering operations that are not used anywhere. This "inline DCE" is needed because instruction lowering can pattern-match and merge one instruction into another, removing the need to generate the source instruction. Unfortunately, the way that I added cold-block support in #3698 was oblivious to this -- it just changed the block sort order. For efficiency reasons, we generate code in its final order directly, so it would not be tenable to generate it in e.g. RPO first and then reorder cold blocks to the bottom; we really do want to visit in the same order as the final code. This PR fixes the bug by moving the point at which cold blocks are sunk to emission-time instead. This is cheaper than either trying to visit blocks during lowering in RPO but add to VCode out-of-order, or trying to do some expensive analysis to recover proper liveness. It's not clear that the latter would be possible anyway -- the need to lower some instructions depends on other instructions' isel results/merging success, so we really do need to visit in RPO, and we can't simply lower all instructions as side-effecting roots (some can't be toplevel nodes). The one downside of this approach is that the VCode itself still has cold blocks inline; so in the text format (and hence compile-tests) it's not possible to see the sinking. This PR adds a test for cold-block sinking that actually verifies the machine code. (The test also includes an add-instruction in the cold path that would have been incorrectly skipped prior to this fix.) Fortunately this bug would not have been triggered by the one current use of cold blocks in #3699, because there the only operation in the cold block was an (always effectful) call instruction. The worst-case effect of the bug in other code would be a regalloc panic; no silent miscompilations could result.	2022-01-21 10:47:49 -08:00
Ulrich Weigand	a94e72b5b7	s390x: Add ISLE support This adds ISLE support for the s390x back-end and moves lowering of most instructions to ISLE. The only instructions still remaining are calls, returns, traps, and branches, most of which will need additional support in common code. Generated code is not intended to be (significantly) different than before; any additional optimizations now made easier to implement due to the ISLE layer can be added in follow-on patches. There were a few differences in some filetests, but those are all just simple register allocation changes (and all to the better!).	2022-01-21 19:30:56 +01:00
Nick Fitzgerald	78ff829733	Merge pull request #3708 from cfallin/cold-blocks-clif Add syntax for cold blocks to CLIF.	2022-01-21 10:19:46 -08:00
Alex Crichton	2b04466955	Document epoch interruption in async documentation (#3712 ) I, after-the-fact, now recall that we document fuel and other interruption schemes in the `Config::async_support` documentation so I've re-worded that section to mention epoch-based interruption as well.	2022-01-21 11:16:27 -06:00
Chris Fallin	51649d56b7	Add syntax for cold blocks to CLIF. This commit adds support for denoting cold blocks in the CLIF text format as follows: ```plain function %f() { block0(...): ... block1 cold: ... block2(...) cold: ... block3: ... ``` With this syntax, we are able to see the cold-block flag in CLIF, we can write tests using it, and it is preserved when round-tripping. Fixes #3701.	2022-01-20 16:49:52 -08:00
Chris Fallin	90e7cef56c	Merge pull request #3699 from cfallin/epoch-interruption Add epoch-based interruption for cooperative async timeslicing.	2022-01-20 14:45:30 -08:00
Chris Fallin	8a55b5c563	Add epoch-based interruption for cooperative async timeslicing. This PR introduces a new way of performing cooperative timeslicing that is intended to replace the "fuel" mechanism. The tradeoff is that this mechanism interrupts with less precision: not at deterministic points where fuel runs out, but rather when the Engine enters a new epoch. The generated code instrumentation is substantially faster, however, because it does not need to do as much work as when tracking fuel; it only loads the global "epoch counter" and does a compare-and-branch at backedges and function prologues. This change has been measured as ~twice as fast as fuel-based timeslicing for some workloads, especially control-flow-intensive workloads such as the SpiderMonkey JS interpreter on Wasm/WASI. The intended interface is that the embedder of the `Engine` performs an `engine.increment_epoch()` call periodically, e.g. once per millisecond. An async invocation of a Wasm guest on a `Store` can specify a number of epoch-ticks that are allowed before an async yield back to the executor's event loop. (The initial amount and automatic "refills" are configured on the `Store`, just as for fuel.) This call does only signal-safe work (it increments an `AtomicU64`) so could be invoked from a periodic signal, or from a thread that wakes up once per period.	2022-01-20 13:58:17 -08:00
Chris Fallin	2615ef967f	Merge pull request #3702 from uweigand/isle-prep-s390x s390x: Codegen fixes and preparation for ISLE migration	2022-01-20 12:02:08 -08:00
Nick Fitzgerald	0670d7beb5	Merge pull request #3703 from uweigand/isle-prep-common ISLE standard prelude: Additional types and helpers	2022-01-20 10:09:51 -08:00
Ulrich Weigand	be60a19623	ISLE standard prelude: Additional types and helpers In preparing to move the s390x back-end to ISLE, I noticed a few missing pieces in the common prelude code. This patch: - Defines the reference types $R32 / $R64. - Provides a trap_code_bad_conversion_to_integer helper. - Provides an avoid_div_traps helper. This requires passing the generic flags in addition to the ISA-specifc flags into the ISLE lowering context.	2022-01-20 17:23:31 +01:00
Ulrich Weigand	c08a013b53	s390x: Codegen fixes and preparation for ISLE migration In preparing the back-end to move to ISLE, I detected a number of codegen bugs in the existing code, which are fixed here: - Fix internal compiler error with uload16/icmp corner case. - Fix broken Cls lowering. - Correctly mask shift count for i8/i16 shifts. In addition, I made several changes to operand encodings in various MInst patterns. These should not have any functional effect, but will make the ISLE migration easier: - Encode floating-point constants as u32/u64 in MInst patterns. - Encode shift amounts as u8 and Reg in ShiftOp pattern. - Use MemArg in LoadMultiple64 and StoreMultiple64 patterns.	2022-01-20 16:59:18 +01:00
Chris Fallin	9321a9db88	Add some agenda items to next Cranelift and Wasmtime meetings. (#3700 ) - Cranelift: items to discuss 2022 roadmap, coordinate who's working on what in ISLE transition for all platforms, and discuss platform tiers and arm32 (wrt above) - Wasmtime: item to discuss new revelations re: memfd/CoW and epoch interruption schemes	2022-01-19 18:18:04 -06:00
Chris Fallin	ae476fde60	Merge pull request #3698 from cfallin/cold-blocks Cranelift: add support for cold blocks.	2022-01-19 12:58:33 -08:00
Chris Fallin	f489b83835	Cranelift: add support for cold blocks. This PR adds a flag to each block that can be set via the frontend/builder interface that indicates that the block will not be frequently executed. As such, the compiler backend should place the block "out of line" in the final machine code, so that the ordinary, more frequent execution path that excludes the block does not have to jump around it. This is useful for adding handlers for exceptional conditions (slow-paths, guard violations) in a way that minimizes performance cost. Fixes #2747.	2022-01-19 12:17:41 -08:00
Chris Fallin	4a331b8981	Merge pull request #3679 from FreddieLiardet/fp_const_fmov Improve code generation for floating-point constants	2022-01-19 09:59:34 -08:00
Benjamin Bouvier	2649d2352c	Support vtune profiling of trampolines too (#3687 ) * Provide helpers for demangling function names * Profile trampolines in vtune too * get rid of mapping * avoid code duplication with jitdump_linux * maintain previous default display name for wasm functions * no dash, grrr * Remove unused profiling error type	2022-01-19 09:49:23 -06:00
Mrmaxmeier	2afd6900f4	runtime: expose DefaultMemoryCreator (#3670 )	2022-01-18 09:17:33 -06:00
Freddie Liardet	b5531580e7	Improve code generation for floating-point constants Copyright (c) 2022, Arm Limited.	2022-01-18 10:39:05 +00:00
Chris Fallin	06a7bfdcbd	Merge pull request #3692 from akirilov-arm/abi_isa_flags Cranelift: Pass the ISA-specific compilation flags to the ABI implementations	2022-01-17 22:46:58 -08:00
Anton Kirilov	89919f4b1f	Pass the ISA-specific compilation flags to the ABI implementations Copyright (c) 2021, Arm Limited.	2022-01-14 14:18:01 +00:00
Nick Fitzgerald	df37074218	Merge pull request #3690 from fitzgen/a-bunch-more-isle cranelift: Port a bunch more lowerings to ISLE on x64	2022-01-13 18:08:31 -08:00
Nick Fitzgerald	a052285340	Fix typo: s/sentinals/sentinels/	2022-01-13 16:50:15 -08:00
Nick Fitzgerald	658c5d33c1	cranelift: Port `trap` and `resumable_trap` lowering to ISLE on x64	2022-01-13 15:57:17 -08:00

1 2 3 4 5 ...

9490 Commits