wasmtime

Author	SHA1	Message	Date
Alex Crichton	c89dc55108	Add a two-week delay to Wasmtime's release process (#3955 ) * Bump to 0.36.0 * Add a two-week delay to Wasmtime's release process This commit is a proposal to update Wasmtime's release process with a two-week delay from branching a release until it's actually officially released. We've had two issues lately that came up which led to this proposal: * In #3915 it was realized that changes just before the 0.35.0 release weren't enough for an embedding use case, but the PR didn't meet the expectations for a full patch release. * At Fastly we were about to start rolling out a new version of Wasmtime when over the weekend the fuzz bug #3951 was found. This led to the desire internally to have a "must have been fuzzed for this long" period of time for Wasmtime changes which we felt were better reflected in the release process itself rather than something about Fastly's own integration with Wasmtime. This commit updates the automation for releases to unconditionally create a `release-X.Y.Z` branch on the 5th of every month. The actual release from this branch is then performed on the 20th of every month, roughly two weeks later. This should provide a period of time to ensure that all changes in a release are fuzzed for at least two weeks and avoid any further surprises. This should also help with any last-minute changes made just before a release if they need tweaking since backporting to a not-yet-released branch is much easier. Overall there are some new properties about Wasmtime with this proposal as well: * The `main` branch will always have a section in `RELEASES.md` which is listed as "Unreleased" for us to fill out. * The `main` branch will always be a version ahead of the latest release. For example it will be bump pre-emptively as part of the release process on the 5th where if `release-2.0.0` was created then the `main` branch will have 3.0.0 Wasmtime. * Dates for major versions are automatically updated in the `RELEASES.md` notes. The associated documentation for our release process is updated and the various scripts should all be updated now as well with this commit. * Add notes on a security patch * Clarify security fixes shouldn't be previewed early on CI	2022-04-01 13:11:10 -05:00
Chris Fallin	666c2554ea	Merge pull request from GHSA-gwc9-348x-qwv2 * Run the GC smoketest with epoch support enabled as well. * Handle safepoints in cold blocks properly. Currently, the way that we find safepoint slots for a given instruction relies on the instruction index order in the safepoint list matching the order of instruction emission. Previous to the introduction of cold-block support, this was trivially satisfied by sorting the safepoint list: we emit instructions 0, 1, 2, 3, 4, ..., and so if we have safepoints at instructions 1 and 4, we will encounter them in that order. However, cold blocks are supported by swizzling the emission order at the last moment (to avoid having to renumber instructions partway through the compilation pipeline), so we actually emit instructions out of index order when cold blocks are present. Reference-type support in Wasm in particular uses cold blocks for slowpaths, and has live refs and safepoints in these slowpaths, so we can reliably "skip" a safepoint (not emit any metadata for it) in the presence of reftype usage. This PR fixes the emission code by building a map from instruction index to safepoint index first, then doing lookups through this map, rather than following along in-order as it emits instructions.	2022-03-31 14:26:01 -07:00
Andrew Brown	bd6fe11ca9	cranelift: remove `load_complex` and `store_complex` (#3976 ) This change removes all variants of `load_complex` and `store_complex` from Cranelift; this is a breaking change to the instructions exposed by CLIF. The complete list of instructions removed is: `load_complex`, `store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`, `sload8_complex`, `uload16_complex`, `sload16_complex`, `istore16_complex`, `uload32_complex`, `sload32_complex`, `istore32_complex`, `uload8x8_complex`, `sload8x8_complex`, `sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`, `sload32x2_complex`. The rationale for this removal is that the Cranelift backend now has the ability to pattern-match multiple upstream additions in order to calculate the address to access. Previously, this was not possible so the `*_complex` instructions were needed. Over time, these instructions have fallen out of use in this repository, making the additional overhead of maintaining them a chore.	2022-03-31 10:05:10 -07:00
Andrew Brown	e8dd13cf87	x64: port the remainder of `select` to ISLE (#3973 ) Previous changes had ported the difficult "`select` based on an `fcmp`" patterns to ISLE; this completes porting of `select` by moving over the final two kinds of patterns: - `select` based on an `icmp` - `select` based on a value	2022-03-30 13:32:26 -07:00
Andrew Brown	5d8dd648d7	x64: port `fcmp` to ISLE (#3967 ) * x64: port scalar `fcmp` to ISLE Implement the CLIF lowering for the `fcmp` to ISLE. This adds a new type-matcher, `ty_scalar_float`, for detecting uses of `F32` and `F64`. * isle: rename `vec128` to `ty_vec12` This refactoring changes the name of the `vec128` matcher function to follow the `ty_` convention of the other type matchers. It also makes the helper an inline function call. x64: port vector `fcmp` to ISLE	2022-03-29 15:41:49 -07:00
Damian Heaton	6c8c94723a	Scalar values in `vectorizelanes` & `extractlanes` (#3922 ) - `extractlanes` will now function on a scalar value, returning the value as a single-element array. - `vectorizelanes` will accept a single-element array, returning the contained value. Existing `if !x.is_vector()` code-patterns have been simplified as a result. Copyright (c) 2022 Arm Limited	2022-03-28 09:32:59 -07:00
Alex Crichton	76b82910c9	Remove the module linking implementation in Wasmtime (#3958 ) * Remove the module linking implementation in Wasmtime This commit removes the experimental implementation of the module linking WebAssembly proposal from Wasmtime. The module linking is no longer intended for core WebAssembly but is instead incorporated into the component model now at this point. This means that very large parts of Wasmtime's implementation of module linking are no longer applicable and would change greatly with an implementation of the component model. The main purpose of this is to remove Wasmtime's reliance on the support for module-linking in `wasmparser` and tooling crates. With this reliance removed we can move over to the `component-model` branch of `wasmparser` and use the updated support for the component model. Additionally given the trajectory of the component model proposal the embedding API of Wasmtime will not look like what it looks like today for WebAssembly. For example the core wasm `Instance` will not change and instead a `Component` is likely to be added instead. Some more rationale for this is in #3941, but the basic idea is that I feel that it's not going to be viable to develop support for the component model on a non-`main` branch of Wasmtime. Additionaly I don't think it's viable, for the same reasons as `wasm-tools`, to support the old module linking proposal and the new component model at the same time. This commit takes a moment to not only delete the existing module linking implementation but some abstractions are also simplified. For example module serialization is a bit simpler that there's only one module. Additionally instantiation is much simpler since the only initializer we have to deal with are imports and nothing else. Closes #3941 * Fix doc link * Update comments	2022-03-23 14:57:34 -05:00
Andrew Brown	4d5bd5f90e	x64: fix register allocation panic due to load-coalesced value (#3954 ) Fuzz testing identified a lowering case for CLIF's `icmp` in which the double use of a loaded operand resulted in a register allocation error. This change manually adds `put_in_xmm` to avoid load-coalescing these values and includes a CLIF filetest to trigger this issue. Closes #3951. I opened #3953 to discuss a way in which this kind of mistake (i.e., forgetting to add `put_in_*` in certain situations) could be avoided.	2022-03-21 18:46:27 -07:00
Andrew Brown	3bfbb3226e	x64: prefix all machine instructions with `x64_` (#3947 ) This change is refactoring only--it should have no logic changes. As discussed previously, prefixing all machine code instructions with `x64_` will make it easier to identify what parts of the ISLE code correspond to single instructions and what parts rely on helpers that may emit more than one instruction.	2022-03-18 17:53:15 -07:00
Andrew Brown	5fa104205d	x64: improve generation of i128 `icmp` (#3946 ) Previously, we used the flags of `AND` for `SETcc`. This change uses `TEST` instead, which discards the AND result but sets the flags needed for `SETcc`. This reduces register pressure slightly for this sequence.	2022-03-18 16:36:31 -07:00
Andrew Brown	e92cbfb283	x64: port `icmp` to ISLE (#3886 ) * x64: port GPR-held `icmp` to ISLE * x64: port equality `icmp` for i128 type * x64: port `icmp` for vector types * x64: rename from_intcc to intcc_to_cc	2022-03-18 11:22:09 -07:00
Chris Fallin	58062b5efe	x64 backend: fix fpcmp to avoid load-op merging. (#3934 ) The `fpcmp` helper in the x64 backend uses `put_in_xmm_mem` for one of its operands, which allows the compiler to merge a load with the compare instruction (`ucomiss` or `ucomisd`). Unfortunately, as we saw in #2576 for the integer-compare case, this does not work with our lowering algorithm because compares can be lowered more than once (unlike all other instructions) to reproduce the flags where needed. Merging a load into an op that executes more than once is invalid in general (the two loads may observe different values, which violates the original program semantics because there was only one load originally). This does not result in a miscompilation, but instead will cause a panic at regalloc time because the register that should have been defined by the separate load is never written (the load is never emitted separately). I think this (very subtle, easy to miss) condition was unfortunately not ported over when we moved the logic in #3682. The existing fcmp-of-load test in `cmp-mem-bug` (from #2576) does not seem to trigger it, for a reason I haven't fully deduced. I just added the verbatim function body (happens to come from `clang.wasm`) that triggers the bug as a test. Discovered while bringing up regalloc2 support. It's pretty unlikely to hit by chance, which is why I think none of our fuzzing has hit it yet.	2022-03-16 09:48:20 -07:00
Alex Crichton	62a6a7ab6c	Use const-initialized thread locals (#3923 ) This was a relatively recent feature added to the Rust standard library which should help accelerate calls into WebAssembly slightly.	2022-03-14 12:29:58 -05:00
FreddieLiardet	13b9396931	Add vector compare to 0 optims (#3887 ) Signed-off-by: Freddie Liardet <frederick.liardet@arm.com>	2022-03-09 16:20:06 -08:00
Alex Crichton	4d404c90b4	Ensure functions are aligned properly on AArch64 (#3908 ) Previously (as in an hour ago) #3905 landed a new ability for fuzzing to arbitrarily insert padding between functions. Running some fuzzers locally though this instantly hit a lot of problems on AArch64 because the arbitrary padding isn't aligned to 4 bytes like all other functions are. To fix this issue appending functions now correctly aligns the output as appropriate for the platform. The alignment argument for appending was switched to `None` where `None` means "use the platform default" and otherwise and explicit alignment can be specified for inserting other data (like arbitrary padding or Windows unwind tables).	2022-03-09 15:45:30 -06:00
Chris Fallin	26ce9a3853	Fix uextend on x64 for non-i32-source cases. (#3906 ) In #3849, I moved uextend over to ISLE in the x64 backend. Unfortunately, the lowering patterns had a bug in the i32-to-i64 special case (when we know the generating instruction zeroes the upper 32 bits): it wasn't actually special casing for an i32 source! This meant that e.g. zero extends of the results of i8 adds did not work properly. This PR fixes the bug and updates the runtest for extends significantly to cover the narrow-value cases. No security impact to Wasm as Wasm does not use narrow integer types. Thanks @bjorn3 for reporting!	2022-03-09 11:10:59 -08:00
wasmtime-publish	9137b4a50e	Bump Wasmtime to 0.35.0 (#3885 ) [automatically-tag-and-release-this-commit] Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-03-07 15:18:34 -06:00
Alex Crichton	ad5ce38467	Remove `all-arch` from `cranelift-interpreter` (#3872 ) I frequently notice that the fuzz build of `cranelift-codegen` takes an extremely long time and recently realized that one issue is that when fuzzers are built we enable all of the backends in `cranelift-codegen` but AFAIK only the native backend is actually fuzzed. I traced the inclusion of `all-arch` back to #2323, specifically [this comment][1] and it looks like now that the old backend is removed this should be able to be removed as well. [1]: https://github.com/bytecodealliance/wasmtime/pull/2323#discussion_r515228552	2022-03-02 12:03:16 -06:00
Alexa VanHattum	ceab1e62fa	Fix broken ISLE language reference link (#3865 ) * Fix broken ISLE language reference link * Update README.md * 3rd time's a charm	2022-02-28 17:11:30 -08:00
Chris Fallin	cd173cfe8e	ISLE: port fmin, fmax, fmin_pseudo, fmax_pseudo on x64. (#3856 )	2022-02-28 14:40:26 -08:00
Chris Fallin	d9dfc44c32	ISLE: port more ops on x64 to lowering patterns. (#3855 )	2022-02-28 13:28:42 -08:00
Chris Fallin	90a081a731	ISLE: port extend/reduce opcodes on x64. (#3849 )	2022-02-28 11:49:28 -08:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Ulrich Weigand	b064e60087	ISLE: Re-implement ValueSlice (#3784 ) The current definition of `ValueSlice` is not usable, since any call to a constructor returning a `ValueSlice` will extend the mutable borrow on the context taken by the constructor call, with the result that it cannot be passed to any other constructor ever. Re-implement `ValueSlice` as a pair of a `ValueList` identifer plus an offset into the list. This type can simply be copied without requiring a borrow on the context.	2022-02-24 15:24:40 -08:00
Ulrich Weigand	07d615d3f7	ISLE: Lowering of multi-output instructions (#3783 ) This changes the output of the `lower` constructor from a `ValueRegs` to a new `InstOutput` type, which is a vector of `ValueRegs`. Code in `lower_common` is updated to use this new type to handle instructions with multiple outputs. All back-ends are updated to use the new type.	2022-02-24 14:03:06 -08:00
Chris Fallin	e8881b2cc0	ISLE lowering rules: make use of implicit conversions. (#3847 ) This PR makes use of the new implicit-conversion feature of the ISLE DSL that was introduced in #3807 in order to make the lowering rules significantly simpler and more concise. The basic idea is to eliminate the repetitive and mechanical use of terms that convert from one type to another when there is only one real way to do the conversion -- for example, to go from a `WritableReg` to a `Reg`, the only sensible way is to use `writable_reg_to_reg`. This PR generally takes any term of the form "A_to_B" and makes it an automatic conversion, as well as some others that are similar in spirit. The notable exception to the pure-value-convsion category is the `put_in_reg` family of operations, which actually do have side-effects. However, as noted in the doc additions in #3807, this is fine as long as the side-effects are idempotent. And on balance, making `put_in_reg` automatic is a significant clarity win -- together with other operand converters, it enables rules like: ``` ;; Add two registers. (rule (lower (has_type (fits_in_64 ty) (iadd x y))) (add ty x y)) ``` There may be other converters that we could define to make the rules even simpler; we can make such improvements as we think of them, but this should be a good start!	2022-02-23 16:14:38 -08:00
Chris Fallin	56236fb58e	ISLE: fix compile fuzz target, and fix a simple error-reporting bug. (#3845 ) It seems our `compile` fuzz target for ISLE has not been regularly tested, as it was never updated for the `isle` -> `cranelift_isle` crate renaming. This PR fixes it to compile again. This also includes a simple fix in the typechecking: when verifying that a term decl is valid, we might insert a term ID into the name->ID map before fully checking that all of the types exist, and then skipping (for error recovery purposes) the actual push onto the term-signature vector if one of the types does have an error. This phantom TID can later cause a panic. The fix is to avoid adding to the map until we have fully verified the term decl.	2022-02-23 14:34:32 -08:00
Chris Fallin	9dbb8c25c5	Implicit type conversions in ISLE (#3807 ) Add support for implicit type conversions to ISLE. This feature allows the DSL user to register to the compiler that a particular term (used as a constructor or extractor) converts from one type to another. The compiler will then automatically insert this term whenever a type mismatch involving that specific pair of types occurs. This significantly cleans up many uses of the ISLE DSL. For example, when defining the compiler backends, we often have newtypes like `Gpr` around `Reg` (signifying a particular type of register); we can define a conversion from Gpr to Reg automatically. Conversions can also have side-effects, as long as these side-effects are idempotent. For example, `put_value_in_reg` in a compiler backend has the effect of marking the value as used, causing codegen to produce it, and assigns a register to the value; but multiple invocations of this will return the same register for the same value. Thus it is safe to use it as an implicit conversion that may be invoked multiple times. This is documented in the ISLE-Cranelift integration document. This PR also adds some testing infrastructure to the ISLE compiler, checking that "pass" tests pass through the DSL compiler, "fail" tests do not, and "link" tests are able to generate code and link that code with corresponding Rust code.	2022-02-23 13:15:27 -08:00
Nick Fitzgerald	bad9a35418	`wasm-mutate` fuzz targets (#3836 ) * fuzzing: Add a custom mutator based on `wasm-mutate` * fuzz: Add a version of the `compile` fuzz target that uses `wasm-mutate` * Update `wasmparser` dependencies	2022-02-23 12:14:11 -08:00
Sam Parker	5b7df72bce	[AArch64] Merge 32- and 64-bit BitOps (#3840 ) Copyright (c) 2022, Arm Limited.	2022-02-23 11:36:23 -08:00
Sam Parker	d307a4ab9a	[AArch64] Improve AtomicRMWLoop (#3839 ) Add more tests, use accurate disassembly, respect data sizes and simplify the Xchg implementation. Copyright (c) 2022, Arm Limited	2022-02-23 10:47:59 -08:00
bjorn3	141af7523a	Allow cloning DataDescriptor (#3377 )	2022-02-23 10:19:54 -08:00
Andrew Brown	f87c61176a	x64: port select to ISLE (#3682 ) * x64: port `select` using an FP comparison to ISLE This change includes quite a few interlocking parts, required mainly by the current x64 conventions in ISLE: - it adds a way to emit a `cmove` with multiple OR-ing conditions; because x64 ISLE cannot currently safely emit a comparison followed by several jumps, this adds `MachInst::CmoveOr` and `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro instructions hide the multi-instruction sequence in `lower.isle` - to properly keep track of what instructions consume and produce flags, @cfallin added a way to pass around variants of `ConsumesFlags` and `ProducesFlags`--these changes affect all backends - then, to lower the `fcmp + select` CLIF, this change adds several `cmove*_from_values` helpers that perform all of the awkward conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one upside is that now these lowerings have much-improved documentation explaining why the various `FloatCC` and `CC` choices are made the the way they are. Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-02-23 10:03:16 -08:00
Alex Crichton	709f7e0c8a	Enable SSE 4.2 unconditionally (#3833 ) * Enable SSE 4.2 unconditionally Fuzzing over the weekend found that `i64x2` comparison operators require `pcmpgtq` which is an SSE 4.2 instruction. Along the lines of #3816 this commit unconditionally enables and requires SSE 4.2 for compilation and fuzzing. It will no longer be possible to create a compiler for x86_64 with simd enabled if SSE 4.2 is disabled. * Update comment	2022-02-22 13:23:51 -06:00
bjorn3	2ca01ae947	Add a way to define a symbol lookup fn for the JIT (#2745 ) * Couple of cranelift-jit cleanups * Add a way to define a symbol lookup fn for the JIT	2022-02-19 08:33:39 -08:00
Sam Parker	e572198f85	[AArch64] Merge 32- and 64-bit ALUOps (#3802 ) Combine the two opcodes into one and pass and add an OperandSize field to these instructions, as well as an ISLE helper to perform the conversion from Type. This saves us from having having to write ISLE helpers to select the correct opcode, based on type, and reduces the amount of code needed for emission. Copyright (c) 2022, Arm Limited.	2022-02-17 10:03:54 -08:00
Chris Fallin	1c014d129a	Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816 ) Addresses #3809: when we are asked to create a Cranelift backend with shared flags that indicate support for SIMD, we should check that the ISA level needed for our SIMD lowerings is present.	2022-02-16 17:29:30 -08:00
Chris Fallin	ca0e8d0a1d	Remove incomplete/unmaintained ARM32 backend (for now). (#3799 ) In #3721, we have been discussing what to do about the ARM32 backend in Cranelift. Currently, this backend supports only 32-bit types, which is insufficient for full Wasm-MVP; it's missing other critical bits, like floating-point support; and it has only ever been exercised, AFAIK, via the filetests for the individual CLIF instructions that are implemented. We were very very thankful for the original contribution of this backend, even in its partial state, and we had hoped at the time that we could eventually mature it in-tree until it supported e.g. Wasm and other use-cases. But that hasn't yet happened -- to the blame of no-one, to be clear, we just haven't had a contributor with sufficient time. Unfortunately, the existence of the backend and lack of active maintainer now potentially pose a bit of a burden as we hope to make continuing changes to the backend framework. For example, the ISLE migration, and the use of regalloc2 that it will allow, would need all of the existing lowering patterns in the hand-written ARM32 backend to be rewritten as ISLE rules. Given that we don't currently have the resources to do this, we think it's probably best if we, sadly, for now remove this partial backend. This is not in any way a statement of what we might accept in the future, though. If, in the future, an ARM32 backend updated to our latest codebase with an active maintainer were to appear, we'd be happy to merge it (and likewise for any other architecture!). But for now, this is probably the best path. Thanks again to the original contributor @jmkrauz and we hope that this work can eventually be brought back and reused if someone has the time to do so!	2022-02-14 15:03:52 -08:00
Nick Fitzgerald	dc86e7a6dc	cranelift: Use GPR newtypes extensively in x64 lowering (#3798 ) We already defined the `Gpr` newtype and used it in a few places, and we already defined the `Xmm` newtype and used it extensively. This finishes the transition to using the newtypes extensively in lowering by making use of `Gpr` in more places. Fixes #3685	2022-02-14 12:54:41 -08:00
Mrmaxmeier	84b9c7bb8a	cranelift/x64: lower min and max for <= `i64` (#3748 ) * cranelift/x64: lower min and max for <= `i64` * cranelift: add runtests for integer min/max	2022-02-14 10:21:19 -08:00
wackbyte	05ace6c0e2	Fix a typo in `cranelift-frontend`'s docs (#3796 ) Specifically that of `Variable`.	2022-02-13 11:08:19 -08:00
Chris Fallin	39a52ceb4f	Implement lazy funcref table and anyfunc initialization. (#3733 ) During instance initialization, we build two sorts of arrays eagerly: - We create an "anyfunc" (a `VMCallerCheckedAnyfunc`) for every function in an instance. - We initialize every element of a funcref table with an initializer to a pointer to one of these anyfuncs. Most instances will not touch (via call_indirect or table.get) all funcref table elements. And most anyfuncs will never be referenced, because most functions are never placed in tables or used with `ref.func`. Thus, both of these initialization tasks are quite wasteful. Profiling shows that a significant fraction of the remaining instance-initialization time after our other recent optimizations is going into these two tasks. This PR implements two basic ideas: - The anyfunc array can be lazily initialized as long as we retain the information needed to do so. For now, in this PR, we just recreate the anyfunc whenever a pointer is taken to it, because doing so is fast enough; in the future we could keep some state to know whether the anyfunc has been written yet and skip this work if redundant. This technique allows us to leave the anyfunc array as uninitialized memory, which can be a significant savings. Filling it with initialized anyfuncs is very expensive, but even zeroing it is expensive: e.g. in a large module, it can be >500KB. - A funcref table can be lazily initialized as long as we retain a link to its corresponding instance and function index for each element. A zero in a table element means "uninitialized", and a slowpath does the initialization. Funcref tables are a little tricky because funcrefs can be null. We need to distinguish "element was initially non-null, but user stored explicit null later" from "element never touched" (ie the lazy init should not blow away an explicitly stored null). We solve this by stealing the LSB from every funcref (anyfunc pointer): when the LSB is set, the funcref is initialized and we don't hit the lazy-init slowpath. We insert the bit on storing to the table and mask it off after loading. We do have to set up a precomputed array of `FuncIndex`s for the table in order for this to work. We do this as part of the module compilation. This PR also refactors the way that the runtime crate gains access to information computed during module compilation. Performance effect measured with in-tree benches/instantiation.rs, using SpiderMonkey built for WASI, and with memfd enabled: ``` BEFORE: sequential/default/spidermonkey.wasm time: [68.569 us 68.696 us 68.856 us] sequential/pooling/spidermonkey.wasm time: [69.406 us 69.435 us 69.465 us] parallel/default/spidermonkey.wasm: with 1 background thread time: [69.444 us 69.470 us 69.497 us] parallel/default/spidermonkey.wasm: with 16 background threads time: [183.72 us 184.31 us 184.89 us] parallel/pooling/spidermonkey.wasm: with 1 background thread time: [69.018 us 69.070 us 69.136 us] parallel/pooling/spidermonkey.wasm: with 16 background threads time: [326.81 us 337.32 us 347.01 us] WITH THIS PR: sequential/default/spidermonkey.wasm time: [6.7821 us 6.8096 us 6.8397 us] change: [-90.245% -90.193% -90.142%] (p = 0.00 < 0.05) Performance has improved. sequential/pooling/spidermonkey.wasm time: [3.0410 us 3.0558 us 3.0724 us] change: [-95.566% -95.552% -95.537%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 1 background thread time: [7.2643 us 7.2689 us 7.2735 us] change: [-89.541% -89.533% -89.525%] (p = 0.00 < 0.05) Performance has improved. parallel/default/spidermonkey.wasm: with 16 background threads time: [147.36 us 148.99 us 150.74 us] change: [-18.997% -18.081% -17.285%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 1 background thread time: [3.1009 us 3.1021 us 3.1033 us] change: [-95.517% -95.511% -95.506%] (p = 0.00 < 0.05) Performance has improved. parallel/pooling/spidermonkey.wasm: with 16 background threads time: [49.449 us 50.475 us 51.540 us] change: [-85.423% -84.964% -84.465%] (p = 0.00 < 0.05) Performance has improved. ``` So an improvement of something like 80-95% for a very large module (7420 functions in its one funcref table, 31928 functions total).	2022-02-09 13:56:53 -08:00
Ulrich Weigand	10198553c7	ISLE: Common accessors for some insn data fields (#3781 ) Add accessors to prelude.isle to access data fields of `func_addr` and `symbol_value` instructions. These are based on similar versions I had added to the s390x back-end, but are a bit more straightforward to use. - func_ref_data: Extract SigRef, ExternalName, and RelocDistance fields given a FuncRef. - symbol_value_data: Extract ExternalName, RelocDistance, and offset fields given a GlobalValue representing a Symbol. - reloc_distance_near: Test for RelocDistance::Near. The s390x back-end is changed to use these common versions. Note that this exposed a bug in common isle code: This extractor: (extractor (load_sym inst) (and inst (load _ (def_inst (symbol_value (symbol_value_data _ (reloc_distance_near) offset))) (i64_from_offset (memarg_symbol_offset_sum <offset _))))) would raise an assertion in sema.rs due to a supposed cycle in extractor definitions. But there was no actual cycle, it was simply that the extractor tree refers twice to the `insn_data` extractor (once via the `load` and once via the `symbol_value` extractor). Fixed by checking for pre-existing definitions only along one path in the tree, not across the whole tree.	2022-02-08 17:57:27 -08:00
Ulrich Weigand	9c5c872b3b	s390x: Add support for all remaining atomic operations (#3746 ) This adds support for all atomic operations that were unimplemented so far in the s390x back end: - atomic_rmw operations xchg, nand, smin, smax, umin, umax - $I8 and $I16 versions of atomic_rmw and atomic_cas - little endian versions of atomic_rmw and atomic_cas All of these have to be implemented by a compare-and-swap loop; and for the $I8 and $I16 versions the actual atomic instruction needs to operate on the surrounding aligned 32-bit word. Since we cannot emit new control flow during ISLE instruction selection, these compare-and-swap loops are emitted as a single meta-instruction to be expanded at emit time. However, since there is a large number of different versions of the loop required to implement all the above operations, I've implemented a facility to allow specifying the loop bodies from within ISLE after all, by creating a vector of MInst structures that will be emitted as part of the meta-instruction. There are still restrictions, in particular instructions that are part of the loop body may not modify any virtual register. But even so, this approach looks preferable to doing everything in emit.rs. A few instructions needed in those compare-and-swap loop bodies were added as well, in particular the RxSBG family of instructions as well as the LOAD REVERSED in-register byte-swap instructions. This patch also adds filetest runtests to verify the semantics of all operations, in particular the subword and little-endian variants (those are currently only executed on s390x).	2022-02-08 13:48:44 -08:00
wasmtime-publish	39b88e4e9e	Release Wasmtime 0.34.0 (#3768 ) * Bump Wasmtime to 0.34.0 [automatically-tag-and-release-this-commit] * Add release notes for 0.34.0 * Update release date to today Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-02-07 19:16:26 -06:00
Nick Fitzgerald	bb7ae46ecd	ISLE: emit traps as safepoints on x64	2022-02-07 10:01:23 -08:00
Chris Fallin	2cf3069b6b	Extend cold-blocks test to test debuginfo as well.	2022-02-04 23:15:16 -08:00
Chris Fallin	d9d6469422	Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout. The debuginfo analyses are written with the assumption that the order of instructions in the VCode is the order of instructions in the final machine ocde. This was previously a strong invariant, until we introduced support for cold blocks. Cold blocks are implemented by reordering during emission, because the VCode ordering has other requirements related to lowering (respecting def-use dependencies in the reverse pass), so it is much simpler to reorder instructions at the last moment. Unfortunately, this causes the breakage we now see. This commit fixes the issue by skipping all cold instructions when emitting value-label ranges (which are translated into debuginfo). This means that variables defined in cold blocks will not have DWARF metadata. But cold blocks are usually compiler-inserted slowpaths, not user code, so this is probably OK. Debuginfo is always best-effort, so in any case this does not violate any correctness constraints.	2022-02-04 23:15:04 -08:00
Nick Fitzgerald	2c77cf866a	ISLE: Rename `{gpr,xmm}_mem_new` constructors to `reg_mem_to_{gpr,xmm}_mem`	2022-02-03 14:08:08 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00

1 2 3 4 5 ...

3627 Commits