wasmtime

Author	SHA1	Message	Date
bjorn3	108f7917c8	Support plugging external profilers into the Cranelift timing infrastructure (#5749 ) * Remove no-std code for cranelift_codegen::timings no-std mode isn't supported by Cranelift anymore * Simplify define_passes macro * Add egraph opt timings * Replace the add_to_current api with PassTimes::add * Omit a couple of unused time measurements * Reduce divergence between run and run_passes a bit * Introduce a Profiler trait This allows plugging in external profilers into the Cranelift profiling framework. * Add Pass::description method * Remove duplicate usage of the compile pass timing * Rustfmt	2023-03-10 19:33:56 +00:00
Alex Crichton	0751cba6e2	Add a `--dynamic-memory-reserved-for-growth` CLI flag (#5980 ) Maps to the corresponding `wasmtime::Config` option. The motivation here is largely completeness and was something I was looking into with the failures in #5970	2023-03-10 17:58:02 +00:00
Alex Crichton	7650d857fa	Update the spec test suite submodule (#5970 ) * Update the spec test suite submodule Delete the local copies of the relaxed-simd test suite as well as they're now incorporated. Closes #5914 * Remove page guards in QEMU emulation Otherwise `(memory 0 0)` was being compiled as a static memory with huge guards which we're trying to avoid in QEMU.	2023-03-10 16:50:20 +00:00
Afonso Bordado	e64fb6ab39	riscv64: Fix underflow in call relocation handling (#5951 ) Under some test case layouts the call relocation panicking with an underflow. Use `wrapping_sub` to signal that this is expected. The fuzzer took a while to generate such a test case. And I can't introduce it as a regression test because when running via the regular clif-util run tests the layout is different and the test case passes! I think this is because in the fuzzer we only add one trampoline, while in clif-util we build trampolines for each funcion in the file. Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-03-10 11:43:27 +00:00
yuyang	4e875f33a7	Codegen fix `fcvt_from_sint.f32` with small types on riscv64. (#5964 ) * fix issue5952 * We should only extend i8 and i16 * remove extra space * move some code	2023-03-10 10:29:55 +00:00
Alex Crichton	0ec7b872fa	x64: Optimize store-of-extract-lane-0 (#5924 ) * x64: Optimize store-of-extract-lane-0 The `movss` and `movsd` instructions can be used to store the 0th lane of a `t32x4` or a `t64x2` vector into memory, enabling fusing a `store` and an `extractlane` instruction. * Fix merge conflict with `main`	2023-03-10 01:06:38 +00:00
Alex Crichton	83f21e784a	x64: Add more support for more AVX instructions (#5931 ) * x64: Add a smattering of lowerings for `shuffle` specializations (#5930) * x64: Add lowerings for `punpck{h,l}wd` Add some special cases for `shuffle` for more specialized x86 instructions. * x64: Add `shuffle` lowerings for `pshufd` This commit adds special-cased lowerings for the x64 `shuffle` instruction when the `pshufd` instruction alone is necessary. This is possible when the shuffle immediate permutes 32-bit values within one of the vector inputs of the `shuffle` instruction, but not both. * x64: Add shuffle lowerings for `punpck{h,l}{q,}dq` This adds specific permutations for some x86 instructions which specifically interleave high/low bytes for 32 and 64-bit values. This corresponds to the preexisting specific lowerings for interleaving 8 and 16-bit values. * x64: Add `shuffle` lowerings for `shufps` This commit adds targeted lowerings for the `shuffle` instruction that match the pattern that `shufps` supports. The `shufps` instruction selects two elements from the first vector and two elements from the second vector which means while it's not generally applicable it should still be more useful than the catch-all lowering of `shuffle`. * x64: Add shuffle support for `pshuf{l,h}w` This commit adds special lowering cases for these instructions which permute 16-bit values within a 128-bit value either within the upper or lower half of the 128-bit value. * x64: Specialize `shuffle` with an all-zeros immediate Instead of loading the all-zeros immediate from a rip-relative address at the end of the function instead generate a zero with a `pxor` instruction and then use `pshufb` to do the broadcast. * Review comments * x64: Add an AVX encoding for the `pshufd` instruction This will benefit from lack of need for alignment vs the `pshufd` instruction if working with a memory operand and additionally, as I've just learned, this reduces dependencies between instructions because the `v` instructions zero the upper bits as opposed to preserving them which could accidentally create false dependencies in the CPU between instructions. x64: Add more support for AVX loads/stores This commit adds VEX-encoded versions of instructions such as `mov{ss,sd,upd,ups,dqu}` for load and store operations. This also changes some signatures so the `load` helpers specifically take a `SyntheticAmode` argument which ended up doing a small refactoring of the `_regmove` variant used for `insertlane 0` into f64x2 vectors. x64: Enable using AVX instructions for zero regs This commit refactors the internal ISLE helpers for creating zero'd xmm registers to leverage the AVX support for all other instructions. This moves away from picking opcodes to instead picking instructions with a bit of reorganization. * x64: Remove `XmmConstOp` as an instruction All existing users can be replaced with usage of the `xmm_uninit_value` helper instruction so there's no longer any need for these otherwise constant operations. This additionally reduces manual usage of opcodes in favor of instruction helpers. * Review comments * Update test expectations	2023-03-09 23:57:42 +00:00
Alex Crichton	1c3a1bda6c	x64: Add a smattering of lowerings for `shuffle` specializations (#5930 ) * x64: Add lowerings for `punpck{h,l}wd` Add some special cases for `shuffle` for more specialized x86 instructions. * x64: Add `shuffle` lowerings for `pshufd` This commit adds special-cased lowerings for the x64 `shuffle` instruction when the `pshufd` instruction alone is necessary. This is possible when the shuffle immediate permutes 32-bit values within one of the vector inputs of the `shuffle` instruction, but not both. * x64: Add shuffle lowerings for `punpck{h,l}{q,}dq` This adds specific permutations for some x86 instructions which specifically interleave high/low bytes for 32 and 64-bit values. This corresponds to the preexisting specific lowerings for interleaving 8 and 16-bit values. * x64: Add `shuffle` lowerings for `shufps` This commit adds targeted lowerings for the `shuffle` instruction that match the pattern that `shufps` supports. The `shufps` instruction selects two elements from the first vector and two elements from the second vector which means while it's not generally applicable it should still be more useful than the catch-all lowering of `shuffle`. * x64: Add shuffle support for `pshuf{l,h}w` This commit adds special lowering cases for these instructions which permute 16-bit values within a 128-bit value either within the upper or lower half of the 128-bit value. * x64: Specialize `shuffle` with an all-zeros immediate Instead of loading the all-zeros immediate from a rip-relative address at the end of the function instead generate a zero with a `pxor` instruction and then use `pshufb` to do the broadcast. * Review comments	2023-03-09 22:58:19 +00:00
Nick Fitzgerald	8a2bf29444	wasmtime: Privately expose a module's address map and its function's bytes (#5973 ) This will allow us to build developer tools for Wasmtime and Cranelift like WAT and asm side-by-side viewers (a la Godbolt). These are not proper public APIs, so they are marked `doc(hidden)` and have comments saying they are only for use within this repo's workspace.	2023-03-09 20:04:51 +00:00
Chris Fallin	7f3500a172	Cranelift: x64, aarch64, s390x, riscv64: ensure addresses are I64s. (#5972 ) * Cranelift: x64, aarch64, s390x, riscv64: ensure addresses are I64s. @avanhatt has been looking at our address-mode lowering and found an example where when feeding an `I32`-typed address into a load or store, we can violate assumptions and get incorrect codegen. This should never be reachable in practice, because all producers on 64-bit architectures use 64-bit types for addresses. However, our IR is insufficiently constrained, and allows loads/stores to `I32` addresses as well. This is nonsensical on a 64-bit architecture. Initially I had thought we should tighten either the instruction definition's accepted types, or the CLIF verifier, to reject this. However both are target-independent, and we don't want to bake an assumption of 64-bit-ness into the compiler core. Instead this PR tightens specific backends' lowerings to rejecct loads/stores of `I32`-typed addresses. tl;dr: no security implications as all producers use I64-typed addresses (and must, for correct operation); but we currently accept I32-typed addresses too, and this breaks other assumptions. * Allow R64 as well as I64 types. * Add an explicit extractor to match 64-bit address types.	2023-03-09 19:08:16 +00:00
Jamey Sharp	f877141668	cranelift-meta: Don't let-bind operand definitions (#5969 ) We've adopted this pattern in Cranelift's instruction definitions where we let-bind some calls to `Operand::new` and then later use them in one or more calls to `Inst::new`. That pattern has two problems: - It puts the type of each operand somewhere potentially far removed from the instruction in which it's used. - We let-bind the same name for many different operands, compounding the first problem by making it harder to find _which_ definition is used. So instead this commit removes all let-bindings for operand definitions and constructs a new `Operand` every time. Constructing an `Operand` at every use means we duplicate some documentation strings, but not all that many of them as it turns out. I've left the let-bound type-sets alone, so those are currently still shared across many instructions. They have some of the same problems and should be reviewed as well.	2023-03-09 17:24:19 +00:00
Alex Crichton	9141fcf8cf	components: Fix bindgen! with renamed interfaces (#5968 ) This follows the same strategy pioneered by the `wit-bindgen` guest Rust bindgen which keeps track of the latest name of an interface for how to refer to an interface. Closes #5961	2023-03-08 23:15:58 +00:00
Alex Crichton	8a08fedc69	components: Assert sizes match `wit-parser` sizes (#5943 ) I was debugging [an issue] recently where it appears that the underlying cause was a discrepancy in the size/align of a WIT type between Wasmtime and `wit-parser`. This commit adds compile-time assertions that the size of a WIT type is the same with `wit-parser` as it is in Wasmtime since the two have different systems to calculate the size of a type. The hope is that this will head off any future issues if they crop up. [an issue]: https://github.com/bytecodealliance/wit-bindgen/issues/526	2023-03-08 21:42:03 +00:00
Alex Crichton	46386cca0e	Add release notes for 7.0.0 (#5963 )	2023-03-08 21:22:33 +00:00
Alex Crichton	63fb30e4b4	Merge pull request from GHSA-ff4p-7xrq-q5r8 * x64: Remove incorrect `amode_add` lowering rules This commit removes two incorrect rules as part of the x64 backend's computation of addressing modes. These two rules folded a zero-extended 32-bit computation into the address mode operand, but this isn't correct as the 32-bit computation should be truncated to 32-bits but when folded into the address mode computation it happens with 64-bit operands, meaning truncation doesn't happen. * Add release notes	2023-03-08 13:00:40 -06:00
Alex Crichton	5dc2bbccbb	Merge pull request from GHSA-xm67-587q-r2vw This commit fixes an off-by-one error in the subtraction of indices when shuffling a vector with itself. Lanes 16-and-above are mapped to select from the first vector since the first and second element are the same, but the subtraction was with 15 rather than 16 by accident.	2023-03-08 13:00:00 -06:00
Alex Crichton	e8331661cb	Add `Clone for component::InstancePre` (#5966 ) This is present on `wasmtime::InstancePre` and should be available for components as well. Closes #5965	2023-03-08 17:13:28 +00:00
Kevin Rizzo	013b35ff32	winch: Refactoring wasmtime compiler integration pieces to share more between Cranelift and Winch (#5944 ) * Enable the native target by default in winch Match cranelift-codegen's build script where if no architecture is explicitly enabled then the host architecture is implicitly enabled. * Refactor Cranelift's ISA builder to share more with Winch This commit refactors the `Builder` type to have a type parameter representing the finished ISA with Cranelift and Winch having their own typedefs for `Builder` to represent their own builders. The intention is to use this shared functionality to produce more shared code between the two codegen backends. * Moving compiler shared components to a separate crate * Restore native flag inference in compiler building This fixes an oversight from the previous commits to use `cranelift-native` to infer flags for the native host when using default settings with Wasmtime. * Move `Compiler::page_size_align` into wasmtime-environ The `cranelift-codegen` crate doesn't need this and winch wants the same implementation, so shuffle it around so everyone has access to it. * Fill out `Compiler::{flags, isa_flags}` for Winch These are easy enough to plumb through with some shared code for Wasmtime. * Plumb the `is_branch_protection_enabled` flag for Winch Just forwarding an isa-specific setting accessor. * Moving executable creation to shared compiler crate * Adding builder back in and removing from shared crate * Refactoring the shared pieces for the `CompilerBuilder` I decided to move a couple things around from Alex's initial changes. Instead of having the shared builder do everything, I went back to having each compiler have a distinct builder implementation. I refactored most of the flag setting logic into a single shared location, so we can still reduce the amount of code duplication. With them being separate, we don't need to maintain things like `LinkOpts` which Winch doesn't currently use. We also have an avenue to error when certain flags are sent to Winch if we don't support them. I'm hoping this will make things more maintainable as we build out Winch. I'm still unsure about keeping everything shared in a single crate (`cranelift_shared`). It's starting to feel like this crate is doing too much, which makes it difficult to name. There does seem to be a need for two distinct abstraction: creating the final executable and the handling of shared/ISA flags when building the compiler. I could make them into two separate crates, but there doesn't seem to be enough there yet to justify it. * Documentation updates, and renaming the finish method * Adding back in a default temporarily to pass tests, and removing some unused imports * Fixing winch tests with wrong method name * Removing unused imports from codegen shared crate * Apply documentation formatting updates Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com> * Adding back in cranelift_native flag inferring * Adding new shared crate to publish list * Adding write feature to pass cargo check --------- Co-authored-by: Alex Crichton <alex@alexcrichton.com> Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>	2023-03-08 15:07:13 +00:00
Trevor Elliott	7d482345fb	Restrict fcvt_to_uint and fcvt_to_sint to produce scalar integers (#5956 ) * Restrict fcvt_to_uint and fcvt_to_sint to produce scalar integers * Review feedback	2023-03-08 05:00:15 +00:00
Alex Crichton	07518dfd36	Remove the Cranelift `vselect` instruction (#5918 ) * Remove the Cranelift `vselect` instruction This instruction is documented as selecting lanes based on the "truthy" value of the condition lane, but the current status of the implementation of this instruction is: * x64 - uses the high bit for `f32x4` and `f64x2` and otherwise uses the high bit of each byte doing a byte-wise lane select rather than whatever the controlling type is. * AArch64 - this is the same as `bitselect` which is a bit-wise selection rather than a lane-wise selection. * s390x - this is the same as AArch64, a bit-wise selection rather than lane-wise. * interpreter - the interpreter implements the documented semantics of selecting based on "truthy" values. Coupled with the status of the implementation is the fact that this instruction is not used by WebAssembly SIMD today either. The only use of this instruction in Cranelift is the nan-canonicalization pass. By moving nan-canonicalization to `bitselect`, since that has the desired semantics, there's no longer any need for `vselect`. Given this situation this commit subsqeuently removes `vselect` and all usage of it throughout Cranelift. Closes #5917 * Review comments * Bring back vselect opts as bitselect opts * Clean up vselect usage in the interpreter * Move bitcast in nan canonicalization * Add a comment about float optimization	2023-03-08 00:42:05 +00:00
Bobby Holley	fc45ccc125	Update cargo-vet imports (#5959 ) * Switch cargo-vet import to Mozilla's aggregated audit set. * Import audits from embark studios. * Import audits from chromeos. * Import audits from ISRG.	2023-03-08 00:24:56 +00:00
Alex Crichton	afde4ea4e3	Fix the original block for block params in egraphs (#5960 ) This fixes an issue where block params were always listed as being members of the current block in egraphs, even when the block param was actually defined in a separate block. This then enables instructions which depend on these parameters to get hoisted up out of inner loops at least to the block that defined the argument. Closes #5957	2023-03-07 23:58:03 +00:00
Nick Fitzgerald	d9cdbef920	Fix typo in `Config::generate_address_map` docs (#5958 )	2023-03-07 23:22:33 +00:00
Alex Crichton	8d3a881b52	Run `cargo vet` on PRs that modify `supply-chain` (#5955 ) Noted in #5954 this'll report `cargo vet` status checks on PRs that modify the `supply-chain` directory in addition to `Cargo.lock` modifications that already happen.	2023-03-07 19:43:28 +00:00
Bobby Holley	8baf645eae	Bump cargo-vet to 0.4.0 (#5954 ) * Bump cargo-vet to 0.4.0. * Run `cargo vet prune`.	2023-03-07 19:16:55 +00:00
Trevor Elliott	709257011e	Restrict uextend and sextend to scalar integers (#5953 )	2023-03-07 19:10:50 +00:00
Trevor Elliott	b44f67b6d7	Stop supporting vectors with `bmask` (#5948 ) Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-03-07 17:34:29 +00:00
Alex Crichton	8bb183f16e	Implement the relaxed SIMD proposal (#5892 ) * Initial support for the Relaxed SIMD proposal This commit adds initial scaffolding and support for the Relaxed SIMD proposal for WebAssembly. Codegen support is supported on the x64 and AArch64 backends on this time. The purpose of this commit is to get all the boilerplate out of the way in terms of plumbing through a new feature, adding tests, etc. The tests are copied from the upstream repository at this time while the WebAssembly/testsuite repository hasn't been updated. A summary of changes made in this commit are: * Lowerings for all relaxed simd opcodes have been added, currently all exhibiting deterministic behavior. This means that few lowerings are optimal on the x86 backend, but on the AArch64 backend, for example, all lowerings should be optimal. * Support is added to codegen to, eventually, conditionally generate different code based on input codegen flags. This is intended to enable codegen to more efficient instructions on x86 by default, for example, while still allowing embedders to force architecture-independent semantics and behavior. One good example of this is the `f32x4.relaxed_fmadd` instruction which when deterministic forces the `fma` instruction, but otherwise if the backend doesn't have support for `fma` then intermediate operations are performed instead. * Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the x86 backend as they're now exercised by the deterministic lowerings of relaxed simd instructions. * Sample codegen tests for added for x86 and aarch64 for some relaxed simd instructions. * Wasmtime embedder support for the relaxed-simd proposal and forcing determinism have been added to `Config` and the CLI. * Support has been added to the `.wast` runtime execution for the `(either ...)` matcher used in the relaxed-simd proposal. Tests for relaxed-simd are run both with a default `Engine` as well as a "force deterministic" `Engine` to test both configurations. * All tests from the upstream repository were copied into Wasmtime. These tests should be deleted when WebAssembly/testsuite is updated. * x64: Add x86-specific lowerings for relaxed simd This commit builds on the prior commit and adds an array of `x86_` instructions to Cranelift which have semantics that match their corresponding x86 equivalents. Translation for relaxed simd is then additionally updated to conditionally generate different CLIF for relaxed simd instructions depending on whether the target is x86 or not. This means that for AArch64 no changes are made but for x86 most relaxed instructions now lower to some x86-equivalent with slightly different semantics than the "deterministic" lowering. Add libcall support for fma to Wasmtime This will be required to implement the `f32x4.relaxed_madd` instruction (and others) when an x86 host doesn't specify the `has_fma` feature. * Ignore relaxed-simd tests on s390x and riscv64 * Enable relaxed-simd tests on s390x * Update cranelift/codegen/meta/src/shared/instructions.rs Co-authored-by: Andrew Brown <andrew.brown@intel.com> * Add a FIXME from review * Add notes about deterministic semantics * Don't default `has_native_fma` to `true` * Review comments and rebase fixes --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com>	2023-03-07 15:52:41 +00:00
Qinxuan Chen	e2dcb19099	deps: update criterion dependency to v0.4.0 (#5935 )	2023-03-07 15:21:48 +00:00
yuyang	812b4b5229	Codegen fix atomic_cas with samll types on riscv64 (#5919 ) * fix issue5901 * add regression test file. * fix regression targets. * fix a comment. * enable atomic-cas-little for riscv64 * specify little endian some s390x can pass test. * fix register error	2023-03-07 13:32:28 +00:00
Pat Hickey	d3fdb5fc2c	cargo vet: add audits for criterion upgrade, and its transitive dependencies (#5946 )	2023-03-07 00:20:03 +00:00
Jan-Justin van Tonder	a2beacd288	cranelift-interpreter: Add trap on misaligned memory accesses (#5921 ) * Add checks to `InterpreterState::checked_{load,store}` to trap on misaligned memory accesses where `aligned` memory flag is set. * Alter `stack_{load,store}` instructions to now rely on `MemFlags::new()` instead of `MemFlags::trusted` since `InterpreterState::checked_{load,store}` is only able to deduce type alignment and not stack slot alignment.	2023-03-07 00:06:19 +00:00
Alex Crichton	3c9fc3ec8c	Update wasm-tools crates (#5945 ) This notably updates `wasmparser` for updates to the relaxed-simd proposal and an implementation of the function-references proposal. Additionally there are some minor bug fixes being picked up for WIT and the component model.	2023-03-06 23:47:34 +00:00
Trevor Elliott	58430b1dd7	Determine the instruction generation strategy based on format (#5942 ) Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-03-06 19:54:52 +00:00
Afonso Bordado	6d9ab2067b	cranelift: Make `sqmul_round_sat` vector only (#5941 )	2023-03-06 18:22:50 +00:00
Alex Crichton	3782ce7333	Update security release documentation slightly (#5940 ) This modernizes our process doc a bit with what I've been doing for the last security release and the upcoming one as well.	2023-03-06 17:19:53 +00:00
Alex Crichton	18ee645ebe	Allow hoisting `vconst` instructions out of loops (#5909 ) * Allow hoisting `vconst` instructions out of loops Staring at some SIMD code and what LLVM and v8 both generate it appears that a common technique for SIMD-loops is to hoist constants outside of loops since they're nontrivial to rematerialize unlike integer constants. This commit updates the `loop_hoist_level` calculation with egraphs to have a nonzero default for instructions that have no arguments (e.g. consts) which enables hoisting these instructions out of loops. Note, though, that for now I've listed the maximum as hoisting outside of one loop, but not all of them. While theoretically vconsts could move up to the top of the function I'd be worried about their impact on register pressure and having to save/restore around calls or similar, so hopefully if the hot part of a program is a single loop then hoisting out of one loop is a reasonable-enough heuristic for now. Locally on x64 with a benchmark that just encodes binary to hex this saw a 15% performance improvement taking hex encoding from ~6G/s to ~6.7G/s. * Test vconst is only hoisted one loop out	2023-03-06 15:29:43 +00:00
wasmtime-publish	354bc48015	Bump Wasmtime to 8.0.0 (#5932 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-03-06 15:08:16 +00:00
yuyang	20198d94c6	Codegen fix atomic_rmw_loop missing move result to `dst` register On riscv64. (#5898 ) * fix issue5884. * fix issue5884 * fix test failure * fix atomic rmw missing move result to dst register. * specify little endian some s390x can pass test.	2023-03-06 11:27:46 +00:00
Andrew Brown	ad584f428a	wasi-threads: run test suite (#5907 ) * wasi-threads: run test suite This change enables the running of the wasi-threads [test suite]. It relies on a Wasmtime CLI binary being available and runs all `.wasm` and `.wat` files present in the test suite directory. The results of each execution are compared against a JSON spec file with the same base name as the WebAssembly module. The spec file defines the expected exit code, e.g. This commit does not yet build any `.c` or `.s` files from the test suite. That could be done later, perhaps upstream; in the meantime, this work is still valuable as it lays the foundation for running other WASI tests from the in-progress [wasi-testsuite] which share the same JSON spec infrastructure. [test suite]: https://github.com/WebAssembly/wasi-threads/tree/main/test/testsuite [wasi-testsuite]: https://github.com/WebAssembly/wasi-testsuite * review: move testsuite to top-level tests * fix: remove now-unnecessary wasi-threads test * fix: update testsuite submodule name * fix: ignore tests on Windows prtest:full * fix: `cfg_attr` syntax prtest:full	2023-03-04 21:50:15 +00:00
Afonso Bordado	c24d4101ae	fuzzgen: Add Invalid inputs counter (#5928 )	2023-03-04 21:23:19 +00:00
Afonso Bordado	e96214968c	fuzzgen: Move `Arbitrary` structs into the fuzzers (#5820 ) * fuzzgen: Move `FunctionWithIsa` to icache fuzzer * fuzzgen: Move `Testcase` to fuzzgen fuzzer * fuzzgen: Move allowed libcalls to fuzzers * fuzzgen: Centralize printing of testcases	2023-03-04 19:17:28 +00:00
Alex Crichton	3ff3994a12	Add egraph optimization for fneg's cancelling out (#5910 ) This implements comments from #5895 to cancel out `fneg` operations in `fma` instructions. Additional support for `fmul` is added as well.	2023-03-02 18:28:32 +00:00
Tristan de Cacqueray	87672f7059	doc: fix WASI-api link (#5912 )	2023-03-02 13:22:33 +00:00
Jan-Justin van Tonder	db8fe0108f	cranelift: Add big and little endian memory accesses to interpreter (#5893 ) * Added `mem_flags` parameter to `State::checked_{load,store}` as the means for determining the endianness, typically derived from an instruction. * Added `native_endianness` property to `InterpreterState` as fallback when determining endianness, such as in cases where there are no memory flags avaiable or set. * Added `to_be` and `to_le` methods to `DataValue`. * Added `AtomicCas` and `AtomicRmw` to list of instructions with retrievable memory flags for `InstructionData::memflags`. * Enabled `atomic-{cas,rmw}-subword-{big,little}.clif` for interpreter run tests.	2023-03-02 11:57:01 +00:00
Alex Crichton	9984e959cd	aarch64: Add support for the `fmls` instruction (#5895 ) This commit adds lowerings to the AArch64 backend for the `fmls` instruction which is intended to be leveraged in the relaxed-simd proposal for WebAssembly. This should hopefully allow for a teeny-bit-more efficient codegen for this operator instead of using the `fmla` instruction plus a negation instruction.	2023-03-02 05:45:58 +00:00
Alex Crichton	52b4c48a1b	x64: Improve codegen for i8x16.shr_u (#5906 ) This catches a case that wasn't handled previously by #5880 to allow a constant load to be folded into an instruction rather than forcing it to be loaded into a temporary register.	2023-03-02 05:43:42 +00:00
Chris Fallin	7b8854f803	egraphs: fix handling of effectful-but-idempotent ops and GVN. (#5800 ) * Revert "egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)" This reverts commit `c7e2571866`. * egraphs: fix handling of effectful-but-idempotent ops and GVN. This PR addresses #5796: currently, ops that are effectful, i.e., remain in the side-effecting skeleton (which we keep in the `Layout` while the egraph exists), but are idempotent and thus mergeable by a GVN pass, are not handled properly. GVN is still possible on effectful but idempotent ops precisely because our GVN does not create partial redundancies: it removes an instruction only when it is dominated by an identical instruction. An isntruction will not be "hoisted" to a point where it could execute in the optimized code but not in the original. However, there are really two parts to the egraph implementation that produce this effect: the deduplication on insertion into the egraph, and the elaboration with a scoped hashmap. The deduplication lets us give a single name (value ID) to all copies of an identical instruction, and then elaboration will re-create duplicates if GVN should not hoist or merge some of them. Because deduplication need not worry about dominance or scopes, we use a simple (non-scoped) hashmap to dedup/intern ops as "egraph nodes". When we added support for GVN'ing effectful but idempotent ops (#5594), we kept the use of this simple dedup'ing hashmap, but these ops do not get elaborated; instead they stay in the side-effecting skeleton. Thus, we inadvertently created potential for weird code-motion effects. The proposal in #5796 would solve this in a clean way by treating these ops as pure again, and keeping them out of the skeleton, instead putting "force" pseudo-ops in the skeleton. However, this is a little more complex than I would like, and I've realized that @jameysharp's earlier suggestion is much simpler: we can keep an actual scoped hashmap separately just for the effectful-but-idempotent ops, and use it to GVN while we build the egraph. In effect, we're fusing a separate GVN pass with the egraph pass (but letting it interact corecursively with egraph rewrites. This is in principle similar to how we keep a separate map for loads and fuse this pass with the egraph rewrite pass as well. Note that we can use a `ScopedHashMap` here without the "context" (as needed by `CtxHashMap`) because, as noted by @jameysharp, in practice the ops we want to GVN have all their args inline. Equality on the `InstructinoData` itself is conservative: two insts whose struct contents compare shallowly equal are definitely identical, but identical insts in a deep-equality sense may not compare shallowly equal, due to list indirection. This is fine for GVN, because it is still sound to skip any given GVN opportunity (and keep the original instructions). Fixes #5796. * Add comments from review.	2023-03-02 02:10:42 +00:00
Alex Crichton	f05babc744	x64: Add `shuffle` cases for `punpck{h,l}bw` (#5905 ) * x64: Add `shuffle` cases for `punpck{h,l}bw` I noticed this difference between LLVM and Cranelift for something I was looking at recently, and while it's probably not all that common I figured I'd add it here since it should be somewhat useful nevertheless. * Review feedback * Use u128 extractor instead	2023-03-01 21:49:00 +00:00
Alexa VanHattum	6f6fcfa437	Add filetest for unexpected imm12_from_negated aarch64 lowering (#5904 )	2023-03-01 20:31:24 +00:00

... 3 4 5 6 7 ...

11191 Commits