wasmtime

Author	SHA1	Message	Date
Alex Crichton	453330b2db	x64: Add rudimentary support for some AVX instructions (#5795 ) * x64: Add rudimentary support for some AVX instructions I was poking around Spidermonkey's wasm backend and saw that the various assembler functions used are all `v`-prefixed which look like they're intended for use with AVX instructions. I looked at Cranelift and it currently doesn't have support for many AVX-based instructions, so I figured I'd take a crack at it! The support added here is a bit of a mishmash when viewed alone, but my general goal was to take a single instruction from the SIMD proposal for WebAssembly and migrate all of its component instructions to AVX. I, by random chance, picked a pretty complicated instruction of `f32x4.min`. This wasm instruction is implemented on x64 with 4 unique SSE instructions and ended up being a pretty good candidate. Further digging about AVX-vs-SSE shows that there should be two major benefits to using AVX over SSE: Primarily AVX instructions largely use a three-operand form where two input registers are operated with and an output register is also specified. This is in contrast to SSE's predominant one-register-is-input-but-also-output pattern. This should help free up the register allocator a bit and additionally remove the need for movement between registers. * As #4767 notes the memory-based operations of VEX-encoded instructions (aka AVX instructions) do not have strict alignment requirements which means we would be able to sink loads and stores into individual instructions instead of having separate instructions. So I set out on my journey to implement the instructions used by `f32x4.min`. The first few were fairly easy. The machinst backends are already of the shape "take these inputs and compute the output" where the x86 requirement of a register being both input and output is postprocessed in. This means that the `inst.isle` creation helpers for SSE instructions were already of the correct form to use AVX. I chose to add new `rule` branches for the instruction creation helpers, for example `x64_andnps`. The new `rule` conditionally only runs if AVX is enabled and emits an AVX instruction instead of an SSE instruction for achieving the same goal. This means that no lowerings of clif instructions were modified, instead just new instructions are being generated. The VEX encoding was previously not heavily used in Cranelift. The only current user are the FMA-style instructions that Cranelift has at this time. These FMA instructions have one extra operand than `vandnps`, for example, so I split the existing `XmmRmRVex` into a few more variants to fit the shape of the instructions that needed generating for `f32x4.min`. This was accompanied then with more AVX opcode definitions, more emission support, etc. Upon implementing all of this it turned out that the test suite was failing on my machine due to the memory-operand encodings of VEX instructions not being supported. I didn't explicitly add those in myself but some preexisting RIP-relative addressing was leaking into the new instructions with existing tests. I opted to go ahead and fill out the memory addressing modes of VEX encoding to get the tests passing again. All-in-all this PR adds new instructions to the x64 backend for a number of AVX instructions, updates 5 existing instruction producers to use AVX instructions conditionally, implements VEX memory operands, and adds some simple tests for the new output of `f32x4.min`. The existing runtest for `f32x.min` caught a few intermediate bugs along the way and I additionally added a plain `target x86_64` to that runtest to ensure that it executes with and without AVX to test the various lowerings. I'll also note that this, and future support, should be well-fuzzed through Wasmtime's fuzzing which may explicitly disable AVX support despite the machine having access to AVX, so non-AVX lowerings should be well-tested into the future. It's also worth mentioning that I am not an AVX or VEX or x64 expert. Implementing the memory operand part for VEX was the hardest part of this PR and while I think it should be good someone else should definitely double-check me. Additionally I haven't added many instructions to the x64 backend yet so I may have missed obvious places to tests or such, so am happy to follow-up with anything to be more thorough if necessary. Finally I should note that this is just the tip of the iceberg when it comes to AVX. My hope is to get some of the idioms sorted out to make it easier for future PRs to add one-off instruction lowerings or such. * Review feedback	2023-02-17 01:29:55 +00:00
Chris Fallin	c7e2571866	egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808 ) This is a short-term fix to the same bug that #5800 is addressing (#5796), but with less risk: it simply turns off GVN'ing of effectful but idempotent ops. Because we have an upcoming release, and this is a miscompile (albeit to do with trapping behavior), we would like to make the simplest possible fix that avoids the bug, and backport it. I will then rebase #5800 on top of a revert of this followed by the more complete fix.	2023-02-16 21:29:03 +00:00
Alex Crichton	cae3b26623	x64: Improve codegen for vectors with constant shift amounts (#5797 ) I stumbled across this working on #5795 and figured this was a nice opportunity to improve the codegen here.	2023-02-16 20:47:59 +00:00
Trevor Elliott	80c147d9c0	Rework br_table to use BlockCall (#5731 ) Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.	2023-02-16 09:23:27 -08:00
Trevor Elliott	cc073593a4	Fix block label printing in precise-output tests (#5798 ) As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function. --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-16 02:35:26 +00:00
Trevor Elliott	f04decc4a1	Use capstone to validate precise-output tests (#5780 ) Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.	2023-02-15 16:35:10 -08:00
Afonso Bordado	eabd43a178	aarch64: Support GOT Relative relocations in PIC mode (#5550 ) * cranelift: Add `adrp` encoding to AArch64 backend * cranelift: Support GOT Symbol References in AArch64 * cranelift: Add MachO GOT relocations * cranelift: Do not mark the GOT PageOffset12 MachO relocation as relative	2023-02-15 15:19:18 -08:00
Afonso Bordado	76539ef9f2	cranelift: Optimize `select+icmp` into `{s,u}{min,max}` (#5546 ) * cranelift: Optimize `select+icmp` into `{s,u}{min,max}` * cranelift: Add generic egraph icmp reverse rule * cranelift: Optimize `vselect+icmp` into `{s,u}{min,max}` * cranelift: Optimize some `vselect+fcmp` into `f{min,max}_pseudo` * cranelift: Add inverted forms of min/max rules	2023-02-15 15:06:21 -08:00
Trevor Elliott	f0137c2618	x64: Fix the formatting for `andn` (#5789 ) * Print AluRmRVex instructions with the destination last * Update andn tests	2023-02-15 11:16:59 -08:00
Ulrich Weigand	305000d14b	s390x: Fix instruction encoding and disassembly format bugs (#5786 ) - Fix encoding of the AHY instruction. - Fix disassembly format of FIEBR, FIDBR, and LEDBRA instructions.	2023-02-15 08:36:44 -08:00
Ulrich Weigand	e10094dcd6	s390x: Support scalar min/max clif instructions (#5762 ) We don't have ISA instructions for that, so simply expand them to icmp + select. Also enable fuzzing for those clif instructions now.	2023-02-15 11:45:09 +00:00
Nick Fitzgerald	6df3bbbe60	Cranelift: Collapse double extends into a single extend (#5772 )	2023-02-13 22:43:17 +00:00
Alex Crichton	a0a97f5e8f	Add (bnot (bxor x y)) lowerings for s390x/aarch64 (#5763 ) * Add (bnot (bxor x y)) lowerings for s390x/aarch64 I originally thought that s390x's original lowering in #5709, but as was rightfully pointed out `(bnot (bxor x y))` is equivalent to `(bxor x (bnot y))` so the special lowering for one should apply as a special lowering for the other. For the s390x and aarch64 backend that have already have a fused lowering of the bxor/bnot add a lowering additionally for the bnot/bxor combination. * Add bnot(bxor(..)) tests for s390x 128-bit sizes	2023-02-13 15:41:18 +00:00
Trevor Elliott	15fe9c7c93	Inline jump tables in parsed br_table instructions (#5755 ) As jump tables are used by at most one br_table instruction, inline their definition in those instructions instead of requiring them to be declared as function-level metadata.	2023-02-09 14:24:04 -08:00
Alex Crichton	72962c9f08	Add some minor souper-harvested optimizations (#5735 ) I was playing around with souper recently on some wasms I had lying around and these are some optimization opportunities that popped out which seemed easy-enough to add to the egraph-based optimizations.	2023-02-07 14:06:24 -06:00
Chris Fallin	75ae976adc	egraphs: fix accidental remat of call. (#5726 ) In the provided test case in #5716, the result of a call was then added to 0. We have a rewrite rule that sets the remat-bit on any add of a value and a constant, because these frequently appear (e.g. from address offset calculations) and this can frequently reduce register pressure (one long-lived base vs. many long-lived base+offset values). Separately, we have an algebraic rule that `x+0` rewrites to `x`. The result of this was that we had an eclass with the remat bit set on the add, but the add was also union'd into the call. We pick the latter during extraction, because it's cheaper not to do the add at all; but we still get the remat bit, and try to remat a call (!), which blows up later. This PR fixes the logic to look up the "best value" for a value (i.e., whatever extraction determined), and look up the remat bit on that node, not the canonical node. (Why did the canonical node become the iadd and not the call? Because the former had a lower value-number, as an accident of IR construction; we don't impose any requirements on the input CLIF's value-number ordering, and I don't think this breaks any of the important acyclic properties, even though there is technically a dependence from a lower-numbered to a higher-numbered node. In essence one can think of them as having "virtual numbers" in any true topologically-sorted order, and the only place the actual integer indices matter should be in choosing the "canonical ID", which is just used for dedup'ing, modulo this bug.) Fixes #5716.	2023-02-06 23:36:16 +00:00
Alex Crichton	de0e0bea3f	Legalize `b{and,or,xor}_not` into component instructions (#5709 ) * Remove trailing whitespace in `lower.isle` files * Legalize the `band_not` instruction into simpler form This commit legalizes the `band_not` instruction into `band`-of-`bnot`, or two instructions. This is intended to assist with egraph-based optimizations where the `band_not` instruction doesn't have to be specifically included in other bit-operation-patterns. Lowerings of the `band_not` instruction have been moved to a specialization of the `band` instruction. * Legalize `bor_not` into components Same as prior commit, but for the `bor_not` instruction. * Legalize bxor_not into bxor-of-bnot Same as prior commits. I think this also ended up fixing a bug in the s390x backend where `bxor_not x y` was actually translated as `bnot (bxor x y)` by accident given the test update changes. * Simplify not-fused operands for riscv64 Looks like some delegated-to rules have special-cases for "if this feature is enabled use the fused instruction" so move the clause for testing the feature up to the lowering phase to help trigger other rules if the feature isn't enabled. This should make the riscv64 backend more consistent with how other backends are implemented. * Remove B{and,or,xor}Not from cost of egraph metrics These shouldn't ever reach egraphs now that they're legalized away. * Add an egraph optimization for `x^-1 => ~x` This adds a simplification node to translate xor-against-minus-1 to a `bnot` instruction. This helps trigger various other optimizations in the egraph implementation and also various backend lowering rules for instructions. This is chiefly useful as wasm doesn't have a `bnot` equivalent, so it's encoded as `x^-1`. * Add a wasm test for end-to-end bitwise lowerings Test that end-to-end various optimizations are being applied for input wasm modules. * Specifically don't self-update rustup on CI I forget why this was here originally, but this is failing on Windows CI. In general there's no need to update rustup, so leave it as-is. * Cleanup some aarch64 lowering rules Previously a 32/64 split was necessary due to the `ALUOp` being different but that's been refactored away no so there's no longer any need for duplicate rules. * Narrow a x64 lowering rule This previously made more sense when it was `band_not` and rarely used, but be more specific in the type-filter on this rule that it's only applicable to SIMD types with lanes. * Simplify xor-against-minus-1 rule No need to have the commutative version since constants are already shuffled right for egraphs * Optimize band-of-bnot when bnot is on the left Use some more rules in the egraph algebraic optimizations to canonicalize band/bor/bxor with a `bnot` operand to put the operand on the right. That way the lowerings in the backends only have to list the rule once, with the operand on the right, to optimize both styles of input. * Add commutative lowering rules * Update cranelift/codegen/src/isa/x64/lower.isle Co-authored-by: Jamey Sharp <jamey@minilop.net> --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-06 13:53:40 -06:00
Trevor Elliott	6d8f2be9e1	Use `andn` for `band_not` when bmi1 is present (#5701 ) We can use the andn instruction for the lowering of band_not on x64 when bmi1 is available.	2023-02-03 16:23:18 -08:00
Nick Fitzgerald	e18d4cb711	Cranelift: Introduce support for `return_call` in the interpreter (#5697 ) Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-02-03 15:53:54 -08:00
Nick Fitzgerald	72c8513411	Cranelift: Correctly wrap shifts in constant propagation (#5695 ) Fixes #5690 Fixes #5696 Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-02-03 00:12:57 +00:00
Jun Ryung Ju	9cd4146939	Implemented `b{and,or,xor}_not` bitops for ty_int_ref_scalar_64 type. (#5604 ) * Implemented `b{and,or,xor}_not` bitops for ty_int_ref_scalar_64 type. * Added tests.	2023-02-01 21:57:18 -08:00
Jamey Sharp	ac4d28f4dd	Constant-fold icmp instructions (#5666 ) We found examples of icmp instructions with both operands constant in spidermonkey.wasm.	2023-02-01 21:55:36 +00:00
Nick Fitzgerald	bdfb746548	Cranelift: Introduce the `return_call` and `return_call_indirect` instructions (#5679 ) * Cranelift: Introduce the `tail` calling convention This is an unstable-ABI calling convention that we will eventually use to support Wasm tail calls. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * Cranelift: Introduce the `return_call` and `return_call_indirect` instructions These will be used to implement tail calls for Wasm and any other language targeting CLIF. The `return_call_indirect` instruction differs from the Wasm instruction of the same name by taking a native address callee rather than a Wasm function index. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * Cranelift: Implement verification rules for `return_call[_indirect]` They must: * have the same return types between the caller and callee, * have the same calling convention between caller and callee, * and that calling convention must support tail calls. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> * cargo fmt --------- Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-02-01 21:20:35 +00:00
Nick Fitzgerald	ffbbfbffce	Cranelift: Rewrite `or(and(x, y), not(y)) => or(x, not(y))` again (#5684 ) This rewrite was introduced in #5676 and then reverted in #5682 due to a footgun where we accidentally weren't actually checking the `y == !z` precondition. This commit fixes the precondition check. It also fixes the arithmetic to be correctly masked to the value type's width. This reverts commit `268f6bfc1d`.	2023-02-01 20:53:22 +00:00
yuyang	cb3b6c621f	fix rotl.i16 with i128 shift value. (#5611 ) * fix issue 5523. * fix. * add missing issue file. * fix issue. * fix duplicate shamt_128. * issue 5523 add test target,and fix some wrong comment. * fix output file. * enable llvm_abi_extensions for regression test file.	2023-02-01 03:44:13 +00:00
Trevor Elliott	268f6bfc1d	Revert "Cranelift: Rewrite `or(and(x, y), not(y)) => or(x, not(y))` (#5676 )" (#5682 ) This reverts commit `8c9eb9939b`. Fixes #5680	2023-02-01 02:53:23 +00:00
yuyang	0c66a1bba7	Fix issue 5528 (#5605 ) * fix parameter error. * fix float convert to i8 and i16 should extract sign bit. * add missing regression test file. * using tmp register. * float convert i8 will consume more instructions. * fix worse inst emit size. * fix worst_case_size.	2023-01-31 15:37:36 -08:00
Nick Fitzgerald	8c9eb9939b	Cranelift: Rewrite `or(and(x, y), not(y)) => or(x, not(y))` (#5676 ) Co-authored-by: Rainy Sinclair <844493+itsrainy@users.noreply.github.com>	2023-01-31 22:44:45 +00:00
Nick Fitzgerald	253e28ca4f	Cranelift: Rewrite `(x>>k)<<k` into masking off the bottom `k` bits (#5673 ) * Cranelift: Rewrite `(x>>k)<<k` into masking off the bottom `k` bits * Add a runtest for exercising our rewrite of `(x >> k) << k` into masking	2023-01-31 21:11:12 +00:00
Nick Fitzgerald	7aa240e0f2	Cranelift: constant propagate shifts (#5671 ) Thanks to Souper for pointing out we weren't doing this!	2023-01-31 12:06:53 -08:00
Nick Fitzgerald	c9d1c068bc	Cranelift: Add egraph rule to rewrite `x * C ==> x << log2(C)` when `C` is a power of two (#5647 )	2023-01-31 18:04:17 +00:00
Trevor Elliott	a5698cedf8	cranelift: Remove brz and brnz (#5630 ) Remove the brz and brnz instructions, as their behavior is now redundant with brif.	2023-01-30 20:34:56 +00:00
yuyang	77cf547f41	fix issue 5569. (#5612 ) * add regression test file. * fix issute5569. * enable code length check.	2023-01-30 10:01:33 -08:00
Jamey Sharp	915801551b	Delete old cranelift-preopt crate (#5642 ) Most of these optimizations are in the egraph `cprop.isle` rules now, making a separate crate unnecessary. Also I think the `udiv` optimizations here are straight-up wrong (doing signed instead of unsigned division, and panicking instead of preserving traps on division by zero) so I'm guessing this crate isn't seriously used anywhere. At the least, bjorn3 confirms that cg_clif doesn't use this, and I've verified that Wasmtime doesn't either. Closes #1090.	2023-01-26 21:32:33 +00:00
Trevor Elliott	7926808e8e	riscv64: improve unordered comparison generated code (#5636 ) Improve the generated code for unordered floating point comparisons by negating the comparison and inveritng the branches. This allows us to pick the unordered versions, which generate significantly better code.	2023-01-25 17:28:28 -08:00
Trevor Elliott	b58a197d33	cranelift: Add a conditional branch instruction with two targets (#5446 ) Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both. This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks. Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.	2023-01-24 14:37:16 -08:00
Jamey Sharp	fef9f64d2c	x86: Test paired udiv/urem (#5573 ) Ideally these pairs of CLIF instructions should emit a single x86 instruction, but they don't today. This test will tell us if somebody fixes that. Similar tests might make sense for imul/umulhi as well as signed versions, but I haven't tried that.	2023-01-23 11:44:27 -08:00
yuyang	7e10bd1f58	fix issue #5497 #5524 #5526 . (#5595 ) * fix issue 5497. * fix issue 5524 * fix issue 5497 5524 5526. * some clif change because of reg alloc.	2023-01-20 14:06:26 -08:00
yuyang	299b8187f8	fix issue 5525. (#5603 ) * fix issue 5525. * reg alloc changed.	2023-01-20 09:53:54 -08:00
Chris Fallin	1faff8c2ce	Enable egraph-based optimization by default. (#5587 ) This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default. Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on. This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!). Fixes #5181.	2023-01-19 15:46:53 -08:00
Chris Fallin	704f5a5772	Cranelift/egraph mid-end: support merging effectful-but-idempotent ops (#5594 ) * Support mergeable-but-side-effectful (idempotent) operations in general in the egraph's GVN. This mirrors the similar change made in #5534. * Add tests for egraph case.	2023-01-19 11:51:19 -08:00
Kevin Rizzo	da03ff47f1	winch: Adding support for integration tests (#5588 ) * Adding in the foundations for Winch `filetests` This commit adds two new crates into the Winch workspace: `filetests` and `test-macros`. The intent is to mimic the structure of Cranelift `filetests`, but in a simpler way. * Updates to documentation This commits adds a high level document to outline how to test Winch through the `winch-tools` utility. It also updates some inline documentation which gets propagated to the CLI. * Updating test-macro to use a glob instead of only a flat directory	2023-01-19 07:34:48 -05:00
Afonso Bordado	3ae373b073	cranelift: Disable select rule for i128 types on riscv64 (#5584 ) * fuzzgen: Disable some selects for RISC-V * cranelift: Force disable gen_select_reg rule for i128 values	2023-01-17 10:01:23 -08:00
Afonso Bordado	82494661c1	cranelift: Add `atomic_{load,store}` and `fence` to the interpreter (#5503 ) * cranelift: Add `fence` to interpreter * cranelift: Add `atomic_{load,store}` to the interpreter * fuzzgen: Add `atomic_{load,store}` * Update cranelift/fuzzgen/src/function_generator.rs Co-authored-by: Jamey Sharp <jamey@minilop.net> * fuzzgen: Use type size as the alignment size. Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-01-12 08:36:04 -08:00
Afonso Bordado	9556cb190f	cranelift: Forbid argument extensions for floats and SIMD vectors (#5536 ) * fuzzgen: Generate argument extensions only for integer argumetns * cranelift: Add verifier check for argument extensions	2023-01-10 10:26:30 -08:00
Alexa VanHattum	44913825b5	cranelift: fix register for `srem.i8` on x86_64 (#5540 ) * Change register written to in specific srem case. Add regression test as filetest case. Fixes #5470 * Add another test case, newline * Update comment	2023-01-06 22:18:16 +00:00
Sam Sartor	1efa3d6f8b	Add `clif-util compile` option to output object file (#5493 ) * add clif-util compile option to output object file * switch from a box to a borrow * update objectmodule tests to use borrowed isa * put targetisa into an arc	2023-01-06 12:53:48 -08:00
uint256_t	b00455135e	Cranelift: Implement 'iabs' for scalar types on x86_64 (#5527 ) * Implement 'iabs' for scalar types on x86_64 * Small fix	2023-01-05 21:33:12 -08:00
Nick Fitzgerald	c50bdf600e	Cranelift: GVN all idempotently trapping but otherwise pure instructions (#5534 )	2023-01-05 15:08:06 -08:00
Afonso Bordado	ee6a909ccb	cranelift: Cleanup SIMD `icmp` tests (#5530 ) * cranelift: Enable more SIMD tests * cranelift: Reorganize icmp tests * cranelift: Enable SIMD icmp tests for unsigned ops * cranelift: Cleanup trailing newlines	2023-01-05 09:19:03 -08:00

1 2 3 4 5 ...

1401 Commits