wasmtime

Author	SHA1	Message	Date
Afonso Bordado	853ff787f3	fuzzgen: Refactor name and signature generation (#5764 ) * fuzzgen: Move cranelift type generation into CraneliftArbitrary * fuzzgen: Deduplicate DataValue generation * fuzzgen: Remove unused code * fuzzgen: Pass allowed function calls into `FunctionGenerator`	2023-02-17 20:48:12 +00:00
Afonso Bordado	a7bd65d116	fuzzgen: Allow inline stackprobes for riscv64 (#5822 )	2023-02-17 20:47:39 +00:00
Trevor Elliott	a139ed6d56	Fix the postorder traversal in the DominatorTree (#5821 ) Fix the postorder traversal computed by the `DominatorTree`. It was recording nodes in the wrong order depending on the order child nodes were visited. Consider the following program: ``` function %foo2(i8) -> i8 { block0(v0: i8): brif v0, block1, block2 block1: return v0 block2: jump block1 } ``` The postorder produced by the previous implementation was: ``` block2 block1 block0 ``` Which is incorrect, as `block1` is branched to by `block2`. Changing the branch order in the function would also change the postorder result, yielding the expected order with `block1` emitted first. The problem was that when pushing successor nodes onto the stack, the old implementation would also mark them SEEN. This would then prevent them from being pushed on the stack again in the future, which is incorrect as they might be visited by other nodes that have not yet been pushed. This causes nodes to potentially show up later in the postorder traversal than they should. This PR reworks the implementation of `DominatorTree::compute` to produce an order where `block1` is always returned first, regardless of the branch order in the original program. Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-02-17 20:39:04 +00:00
Berkus Decker	c8fa1b845f	Fix typo (#5814 )	2023-02-17 15:08:07 +00:00
Alex Crichton	453330b2db	x64: Add rudimentary support for some AVX instructions (#5795 ) * x64: Add rudimentary support for some AVX instructions I was poking around Spidermonkey's wasm backend and saw that the various assembler functions used are all `v`-prefixed which look like they're intended for use with AVX instructions. I looked at Cranelift and it currently doesn't have support for many AVX-based instructions, so I figured I'd take a crack at it! The support added here is a bit of a mishmash when viewed alone, but my general goal was to take a single instruction from the SIMD proposal for WebAssembly and migrate all of its component instructions to AVX. I, by random chance, picked a pretty complicated instruction of `f32x4.min`. This wasm instruction is implemented on x64 with 4 unique SSE instructions and ended up being a pretty good candidate. Further digging about AVX-vs-SSE shows that there should be two major benefits to using AVX over SSE: Primarily AVX instructions largely use a three-operand form where two input registers are operated with and an output register is also specified. This is in contrast to SSE's predominant one-register-is-input-but-also-output pattern. This should help free up the register allocator a bit and additionally remove the need for movement between registers. * As #4767 notes the memory-based operations of VEX-encoded instructions (aka AVX instructions) do not have strict alignment requirements which means we would be able to sink loads and stores into individual instructions instead of having separate instructions. So I set out on my journey to implement the instructions used by `f32x4.min`. The first few were fairly easy. The machinst backends are already of the shape "take these inputs and compute the output" where the x86 requirement of a register being both input and output is postprocessed in. This means that the `inst.isle` creation helpers for SSE instructions were already of the correct form to use AVX. I chose to add new `rule` branches for the instruction creation helpers, for example `x64_andnps`. The new `rule` conditionally only runs if AVX is enabled and emits an AVX instruction instead of an SSE instruction for achieving the same goal. This means that no lowerings of clif instructions were modified, instead just new instructions are being generated. The VEX encoding was previously not heavily used in Cranelift. The only current user are the FMA-style instructions that Cranelift has at this time. These FMA instructions have one extra operand than `vandnps`, for example, so I split the existing `XmmRmRVex` into a few more variants to fit the shape of the instructions that needed generating for `f32x4.min`. This was accompanied then with more AVX opcode definitions, more emission support, etc. Upon implementing all of this it turned out that the test suite was failing on my machine due to the memory-operand encodings of VEX instructions not being supported. I didn't explicitly add those in myself but some preexisting RIP-relative addressing was leaking into the new instructions with existing tests. I opted to go ahead and fill out the memory addressing modes of VEX encoding to get the tests passing again. All-in-all this PR adds new instructions to the x64 backend for a number of AVX instructions, updates 5 existing instruction producers to use AVX instructions conditionally, implements VEX memory operands, and adds some simple tests for the new output of `f32x4.min`. The existing runtest for `f32x.min` caught a few intermediate bugs along the way and I additionally added a plain `target x86_64` to that runtest to ensure that it executes with and without AVX to test the various lowerings. I'll also note that this, and future support, should be well-fuzzed through Wasmtime's fuzzing which may explicitly disable AVX support despite the machine having access to AVX, so non-AVX lowerings should be well-tested into the future. It's also worth mentioning that I am not an AVX or VEX or x64 expert. Implementing the memory operand part for VEX was the hardest part of this PR and while I think it should be good someone else should definitely double-check me. Additionally I haven't added many instructions to the x64 backend yet so I may have missed obvious places to tests or such, so am happy to follow-up with anything to be more thorough if necessary. Finally I should note that this is just the tip of the iceberg when it comes to AVX. My hope is to get some of the idioms sorted out to make it easier for future PRs to add one-off instruction lowerings or such. * Review feedback	2023-02-17 01:29:55 +00:00
Trevor Elliott	d711872d63	Refactor collect_branches_and_targets to not need a smallvec (#5803 ) * Refactor collect_branches_and_targets to not need a smallvec Basic blocks are terminated by at most one branch instruction now, so we can use that assumption in `collect_branches_and_targets` to return the last instruction we saw instead. * Review comments	2023-02-16 21:30:17 +00:00
Chris Fallin	c7e2571866	egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808 ) This is a short-term fix to the same bug that #5800 is addressing (#5796), but with less risk: it simply turns off GVN'ing of effectful but idempotent ops. Because we have an upcoming release, and this is a miscompile (albeit to do with trapping behavior), we would like to make the simplest possible fix that avoids the bug, and backport it. I will then rebase #5800 on top of a revert of this followed by the more complete fix.	2023-02-16 21:29:03 +00:00
Alex Crichton	cae3b26623	x64: Improve codegen for vectors with constant shift amounts (#5797 ) I stumbled across this working on #5795 and figured this was a nice opportunity to improve the codegen here.	2023-02-16 20:47:59 +00:00
Trevor Elliott	80c147d9c0	Rework br_table to use BlockCall (#5731 ) Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.	2023-02-16 09:23:27 -08:00
Chris Fallin	c15c4ed23d	Cranelift: upgrade to regalloc2 0.6.1. (#5799 ) * Cranelift: upgrade to regalloc2 0.6.1. Fixes #5791 by pulling in bytecodealliance/regalloc2#113. * Add cargo-vet entry for regalloc2 0.6.1.	2023-02-16 03:22:58 +00:00
Trevor Elliott	cc073593a4	Fix block label printing in precise-output tests (#5798 ) As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function. --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-16 02:35:26 +00:00
Trevor Elliott	f04decc4a1	Use capstone to validate precise-output tests (#5780 ) Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.	2023-02-15 16:35:10 -08:00
Afonso Bordado	eabd43a178	aarch64: Support GOT Relative relocations in PIC mode (#5550 ) * cranelift: Add `adrp` encoding to AArch64 backend * cranelift: Support GOT Symbol References in AArch64 * cranelift: Add MachO GOT relocations * cranelift: Do not mark the GOT PageOffset12 MachO relocation as relative	2023-02-15 15:19:18 -08:00
Trevor Elliott	aba239e9b8	Fix handling of jumps in bugpoint (#5794 ) Fixes #5792	2023-02-15 15:07:03 -08:00
Afonso Bordado	76539ef9f2	cranelift: Optimize `select+icmp` into `{s,u}{min,max}` (#5546 ) * cranelift: Optimize `select+icmp` into `{s,u}{min,max}` * cranelift: Add generic egraph icmp reverse rule * cranelift: Optimize `vselect+icmp` into `{s,u}{min,max}` * cranelift: Optimize some `vselect+fcmp` into `f{min,max}_pseudo` * cranelift: Add inverted forms of min/max rules	2023-02-15 15:06:21 -08:00
Trevor Elliott	f0137c2618	x64: Fix the formatting for `andn` (#5789 ) * Print AluRmRVex instructions with the destination last * Update andn tests	2023-02-15 11:16:59 -08:00
Alex Crichton	0037b71b11	Use xmm_rm_r more frequently in x64 backend (#5787 ) This updates the signatures of the `xmm_rm_r` helper function and then updates existing users and migrates other users to the helper now that the type information is no longer required.	2023-02-15 09:03:19 -08:00
Ulrich Weigand	305000d14b	s390x: Fix instruction encoding and disassembly format bugs (#5786 ) - Fix encoding of the AHY instruction. - Fix disassembly format of FIEBR, FIDBR, and LEDBRA instructions.	2023-02-15 08:36:44 -08:00
Ulrich Weigand	e10094dcd6	s390x: Support scalar min/max clif instructions (#5762 ) We don't have ISA instructions for that, so simply expand them to icmp + select. Also enable fuzzing for those clif instructions now.	2023-02-15 11:45:09 +00:00
Alphyr	cb150d37ce	Update dependencies (#5513 )	2023-02-14 19:45:15 +00:00
Nick Fitzgerald	6df3bbbe60	Cranelift: Collapse double extends into a single extend (#5772 )	2023-02-13 22:43:17 +00:00
Trevor Elliott	19f337e29b	Move the default block to the front of the underlying jump table storage (#5770 ) The new api on JumpTableData makese it easy to keep the default label first, and that shrinks the diff in #5731 a bit.	2023-02-13 20:50:29 +00:00
Alex Crichton	a0a97f5e8f	Add (bnot (bxor x y)) lowerings for s390x/aarch64 (#5763 ) * Add (bnot (bxor x y)) lowerings for s390x/aarch64 I originally thought that s390x's original lowering in #5709, but as was rightfully pointed out `(bnot (bxor x y))` is equivalent to `(bxor x (bnot y))` so the special lowering for one should apply as a special lowering for the other. For the s390x and aarch64 backend that have already have a fused lowering of the bxor/bnot add a lowering additionally for the bnot/bxor combination. * Add bnot(bxor(..)) tests for s390x 128-bit sizes	2023-02-13 15:41:18 +00:00
Trevor Elliott	d99783fc91	Move default blocks into jump tables (#5756 ) Move the default block off of the br_table instrution, and into the JumpTable that it references.	2023-02-10 08:53:30 -08:00
Trevor Elliott	15fe9c7c93	Inline jump tables in parsed br_table instructions (#5755 ) As jump tables are used by at most one br_table instruction, inline their definition in those instructions instead of requiring them to be declared as function-level metadata.	2023-02-09 14:24:04 -08:00
bjorn3	202d3af16a	Remove the unused sigid argument purpose (#5753 )	2023-02-09 09:18:39 -08:00
Amanieu d'Antras	a2d356d45e	Add `JITBuilder::with_flags` constructor (#5751 ) This allows custom flags to be set (e.g. `opt-level`) while still leaving most of of the boilerplate to select the native target to the `JITBuilder`.	2023-02-09 02:49:17 +00:00
Ayomide Bamidele	9637840b4b	Update AMD and generic x86 CPU presets to match LLVM (#5575 ) * Add x86 AMD and generic presets * Fix typos * Add zen2 definition	2023-02-08 15:56:45 -08:00
Trevor Elliott	34ec4b4e44	Reuse `inst_predicates::visit_block_succs` in more places (#5750 ) Following up from #5730, replace some explicit matching over branch instructions with a use of inst_predicates::visit_block_succs.	2023-02-08 15:42:24 -08:00
Trevor Elliott	b0b3f67cb0	Move jump tables to the DataFlowGraph (#5745 ) Move the storage for jump tables off of FunctionStencil and onto DataFlowGraph. This change is in service of #5731, making it easier to access the jump table data in the context of helpers like inst_values.	2023-02-07 21:21:35 -08:00
Jamey Sharp	7bf89683e9	Generalize and/or/xor optimizations (#5744 ) * Generalize `n ^ !n` optimization to more types * Generalize `x & -1` optimization to more types Also mark the `x & x` rewrite to `subsume`. * Cranelift: Optimize x\|!x and x&!x to constants These cases are much like the existing x^!x rules.	2023-02-08 02:18:36 +00:00
Trevor Elliott	d71c9458dc	Make `DataFlowGraph::blocks` public (#5740 ) Similar to when we exposed the DataFlowGraph::insts field through a restrictive newtype, expose DataFlowGraph::blocks through an interface that allows a restrictive set of operations. This field being public now allows us to avoid a rematch in ssa construction, and simplifies the implementation of adding a block argument to a block referenced by a br_table instruction.	2023-02-07 17:11:14 -08:00
Jamey Sharp	f3b408d5e2	Algebraic opts: Reuse `iconst 0` from LHS (#5724 ) We don't need to spend time going through the GVN map to dedup a newly-constructed `iconst 0` when we already matched that value on the left-hand side of these rules. Also, mark these rules as subsuming any others since we can't do better than reducing an expression to a constant.	2023-02-08 00:11:07 +00:00
Trevor Elliott	116e5a665f	Bump regalloc2 to 0.6.0 (#5742 ) * Bump regalloc2 * Certify regalloc2 0.6.0	2023-02-07 15:57:49 -08:00
Trevor Elliott	3343cf80e9	Add assertions for matches that used to use analyze_branch (#5733 ) Following up from #5730, add debug assertions to ensure that new branch instructions don't slip through matches that used to use analyze_branch.	2023-02-07 14:51:18 -08:00
Trevor Elliott	2c8425998b	Refactor matches that used to consume BranchInfo (#5734 ) Explicitly borrow the instruction data, and use a mutable borrow to avoid rematch.	2023-02-07 13:29:42 -08:00
Raekye	fdd4a778fc	Fix ABI of jitted function in cranelift-jit example. (#5736 )	2023-02-07 21:16:05 +00:00
Alex Crichton	72962c9f08	Add some minor souper-harvested optimizations (#5735 ) I was playing around with souper recently on some wasms I had lying around and these are some optimization opportunities that popped out which seemed easy-enough to add to the egraph-based optimizations.	2023-02-07 14:06:24 -06:00
Chris Fallin	673b448cfe	Cranelift DFG: make inst clone deep-clone varargs lists. (#5727 ) When investigating #5716, I found that rematerialization of a `call`, in addition to blowing up for other reasons, caused aliasing of the varargs list (the `EntityList` in the `ListPool`), such that editing the args of the second copy of the call instruction inadvertently updated the first as well. This PR modifies `DataFlowGraph::clone_inst` so that it always clones the varargs list if present. This shouldn't have any functional impact on Cranelift today, because we don't rematerialize any instructions with varargs; but it's important to get it right to avoid a bug later!	2023-02-07 01:21:09 +00:00
Trevor Elliott	c8a6adf825	Remove analyze_branch and BranchInfo (#5730 ) We don't have overlap in behavior for branch instructions anymore, so we can remove analyze_branch and instead match on the InstructionData directly. Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-06 17:06:57 -08:00
Chris Fallin	75ae976adc	egraphs: fix accidental remat of call. (#5726 ) In the provided test case in #5716, the result of a call was then added to 0. We have a rewrite rule that sets the remat-bit on any add of a value and a constant, because these frequently appear (e.g. from address offset calculations) and this can frequently reduce register pressure (one long-lived base vs. many long-lived base+offset values). Separately, we have an algebraic rule that `x+0` rewrites to `x`. The result of this was that we had an eclass with the remat bit set on the add, but the add was also union'd into the call. We pick the latter during extraction, because it's cheaper not to do the add at all; but we still get the remat bit, and try to remat a call (!), which blows up later. This PR fixes the logic to look up the "best value" for a value (i.e., whatever extraction determined), and look up the remat bit on that node, not the canonical node. (Why did the canonical node become the iadd and not the call? Because the former had a lower value-number, as an accident of IR construction; we don't impose any requirements on the input CLIF's value-number ordering, and I don't think this breaks any of the important acyclic properties, even though there is technically a dependence from a lower-numbered to a higher-numbered node. In essence one can think of them as having "virtual numbers" in any true topologically-sorted order, and the only place the actual integer indices matter should be in choosing the "canonical ID", which is just used for dedup'ing, modulo this bug.) Fixes #5716.	2023-02-06 23:36:16 +00:00
bjorn3	16afefdab1	Some refactorings to the ISLE parser (#5693 ) * Use is_ascii_digit and is_ascii_hexdigit in the ISLE lexer * Use range pattern in ISLE lexer * Use a couple of shorthands in the ISLE parser * Use parse_ident instead of symbol + str_to_ident * Introduce token eating api This is a non-fatal version of the take api * Rename take to expect and add expect_ prefixes to several methods * Review comments	2023-02-06 15:11:25 -08:00
Trevor Elliott	e9c05622c0	Keep reachable jump tables (#5721 ) Instead of identifying unused branch tables by looking for unused blocks inside of them, track used branch tables while traversing reachable blocks. This introduces an extra allocation of an EntitySet to track the used jump tables, but as those are few and this function runs once per ir::Function, the allocation seems reasonable.	2023-02-06 14:10:47 -08:00
Jamey Sharp	65c1f654f2	Cranelift: Only build iconst for ints <= 64 bits (#5723 ) I audited the egraph "algebraic" optimization rules for any which construct an `iconst` on the right-hand side of the rule. In these cases we need to constrain the type passed to `iconst` to be both `fits_in_64` and `ty_int`, because `iconst` is not defined on other types.	2023-02-06 14:10:29 -08:00
Alex Crichton	de0e0bea3f	Legalize `b{and,or,xor}_not` into component instructions (#5709 ) * Remove trailing whitespace in `lower.isle` files * Legalize the `band_not` instruction into simpler form This commit legalizes the `band_not` instruction into `band`-of-`bnot`, or two instructions. This is intended to assist with egraph-based optimizations where the `band_not` instruction doesn't have to be specifically included in other bit-operation-patterns. Lowerings of the `band_not` instruction have been moved to a specialization of the `band` instruction. * Legalize `bor_not` into components Same as prior commit, but for the `bor_not` instruction. * Legalize bxor_not into bxor-of-bnot Same as prior commits. I think this also ended up fixing a bug in the s390x backend where `bxor_not x y` was actually translated as `bnot (bxor x y)` by accident given the test update changes. * Simplify not-fused operands for riscv64 Looks like some delegated-to rules have special-cases for "if this feature is enabled use the fused instruction" so move the clause for testing the feature up to the lowering phase to help trigger other rules if the feature isn't enabled. This should make the riscv64 backend more consistent with how other backends are implemented. * Remove B{and,or,xor}Not from cost of egraph metrics These shouldn't ever reach egraphs now that they're legalized away. * Add an egraph optimization for `x^-1 => ~x` This adds a simplification node to translate xor-against-minus-1 to a `bnot` instruction. This helps trigger various other optimizations in the egraph implementation and also various backend lowering rules for instructions. This is chiefly useful as wasm doesn't have a `bnot` equivalent, so it's encoded as `x^-1`. * Add a wasm test for end-to-end bitwise lowerings Test that end-to-end various optimizations are being applied for input wasm modules. * Specifically don't self-update rustup on CI I forget why this was here originally, but this is failing on Windows CI. In general there's no need to update rustup, so leave it as-is. * Cleanup some aarch64 lowering rules Previously a 32/64 split was necessary due to the `ALUOp` being different but that's been refactored away no so there's no longer any need for duplicate rules. * Narrow a x64 lowering rule This previously made more sense when it was `band_not` and rarely used, but be more specific in the type-filter on this rule that it's only applicable to SIMD types with lanes. * Simplify xor-against-minus-1 rule No need to have the commutative version since constants are already shuffled right for egraphs * Optimize band-of-bnot when bnot is on the left Use some more rules in the egraph algebraic optimizations to canonicalize band/bor/bxor with a `bnot` operand to put the operand on the right. That way the lowerings in the backends only have to list the rule once, with the operand on the right, to optimize both styles of input. * Add commutative lowering rules * Update cranelift/codegen/src/isa/x64/lower.isle Co-authored-by: Jamey Sharp <jamey@minilop.net> --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-06 13:53:40 -06:00
Afonso Bordado	99c3936616	fuzzgen: Enable `rotl` for riscv64 (#5715 ) It was fixed in #5611	2023-02-06 18:27:31 +00:00
Jamey Sharp	23e1d6b5e3	egraphs/cprop: Don't extend constants to `i128` (#5717 ) Fixes #5711.	2023-02-06 17:34:21 +00:00
wasmtime-publish	482f541101	Bump Wasmtime to 7.0.0 (#5712 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-02-06 09:10:19 -06:00
Jamey Sharp	97381792ac	Generalize u/sextend constant folding to all types (#5706 ) Also move these optimization rules to cprop.isle; it's where all the other similar rules are. Like the other cprop rules, these can subsume any other rules. We can't do better than reducing an expression to a constant. The new i64_sextend_imm64 and u64_uextend_imm64 constructors are useful helpers to clean up other code. I applied them to `imm64_icmp` while I was here, as well as using the existing `ty_mask` helper to clean up `imm64_masked`.	2023-02-03 17:29:21 -08:00
Trevor Elliott	6d8f2be9e1	Use `andn` for `band_not` when bmi1 is present (#5701 ) We can use the andn instruction for the lowering of band_not on x64 when bmi1 is available.	2023-02-03 16:23:18 -08:00

1 2 3 4 5 ...

4327 Commits