wasmtime

Author	SHA1	Message	Date
Afonso Bordado	2fb76be2e4	x64: Add `bmask` implementation (#5148 )	2022-10-28 17:17:22 -07:00
Afonso Bordado	879b52825f	cranelift: Implement `ineg.i128` for everyone (#5129 ) * cranelift: Add `ineg` runtests * aarch64: Implement `ineg.i128` * x64: Implement `ineg.i128` * riscv: Implement `ineg.i128` * fuzzgen: Enable `ineg.i128`	2022-10-28 16:10:00 -07:00
Afonso Bordado	3cbd490d52	fuzzgen: Add more opcodes (#5124 ) * fuzzgen: Add a few more opcodes * fuzzgen: Add more bmask variations	2022-10-27 11:01:03 -07:00
Afonso Bordado	e8f3d03bbe	cranelift: Mask high bits on `bmask` for types smaller than a register (#5118 ) * aarch64: Fix incorrect masking for small types on bmask `bmask` was accidentally relying on the uppermost bits of the register for small types. This was found by fuzzgen, when it generated a shift left followed by a bmask, the shift left shifted the bits out of the range of the input type (i8), however these are not automatically cleared since they remained inside the 32 bits of the register. That caused issues when the bmask tried to compare the whole register instead of just the bottom bits. The solution here is to mask the upper bits for small types. * aarch64: Emit 32bit cmp on bmask This fixes an issue where bmask was accidentally comparing the upper bits of the register by always using a 64bit cmp. * riscv: Mask high bits in bmask * riscv: Add compile tests for br{z,nz} * riscv: Use shifts to mask 32bit values This produces less code than the AND since that version needs to load an immediate constant from memory. * cranelift: Update test input to hexadecimal values This makes it a bit more clear what is being tested. * riscv: Use addiw for masking 32 bit values Co-authored-by: Trevor Elliott <telliott@fastly.com> * aarch64: Update bmask rule priority Co-authored-by: Trevor Elliott <telliott@fastly.com>	2022-10-27 09:45:39 -07:00
Trevor Elliott	02620441c3	Add uadd_overflow_trap (#5123 ) Add a new instruction uadd_overflow_trap, which is a fused version of iadd_ifcout and trapif. Adding this instruction removes a dependency on the iflags type, and would allow us to move closer to removing it entirely. The instruction is defined for the i32 and i64 types only, and is currently only used in the legalization of heap_addr.	2022-10-27 09:43:15 -07:00
Jamey Sharp	e079195322	Simplify overlap checking after removing Rayon (#5131 ) Now that we aren't trying to do overlap checking in parallel, we can fuse the loop that generates a list of rule pairs with the loop that checks those pairs. Removing the intermediate vector of pairs should save a little time and memory. But it also means we're no longer borrowing from the `by_term` HashMap, so we can use `into_iter` instead of `values` to move ownership out of the map. That in turn means that we can use `into_iter` on each vector of rules as well, which turns out to offer a slightly nicer idiom for looping over all pairs, and also means we drop allocations as soon as possible. I also pushed grouping by priority earlier, so the O(n^2) all-pairs loop runs over smaller lists. If we later find we want to know about overlaps across different priorities, the definition of the map key is an easy place to make that change.	2022-10-26 19:49:08 +00:00
Alex Crichton	bc3285e845	Update wasm-tools crates (#5130 ) * Update wasm-tools crates Mostly just a hygienic update, nothing major here * Fix fuzz compile * Fix test expectations	2022-10-26 18:29:10 +00:00
Afonso Bordado	4867813f77	cranelift: Remove `copy` instruction (#5125 )	2022-10-25 17:27:33 -07:00
Chris Fallin	b3333bf9ea	Cranelift: disable egraphs in fuzzing for now. (#5128 ) * Cranelift: disable egraphs in fuzzing for now. As per [this comment], with a few recent discussions it's become clear that we want to refactor egraphs in a way that will subsume, or make irrelevant, some of the recent fuzzbugs that have arisen (and likely lead to others, which we'll want to fix!). Rather than chase these down then refactor later, it probably makes sense not to spend the human time or fuzzing time doing so. This PR turns off egraphs support in fuzzing configurations for now, to be re-enabled later. [this comment]: https://github.com/bytecodealliance/wasmtime/issues/5126#issuecomment-1291222515 * Disable in cranelift-fuzzgen as well.	2022-10-25 23:51:55 +00:00
Ulrich Weigand	b61e678309	s390x: Fix more regalloc checker errors (#5121 ) For VecInsertLane[Undef] and VecExtractLane, if lane_reg is zero_reg(), the instruction does not actually use any register value. Fixes https://github.com/bytecodealliance/wasmtime/issues/5090	2022-10-25 18:04:31 +00:00
Ulrich Weigand	39b3b1d772	s390x: Fix handling of sret arguments (#5116 ) Skip synthetic StructReturn entries in the return value list. Fixes https://github.com/bytecodealliance/wasmtime/issues/5089	2022-10-25 10:40:10 -07:00
bjorn3	441401f9d6	Fix zero init sequence for i128 in cranelift-frontend (#5115 ) iconst.i128 is no longer allowed, so we have to use iconst.i64 + uextend instead.	2022-10-25 10:03:48 -07:00
Afonso Bordado	ba7b874ca3	cranelift: Add RISC-V disassembly capabilities to clif-util (#5117 ) This just correctly maps our RISC-V ISA to capstone.	2022-10-25 10:03:04 -07:00
Chris Fallin	e62e530b7c	egraphs: fix fill-in-the-types logic for multiple projections of one value. (#5112 ) In particular, this was found to happen in #5099 because a `Result` projection node was not deduplicating across two separate `isplit`s that created it. (This is a separate issue we should also fix; `needs_dedup` is I think overly conservative because `Result` can project out a single value from a pure or impure node, but the projection itself should be treated like any other pure operator.) In any case, if we have a value `v0` and two separate `Result { value: v0, result: N, ty }` nodes, each of these will fill in the type `ty` for the `N`th output of `v0`, and the second will idempotently overwrite the first; we should loosen the assert so that it allows this case. Fixes #5099. Fixes #5100.	2022-10-25 05:22:28 +00:00
Nick Fitzgerald	097d1087e0	Cranelift: Avoid calling `ensure_struct_return_pointer_is_returned` and cloning sigs for every call (#5113 ) * Cranelift: pass iterators to `ABIMachineSpec::compute_arg_locs` Instead of slices. This gives us more flexibility to pass custom sequences without needing to allocate a `Vec` to hold them and pass in as a slice. * Cranelift: Avoid cloning `ir::Signature`s in `SigData::from_func_sig` This avoids two heap allocations per signature that are unnecessary 99% of the time. * fix typo * Simplify condition in `missing_struct_return`	2022-10-24 17:21:34 -07:00
Trevor Elliott	ec12415b1f	cranelift: Remove redundant branch and select instructions (#5097 ) As discussed in the 2022/10/19 meeting, this PR removes many of the branch and select instructions that used iflags, in favor if using brz/brnz and select in their place. Additionally, it reworks selectif_spectre_guard to take an i8 input instead of an iflags input. For reference, the removed instructions are: br_icmp, brif, brff, trueif, trueff, and selectif.	2022-10-24 16:14:35 -07:00
Afonso Bordado	c8791073d6	cranelift: Remove `iconst.i128` (#5075 ) * cranelift: Remove iconst.i128 * bugpoint: Report Changed when only one instruction is mutated * cranelift: Fix egraph bxor rule * cranelift: Remove some simple_preopt opts for i128	2022-10-24 12:43:28 -07:00
Ulrich Weigand	bfcf6616fe	s390x: clean up remnants of non-SSA code generation (#5096 ) Eliminate a few remaining instances of non-SSA code. Remove infrastructure previously used for non-SSA code emission. Related cleanup around flags handling.	2022-10-24 12:40:50 -07:00
bjorn3	470070ab71	Remove rayon dependency of cranelift-isle (#5101 ) Using rayon adds a lot of dependencies to Cranelift. The total unparallelized time the code that uses rayon takes is less than half a second and it runs at compile time, so there is pretty much no benefit to parallelizing it.	2022-10-23 15:13:14 -07:00
Nick Fitzgerald	442f9fa01b	Cranelift: pass iterators to `ABIMachineSpec::compute_arg_locs` (#5095 ) Instead of slices. This gives us more flexibility to pass custom sequences without needing to allocate a `Vec` to hold them and pass in as a slice.	2022-10-21 16:08:09 -07:00
Nick Fitzgerald	5c5fa192f7	Cranelift: use `.enumerate()` to avoid indexing in s390x backend (#5094 ) This can help rustc/llvm avoid bounds checks, but more importantly I will have future changes here that remove indexing of params, and instead hand them out as an iterator.	2022-10-21 13:08:56 -07:00
Nick Fitzgerald	4a66c3b855	Cranelift: Remove duplicate IR signature legalizations (#5093 ) The `SigData::from_func_sig` constructor will already ensure that the struct return pointer is returned, so this is a purely unnecessary call. Note that this is not a performance speed up, since `ensure_struct_return_ptr_is_returned` doesn't do any significant work if the signature is already legalized.	2022-10-21 13:08:44 -07:00
Ulrich Weigand	9dadba60a0	s390x: use constraints for call arguments and return values (#5092 ) Use the regalloc constraint-based CallArgList / CallRetList mechanism instead of directly using physregs in instructions.	2022-10-21 11:01:22 -07:00
Chris Fallin	86e77953f8	Fix some egraph-related issues. (#5088 ) This fixes #5086 by addressing two separate issues: - The `ValueDataPacked::set_type()` helper had an embarrassing bitfield-manipulation bug that would mangle the rest of a `ValueDef` when setting its type. This is not normally used, only when the egraph elaboration fills in types after-the-fact on a multi-value node. - The lowering rules for `isplit` on aarch64 and s390x were dispatching on the first output type, rather than the input type. When only the second output is used (as in the example in #5086), the first output type actually remains `INVALID` (and this is fine because it's never used).	2022-10-21 10:24:48 -07:00
Trevor Elliott	d9753fac2b	Remove uses of `reg_mod` from s390x (#5073 ) Remove uses of reg_mod from the s390x backend. This required moving away from using r0/r1 as the result registers from a few different pseudo instructions, standardizing instead on r2/r3. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, which r0/r1 are not. Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-21 09:22:16 -07:00
Afonso Bordado	51d8734235	fuzzgen: Generate compiler flags (#5020 ) * fuzzgen: Test compiler flags * cranelift: Generate `all()` function for all enum flags This allows a user to iterate all flags that exist. * fuzzgen: Minimize regalloc_checker compiles * fuzzgen: Limit the amount of test case inputs * fuzzgen: Add egraphs flag It's finally here! 🥳 * cranelift: Add fuzzing comment to settings * fuzzgen: Add riscv64 * fuzzgen: Unconditionally enable some flags	2022-10-20 16:40:50 -07:00
Chris Fallin	c392e461a3	egraphs: a few miscellaneous compile-time optimizations. (#5072 ) * egraphs: a few miscellaneous compile-time optimizations. These optimizations together are worth about a 2% compile-time reduction, as measured on one core with spidermonkey.wasm as an input, using `hyperfine` on `wasmtime compile`. The changes included are: - Some better pre-allocation (blockparams and side-effects concatenated list vecs); - Avoiding the indirection of storing list-of-types for every Pure and Inst node, when almost all nodes produce only a single result; instead, store arity and single type if it exists, and allow result projection nodes to fill in types otherwise; - Pack the `MemoryState` enum into one `u32` (this together with the above removal of the type slice allows `Node` to shrink from 48 bytes to 32 bytes); - always-inline an accessor (`entry` on `CtxHash`) that wasn't (`always(inline)` appears to be load-bearing, rather than just `inline`); - Split the update-analysis path into two hotpaths, one for the union case and one for the new-node case (and the former can avoid recomputing for the contained node when replacing a node with node-and-child eclass entry). * Review feedback. * Fix test build. * Fix to lowering when unused output with invalid type is present.	2022-10-19 11:05:00 -07:00
bjorn3	0667a412d7	Export a couple of types from cranelift_module that were meant to be exported (#5074 )	2022-10-19 08:52:24 -07:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
Afonso Bordado	766ecb561e	fuzzgen: Always generate reachable blocks (#5034 ) * fuzzgen: Always reachable blocks * fuzzgen: Rename BlockTerminator * fuzzgen: Rename `finalize_block` * fuzzgen: Use `cloned` instead of map clone Thanks @jameysharp! Co-authored-by: Jamey Sharp <jamey@minilop.net> * fuzzgen: `rustfmt` * fuzzgen: Document paramless targets * fuzzgen: Add `BlockTerminatorKind` * fuzzen: Update BrTable/Switch comment * fuzzgen: Minor cleanup Co-authored-by: Jamey Sharp <jamey@minilop.net>	2022-10-17 12:51:20 -07:00
Chris Fallin	1aaea279e5	egraph opts: fix uextend-of-i32. (#5061 ) This is a simple error in the const-prop rules: uextend was not masking iconst's u64 immediate when extending from i32 to i64. Arguably an iconst.i32 should not have nonzero bits in the upper 32 of its immediate, but that's a separate design question. For now, if our invariant is that the upper bits are ignored, then it is required to mask the bits when const-evaling a `uextend`. Fixes #5047.	2022-10-17 12:45:49 -07:00
Afonso Bordado	4639e85c4e	Flush Icache on AArch64 Windows (#4997 ) * cranelift: Add FlushInstructionCache for AArch64 on Windows This was previously done on #3426 for linux. * wasmtime: Add FlushInstructionCache for AArch64 on Windows This was previously done on #3426 for linux. * cranelift: Add MemoryUse flag to JIT Memory Manager This allows us to keep the icache flushing code self-contained and not leak implementation details. This also changes the windows icache flushing code to only flush pages that were previously unflushed. * Add jit-icache-coherence crate * cranelift: Use `jit-icache-coherence` * wasmtime: Use `jit-icache-coherence` * jit-icache-coherence: Make rustix feature additive Mutually exclusive features cause issues. * wasmtime: Remove rustix from wasmtime-jit We now use it via jit-icache-coherence * Rename wasmtime-jit-icache-coherency crate * Use cfg-if in wasmtime-jit-icache-coherency crate * Use inline instead of inline(always) * Add unsafe marker to clear_cache * Conditionally compile all rustix operations membarrier does not exist on MacOS * Publish `wasmtime-jit-icache-coherence` * Remove explicit windows check This is implied by the target_os = "windows" above * cranelift: Remove len != 0 check This is redundant as it is done in non_protected_allocations_iter * Comment cleanups Thanks @akirilov-arm! * Make clear_cache safe * Rename pipeline_flush to pipeline_flush_mt * Revert "Make clear_cache safe" This reverts commit 21165d81c9030ed9b291a1021a367214d2942c90. * More docs! * Fix pipeline_flush reference on clear_cache * Update more docs! * Move pipeline flush after `mprotect` calls Technically the `clear_cache` operation is a lie in AArch64, so move the pipeline flush after the `mprotect` calls so that it benefits from the implicit cache cleaning done by it. * wasmtime: Remove rustix backend from icache crate * wasmtime: Use libc for macos * wasmtime: Flush icache on all arch's for windows * wasmtime: Add flags to membarrier call	2022-10-12 11:15:38 -07:00
Nick Fitzgerald	03d77d4d6b	Cranelift: Derive `Copy` for `InstructionData` (#5043 ) * Cranelift: Derive `Copy` for `InstructionData` And update `clone` calls to be copies. * Add a test for `InstructionData`'s size	2022-10-12 07:58:27 -07:00
Afonso Bordado	1d8f982fe5	fuzzgen: Add bitops (#5040 ) * cranelift: Implement some bitops for i128 values * fuzzgen: Add bitops	2022-10-12 05:52:48 -07:00
Chris Fallin	2be12a5167	egraph-based midend: draw the rest of the owl (productionized). (#4953 ) * egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2022-10-11 18:15:53 -07:00
Nick Fitzgerald	e2f1ced0b6	Cranelift: Make `Opcode` represented as a `u8` instead of `u16` and remove vestigial conversion impls (#5042 ) * Cranelift: Make `Opcode` represented as a `u8` instead of `u16` * Cranelift: Remove unused conversion impls for `Opcode` These are vestigial, left over from Peepmatic.	2022-10-11 12:57:12 -07:00
Afonso Bordado	86331b9b37	cranelift: Native feature detection for RISC-V (#5044 ) * cranelift: Native feature detection for RISC-V * cranelift: Typo fix Thanks @cfallin	2022-10-11 19:29:03 +00:00
Benjamin Bouvier	d68ca3711b	Upgrade sha2 to 0.10.2 in wasmtime (#4749 )	2022-10-10 09:40:40 +00:00
Jun Ryung Ju	39fbff92c3	cranelift: Added fp and, or, xor, not ops to interpreter. (#4999 ) * cranelift: Added fp and, or, xor, not ops to interpreter. * Formatting. * Removed archtecture dependent test on float-bitops.	2022-10-06 18:24:45 -07:00
Chris Fallin	e95ffe4413	Fix StructReturn handling: properly mark the clobber, and offset actual rets. (#5023 ) * Fix StructReturn handling: properly mark the clobber, and offset actual rets. The legalization of `StructReturn` was causing issues in the new call-handling code: the `StructReturn` ret was included in the `SigData` as if it were an actual CLIF-level return value, but it is not. Prior to using regalloc constraints for return values, we unconditionally included rax (or the architecture's usual return register) as a def, so it would be properly handled as "clobbered" by the regalloc. With the new scheme, we include defs on the call only for CLIF-level outputs. Callees with `StructReturn` args were thus not known to clobber the return-value register, and values might be corrupted. This PR updates the code to include a `StructReturn` ret as a clobber rather than a returned value in the relevant spots. I observed it causing saves/restores of rax in some CLIF that @bjorn3 provided me, but I was having difficulty minimizing this into a test-case that I would be comfortable including as a precise-output case (including the whole thing verbatim would lock down a bunch of other irrelevant details and cause test-update noise later). If we can find a more minimized example I'm happy to include it as a filetest. Fixes #5018.	2022-10-07 00:14:38 +00:00
Jamey Sharp	04b30acad9	Misc cleanups (#5014 ) * Replace resize+copy_from_slice with extend_from_slice Vec::resize initializes the new space, which is wasted effort if we're just going to call `copy_from_slice` on it immediately afterward. Using `extend_from_slice` is simpler, and very slightly faster. If the new size were bigger than the buffer we're copying from, then it would make sense to initialize the excess. But it isn't: it's always exactly the same size. * Move helpers from Context to CompiledCode These methods only use information from Context::compiled_code, so they should live on CompiledCode instead. * Remove an unnecessary #[cfg_attr] There are other uses of `#[allow(clippy::too_many_arguments)]` in this file, so apparently it doesn't need to be guarded by the "cargo-clippy" feature. * Fix a few comments Two of these were wrong/misleading: - `FunctionBuilder::new` does not clear the provided func_ctx. It does debug-assert that the context is already clear, but I don't think that's worth a comment. - `switch_to_block` does not "create values for the arguments." That's done by the combination of `append_block_params_for_function_params` and `declare_wasm_parameters`. * wasmtime-cranelift: Misc cleanups The main change is to use the `CompiledCode` reference we already had instead of getting it out of `Context` repeatedly. This removes a bunch of `unwrap()` calls. * wasmtime-cranelift: Factor out uncached compile	2022-10-05 10:35:59 -07:00
wasmtime-publish	a9be4a9b56	Bump Wasmtime to 3.0.0 (#5016 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-10-05 09:30:55 -05:00
Trevor Elliott	e63771f2d9	More consistent use of `add_inst` (#5012 ) Use the InstId returned by add_inst rather than creating it eagerly, when possible.	2022-10-04 15:59:30 -07:00
Trevor Elliott	a209cb63f5	ISLE: Enable the overlap checker (#5011 ) This PR turns the overlap checker on by default, requiring the use of priorities to resolve overlap between rules.	2022-10-04 21:56:49 +00:00
Alex Crichton	2607590d8c	Update the wasm-tools family of crates (#5010 ) * Update the wasm-tools family of crates Only minor updates here, mostly internal changes and no binary-related changes today. * Fix test expectation	2022-10-04 16:26:22 -05:00
yuyang	07584f6ac8	fix issue 4996. (#5003 )	2022-10-04 11:18:42 -07:00
Trevor Elliott	c9ff14e00b	Resolve overlap in the s390x backend (#5002 ) Resolve overlap in the s390x backend by adding rule priorities to disambiguate rule order.	2022-10-03 17:06:10 -07:00
Jamey Sharp	d35c508436	cranelift-frontend: Replace Vecs with ListPools (#5001 ) * Elide redundant sentinel values The `undef_variables` lists were a binding from Variable to Value, but the Values were always equal to a suffix of the block's parameters. So instead of storing another copy, we can just get them back from the block parameters. According to DHAT, this decreases total memory allocated and number of bytes written, and increases number of bytes read and instructions retired, but all by small fractions of a percent. According to hyperfine, main is "1.00 ± 0.01 times faster". * Use entity_impl for cranelift_frontend::Variable Instead of hand-coding essentially the same thing. * Keep undefined variables in a ListPool According to DHAT, this improves every measure of performance (instructions retired, total memory allocated, max heap size, bytes read, and bytes written), although by fractions of a percent. According to hyperfine the difference is nearly zero, but on Spidermonkey this branch is "1.01 ± 0.00 times faster" than main. * Elide redundant block IDs In a list of predecessors, we previously kept both the jump instruction that points to the current block, and the block where that instruction resides. But we can look up the block from the instruction as long as we have access to the current Layout, which we do everywhere that it was necessary. So don't store the block, just store the instruction. * Keep predecessor definitions in a ListPool * Make append_jump_argument independent of self This makes it easier to reason about borrow-checking issues. * Reuse `results` instead of re-doing variable lookup This eliminates three array lookups per predecessor by hanging on to the results of earlier steps a little longer. This only works now because I previously removed the need to borrow all of `self`, which otherwise prevented keeping a borrow of self.results alive. I had experimented with using `Vec::split_off` to copy the relevant chunk of results to a temporary heap allocation, but the extra allocation and copy was measurably slower. So it's important that this is just a borrow. * Cache single-predecessor block ID when sealing Of the code in cranelift_frontend, `use_var` is the second-hottest path, sitting close behind the `build` function that's used when inserting every new instruction. This makes sense given that the operands of a new instruction usually need to be looked up immediately before building the instruction. So making the single-predecessor loops in `find_var` and `use_var_local` do fewer memory accesses and execute fewer instructions turns out to have a measurable effect. It's still only a small fraction of a percent overall since cranelift-frontend is only a few percent of total runtime. This patch keeps a block ID in the SSABlockData, which is None unless both the block is sealed and it has exactly one predecessor. Doing so avoids two array lookups on each iteration of the two loops. According to DHAT, compared with main, at this point this PR uses 0.3% less memory at max heap, reads 0.6% fewer bytes, and writes 0.2% fewer bytes. According to Hyperfine, this PR is "1.01 ± 0.01 times faster" than main when compiling Spidermonkey. On the other hand, Sightglass says main is 1.01x faster than this PR on the same benchmark by CPU cycles. In short, actual effects are too small to measure reliably.	2022-10-03 14:29:12 -07:00
bjorn3	f1fce6c60d	Support writing riscv64 object files (#4995 )	2022-10-03 10:27:07 -07:00
Jamey Sharp	3fa545bd89	Cleanup cranelift-frontend (#4989 ) * cranelift-wasm: Assume block is reachable In handling the WebAssembly "end" operator, cranelift-wasm had logic to skip generating a jump instruction if the block was both unreachable and "pristine", meaning no instructions had been added. However, `translate_operator` checks first that `state.reachable` is true, so this logic only runs when cranelift-wasm believes that the current block _is_ reachable. Therefore the condition should always be true, whether the block is pristine or not. I've left a debug_assert in case `state.reachable` ever doesn't agree with `builder.is_unreachable()`, but the assert doesn't fail in any of the tests. We'll see if fuzzing finds something. Anyway, outside of cranelift-frontend, this eliminates the only use of `is_pristine()`, and there were no uses of `is_filled()`. So I've made both of those private. They're now only used in a nearby debug assert. * cranelift-frontend: Clarify pristine/filled states There was a comment here saying "A filled block cannot be pristine." Given that the intent was for those two states to be mutually exclusive, I've replaced the two booleans with a three-state enum. I also replaced all reads of these two flags with method calls. In all but one case these are only checked in debug assertions, so I don't even care whether they get inlined. They're easier to read, and this will make it easier to replace their implementations, which I hope to do soon. Finally, I replaced all assignments to either flag with an appropriate assignment of the corresponding enum state. Keep in mind this correspondence between the new enum and the old flags: - Empty: pristine true, filled false - Partial: pristine false, filled false - Filled: pristine false, filled true Every existing update to these flags could only move to a later state. (For example, Partial couldn't go back to Empty.) In the old flags that meant that pristine could only go from true to false, and filled could only go from false to true. `fill_current_block` was a weird case because at first glance it looks like it could allow both pristine and filled to be true at the same time. However, it's only called from `FuncInstBuilder::build`, which calls `ensure_inserted_block` before doing anything else, and _that_ cleared the pristine flag. Similarly, `handle_ssa_side_effects` looks like it could allow both pristine and filled to be true for anything in `split_blocks_created`. However, those blocks are created by SSABuilder, so their BlockData is not initialized by `create_block`, and instead uses BlockData::default. The `Default` implementation here previously set both flags false, while `create_block` would instead set pristine to true. So these split blocks were correctly set to the Filled state, and after this patch they are still set correctly. * cranelift-frontend: Separate SSA and user block params Previously there was a `user_param_count` field in BlockData, used purely to debug-assert that no user parameters are added to a block after `use_var` adds SSA parameters. Instead, this patch enforces a strict phase separation between the period after a block is created when user parameters can be added to it, and the period when `use_var` may be called and instructions may be added. I'm assuming that calls to `use_var` are _always_ followed by inserting one or more instructions into the block. (If you don't want to insert an instruction, why do you need to know where instructions in this block would get variable definitions from?) This patch has no visible effect for callers which follow that rule. However, it was previously legal to call `use_var`, then append a block parameter before adding instructions, so long as `use_var` didn't actually need to add a block parameter. That could only happen if the current block is sealed and has exactly one predecessor. So anyone who was counting on this behavior was playing a dangerous game anyway. * cranelift-frontend: Defer initializing block data Every reference to the func_ctx.status SecondaryMap will automatically create the appropriate entries on-demand, with the sole exception of `finalize`. In that function, debug assertions use SecondaryMap::keys to find out which blocks need to be checked. However, those assertions always succeed for blocks which never had any instructions added. So it's okay to skip them for blocks which aren't touched after `create_block`.	2022-09-30 14:11:19 -07:00

1 2 3 4 5 ...

4055 Commits