wasmtime

Author	SHA1	Message	Date
Trevor Elliott	80c147d9c0	Rework br_table to use BlockCall (#5731 ) Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.	2023-02-16 09:23:27 -08:00
Trevor Elliott	19f337e29b	Move the default block to the front of the underlying jump table storage (#5770 ) The new api on JumpTableData makese it easy to keep the default label first, and that shrinks the diff in #5731 a bit.	2023-02-13 20:50:29 +00:00
Trevor Elliott	d99783fc91	Move default blocks into jump tables (#5756 ) Move the default block off of the br_table instrution, and into the JumpTable that it references.	2023-02-10 08:53:30 -08:00
Trevor Elliott	15fe9c7c93	Inline jump tables in parsed br_table instructions (#5755 ) As jump tables are used by at most one br_table instruction, inline their definition in those instructions instead of requiring them to be declared as function-level metadata.	2023-02-09 14:24:04 -08:00
Trevor Elliott	b0b3f67cb0	Move jump tables to the DataFlowGraph (#5745 ) Move the storage for jump tables off of FunctionStencil and onto DataFlowGraph. This change is in service of #5731, making it easier to access the jump table data in the context of helpers like inst_values.	2023-02-07 21:21:35 -08:00
Trevor Elliott	d71c9458dc	Make `DataFlowGraph::blocks` public (#5740 ) Similar to when we exposed the DataFlowGraph::insts field through a restrictive newtype, expose DataFlowGraph::blocks through an interface that allows a restrictive set of operations. This field being public now allows us to avoid a rematch in ssa construction, and simplifies the implementation of adding a block argument to a block referenced by a br_table instruction.	2023-02-07 17:11:14 -08:00
Trevor Elliott	2c8425998b	Refactor matches that used to consume BranchInfo (#5734 ) Explicitly borrow the instruction data, and use a mutable borrow to avoid rematch.	2023-02-07 13:29:42 -08:00
Trevor Elliott	c8a6adf825	Remove analyze_branch and BranchInfo (#5730 ) We don't have overlap in behavior for branch instructions anymore, so we can remove analyze_branch and instead match on the InstructionData directly. Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-02-06 17:06:57 -08:00
Trevor Elliott	a5698cedf8	cranelift: Remove brz and brnz (#5630 ) Remove the brz and brnz instructions, as their behavior is now redundant with brif.	2023-01-30 20:34:56 +00:00
Trevor Elliott	b58a197d33	cranelift: Add a conditional branch instruction with two targets (#5446 ) Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both. This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks. Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.	2023-01-24 14:37:16 -08:00
Trevor Elliott	1e6c13d83e	cranelift: Rework block instructions to use BlockCall (#5464 ) Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList. To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments: inst_values - returns an iterator that traverses values in the instruction and block arguments map_inst_values - applies a function to each value in the instruction and block arguments overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-01-17 16:31:15 -08:00
Trevor Elliott	25bf8e0e67	Make DataFlowGraph::insts public, but restricted (#5450 ) We have some operations defined on DataFlowGraph purely to work around borrow-checker issues with InstructionData and other data on DataFlowGraph. Part of the problem is that indexing the DFG directly hides the fact that we're only indexing the insts field of the DFG. This PR makes the insts field of the DFG public, but wraps it in a newtype that only allows indexing. This means that the borrow checker is better able to tell when operations on memory held by the DFG won't conflict, which comes up frequently when mutating ValueLists held by InstructionData.	2022-12-16 10:46:09 -08:00
bjorn3	441401f9d6	Fix zero init sequence for i128 in cranelift-frontend (#5115 ) iconst.i128 is no longer allowed, so we have to use iconst.i64 + uextend instead.	2022-10-25 10:03:48 -07:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
Jamey Sharp	d35c508436	cranelift-frontend: Replace Vecs with ListPools (#5001 ) * Elide redundant sentinel values The `undef_variables` lists were a binding from Variable to Value, but the Values were always equal to a suffix of the block's parameters. So instead of storing another copy, we can just get them back from the block parameters. According to DHAT, this decreases total memory allocated and number of bytes written, and increases number of bytes read and instructions retired, but all by small fractions of a percent. According to hyperfine, main is "1.00 ± 0.01 times faster". * Use entity_impl for cranelift_frontend::Variable Instead of hand-coding essentially the same thing. * Keep undefined variables in a ListPool According to DHAT, this improves every measure of performance (instructions retired, total memory allocated, max heap size, bytes read, and bytes written), although by fractions of a percent. According to hyperfine the difference is nearly zero, but on Spidermonkey this branch is "1.01 ± 0.00 times faster" than main. * Elide redundant block IDs In a list of predecessors, we previously kept both the jump instruction that points to the current block, and the block where that instruction resides. But we can look up the block from the instruction as long as we have access to the current Layout, which we do everywhere that it was necessary. So don't store the block, just store the instruction. * Keep predecessor definitions in a ListPool * Make append_jump_argument independent of self This makes it easier to reason about borrow-checking issues. * Reuse `results` instead of re-doing variable lookup This eliminates three array lookups per predecessor by hanging on to the results of earlier steps a little longer. This only works now because I previously removed the need to borrow all of `self`, which otherwise prevented keeping a borrow of self.results alive. I had experimented with using `Vec::split_off` to copy the relevant chunk of results to a temporary heap allocation, but the extra allocation and copy was measurably slower. So it's important that this is just a borrow. * Cache single-predecessor block ID when sealing Of the code in cranelift_frontend, `use_var` is the second-hottest path, sitting close behind the `build` function that's used when inserting every new instruction. This makes sense given that the operands of a new instruction usually need to be looked up immediately before building the instruction. So making the single-predecessor loops in `find_var` and `use_var_local` do fewer memory accesses and execute fewer instructions turns out to have a measurable effect. It's still only a small fraction of a percent overall since cranelift-frontend is only a few percent of total runtime. This patch keeps a block ID in the SSABlockData, which is None unless both the block is sealed and it has exactly one predecessor. Doing so avoids two array lookups on each iteration of the two loops. According to DHAT, compared with main, at this point this PR uses 0.3% less memory at max heap, reads 0.6% fewer bytes, and writes 0.2% fewer bytes. According to Hyperfine, this PR is "1.01 ± 0.01 times faster" than main when compiling Spidermonkey. On the other hand, Sightglass says main is 1.01x faster than this PR on the same benchmark by CPU cycles. In short, actual effects are too small to measure reliably.	2022-10-03 14:29:12 -07:00
Jamey Sharp	77ab99d3b0	cranelift-frontend: SSA-building cleanup (#4984 ) * Cleanups to cranelift-frontend SSA construction * Encode sealed/undef_variables relationship in type A block can't have any undef_variables if it is sealed. It's useful to make that fact explicit in the types so that any time either value is used, it's clear that we should think about the other one too. In addition, encoding this fact in an enum type lets Rust apply an optimization that reduces the size of SSABlockData by 8 bytes, making it fit in a 64-byte cache line. I haven't taken the extra step of making SSABlockData be 64-byte aligned because 1) it doesn't seem to have a performance impact and b) doing so makes other structures quite a bit bigger. * Simplify finish_predecessors_lookup Using Vec::drain is more concise than a combination of iter().rev().take() followed by Vec::truncate. And in this case it doesn't matter what order we examine the results in, because we just want to know if they're all equal, so we might as well iterate forward instead of in reverse. There's no need for the ZeroOneOrMore enum. Instead, there are only two cases: either we have a single value to use for the variable (possibly synthesized as a constant zero), or we need to add a block parameter in every predecessor. Pre-filtering the results iterator to eliminate the sentinel makes it easy to identify how many distinct definitions this variable has. iter.next() indicates if there are any definitions at all, and then iter.all() is a clear way to express that we want to know if the remaining definitions are the same as the first one. * Simplify append_jump_argument * Avoid assigning default() into SecondaryMap This eliminates some redundant reads and writes. * cranelift-frontend: Construct with default() This eliminates a bunch of boilerplate in favor of a built in `derive` macro. Also I'm deleting an import that had the comment "FIXME: Remove in edition2021", which we've been using everywhere since April. * Fix tests	2022-09-29 16:59:47 -07:00
Jamey Sharp	6c8620b688	Bypass state machine for single-predecessor chains (#4955 ) In the common case where there is a chain of sealed blocks that each have exactly one predecessor, we can keep track of any sub-sequence of those blocks in O(1) space. So there's no need to use the state machine stack to propagate variable definitions back along the chain. Instead, we can do one loop to find which block to stop at, then either get the variable definition from that block or introduce a block parameter there, and finally do one more loop to update variable definitions in all the intervening blocks. The existing implementation already had to do a graph traversal to propagate variable definitions correctly, so this doesn't visit any more blocks than before. However, this change also makes it possible to integrate cycle detection with the graph traversal. That eliminates the need for the in_predecessor_cycle flags, and any possibility of spiky performance profiles in maintaining those flags. As far as performance goes, this is all pretty much a wash: Changes to CPU time and CPU cycles are within noise, according to hyperfine and Sightglass/perf. But it's a substantially simpler implementation, with fewer invisible interactions between functions.	2022-09-28 17:05:08 -07:00
Jamey Sharp	9715d91c50	Resolve aliases before checking for unique values (#4966 ) At control-flow join points, cranelift-frontend's SSA builder currently checks to see if only one definition of a variable reaches the current block. If so, it can eliminate the corresponding block parameter and use the original def directly. It implements this by turning the block parameter into an alias for the original value. However, it didn't resolve aliases during this check, except after it had already determined that there was only one definition. Resolving aliases first instead allows it to detect that more block parameters are redundant. And as more block parameters get converted to aliases, later blocks can see common definitions from further away, so this has a compounding effect. This also merges a special case, where there's exactly one unique non-sentinel definition but it's actually an alias for the sentinel, into the general case where all definitions are from the sentinel. As a result there's only one case that has to introduce a definition of the variable to zero. According to `valgrind --tool=dhat`, this is a significant memory savings. On the pulldown-cmark benchmark from Sightglass: - 15.3% (1.9MiB) less memory allocated at maximum heap - 4.1% (6.7MiB) less memory allocated in total - 9.8% (57MiB) fewer bytes read - 12.6% (36MiB) fewer bytes written - 5.4% fewer instructions retired - 1.04x faster by instructions retired (per Sightglass/perf) - 1.03x to 1.04x faster by CPU cycles (per Sightglass/perf) - 1.03 ± 0.01 times faster by CPU time (per hyperfine) - 1.04x faster by cache accesses (per Sightglass/perf) On the bz2 benchmark: - 1.06x faster by instructions retired (per Sightglass/perf) - 1.05x faster by CPU cycles (per Sightglass/perf) - 1.04 ± 0.01 times faster by CPU time (per hyperfine) - 1.02x to 1.03x faster by cache accesses (per Sightglass/perf) Even on the largest benchmark in Sightglass (spidermonkey.wasm), this is a measurable improvement: - 1.03x faster by instructions retired (per Sightglass/perf) - 1.02x faster by CPU cycles (per Sightglass/perf) - 1.02 ± 0.00 times faster by CPU time (per hyperfine) There was no significant difference in cache misses for any benchmark, according to Sightglass/perf.	2022-09-27 13:59:37 -07:00
Jamey Sharp	6e76e925f4	Avoid quadratic behavior in `can_optimize_var_lookup` (#4939 ) * cranelift-frontend: Avoid quadratic behavior Fixes #4923. * Improve comments and debug assertions * Improve comments One thing that's especially neat about this PR is that, unlike the `can_optimize_var_lookup` graph traversal, `update_predecessor_cycle` doesn't need to keep track of all the blocks it has visited in order to detect cycles. However, the reasons why are subtle and need careful documentation. Also neat: We've previously tried keeping either a HashSet or a SecondaryMap around to re-use the same heap allocation for the `visited` set, which needs space linear in the number of blocks. After this PR, we're still using space that's linear in the number of blocks to store the `in_predecessor_cycle` flag, but that flag fits inside existing padding in `SSABlockData`, so it's a net savings in memory consumption. * Avoid quadratic behavior in `update_predecessor_cycle` So far I hadn't really eliminated the quadratic behavior from `can_optimize_var_lookup`. I just moved it to happen when the CFG is modified instead, and switched to indexing directly into the vector of blocks instead of going through a HashSet. I suspect the latter change is always a win, but the former is only an improvement assuming that `use_var` is called more often than `declare_block_predecessor`. But @cfallin pointed out that it feels like we should be able to do better by taking advantage of the knowledge that once a block is sealed, its predecessors can't change any more. That's not completely trivial to do because changes to the property we care about propagate toward successors, and we're only keeping pointers to predecessors. Still, as long as frontends follow the existing recommendation to seal blocks as soon as possible, maintaining a conservative approximation using only local information works fine in practice. This significantly limits the situations where this graph traversal could visit a lot of the CFG. * Review comments	2022-09-23 16:41:22 +00:00
Nick Fitzgerald	b2d13ebd46	Revert "Memoize `can_optimize_var_lookup` (#4924 )" (#4937 ) This reverts commit `562bb25360`.	2022-09-21 10:30:18 -07:00
Adam Bratschi-Kaye	562bb25360	Memoize `can_optimize_var_lookup` (#4924 ) * Memoize `can_optimize_var_lookup` `can_optimize_var_lookup` can have quadratic behavior if there is a chain of blocks each containing a `local.get` instruction because each run can walk up the entire chain. This change memoizes the results of `can_optimize_var_lookup` so that we can stop following the chain of predecessors when we hit a block that has previously been handled (making the operation linear again).	2022-09-19 10:18:11 -07:00
Nick Fitzgerald	50b9195882	cranelift-frontend: Reuse visited block sets in `SSABuilder::can_optimize_var_lookup` (#4536 ) First, we switch from a `BTreeSet` to a `HashSet` because clearing a `BTreeSet` will deallocate the btree's nodes but clearing a `HashSet` will not deallocate the backing hash table, saving the space to reuse for future insertions. Then, we reuse the same set (and therefore the same allocation) across every call to `can_optimize_var_lookup`. This results in a 1.22x to 1.32x speed up on various Sightglass benchmarks: ``` compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 39478181.76 ± 3441880.32 (confidence = 99%) main.so is 0.75x to 0.79x faster than reuse-set.so! reuse-set.so is 1.27x to 1.32x faster than main.so! [160128343 172174751.09 213325968] main.so [115055695 132696569.33 200782128] reuse-set.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 22576954.88 ± 1830771.68 (confidence = 99%) main.so is 0.77x to 0.81x faster than reuse-set.so! reuse-set.so is 1.25x to 1.29x faster than main.so! [100449245 106820149.65 118628066] main.so [77039172 84243194.77 128168647] reuse-set.so compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 664533554.97 ± 22109170.05 (confidence = 99%) main.so is 0.81x to 0.82x faster than reuse-set.so! reuse-set.so is 1.22x to 1.23x faster than main.so! [3549762523 3640587103.35 3798662501] main.so [2793335181 2976053548.38 3192950484] reuse-set.so ```	2022-07-26 18:38:24 -07:00
Jamey Sharp	f242975c49	cranelift-frontend: Allow jump table reuse (#4429 ) * Allow using jump-tables multiple times (fixes #3347) If there are multiple `br_table` instructions using the same jump table, then `append_jump_argument` must not modify the jump table in-place. When this function is called, we don't know if more `br_table` instructions might be added later. So this patch conservatively assumes that all jump tables might be reused. If Cranelift needs to add a block argument to a block that's the target of some jump table, then the jump table will be unconditionally cloned. I'm not sure if having duplicated and unused jump tables will turn out to be a compile-time performance issue. If it is, there's discussion in issue #3347 about ways to determine that there can't be any more uses of a jump table, so that it's safe to modify in-place. * Re-enable cranelift-fuzzgen fuzz target I've been running this fuzz target for an hour without finding new bugs. Let's see if oss-fuzz finds anything now.	2022-07-11 15:09:51 -05:00
Afonso Bordado	6a9378e244	cranelift: Prevent infinite loops in ssa frontend with unreachable code. Perform a search over block predecessors trying to find loops of unreachable predecessors. We do this by iterating on predecessors and marking them as visited, stopping if we find a previously visited block or if we find a block with multiple predecessors. This issue was found by the CLIF fuzzer in #3094.	2021-09-01 11:19:23 +01:00
Alex Crichton	30d9164b6e	Fix a number of warnings cropping up on nightly Rust (#2767 ) Various small issues here and there, nothing major	2021-03-25 13:19:37 -05:00
Chris Fallin	8cd64e3ec6	Fix warnings (causing CI failures) with new Rust beta. - Panic messages must now be string literals (we used `format!()` in many places; `panic!()` can take format strings directly). - Some dead enum options with EVEX encoding stuff in old x86 backend. This will go away soon and/or be moved to the new backend anyway, so let's silence the warning for now. - A few other misc warnings.	2021-02-16 14:10:05 -08:00
teapotd	11497a5207	Ignore already sealed blocks in seal_all_blocks	2020-04-29 11:42:41 -07:00
bjorn3	bc462404b3	[cranelift-frontend] Fix emit_zero for vectors (#1428 ) * Fix emit_zero for vectors * Add undef_vector_vars frontend test * Remove unused import	2020-03-30 13:01:14 -07:00
Y-Nak	bcddce5fe0	Simplify ssa builder (#1340 ) * Simplify SSABuilder with basic block * Simplify FunctionBuilder with basic block * Update SSABuilder test * Update SSABuilder doc	2020-03-20 11:54:44 -07:00
Ryan Hunt	07f335dca6	Rename 'an block' to 'a block' Missed this in the automatic rename of 'Ebb' to 'Block'.	2020-03-03 13:21:13 -06:00
Ryan Hunt	832666c45e	Mass rename Ebb and relatives to Block (#1365 ) * Manually rename BasicBlock to BlockPredecessor BasicBlock is a pair of (Ebb, Inst) that is used to represent the basic block subcomponent of an Ebb that is a predecessor to an Ebb. Eventually we will be able to remove this struct, but for now it makes sense to give it a non-conflicting name so that we can start to transition Ebb to represent a basic block. I have not updated any comments that refer to BasicBlock, as eventually we will remove BlockPredecessor and replace with Block, which is a basic block, so the comments will become correct. * Manually rename SSABuilder block types to avoid conflict SSABuilder has its own Block and BlockData types. These along with associated identifier will cause conflicts in a later commit, so they are renamed to be more verbose here. * Automatically rename 'Ebb' to 'Block' in .rs Automatically rename 'EBB' to 'block' in .rs Automatically rename 'ebb' to 'block' in .rs Automatically rename 'extended basic block' to 'basic block' in .rs Automatically rename 'an basic block' to 'a basic block' in .rs Manually update comment for `Block` `Block`'s wikipedia article required an update. * Automatically rename 'an `Block`' to 'a `Block`' in .rs Automatically rename 'extended_basic_block' to 'basic_block' in .rs Automatically rename 'ebb' to 'block' in .clif Manually rename clif constant that contains 'ebb' as substring to avoid conflict * Automatically rename filecheck uses of 'EBB' to 'BB' 'regex: EBB' -> 'regex: BB' '$EBB' -> '$BB' * Automatically rename 'EBB' 'Ebb' to 'block' in .clif Automatically rename 'an block' to 'a block' in .clif Fix broken testcase when function name length increases Test function names are limited to 16 characters. This causes the new longer name to be truncated and fail a filecheck test. An outdated comment was also fixed.	2020-02-07 10:46:47 -06:00
Peter Huene	9f506692c2	Fix clippy warnings. This commit fixes the current set of (stable) clippy warnings in the repo.	2019-10-24 17:20:12 -07:00
Sean Stangl	46b44ad82d	Increase legibility of the SSABuilder (#1142 )	2019-10-17 11:30:38 -06:00
bjorn3	bb8fa40ef0	Rustfmt	2019-10-02 11:50:44 -07:00
bjorn3	1a99ac6b4a	Always use extern crate std in cranelift-frontend	2019-10-02 11:50:44 -07:00
Erin Power	5426e42a27	Revert "Remove FunctionBuilderContext from API, and change FunctionBuilder API" This reverts commit 39e638af99dbe6537bc935bfb1a74669b62877b3.	2019-09-17 08:58:46 +02:00
Julian Seward	c6a4c60a0f	EbbHeaderBlockData::predecessors: use SmallVec instead of Vec Allocations associated with pushes to EbbHeaderBlockData::predecessors account for 4.9% of all heap allocation (calls) in CL. This change avoids almost all of them by changing it to be a SmallVec<[PredBlock; 4]>. Dynamic instruction count falls by 0.15%.	2019-09-09 11:37:04 +02:00
Aaron Power	8fd1128990	Remove FunctionBuilderContext from API, and change FunctionBuilder API	2019-09-07 14:43:07 -07:00
Carmen Kwan	19257f80c1	Add reference types R32 and R64 -Add resumable_trap, safepoint, isnull, and null instructions -Add Stackmap struct and StackmapSink trait Co-authored-by: Mir Ahmed <mirahmed753@gmail.com> Co-authored-by: Dan Gohman <sunfish@mozilla.com>	2019-08-16 11:35:16 -07:00
lazypassion	747ad3c4c5	moved crates in lib/ to src/, renamed crates, modified some files' text (#660 ) moved crates in lib/ to src/, renamed crates, modified some files' text (#660)	2019-01-28 15:56:54 -08:00

40 Commits