wasmtime

Author	SHA1	Message	Date
Trevor Elliott	1e6c13d83e	cranelift: Rework block instructions to use BlockCall (#5464 ) Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList. To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments: inst_values - returns an iterator that traverses values in the instruction and block arguments map_inst_values - applies a function to each value in the instruction and block arguments overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-01-17 16:31:15 -08:00
wasmtime-publish	7bfbec1b57	Bump Wasmtime to 6.0.0 (#5521 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-01-05 09:46:01 -06:00
Trevor Elliott	25bf8e0e67	Make DataFlowGraph::insts public, but restricted (#5450 ) We have some operations defined on DataFlowGraph purely to work around borrow-checker issues with InstructionData and other data on DataFlowGraph. Part of the problem is that indexing the DFG directly hides the fact that we're only indexing the insts field of the DFG. This PR makes the insts field of the DFG public, but wraps it in a newtype that only allows indexing. This means that the borrow checker is better able to tell when operations on memory held by the DFG won't conflict, which comes up frequently when mutating ValueLists held by InstructionData.	2022-12-16 10:46:09 -08:00
Nick Fitzgerald	c0b587ac5f	Remove heaps from core Cranelift, push them into `cranelift-wasm` (#5386 ) * cranelift-wasm: translate Wasm loads into lower-level CLIF operations Rather than using `heap_{load,store,addr}`. * cranelift: Remove the `heap_{addr,load,store}` instructions These are now legalized in the `cranelift-wasm` frontend. * cranelift: Remove the `ir::Heap` entity from CLIF * Port basic memory operation tests to .wat filetests * Remove test for verifying CLIF heaps * Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test * Remove `heap_addr` from readonly.clif test * Remove `heap_addr` from `table_addr.clif` test * Remove `heap_addr` from the simd-fvpromote_low.clif test * Remove `heap_addr` from simd-fvdemote.clif test * Remove `heap_addr` from the load-op-store.clif test * Remove the CLIF heap runtest * Remove `heap_addr` from the global_value.clif test * Remove `heap_addr` from fpromote.clif runtests * Remove `heap_addr` from fdemote.clif runtests * Remove `heap_addr` from memory.clif parser test * Remove `heap_addr` from reject_load_readonly.clif test * Remove `heap_addr` from reject_load_notrap.clif test * Remove `heap_addr` from load_readonly_notrap.clif test * Remove `static-heap-without-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` to generating `.wat` tests. * Remove `static-heap-with-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove more heap tests These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove `heap_addr` from `simple-alias.clif` test * Remove `heap_addr` from partial-redundancy.clif test * Remove `heap_addr` from multiple-blocks.clif test * Remove `heap_addr` from fence.clif test * Remove `heap_addr` from extends.clif test * Remove runtests that rely on heaps Heaps are not a thing in CLIF or the interpreter anymore * Add generated load/store `.wat` tests * Enable memory-related wasm features in `.wat` tests * Remove CLIF heap from fcmp-mem-bug.clif test * Add a mode for compiling `.wat` all the way to assembly in filetests * Also generate WAT to assembly tests in `make-load-store-tests.sh` * cargo fmt * Reinstate `f{de,pro}mote.clif` tests without the heap bits * Remove undefined doc link * Remove outdated SVG and dot file from docs * Add docs about `None` returns for base address computation helpers * Factor out `env.heap_access_spectre_mitigation()` to a local * Expand docs for `FuncEnvironment::heaps` trait method * Restore f{de,pro}mote+load clif runtests with stack memory	2022-12-15 00:26:45 +00:00
wasmtime-publish	a28d4d3c89	Bump Wasmtime to 5.0.0 (#5372 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-12-05 08:38:57 -06:00
Rodrigo Batista de Moraes	28cf995fd3	cranelift-frontend: make `FunctionBuilder::finalize` consume self (#5316 )	2022-11-23 23:41:52 +00:00
wasmtime-publish	08ef518c95	Bump Wasmtime to 4.0.0 (#5209 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-11-06 13:32:34 -06:00
bjorn3	441401f9d6	Fix zero init sequence for i128 in cranelift-frontend (#5115 ) iconst.i128 is no longer allowed, so we have to use iconst.i64 + uextend instead.	2022-10-25 10:03:48 -07:00
Afonso Bordado	c8791073d6	cranelift: Remove `iconst.i128` (#5075 ) * cranelift: Remove iconst.i128 * bugpoint: Report Changed when only one instruction is mutated * cranelift: Fix egraph bxor rule * cranelift: Remove some simple_preopt opts for i128	2022-10-24 12:43:28 -07:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
wasmtime-publish	a9be4a9b56	Bump Wasmtime to 3.0.0 (#5016 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-10-05 09:30:55 -05:00
Jamey Sharp	d35c508436	cranelift-frontend: Replace Vecs with ListPools (#5001 ) * Elide redundant sentinel values The `undef_variables` lists were a binding from Variable to Value, but the Values were always equal to a suffix of the block's parameters. So instead of storing another copy, we can just get them back from the block parameters. According to DHAT, this decreases total memory allocated and number of bytes written, and increases number of bytes read and instructions retired, but all by small fractions of a percent. According to hyperfine, main is "1.00 ± 0.01 times faster". * Use entity_impl for cranelift_frontend::Variable Instead of hand-coding essentially the same thing. * Keep undefined variables in a ListPool According to DHAT, this improves every measure of performance (instructions retired, total memory allocated, max heap size, bytes read, and bytes written), although by fractions of a percent. According to hyperfine the difference is nearly zero, but on Spidermonkey this branch is "1.01 ± 0.00 times faster" than main. * Elide redundant block IDs In a list of predecessors, we previously kept both the jump instruction that points to the current block, and the block where that instruction resides. But we can look up the block from the instruction as long as we have access to the current Layout, which we do everywhere that it was necessary. So don't store the block, just store the instruction. * Keep predecessor definitions in a ListPool * Make append_jump_argument independent of self This makes it easier to reason about borrow-checking issues. * Reuse `results` instead of re-doing variable lookup This eliminates three array lookups per predecessor by hanging on to the results of earlier steps a little longer. This only works now because I previously removed the need to borrow all of `self`, which otherwise prevented keeping a borrow of self.results alive. I had experimented with using `Vec::split_off` to copy the relevant chunk of results to a temporary heap allocation, but the extra allocation and copy was measurably slower. So it's important that this is just a borrow. * Cache single-predecessor block ID when sealing Of the code in cranelift_frontend, `use_var` is the second-hottest path, sitting close behind the `build` function that's used when inserting every new instruction. This makes sense given that the operands of a new instruction usually need to be looked up immediately before building the instruction. So making the single-predecessor loops in `find_var` and `use_var_local` do fewer memory accesses and execute fewer instructions turns out to have a measurable effect. It's still only a small fraction of a percent overall since cranelift-frontend is only a few percent of total runtime. This patch keeps a block ID in the SSABlockData, which is None unless both the block is sealed and it has exactly one predecessor. Doing so avoids two array lookups on each iteration of the two loops. According to DHAT, compared with main, at this point this PR uses 0.3% less memory at max heap, reads 0.6% fewer bytes, and writes 0.2% fewer bytes. According to Hyperfine, this PR is "1.01 ± 0.01 times faster" than main when compiling Spidermonkey. On the other hand, Sightglass says main is 1.01x faster than this PR on the same benchmark by CPU cycles. In short, actual effects are too small to measure reliably.	2022-10-03 14:29:12 -07:00
Jamey Sharp	3fa545bd89	Cleanup cranelift-frontend (#4989 ) * cranelift-wasm: Assume block is reachable In handling the WebAssembly "end" operator, cranelift-wasm had logic to skip generating a jump instruction if the block was both unreachable and "pristine", meaning no instructions had been added. However, `translate_operator` checks first that `state.reachable` is true, so this logic only runs when cranelift-wasm believes that the current block _is_ reachable. Therefore the condition should always be true, whether the block is pristine or not. I've left a debug_assert in case `state.reachable` ever doesn't agree with `builder.is_unreachable()`, but the assert doesn't fail in any of the tests. We'll see if fuzzing finds something. Anyway, outside of cranelift-frontend, this eliminates the only use of `is_pristine()`, and there were no uses of `is_filled()`. So I've made both of those private. They're now only used in a nearby debug assert. * cranelift-frontend: Clarify pristine/filled states There was a comment here saying "A filled block cannot be pristine." Given that the intent was for those two states to be mutually exclusive, I've replaced the two booleans with a three-state enum. I also replaced all reads of these two flags with method calls. In all but one case these are only checked in debug assertions, so I don't even care whether they get inlined. They're easier to read, and this will make it easier to replace their implementations, which I hope to do soon. Finally, I replaced all assignments to either flag with an appropriate assignment of the corresponding enum state. Keep in mind this correspondence between the new enum and the old flags: - Empty: pristine true, filled false - Partial: pristine false, filled false - Filled: pristine false, filled true Every existing update to these flags could only move to a later state. (For example, Partial couldn't go back to Empty.) In the old flags that meant that pristine could only go from true to false, and filled could only go from false to true. `fill_current_block` was a weird case because at first glance it looks like it could allow both pristine and filled to be true at the same time. However, it's only called from `FuncInstBuilder::build`, which calls `ensure_inserted_block` before doing anything else, and _that_ cleared the pristine flag. Similarly, `handle_ssa_side_effects` looks like it could allow both pristine and filled to be true for anything in `split_blocks_created`. However, those blocks are created by SSABuilder, so their BlockData is not initialized by `create_block`, and instead uses BlockData::default. The `Default` implementation here previously set both flags false, while `create_block` would instead set pristine to true. So these split blocks were correctly set to the Filled state, and after this patch they are still set correctly. * cranelift-frontend: Separate SSA and user block params Previously there was a `user_param_count` field in BlockData, used purely to debug-assert that no user parameters are added to a block after `use_var` adds SSA parameters. Instead, this patch enforces a strict phase separation between the period after a block is created when user parameters can be added to it, and the period when `use_var` may be called and instructions may be added. I'm assuming that calls to `use_var` are _always_ followed by inserting one or more instructions into the block. (If you don't want to insert an instruction, why do you need to know where instructions in this block would get variable definitions from?) This patch has no visible effect for callers which follow that rule. However, it was previously legal to call `use_var`, then append a block parameter before adding instructions, so long as `use_var` didn't actually need to add a block parameter. That could only happen if the current block is sealed and has exactly one predecessor. So anyone who was counting on this behavior was playing a dangerous game anyway. * cranelift-frontend: Defer initializing block data Every reference to the func_ctx.status SecondaryMap will automatically create the appropriate entries on-demand, with the sole exception of `finalize`. In that function, debug assertions use SecondaryMap::keys to find out which blocks need to be checked. However, those assertions always succeed for blocks which never had any instructions added. So it's okay to skip them for blocks which aren't touched after `create_block`.	2022-09-30 14:11:19 -07:00
Jamey Sharp	77ab99d3b0	cranelift-frontend: SSA-building cleanup (#4984 ) * Cleanups to cranelift-frontend SSA construction * Encode sealed/undef_variables relationship in type A block can't have any undef_variables if it is sealed. It's useful to make that fact explicit in the types so that any time either value is used, it's clear that we should think about the other one too. In addition, encoding this fact in an enum type lets Rust apply an optimization that reduces the size of SSABlockData by 8 bytes, making it fit in a 64-byte cache line. I haven't taken the extra step of making SSABlockData be 64-byte aligned because 1) it doesn't seem to have a performance impact and b) doing so makes other structures quite a bit bigger. * Simplify finish_predecessors_lookup Using Vec::drain is more concise than a combination of iter().rev().take() followed by Vec::truncate. And in this case it doesn't matter what order we examine the results in, because we just want to know if they're all equal, so we might as well iterate forward instead of in reverse. There's no need for the ZeroOneOrMore enum. Instead, there are only two cases: either we have a single value to use for the variable (possibly synthesized as a constant zero), or we need to add a block parameter in every predecessor. Pre-filtering the results iterator to eliminate the sentinel makes it easy to identify how many distinct definitions this variable has. iter.next() indicates if there are any definitions at all, and then iter.all() is a clear way to express that we want to know if the remaining definitions are the same as the first one. * Simplify append_jump_argument * Avoid assigning default() into SecondaryMap This eliminates some redundant reads and writes. * cranelift-frontend: Construct with default() This eliminates a bunch of boilerplate in favor of a built in `derive` macro. Also I'm deleting an import that had the comment "FIXME: Remove in edition2021", which we've been using everywhere since April. * Fix tests	2022-09-29 16:59:47 -07:00
Jamey Sharp	6c8620b688	Bypass state machine for single-predecessor chains (#4955 ) In the common case where there is a chain of sealed blocks that each have exactly one predecessor, we can keep track of any sub-sequence of those blocks in O(1) space. So there's no need to use the state machine stack to propagate variable definitions back along the chain. Instead, we can do one loop to find which block to stop at, then either get the variable definition from that block or introduce a block parameter there, and finally do one more loop to update variable definitions in all the intervening blocks. The existing implementation already had to do a graph traversal to propagate variable definitions correctly, so this doesn't visit any more blocks than before. However, this change also makes it possible to integrate cycle detection with the graph traversal. That eliminates the need for the in_predecessor_cycle flags, and any possibility of spiky performance profiles in maintaining those flags. As far as performance goes, this is all pretty much a wash: Changes to CPU time and CPU cycles are within noise, according to hyperfine and Sightglass/perf. But it's a substantially simpler implementation, with fewer invisible interactions between functions.	2022-09-28 17:05:08 -07:00
Jamey Sharp	9715d91c50	Resolve aliases before checking for unique values (#4966 ) At control-flow join points, cranelift-frontend's SSA builder currently checks to see if only one definition of a variable reaches the current block. If so, it can eliminate the corresponding block parameter and use the original def directly. It implements this by turning the block parameter into an alias for the original value. However, it didn't resolve aliases during this check, except after it had already determined that there was only one definition. Resolving aliases first instead allows it to detect that more block parameters are redundant. And as more block parameters get converted to aliases, later blocks can see common definitions from further away, so this has a compounding effect. This also merges a special case, where there's exactly one unique non-sentinel definition but it's actually an alias for the sentinel, into the general case where all definitions are from the sentinel. As a result there's only one case that has to introduce a definition of the variable to zero. According to `valgrind --tool=dhat`, this is a significant memory savings. On the pulldown-cmark benchmark from Sightglass: - 15.3% (1.9MiB) less memory allocated at maximum heap - 4.1% (6.7MiB) less memory allocated in total - 9.8% (57MiB) fewer bytes read - 12.6% (36MiB) fewer bytes written - 5.4% fewer instructions retired - 1.04x faster by instructions retired (per Sightglass/perf) - 1.03x to 1.04x faster by CPU cycles (per Sightglass/perf) - 1.03 ± 0.01 times faster by CPU time (per hyperfine) - 1.04x faster by cache accesses (per Sightglass/perf) On the bz2 benchmark: - 1.06x faster by instructions retired (per Sightglass/perf) - 1.05x faster by CPU cycles (per Sightglass/perf) - 1.04 ± 0.01 times faster by CPU time (per hyperfine) - 1.02x to 1.03x faster by cache accesses (per Sightglass/perf) Even on the largest benchmark in Sightglass (spidermonkey.wasm), this is a measurable improvement: - 1.03x faster by instructions retired (per Sightglass/perf) - 1.02x faster by CPU cycles (per Sightglass/perf) - 1.02 ± 0.00 times faster by CPU time (per hyperfine) There was no significant difference in cache misses for any benchmark, according to Sightglass/perf.	2022-09-27 13:59:37 -07:00
Alex Crichton	7b311004b5	Leverage Cargo's workspace inheritance feature (#4905 ) * Leverage Cargo's workspace inheritance feature This commit is an attempt to reduce the complexity of the Cargo manifests in this repository with Cargo's workspace-inheritance feature becoming stable in Rust 1.64.0. This feature allows specifying fields in the root workspace `Cargo.toml` which are then reused throughout the workspace. For example this PR shares definitions such as: * All of the Wasmtime-family of crates now use `version.workspace = true` to have a single location which defines the version number. * All crates use `edition.workspace = true` to have one default edition for the entire workspace. * Common dependencies are listed in `[workspace.dependencies]` to avoid typing the same version number in a lot of different places (e.g. the `wasmparser = "0.89.0"` is now in just one spot. Currently the workspace-inheritance feature doesn't allow having two different versions to inherit, so all of the Cranelift-family of crates still manually specify their version. The inter-crate dependencies, however, are shared amongst the root workspace. This feature can be seen as a method of "preprocessing" of sorts for Cargo manifests. This will help us develop Wasmtime but shouldn't have any actual impact on the published artifacts -- everything's dependency lists are still the same. * Fix wasi-crypto tests	2022-09-26 11:30:01 -05:00
Jamey Sharp	bd870a9d6c	Shrink all SmallVecs by 8 bytes (#4951 ) We weren't using the "union" cargo feature for the smallvec crate, which reduces the size of a SmallVec by one machine word. This feature requires Rust 1.49 but we already require much newer versions. When using Wasmtime to compile pulldown-cmark from Sightglass, this saves a decent amount of memory allocations and writes. According to `valgrind --tool=dhat`: - 6.2MiB (3.69%) less memory allocated over the program's lifetime - 0.5MiB (4.13%) less memory allocated at maximum heap size - 5.5MiB (1.88%) fewer bytes written to - 0.44% fewer instructions executed Sightglass reports a statistically significant runtime improvement too: compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 24379323.60 ± 20051394.04 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.01x to 1.13x faster than main-be690a468.so! [227506364 355007998.78 423280514] main-be690a468.so [227686018 330628675.18 406025344] shrink-abiarg-0406da67c.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 360151622.56 ± 278294316.90 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.01x to 1.07x faster than main-be690a468.so! [8709162212 8911001926.44 9535111576] main-be690a468.so [5058015392 8550850303.88 9282148438] shrink-abiarg-0406da67c.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 6936570.28 ± 6897696.38 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.00x to 1.08x faster than main-be690a468.so! [155810934 175260571.20 234737344] main-be690a468.so [119128240 168324000.92 257451074] shrink-abiarg-0406da67c.so	2022-09-23 16:32:13 -07:00
Jamey Sharp	6e76e925f4	Avoid quadratic behavior in `can_optimize_var_lookup` (#4939 ) * cranelift-frontend: Avoid quadratic behavior Fixes #4923. * Improve comments and debug assertions * Improve comments One thing that's especially neat about this PR is that, unlike the `can_optimize_var_lookup` graph traversal, `update_predecessor_cycle` doesn't need to keep track of all the blocks it has visited in order to detect cycles. However, the reasons why are subtle and need careful documentation. Also neat: We've previously tried keeping either a HashSet or a SecondaryMap around to re-use the same heap allocation for the `visited` set, which needs space linear in the number of blocks. After this PR, we're still using space that's linear in the number of blocks to store the `in_predecessor_cycle` flag, but that flag fits inside existing padding in `SSABlockData`, so it's a net savings in memory consumption. * Avoid quadratic behavior in `update_predecessor_cycle` So far I hadn't really eliminated the quadratic behavior from `can_optimize_var_lookup`. I just moved it to happen when the CFG is modified instead, and switched to indexing directly into the vector of blocks instead of going through a HashSet. I suspect the latter change is always a win, but the former is only an improvement assuming that `use_var` is called more often than `declare_block_predecessor`. But @cfallin pointed out that it feels like we should be able to do better by taking advantage of the knowledge that once a block is sealed, its predecessors can't change any more. That's not completely trivial to do because changes to the property we care about propagate toward successors, and we're only keeping pointers to predecessors. Still, as long as frontends follow the existing recommendation to seal blocks as soon as possible, maintaining a conservative approximation using only local information works fine in practice. This significantly limits the situations where this graph traversal could visit a lot of the CFG. * Review comments	2022-09-23 16:41:22 +00:00
Nick Fitzgerald	b2d13ebd46	Revert "Memoize `can_optimize_var_lookup` (#4924 )" (#4937 ) This reverts commit `562bb25360`.	2022-09-21 10:30:18 -07:00
Adam Bratschi-Kaye	562bb25360	Memoize `can_optimize_var_lookup` (#4924 ) * Memoize `can_optimize_var_lookup` `can_optimize_var_lookup` can have quadratic behavior if there is a chain of blocks each containing a `local.get` instruction because each run can walk up the entire chain. This change memoizes the results of `can_optimize_var_lookup` so that we can stop following the chain of predecessors when we hit a block that has previously been handled (making the operation linear again).	2022-09-19 10:18:11 -07:00
Jamey Sharp	3d6d49daba	cranelift: Remove of/nof overflow flags from icmp (#4879 ) * cranelift: Remove of/nof overflow flags from icmp Neither Wasmtime nor cg-clif use these flags under any circumstances. From discussion on #3060 I see it's long been unclear what purpose these flags served. Fixes #3060, fixes #4406, and fixes #4875... by deleting all the code that could have been buggy. This changes the cranelift-fuzzgen input format by removing some IntCC options, so I've gone ahead and enabled I128 icmp tests at the same time. Since only the of/nof cases were failing before, I expect these to work. * Restore trapif tests It's still useful to validate that iadd_ifcout's iflags result can be forwarded correctly to trapif, and for that purpose it doesn't really matter what condition code is checked.	2022-09-07 08:38:41 -07:00
Alex Crichton	65930640f8	Bump Wasmtime to 2.0.0 (#4874 ) This commit replaces #4869 and represents the actual version bump that should have happened had I remembered to bump the in-tree version of Wasmtime to 1.0.0 prior to the branch-cut date. Alas!	2022-09-06 13:49:56 -05:00
Jamey Sharp	3629bbbd55	Print constants in a comment in CLIF output (#4725 ) When trying to read generated CLIF, it's nice to be able to see at a glance that some of the operands are defined by `iconst` and similar instructions, without having to go find each operand's definition manually.	2022-08-17 09:00:20 -07:00
Ulrich Weigand	a916788ab4	Fix mis-aligned access issues with s390x (#4702 ) This fixes two problems: minimum symbol alignment for the LARL instruction, and alignment requirements for LRL/LGRL etc. The first problem is that the LARL instruction used to load a symbol address (PC relative) requires that the target symbol is at least 2-byte aligned. This is always guaranteed for code symbols (all instructions must be 2-aligned anyway), but not necessarily for data symbols. Other s390x compilers fix this problem by ensuring that all global symbols are always emitted with a minimum 2-byte alignment. This patch introduces an equivalent mechanism for cranelift: - Add a symbol_alignment routine to TargetIsa, similar to the existing code_section_alignment routine. - Respect symbol_alignment as minimum alignment for all symbols emitted in the object backend (code and data). The second problem is that PC-relative instructions that directly access data (like LRL/LGRL, STRL/STGRL etc.) not only have the 2-byte requirement like LARL, but actually require that their memory operand is naturally aligned (i.e. alignment is at least the size of the access). This property (natural alignment for memory accesses) is supposed to be provided by the "aligned" flag in MemFlags; however, this is not implemented correctly at the moment. To fix this, this patch: - Only emits PC-relative memory access instructions if the "aligned" flag is set in the associated MemFlags. - Fixes a bug in emit_small_memory_copy and emit_small_memset which currently set the aligned flag unconditionally, ignoring the actual alignment info passed by their caller. Tested with wasmtime and cg_clif.	2022-08-16 12:39:42 -07:00
Benjamin Bouvier	8a9b1a9025	Implement an incremental compilation cache for Cranelift (#4551 ) This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. After the suggestion of Chris, `Function` has been split into mostly two parts: - on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`. - on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on. Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache: - most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set. - user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`. - some refactorings have been made for function names: - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name. - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with. The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions. A basic fuzz target has been introduced that tries to do the bare minimum: - check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function. - check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache. - This last check is less efficient and less likely to happen, so probably should be rethought a bit. Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip. Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. Fixes #4155.	2022-08-12 16:47:43 +00:00
Afonso Bordado	a36a52a017	cranelift: Print error message when basic blocks are invalid (#4591 )	2022-08-09 09:28:41 -07:00
wasmtime-publish	412fa04911	Bump Wasmtime to 0.41.0 (#4620 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-08-04 20:02:19 -05:00
Teymour Aldridge	ad223c5234	Add `try_use_var` method to `cranelift-frontend`. (#4588 ) * Add `try_use_var` method to `cranelift-frontend`. - Unlike `use_var`, this method does not panic if the variable has not been defined before use * Add `try_declare_var` and `try_def_var`. - Also implement Error for error enums. * Use `write!` macro. * Add `write!` use I missed.	2022-08-04 16:19:15 +00:00
Nick Fitzgerald	50b9195882	cranelift-frontend: Reuse visited block sets in `SSABuilder::can_optimize_var_lookup` (#4536 ) First, we switch from a `BTreeSet` to a `HashSet` because clearing a `BTreeSet` will deallocate the btree's nodes but clearing a `HashSet` will not deallocate the backing hash table, saving the space to reuse for future insertions. Then, we reuse the same set (and therefore the same allocation) across every call to `can_optimize_var_lookup`. This results in a 1.22x to 1.32x speed up on various Sightglass benchmarks: ``` compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 39478181.76 ± 3441880.32 (confidence = 99%) main.so is 0.75x to 0.79x faster than reuse-set.so! reuse-set.so is 1.27x to 1.32x faster than main.so! [160128343 172174751.09 213325968] main.so [115055695 132696569.33 200782128] reuse-set.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 22576954.88 ± 1830771.68 (confidence = 99%) main.so is 0.77x to 0.81x faster than reuse-set.so! reuse-set.so is 1.25x to 1.29x faster than main.so! [100449245 106820149.65 118628066] main.so [77039172 84243194.77 128168647] reuse-set.so compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 664533554.97 ± 22109170.05 (confidence = 99%) main.so is 0.81x to 0.82x faster than reuse-set.so! reuse-set.so is 1.22x to 1.23x faster than main.so! [3549762523 3640587103.35 3798662501] main.so [2793335181 2976053548.38 3192950484] reuse-set.so ```	2022-07-26 18:38:24 -07:00
Afonso Bordado	af62037f62	cranelift: Restrict `br_table` to `i32` indices (#4510 ) * cranelift: Restrict `br_table` to `i32` indices In #4498 it was proposed that we should only accept `i32` indices to `br_table`. The rationale for this is that larger types lead the users to a false sense of flexibility (since we don't support jump tables larger than u32's), and narrower types are not well tested paths that would be safer if we removed them. * cranelift: Reduce directly from i128 to i32 in Switch	2022-07-22 23:32:40 +00:00
Afonso Bordado	4720d09651	cranelift: Fix Switch bug when emitting indexes larger than val type (#4507 ) In #4502 we discovered a bug in the switch api where it would emit `icmp_imm`'s with types that were not able to fully represent the destination index. We now reject these inputs. The index val must always have a type that is capable of addressing the entire range of inputs.	2022-07-22 10:55:36 -07:00
Jamey Sharp	f242975c49	cranelift-frontend: Allow jump table reuse (#4429 ) * Allow using jump-tables multiple times (fixes #3347) If there are multiple `br_table` instructions using the same jump table, then `append_jump_argument` must not modify the jump table in-place. When this function is called, we don't know if more `br_table` instructions might be added later. So this patch conservatively assumes that all jump tables might be reused. If Cranelift needs to add a block argument to a block that's the target of some jump table, then the jump table will be unconditionally cloned. I'm not sure if having duplicated and unused jump tables will turn out to be a compile-time performance issue. If it is, there's discussion in issue #3347 about ways to determine that there can't be any more uses of a jump table, so that it's safe to modify in-place. * Re-enable cranelift-fuzzgen fuzz target I've been running this fuzz target for an hour without finding new bugs. Let's see if oss-fuzz finds anything now.	2022-07-11 15:09:51 -05:00
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Alex Crichton	9ae060a12a	Update some dependency versions used by Wasmtime (#4405 ) No major motivation here, mostly just dependency gardening.	2022-07-07 18:47:39 +00:00
wasmtime-publish	7c428bbd62	Bump Wasmtime to 0.40.0 (#4378 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-07-05 09:10:52 -05:00
wasmtime-publish	55946704cb	Bump Wasmtime to 0.39.0 (#4225 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-06-06 09:12:47 -05:00
wasmtime-publish	9a6854456d	Bump Wasmtime to 0.38.0 (#4103 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-05-05 13:43:02 -05:00
Alex Crichton	871a9d93f2	Update some dependencies in `Cargo.lock` (#4081 ) * Run a `cargo update` over our dependencies This'll notably fix a `cargo audit` error where we have a pinned version of the `regex` crate which has a CVE assigned to it. * Update to `object` and `hashbrown` crates Prune some duplicate versions showing up from the previous `cargo update`	2022-04-28 11:12:58 -05:00
wasmtime-publish	78a595ac88	Bump Wasmtime to 0.37.0 (#3994 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-04-05 09:24:28 -05:00
Alex Crichton	7b5176baea	Upgrade all crates to the Rust 2021 edition (#3991 ) * Upgrade all crates to the Rust 2021 edition I've personally started using the new format strings for things like `panic!("some message {foo}")` or similar and have been upgrading crates on a case-by-case basis, but I think it probably makes more sense to go ahead and blanket upgrade everything so 2021 features are always available. * Fix compile of the C API * Fix a warning * Fix another warning	2022-04-04 12:27:12 -05:00
Alex Crichton	c89dc55108	Add a two-week delay to Wasmtime's release process (#3955 ) * Bump to 0.36.0 * Add a two-week delay to Wasmtime's release process This commit is a proposal to update Wasmtime's release process with a two-week delay from branching a release until it's actually officially released. We've had two issues lately that came up which led to this proposal: * In #3915 it was realized that changes just before the 0.35.0 release weren't enough for an embedding use case, but the PR didn't meet the expectations for a full patch release. * At Fastly we were about to start rolling out a new version of Wasmtime when over the weekend the fuzz bug #3951 was found. This led to the desire internally to have a "must have been fuzzed for this long" period of time for Wasmtime changes which we felt were better reflected in the release process itself rather than something about Fastly's own integration with Wasmtime. This commit updates the automation for releases to unconditionally create a `release-X.Y.Z` branch on the 5th of every month. The actual release from this branch is then performed on the 20th of every month, roughly two weeks later. This should provide a period of time to ensure that all changes in a release are fuzzed for at least two weeks and avoid any further surprises. This should also help with any last-minute changes made just before a release if they need tweaking since backporting to a not-yet-released branch is much easier. Overall there are some new properties about Wasmtime with this proposal as well: * The `main` branch will always have a section in `RELEASES.md` which is listed as "Unreleased" for us to fill out. * The `main` branch will always be a version ahead of the latest release. For example it will be bump pre-emptively as part of the release process on the 5th where if `release-2.0.0` was created then the `main` branch will have 3.0.0 Wasmtime. * Dates for major versions are automatically updated in the `RELEASES.md` notes. The associated documentation for our release process is updated and the various scripts should all be updated now as well with this commit. * Add notes on a security patch * Clarify security fixes shouldn't be previewed early on CI	2022-04-01 13:11:10 -05:00
wasmtime-publish	9137b4a50e	Bump Wasmtime to 0.35.0 (#3885 ) [automatically-tag-and-release-this-commit] Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-03-07 15:18:34 -06:00
Chris Fallin	1c014d129a	Cranelift: ensure ISA level needed for SIMD is present when SIMD is enabled. (#3816 ) Addresses #3809: when we are asked to create a Cranelift backend with shared flags that indicate support for SIMD, we should check that the ISA level needed for our SIMD lowerings is present.	2022-02-16 17:29:30 -08:00
wackbyte	05ace6c0e2	Fix a typo in `cranelift-frontend`'s docs (#3796 ) Specifically that of `Variable`.	2022-02-13 11:08:19 -08:00
wasmtime-publish	39b88e4e9e	Release Wasmtime 0.34.0 (#3768 ) * Bump Wasmtime to 0.34.0 [automatically-tag-and-release-this-commit] * Add release notes for 0.34.0 * Update release date to today Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-02-07 19:16:26 -06:00
Chris Fallin	ae476fde60	Merge pull request #3698 from cfallin/cold-blocks Cranelift: add support for cold blocks.	2022-01-19 12:58:33 -08:00
Chris Fallin	f489b83835	Cranelift: add support for cold blocks. This PR adds a flag to each block that can be set via the frontend/builder interface that indicates that the block will not be frequently executed. As such, the compiler backend should place the block "out of line" in the final machine code, so that the ordinary, more frequent execution path that excludes the block does not have to jump around it. This is useful for adding handlers for exceptional conditions (slow-paths, guard violations) in a way that minimizes performance cost. Fixes #2747.	2022-01-19 12:17:41 -08:00
wasmtime-publish	8043c1f919	Release Wasmtime 0.33.0 (#3648 ) * Bump Wasmtime to 0.33.0 [automatically-tag-and-release-this-commit] * Update relnotes for 0.33.0 * Wordsmithing relnotes Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-01-05 13:26:50 -06:00
wasmtime-publish	c1c4c59670	Release Wasmtime 0.32.0 (#3589 ) * Bump Wasmtime to 0.32.0 [automatically-tag-and-release-this-commit] * Update release notes for 0.32.0 Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2021-12-13 13:47:30 -06:00

1 2 3 4

188 Commits