wasmtime

Author	SHA1	Message	Date
Trevor Elliott	19f337e29b	Move the default block to the front of the underlying jump table storage (#5770 ) The new api on JumpTableData makese it easy to keep the default label first, and that shrinks the diff in #5731 a bit.	2023-02-13 20:50:29 +00:00
Trevor Elliott	d99783fc91	Move default blocks into jump tables (#5756 ) Move the default block off of the br_table instrution, and into the JumpTable that it references.	2023-02-10 08:53:30 -08:00
wasmtime-publish	482f541101	Bump Wasmtime to 7.0.0 (#5712 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-02-06 09:10:19 -06:00
Nick Fitzgerald	72c8513411	Cranelift: Correctly wrap shifts in constant propagation (#5695 ) Fixes #5690 Fixes #5696 Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2023-02-03 00:12:57 +00:00
Alex Crichton	7f2c8e6344	Fix some warnings on nightly Rust (#5668 ) * Fix some warnings on nightly Rust Cargo is warning about the usage of workspace dependencies where the workspace declaration does not mention `default-features` but the dependency mentions `default-features`, so this explicitly turns off default features for `cranelift-codegen` at the workspace level and removes the explicit `default-features = false` at the manifest levels. * Explicitly enable default feature in wasmtime * Enable another feature	2023-01-31 20:54:58 +00:00
Trevor Elliott	058d93bc64	Migrate cranelift-wasm to brif (#5638 ) Incrementally working towards removing brz and brnz completely.	2023-01-27 10:34:41 -08:00
Alex Crichton	a7d0d00e57	Update wasm-tools crates (#5631 ) Nothing major pulled in here, but wanted to update to the latest versions which enable tail calls by default. When used in Wasmtime, however, the feature is disabled without the possibility of being enabled since it's not implemented.	2023-01-25 16:33:26 +00:00
wasmtime-publish	7bfbec1b57	Bump Wasmtime to 6.0.0 (#5521 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-01-05 09:46:01 -06:00
Nick Fitzgerald	276bc6ad2e	cranelift-wasm: Better track reachability after translating loads (#5511 ) We can sometimes statically determine that a given load will unconditionally trap. When this happens, we emit an unconditional trap, and we need to stop adding new instructions to the block. This commit introduces a `Reachability<T>` type that is like `Option<T>` but specifically for communicating reachability and is marked `must_use` to receivers to handle transitions from reachable to unreachable states. Additionally, adds handling of reachable -> unreachable state transitions to some SIMD op translations that weren't checking for it. Fixes #5455 Fixes #5456	2023-01-03 22:04:18 +00:00
Nick Fitzgerald	1fe56d7efb	Account for fuel before unconditionally trapping Wasm accesses (#5447 ) * Account for fuel before unconditionally trapping Wasm accesses Fixes #5445 * Add a test for fuel accounting and unconditionally trapping memory accesses	2022-12-15 20:18:52 +00:00
Nick Fitzgerald	c0b587ac5f	Remove heaps from core Cranelift, push them into `cranelift-wasm` (#5386 ) * cranelift-wasm: translate Wasm loads into lower-level CLIF operations Rather than using `heap_{load,store,addr}`. * cranelift: Remove the `heap_{addr,load,store}` instructions These are now legalized in the `cranelift-wasm` frontend. * cranelift: Remove the `ir::Heap` entity from CLIF * Port basic memory operation tests to .wat filetests * Remove test for verifying CLIF heaps * Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test * Remove `heap_addr` from readonly.clif test * Remove `heap_addr` from `table_addr.clif` test * Remove `heap_addr` from the simd-fvpromote_low.clif test * Remove `heap_addr` from simd-fvdemote.clif test * Remove `heap_addr` from the load-op-store.clif test * Remove the CLIF heap runtest * Remove `heap_addr` from the global_value.clif test * Remove `heap_addr` from fpromote.clif runtests * Remove `heap_addr` from fdemote.clif runtests * Remove `heap_addr` from memory.clif parser test * Remove `heap_addr` from reject_load_readonly.clif test * Remove `heap_addr` from reject_load_notrap.clif test * Remove `heap_addr` from load_readonly_notrap.clif test * Remove `static-heap-without-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` to generating `.wat` tests. * Remove `static-heap-with-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove more heap tests These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove `heap_addr` from `simple-alias.clif` test * Remove `heap_addr` from partial-redundancy.clif test * Remove `heap_addr` from multiple-blocks.clif test * Remove `heap_addr` from fence.clif test * Remove `heap_addr` from extends.clif test * Remove runtests that rely on heaps Heaps are not a thing in CLIF or the interpreter anymore * Add generated load/store `.wat` tests * Enable memory-related wasm features in `.wat` tests * Remove CLIF heap from fcmp-mem-bug.clif test * Add a mode for compiling `.wat` all the way to assembly in filetests * Also generate WAT to assembly tests in `make-load-store-tests.sh` * cargo fmt * Reinstate `f{de,pro}mote.clif` tests without the heap bits * Remove undefined doc link * Remove outdated SVG and dot file from docs * Add docs about `None` returns for base address computation helpers * Factor out `env.heap_access_spectre_mitigation()` to a local * Expand docs for `FuncEnvironment::heaps` trait method * Restore f{de,pro}mote+load clif runtests with stack memory	2022-12-15 00:26:45 +00:00
Alex Crichton	3861f667a2	Update some wasm-tools crates (#5422 ) Notably this pulls in https://github.com/bytecodealliance/wasm-tools/pull/862 which should fix some fuzz bugs on oss-fuzz.	2022-12-12 18:34:29 -06:00
Nick Fitzgerald	f2e1eaa847	cranelift-filetest: Add support for Wasm-to-CLIF translation filetests (#5412 ) This adds support for `.wat` tests in `cranelift-filetest`. The test runner translates the WAT to Wasm and then uses `cranelift-wasm` to translate the Wasm to CLIF. These tests are always precise output tests. The test expectations can be updated by running tests with the `CRANELIFT_TEST_BLESS=1` environment variable set, similar to our compile precise output tests. The test's expected output is contained in the last comment in the test file. The tests allow for configuring the kinds of heaps used to implement Wasm linear memory via TOML in a `;;!` comment at the start of the test. To get ISA and Cranelift flags parsing available in the filetests crate, I had to move the `parse_sets_and_triple` helper from the `cranelift-tools` binary crate to the `cranelift-reader` crate, where I think it logically fits. Additionally, I had to make some more bits of `cranelift-wasm`'s dummy environment `pub` so that I could properly wrap and compose it with the environment used for the `.wat` tests. I don't think this is a big deal, but if we eventually want to clean this stuff up, we can probably remove the dummy environments completely, remove `translate_module`, and fold them into these new test environments and test runner (since Wasmtime isn't using those things anyways).	2022-12-12 19:31:29 +00:00
Saúl Cabrera	28cfa57533	cranelift: Small documentation fixes (#5377 ) * `translate_operator` doesn't return a boolean. * `from_base_offset` doesn't panic if offset is smaller than base.	2022-12-06 00:46:58 +00:00
wasmtime-publish	a28d4d3c89	Bump Wasmtime to 5.0.0 (#5372 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-12-05 08:38:57 -06:00
Nick Fitzgerald	1eeec7b698	cranelift-wasm: Remove `ModuleTranslationState` (#5365 ) * cranelift-wasm: Remove `ModuleTranslationState` We were putting data into it, but never reading data out of it. Can be removed. * cranelift-wasm: move `funct_state.rs` sub module to `state.rs` Since it is the only submodule of `state` it can just be the whole `state` module.	2022-12-01 19:04:36 +00:00
Alex Crichton	9bf2a8e663	Remove some dead code in the cranelift-wasm crate (#5290 ) * Remove some dead code in the cranelift-wasm crate * Remove some more dead code	2022-11-17 16:28:11 +00:00
Alex Crichton	0548952319	Update wasm-tools crates (#5248 ) No major updates, just keeping up-to-date.	2022-11-10 21:23:20 +00:00
Nick Fitzgerald	47fa1ad6a8	Rework bounds checking for atomic operations (#5239 ) Before, we would do a `heap_addr` to translate the given Wasm memory address into a native memory address and pass it into the libcall that implemented the atomic operation, which would then treat the address as a Wasm memory address and pass it to `validate_atomic_addr` to be bounds checked a second time. This is a bit nonsensical, as we are validating a native memory address as if it were a Wasm memory address. Now, we no longer do a `heap_addr` to translate the Wasm memory address to a native memory address. Instead, we pass the Wasm memory address to the libcall, and the libcall is responsible for doing the bounds check (by calling `validate_atomic_addr` with the correct type of memory address now).	2022-11-09 16:19:43 -08:00
Nick Fitzgerald	fc62d4ad65	Cranelift: Make `heap_addr` return calculated `base + index + offset` (#5231 ) * Cranelift: Make `heap_addr` return calculated `base + index + offset` Rather than return just the `base + index`. (Note: I've chosen to use the nomenclature "index" for the dynamic operand and "offset" for the static immediate.) This move the addition of the `offset` into `heap_addr`, instead of leaving it for the subsequent memory operation, so that we can Spectre-guard the full address, and not allow speculative execution to read the first 4GiB of memory. Before this commit, we were effectively doing load(spectre_guard(base + index) + offset) Now we are effectively doing load(spectre_guard(base + index + offset)) Finally, this also corrects `heap_addr`'s documented semantics to say that it returns an address that will trap on access if `index + offset + access_size` is out of bounds for the given heap, rather than saying that the `heap_addr` itself will trap. This matches the implemented behavior for static memories, and after https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked on this commit) will also match the implemented behavior for dynamic memories. * Update heap_addr docs * Factor out `offset + size` to a helper	2022-11-09 19:53:51 +00:00
Ulrich Weigand	3e5938e65a	Support big- and little-endian lane order with bitcast (#5196 ) Add a MemFlags operand to the bitcast instruction, where only the `big` and `little` flags are accepted. These define the lane order to be used when casting between types of different lane counts. Update all users to pass an appropriate MemFlags argument. Implement lane swaps where necessary in the s390x back-end. This is the final part necessary to fix https://github.com/bytecodealliance/wasmtime/issues/4566.	2022-11-07 14:41:10 -08:00
wasmtime-publish	08ef518c95	Bump Wasmtime to 4.0.0 (#5209 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-11-06 13:32:34 -06:00
Afonso Bordado	3ef30b5b67	cranelift: Rename `i{min,max}` to `s{min,max}` (#5187 ) This brings these instructions with our general naming convention of signed instructions being prefixed with `s`.	2022-11-03 18:20:33 +00:00
Trevor Elliott	aeceea28e2	Remove trapif and trapff (#5162 ) This branch removes the trapif and trapff instructions, in favor of using an explicit comparison and trapnz. This moves us closer to removing iflags and fflags, but introduces the need to implement instructions like iadd_cout in the x64 and aarch64 backends.	2022-11-03 09:25:11 -07:00
Ulrich Weigand	961107ec63	Merge raw_bitcast and bitcast (#5175 ) - Allow bitcast for vectors with differing lane widths - Remove raw_bitcast IR instruction - Change all users of raw_bitcast to bitcast - Implement support for no-op bitcast cases across backends This implements the second step of the plan outlined here: https://github.com/bytecodealliance/wasmtime/issues/4566#issuecomment-1234819394	2022-11-02 10:16:27 -07:00
Alex Crichton	bc3285e845	Update wasm-tools crates (#5130 ) * Update wasm-tools crates Mostly just a hygienic update, nothing major here * Fix fuzz compile * Fix test expectations	2022-10-26 18:29:10 +00:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
Jamey Sharp	04b30acad9	Misc cleanups (#5014 ) * Replace resize+copy_from_slice with extend_from_slice Vec::resize initializes the new space, which is wasted effort if we're just going to call `copy_from_slice` on it immediately afterward. Using `extend_from_slice` is simpler, and very slightly faster. If the new size were bigger than the buffer we're copying from, then it would make sense to initialize the excess. But it isn't: it's always exactly the same size. * Move helpers from Context to CompiledCode These methods only use information from Context::compiled_code, so they should live on CompiledCode instead. * Remove an unnecessary #[cfg_attr] There are other uses of `#[allow(clippy::too_many_arguments)]` in this file, so apparently it doesn't need to be guarded by the "cargo-clippy" feature. * Fix a few comments Two of these were wrong/misleading: - `FunctionBuilder::new` does not clear the provided func_ctx. It does debug-assert that the context is already clear, but I don't think that's worth a comment. - `switch_to_block` does not "create values for the arguments." That's done by the combination of `append_block_params_for_function_params` and `declare_wasm_parameters`. * wasmtime-cranelift: Misc cleanups The main change is to use the `CompiledCode` reference we already had instead of getting it out of `Context` repeatedly. This removes a bunch of `unwrap()` calls. * wasmtime-cranelift: Factor out uncached compile	2022-10-05 10:35:59 -07:00
wasmtime-publish	a9be4a9b56	Bump Wasmtime to 3.0.0 (#5016 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-10-05 09:30:55 -05:00
Alex Crichton	2607590d8c	Update the wasm-tools family of crates (#5010 ) * Update the wasm-tools family of crates Only minor updates here, mostly internal changes and no binary-related changes today. * Fix test expectation	2022-10-04 16:26:22 -05:00
Jamey Sharp	d35c508436	cranelift-frontend: Replace Vecs with ListPools (#5001 ) * Elide redundant sentinel values The `undef_variables` lists were a binding from Variable to Value, but the Values were always equal to a suffix of the block's parameters. So instead of storing another copy, we can just get them back from the block parameters. According to DHAT, this decreases total memory allocated and number of bytes written, and increases number of bytes read and instructions retired, but all by small fractions of a percent. According to hyperfine, main is "1.00 ± 0.01 times faster". * Use entity_impl for cranelift_frontend::Variable Instead of hand-coding essentially the same thing. * Keep undefined variables in a ListPool According to DHAT, this improves every measure of performance (instructions retired, total memory allocated, max heap size, bytes read, and bytes written), although by fractions of a percent. According to hyperfine the difference is nearly zero, but on Spidermonkey this branch is "1.01 ± 0.00 times faster" than main. * Elide redundant block IDs In a list of predecessors, we previously kept both the jump instruction that points to the current block, and the block where that instruction resides. But we can look up the block from the instruction as long as we have access to the current Layout, which we do everywhere that it was necessary. So don't store the block, just store the instruction. * Keep predecessor definitions in a ListPool * Make append_jump_argument independent of self This makes it easier to reason about borrow-checking issues. * Reuse `results` instead of re-doing variable lookup This eliminates three array lookups per predecessor by hanging on to the results of earlier steps a little longer. This only works now because I previously removed the need to borrow all of `self`, which otherwise prevented keeping a borrow of self.results alive. I had experimented with using `Vec::split_off` to copy the relevant chunk of results to a temporary heap allocation, but the extra allocation and copy was measurably slower. So it's important that this is just a borrow. * Cache single-predecessor block ID when sealing Of the code in cranelift_frontend, `use_var` is the second-hottest path, sitting close behind the `build` function that's used when inserting every new instruction. This makes sense given that the operands of a new instruction usually need to be looked up immediately before building the instruction. So making the single-predecessor loops in `find_var` and `use_var_local` do fewer memory accesses and execute fewer instructions turns out to have a measurable effect. It's still only a small fraction of a percent overall since cranelift-frontend is only a few percent of total runtime. This patch keeps a block ID in the SSABlockData, which is None unless both the block is sealed and it has exactly one predecessor. Doing so avoids two array lookups on each iteration of the two loops. According to DHAT, compared with main, at this point this PR uses 0.3% less memory at max heap, reads 0.6% fewer bytes, and writes 0.2% fewer bytes. According to Hyperfine, this PR is "1.01 ± 0.01 times faster" than main when compiling Spidermonkey. On the other hand, Sightglass says main is 1.01x faster than this PR on the same benchmark by CPU cycles. In short, actual effects are too small to measure reliably.	2022-10-03 14:29:12 -07:00
Jamey Sharp	3fa545bd89	Cleanup cranelift-frontend (#4989 ) * cranelift-wasm: Assume block is reachable In handling the WebAssembly "end" operator, cranelift-wasm had logic to skip generating a jump instruction if the block was both unreachable and "pristine", meaning no instructions had been added. However, `translate_operator` checks first that `state.reachable` is true, so this logic only runs when cranelift-wasm believes that the current block _is_ reachable. Therefore the condition should always be true, whether the block is pristine or not. I've left a debug_assert in case `state.reachable` ever doesn't agree with `builder.is_unreachable()`, but the assert doesn't fail in any of the tests. We'll see if fuzzing finds something. Anyway, outside of cranelift-frontend, this eliminates the only use of `is_pristine()`, and there were no uses of `is_filled()`. So I've made both of those private. They're now only used in a nearby debug assert. * cranelift-frontend: Clarify pristine/filled states There was a comment here saying "A filled block cannot be pristine." Given that the intent was for those two states to be mutually exclusive, I've replaced the two booleans with a three-state enum. I also replaced all reads of these two flags with method calls. In all but one case these are only checked in debug assertions, so I don't even care whether they get inlined. They're easier to read, and this will make it easier to replace their implementations, which I hope to do soon. Finally, I replaced all assignments to either flag with an appropriate assignment of the corresponding enum state. Keep in mind this correspondence between the new enum and the old flags: - Empty: pristine true, filled false - Partial: pristine false, filled false - Filled: pristine false, filled true Every existing update to these flags could only move to a later state. (For example, Partial couldn't go back to Empty.) In the old flags that meant that pristine could only go from true to false, and filled could only go from false to true. `fill_current_block` was a weird case because at first glance it looks like it could allow both pristine and filled to be true at the same time. However, it's only called from `FuncInstBuilder::build`, which calls `ensure_inserted_block` before doing anything else, and _that_ cleared the pristine flag. Similarly, `handle_ssa_side_effects` looks like it could allow both pristine and filled to be true for anything in `split_blocks_created`. However, those blocks are created by SSABuilder, so their BlockData is not initialized by `create_block`, and instead uses BlockData::default. The `Default` implementation here previously set both flags false, while `create_block` would instead set pristine to true. So these split blocks were correctly set to the Filled state, and after this patch they are still set correctly. * cranelift-frontend: Separate SSA and user block params Previously there was a `user_param_count` field in BlockData, used purely to debug-assert that no user parameters are added to a block after `use_var` adds SSA parameters. Instead, this patch enforces a strict phase separation between the period after a block is created when user parameters can be added to it, and the period when `use_var` may be called and instructions may be added. I'm assuming that calls to `use_var` are _always_ followed by inserting one or more instructions into the block. (If you don't want to insert an instruction, why do you need to know where instructions in this block would get variable definitions from?) This patch has no visible effect for callers which follow that rule. However, it was previously legal to call `use_var`, then append a block parameter before adding instructions, so long as `use_var` didn't actually need to add a block parameter. That could only happen if the current block is sealed and has exactly one predecessor. So anyone who was counting on this behavior was playing a dangerous game anyway. * cranelift-frontend: Defer initializing block data Every reference to the func_ctx.status SecondaryMap will automatically create the appropriate entries on-demand, with the sole exception of `finalize`. In that function, debug assertions use SecondaryMap::keys to find out which blocks need to be checked. However, those assertions always succeed for blocks which never had any instructions added. So it's okay to skip them for blocks which aren't touched after `create_block`.	2022-09-30 14:11:19 -07:00
Alex Crichton	29c7de7340	Update wasm-tools dependencies (#4970 ) * Update wasm-tools dependencies This update brings in a number of features such as: * The component model binary format and AST has been slightly adjusted in a few locations. Names are dropped from parameters/results now in the internal representation since they were not used anyway. At this time the ability to bind a multi-return function has not been exposed. * The `wasmparser` validator pass will now share allocations with prior functions, providing what's probably a very minor speedup for Wasmtime itself. * The text format for many component-related tests now requires named parameters. * Some new relaxed-simd instructions are updated to be ignored. I hope to have a follow-up to expose the multi-return ability to the embedding API of components. * Update audit information for new crates	2022-09-27 13:12:34 -05:00
Alex Crichton	ee2ef5bdd0	Tidy up some more deps with workspace dependencies (#4965 ) I missed these in #4905 and noticed just now but wanted to handle them to ensure we uniformly handle deps in the workspace.	2022-09-26 17:27:35 -07:00
Alex Crichton	7b311004b5	Leverage Cargo's workspace inheritance feature (#4905 ) * Leverage Cargo's workspace inheritance feature This commit is an attempt to reduce the complexity of the Cargo manifests in this repository with Cargo's workspace-inheritance feature becoming stable in Rust 1.64.0. This feature allows specifying fields in the root workspace `Cargo.toml` which are then reused throughout the workspace. For example this PR shares definitions such as: * All of the Wasmtime-family of crates now use `version.workspace = true` to have a single location which defines the version number. * All crates use `edition.workspace = true` to have one default edition for the entire workspace. * Common dependencies are listed in `[workspace.dependencies]` to avoid typing the same version number in a lot of different places (e.g. the `wasmparser = "0.89.0"` is now in just one spot. Currently the workspace-inheritance feature doesn't allow having two different versions to inherit, so all of the Cranelift-family of crates still manually specify their version. The inter-crate dependencies, however, are shared amongst the root workspace. This feature can be seen as a method of "preprocessing" of sorts for Cargo manifests. This will help us develop Wasmtime but shouldn't have any actual impact on the published artifacts -- everything's dependency lists are still the same. * Fix wasi-crypto tests	2022-09-26 11:30:01 -05:00
Jamey Sharp	bd870a9d6c	Shrink all SmallVecs by 8 bytes (#4951 ) We weren't using the "union" cargo feature for the smallvec crate, which reduces the size of a SmallVec by one machine word. This feature requires Rust 1.49 but we already require much newer versions. When using Wasmtime to compile pulldown-cmark from Sightglass, this saves a decent amount of memory allocations and writes. According to `valgrind --tool=dhat`: - 6.2MiB (3.69%) less memory allocated over the program's lifetime - 0.5MiB (4.13%) less memory allocated at maximum heap size - 5.5MiB (1.88%) fewer bytes written to - 0.44% fewer instructions executed Sightglass reports a statistically significant runtime improvement too: compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 24379323.60 ± 20051394.04 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.01x to 1.13x faster than main-be690a468.so! [227506364 355007998.78 423280514] main-be690a468.so [227686018 330628675.18 406025344] shrink-abiarg-0406da67c.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 360151622.56 ± 278294316.90 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.01x to 1.07x faster than main-be690a468.so! [8709162212 8911001926.44 9535111576] main-be690a468.so [5058015392 8550850303.88 9282148438] shrink-abiarg-0406da67c.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 6936570.28 ± 6897696.38 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.00x to 1.08x faster than main-be690a468.so! [155810934 175260571.20 234737344] main-be690a468.so [119128240 168324000.92 257451074] shrink-abiarg-0406da67c.so	2022-09-23 16:32:13 -07:00
Alex Crichton	65930640f8	Bump Wasmtime to 2.0.0 (#4874 ) This commit replaces #4869 and represents the actual version bump that should have happened had I remembered to bump the in-tree version of Wasmtime to 1.0.0 prior to the branch-cut date. Alas!	2022-09-06 13:49:56 -05:00
Alex Crichton	57dca934ad	Upgrade wasm-tools crates, namely the component model (#4715 ) * Upgrade wasm-tools crates, namely the component model This commit pulls in the latest versions of all of the `wasm-tools` family of crates. There were two major changes that happened in `wasm-tools` in the meantime: * bytecodealliance/wasm-tools#697 - this commit introduced a new API for more efficiently reading binary operators from a wasm binary. The old `Operator`-based reading was left in place, however, and continues to be what Wasmtime uses. I hope to update Wasmtime in a future PR to use this new API, but for now the biggest change is... * bytecodealliance/wasm-tools#703 - this commit was a major update to the component model AST. This commit almost entirely deals with the fallout of this change. The changes made to the component model were: 1. The `unit` type no longer exists. This was generally a simple change where the `Unit` case in a few different locations were all removed. 2. The `expected` type was renamed to `result`. This similarly was relatively lightweight and mostly just a renaming on the surface. I took this opportunity to rename `val::Result` to `val::ResultVal` and `types::Result` to `types::ResultType` to avoid clashing with the standard library types. The `Option`-based types were handled with this as well. 3. The payload type of `variant` and `result` types are now optional. This affected many locations that calculate flat type representations, ABI information, etc. The `#[derive(ComponentType)]` macro now specifically handles Rust-defined `enum` types which have no payload to the equivalent in the component model. 4. Functions can now return multiple parameters. This changed the signature of invoking component functions because the return value is now bound by `ComponentNamedList` (renamed from `ComponentParams`). This had a large effect in the tests, fuzz test case generation, etc. 5. Function types with 2-or-more parameters/results must uniquely name all parameters/results. This mostly affected the text format used throughout the tests. I haven't added specifically new tests for multi-return but I changed a number of tests to use it. Additionally I've updated the fuzzers to all exercise multi-return as well so I think we should get some good coverage with that. * Update version numbers * Use crates.io	2022-08-17 16:17:34 +00:00
Benjamin Bouvier	8a9b1a9025	Implement an incremental compilation cache for Cranelift (#4551 ) This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. After the suggestion of Chris, `Function` has been split into mostly two parts: - on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`. - on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on. Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache: - most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set. - user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`. - some refactorings have been made for function names: - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name. - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with. The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions. A basic fuzz target has been introduced that tries to do the bare minimum: - check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function. - check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache. - This last check is less efficient and less likely to happen, so probably should be rethought a bit. Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip. Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. Fixes #4155.	2022-08-12 16:47:43 +00:00
wasmtime-publish	412fa04911	Bump Wasmtime to 0.41.0 (#4620 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-08-04 20:02:19 -05:00
Chris Fallin	43f1765272	Cranellift: remove Baldrdash support and related features. (#4571 ) * Cranellift: remove Baldrdash support and related features. As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team has recently determined that their current form of integration with Cranelift is too hard to maintain, and they have chosen to remove it from their codebase. If and when they decide to build updated support for Cranelift, they will adopt different approaches to several details of the integration. In the meantime, after discussion with the SpiderMonkey folks, they agree that it makes sense to remove the bits of Cranelift that exist to support the integration ("Baldrdash"), as they will not need them. Many of these bits are difficult-to-maintain special cases that are not actually tested in Cranelift proper: for example, the Baldrdash integration required Cranelift to emit function bodies without prologues/epilogues, and instead communicate very precise information about the expected frame size and layout, then stitched together something post-facto. This was brittle and caused a lot of incidental complexity ("fallthrough returns", the resulting special logic in block-ordering); this is just one example. As another example, one particular Baldrdash ABI variant processed stack args in reverse order, so our ABI code had to support both traversal orders. We had a number of other Baldrdash-specific settings as well that did various special things. This PR removes Baldrdash ABI support, the `fallthrough_return` instruction, and pulls some threads to remove now-unused bits as a result of those two, with the understanding that the SpiderMonkey folks will build new functionality as needed in the future and we can perhaps find cleaner abstractions to make it all work. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425 * Review feedback. * Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations. The debugger tests invoke `wasmtime` from within each test case under the control of a debugger (gdb or lldb). Some of these tests started to inexplicably fail in CI with unrelated changes, and the failures were only inconsistently reproducible locally. It seems to be cache related: if we disable cached compilation on the nested `wasmtime` invocations, the tests consistently pass. * Review feedback.	2022-08-02 19:37:56 +00:00
Jamey Sharp	ad050e6fb2	cranelift-wasm: Only allocate if vectors need bitcasts (#4543 ) For wasm programs using SIMD vector types, the type known at function entry or exit may not match the type used within the body of the function, so we have to bitcast them. We have to check all calls and returns for this condition, which involves comparing a subset of a function's signature with the CLIF types we're trying to use. Currently, this check heap-allocates a short-lived Vec for the appropriate subset of the signature. But most of the time none of the values need a bitcast. So this patch avoids allocating unless at least one bitcast is actually required, and only saves the pointers to values that need fixing up. I leaned heavily on iterators to keep space usage constant until we discover allocation is necessary after all. I don't think it's possible to eliminate the allocation entirely, because the function signature we're examining is borrowed from the FuncBuilder, but we need to mutably borrow the FuncBuilder to insert the bitcast instructions. Fortunately, the FromIterator implementation for Vec doesn't reserve any heap space if the iterator is empty, so we can unconditionally collect into a Vec and still avoid unnecessary allocations. Since the relationship between a function signature and a list of CLIF values is somewhat complicated, I refactored all the uses of `bitcast_arguments` and `wasm_param_types`. Instead there's `bitcast_wasm_params` and `bitcast_wasm_returns` which share a helper that combines the previous pair of functions into one. According to DHAT, when compiling the Sightglass Spidermonkey benchmark, this avoids 52k allocations averaging about 9 bytes each, which are freed on average within 2k instructions. Most Sightglass benchmarks, including Spidermonkey, show no performance difference with this change. Only one has a slowdown, and it's small: compilation :: nanoseconds :: benchmarks/shootout-matrix/benchmark.wasm Δ = 689373.34 ± 593720.78 (confidence = 99%) lazy-bitcast.so is 0.94x to 1.00x faster than main-e121c209f.so! main-e121c209f.so is 1.00x to 1.06x faster than lazy-bitcast.so! [19741713 21375844.46 32976047] lazy-bitcast.so [19345471 20686471.12 30872267] main-e121c209f.so But several Sightglass benchmarks have notable speed-ups, with smaller improvements for shootout-ed25519, meshoptimizer, and pulldown-cmark, and larger ones as follows: compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm Δ = 20071545.47 ± 2950014.92 (confidence = 99%) lazy-bitcast.so is 1.26x to 1.36x faster than main-e121c209f.so! main-e121c209f.so is 0.73x to 0.80x faster than lazy-bitcast.so! [55995164 64849257.68 89083031] lazy-bitcast.so [79382460 84920803.15 98016185] main-e121c209f.so compilation :: nanoseconds :: benchmarks/blake3-scalar/benchmark.wasm Δ = 16620780.61 ± 5395162.13 (confidence = 99%) lazy-bitcast.so is 1.14x to 1.28x faster than main-e121c209f.so! main-e121c209f.so is 0.77x to 0.88x faster than lazy-bitcast.so! [54604352 79877776.35 103666598] lazy-bitcast.so [94011835 96498556.96 106200091] main-e121c209f.so compilation :: nanoseconds :: benchmarks/intgemm-simd/benchmark.wasm Δ = 36891254.34 ± 12403663.50 (confidence = 99%) lazy-bitcast.so is 1.12x to 1.24x faster than main-e121c209f.so! main-e121c209f.so is 0.79x to 0.90x faster than lazy-bitcast.so! [131610845 201289587.88 247341883] lazy-bitcast.so [232065032 238180842.22 250957563] main-e121c209f.so	2022-07-27 16:41:30 -07:00
Alex Crichton	174b60dcf7	Add `.wast` support for invoking components (#4526 ) This commit builds on bytecodealliance/wasm-tools#690 to add support to testing of the component model to execute functions when running `.wast` files. This support is all built on #4442 as functions are invoked through a "dynamic" API. Right now the testing and integration is fairly crude but I'm hoping that we can try to improve it over time as necessary. For now this should provide a hopefully more convenient syntax for unit tests and the like.	2022-07-27 21:02:16 +00:00
Afonso Bordado	0508932174	cranelift: Align Scalar and SIMD shift semantics (#4520 ) * cranelift: Reorganize test suite Group some SIMD operations by instruction. * cranelift: Deduplicate some shift tests Also, new tests with the mod behaviour * aarch64: Lower shifts with mod behaviour * x64: Lower shifts with mod behaviour * wasmtime: Don't mask SIMD shifts	2022-07-27 17:54:00 +00:00
Alex Crichton	9ae060a12a	Update some dependency versions used by Wasmtime (#4405 ) No major motivation here, mostly just dependency gardening.	2022-07-07 18:47:39 +00:00
Alex Crichton	41ba851a95	Bump versions of wasm-tools crates (#4380 ) * Bump versions of wasm-tools crates Note that this leaves new features in the component model, outer type aliases for core wasm types, unimplemented for now. * Move to crates.io-based versions of tools	2022-07-05 14:23:03 -05:00
wasmtime-publish	7c428bbd62	Bump Wasmtime to 0.40.0 (#4378 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-07-05 09:10:52 -05:00
Alex Crichton	7d7ddceb17	Update wasm-tools crates (#4246 ) This commit updates the wasm-tools family of crates, notably pulling in the refactorings and updates from bytecodealliance/wasm-tools#621 for the latest iteration of the component model. This commit additionally updates all support for the component model for these changes, notably: * Many bits and pieces of type information was refactored. Many `FooTypeIndex` namings are now `TypeFooIndex`. Additionally there is now `TypeIndex` as well as `ComponentTypeIndex` for the two type index spaces in a component. * A number of new sections are now processed to handle the core and component variants. * Internal maps were split such as the `funcs` map into `component_funcs` and `funcs` (same for `instances`). * Canonical options are now processed individually instead of one bulk `into` definition. Overall this was not a major update to the internals of handling the component model in Wasmtime. Instead this was mostly a surface-level refactoring to make sure that everything lines up with the new binary format for components. * All text syntax used in tests was updated to the new syntax.	2022-06-09 11:16:07 -05:00
wasmtime-publish	55946704cb	Bump Wasmtime to 0.39.0 (#4225 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-06-06 09:12:47 -05:00
Chris Fallin	0824abbae4	Add a basic alias analysis with redundant-load elim and store-to-load fowarding opts. (#4163 ) This PR adds a basic alias analysis, and optimizations that use it. This is a "mid-end optimization": it operates on CLIF, the machine-independent IR, before lowering occurs. The alias analysis (or maybe more properly, a sort of memory-value analysis) determines when it can prove a particular memory location is equal to a given SSA value, and when it can, it replaces any loads of that location. This subsumes two common optimizations: * Redundant load elimination: when the same memory address is loaded two times, and it can be proven that no intervening operations will write to that memory, then the second load is redundant and its result must be the same as the first. We can use the first load's result and remove the second load. * Store-to-load forwarding: when a load can be proven to access exactly the memory written by a preceding store, we can replace the load's result with the store's data operand, and remove the load. Both of these optimizations rely on a "last store" analysis that is a sort of coloring mechanism, split across disjoint categories of abstract state. The basic idea is that every memory-accessing operation is put into one of N disjoint categories; it is disallowed for memory to ever be accessed by an op in one category and later accessed by an op in another category. (The frontend must ensure this.) Then, given this, we scan the code and determine, for each memory-accessing op, when a single prior instruction is a store to the same category. This "colors" the instruction: it is, in a sense, a static name for that version of memory. This analysis provides an important invariant: if two operations access memory with the same last-store, then no other store can alias in the time between that last store and these operations. This must-not-alias property, together with a check that the accessed address is exactly the same (same SSA value and offset), and other attributes of the access (type, extension mode) are the same, let us prove that the results are the same. Given last-store info, we scan the instructions and build a table from "memory location" key (last store, address, offset, type, extension) to known SSA value stored in that location. A store inserts a new mapping. A load may also insert a new mapping, if we didn't already have one. Then when a load occurs and an entry already exists for its "location", we can reuse the value. This will be either RLE or St-to-Ld depending on where the value came from. Note that this does work across basic blocks: the last-store analysis is a full iterative dataflow pass, and we are careful to check dominance of a previously-defined value before aliasing to it at a potentially redundant load. So we will do the right thing if we only have a "partially redundant" load (loaded already but only in one predecessor block), but we will also correctly reuse a value if there is a store or load above a loop and a redundant load of that value within the loop, as long as no potentially-aliasing stores happen within the loop.	2022-05-20 13:19:32 -07:00

1 2 3 4 5 ...

386 Commits