wasmtime

Author	SHA1	Message	Date
Chris Fallin	c15c4ed23d	Cranelift: upgrade to regalloc2 0.6.1. (#5799 ) * Cranelift: upgrade to regalloc2 0.6.1. Fixes #5791 by pulling in bytecodealliance/regalloc2#113. * Add cargo-vet entry for regalloc2 0.6.1.	2023-02-16 03:22:58 +00:00
Trevor Elliott	f04decc4a1	Use capstone to validate precise-output tests (#5780 ) Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.	2023-02-15 16:35:10 -08:00
Alphyr	cb150d37ce	Update dependencies (#5513 )	2023-02-14 19:45:15 +00:00
Trevor Elliott	116e5a665f	Bump regalloc2 to 0.6.0 (#5742 ) * Bump regalloc2 * Certify regalloc2 0.6.0	2023-02-07 15:57:49 -08:00
wasmtime-publish	482f541101	Bump Wasmtime to 7.0.0 (#5712 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-02-06 09:10:19 -06:00
wasmtime-publish	7bfbec1b57	Bump Wasmtime to 6.0.0 (#5521 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2023-01-05 09:46:01 -06:00
Nick Fitzgerald	c0b587ac5f	Remove heaps from core Cranelift, push them into `cranelift-wasm` (#5386 ) * cranelift-wasm: translate Wasm loads into lower-level CLIF operations Rather than using `heap_{load,store,addr}`. * cranelift: Remove the `heap_{addr,load,store}` instructions These are now legalized in the `cranelift-wasm` frontend. * cranelift: Remove the `ir::Heap` entity from CLIF * Port basic memory operation tests to .wat filetests * Remove test for verifying CLIF heaps * Remove `heap_addr` from replace_branching_instructions_and_cfg_predecessors.clif test * Remove `heap_addr` from readonly.clif test * Remove `heap_addr` from `table_addr.clif` test * Remove `heap_addr` from the simd-fvpromote_low.clif test * Remove `heap_addr` from simd-fvdemote.clif test * Remove `heap_addr` from the load-op-store.clif test * Remove the CLIF heap runtest * Remove `heap_addr` from the global_value.clif test * Remove `heap_addr` from fpromote.clif runtests * Remove `heap_addr` from fdemote.clif runtests * Remove `heap_addr` from memory.clif parser test * Remove `heap_addr` from reject_load_readonly.clif test * Remove `heap_addr` from reject_load_notrap.clif test * Remove `heap_addr` from load_readonly_notrap.clif test * Remove `static-heap-without-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` to generating `.wat` tests. * Remove `static-heap-with-guard-pages.clif` test Will be subsumed when we port `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove more heap tests These will be subsumed by porting `make-heap-load-store-tests.sh` over to `.wat` tests. * Remove `heap_addr` from `simple-alias.clif` test * Remove `heap_addr` from partial-redundancy.clif test * Remove `heap_addr` from multiple-blocks.clif test * Remove `heap_addr` from fence.clif test * Remove `heap_addr` from extends.clif test * Remove runtests that rely on heaps Heaps are not a thing in CLIF or the interpreter anymore * Add generated load/store `.wat` tests * Enable memory-related wasm features in `.wat` tests * Remove CLIF heap from fcmp-mem-bug.clif test * Add a mode for compiling `.wat` all the way to assembly in filetests * Also generate WAT to assembly tests in `make-load-store-tests.sh` * cargo fmt * Reinstate `f{de,pro}mote.clif` tests without the heap bits * Remove undefined doc link * Remove outdated SVG and dot file from docs * Add docs about `None` returns for base address computation helpers * Factor out `env.heap_access_spectre_mitigation()` to a local * Expand docs for `FuncEnvironment::heaps` trait method * Restore f{de,pro}mote+load clif runtests with stack memory	2022-12-15 00:26:45 +00:00
Trevor Elliott	ab6c8e1a1a	Bump regalloc2 to version 0.5.1 (#5387 ) Bump regalloc2 to version 0.5.1.	2022-12-06 15:38:03 -08:00
Chris Fallin	f980defe17	egraph support: rewrite to work in terms of CLIF data structures. (#5382 ) * egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have multiple representations: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2022-12-06 14:58:57 -08:00
wasmtime-publish	a28d4d3c89	Bump Wasmtime to 5.0.0 (#5372 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-12-05 08:38:57 -06:00
Trevor Elliott	f138fc0ed3	Bump regalloc2 to 0.5.0 (#5345 ) * Bump the regalloc2 dependency to 0.5.0 * Replace preg_set_from_machine_env with PRegSet::from * Vet the regalloc2 update	2022-11-29 11:25:35 -08:00
Jamey Sharp	ff5abfd993	cranelift-isle: Minor error-handling cleanups (#5338 ) - Remove remaining references to Miette - Borrow implementation of `line_starts` from codespan-reporting - Clean up a use of `Result` that no longer conflicts with a local definition - When printing plain errors, add a blank line between errors for readability	2022-11-29 03:07:05 +00:00
Jamey Sharp	044b57f334	cranelift-isle: Rewrite error reporting (#5318 ) There were several issues with ISLE's existing error reporting implementation. - When using Miette for more readable error reports, it would panic if errors were reported from multiple files in the same run. - Miette is pretty heavy-weight for what we're doing, with a lot of dependencies. - The `Error::Errors` enum variant led to normalization steps in many places, to avoid using that variant to represent a single error. This commit: - replaces Miette with codespan-reporting - gets rid of a bunch of cargo-vet exemptions - replaces the `Error::Errors` variant with a new `Errors` type - removes source info from `Error` variants so they're easy to construct - adds source info only when formatting `Errors` - formats `Errors` with a custom `Debug` impl - shares common code between ISLE's callers, islec and cranelift-codegen - includes a source snippet even with fancy-errors disabled I tried to make this a series of smaller commits but I couldn't find any good split points; everything was too entangled with everything else.	2022-11-23 14:20:48 -08:00
wasmtime-publish	08ef518c95	Bump Wasmtime to 4.0.0 (#5209 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-11-06 13:32:34 -06:00
Nick Fitzgerald	3c496d8cdc	Update `regalloc2` to v0.4.2 (#5169 )	2022-11-01 11:18:19 -07:00
Chris Fallin	2be12a5167	egraph-based midend: draw the rest of the owl (productionized). (#4953 ) * egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2022-10-11 18:15:53 -07:00
Benjamin Bouvier	d68ca3711b	Upgrade sha2 to 0.10.2 in wasmtime (#4749 )	2022-10-10 09:40:40 +00:00
wasmtime-publish	a9be4a9b56	Bump Wasmtime to 3.0.0 (#5016 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-10-05 09:30:55 -05:00
yuyang-ok	cdecc858b4	add riscv64 backend for cranelift. (#4271 ) Add a RISC-V 64 (`riscv64`, RV64GC) backend. Co-authored-by: yuyang <756445638@qq.com> Co-authored-by: Chris Fallin <chris@cfallin.org> Co-authored-by: Afonso Bordado <afonsobordado@az8.co>	2022-09-27 17:30:31 -07:00
Alex Crichton	7b311004b5	Leverage Cargo's workspace inheritance feature (#4905 ) * Leverage Cargo's workspace inheritance feature This commit is an attempt to reduce the complexity of the Cargo manifests in this repository with Cargo's workspace-inheritance feature becoming stable in Rust 1.64.0. This feature allows specifying fields in the root workspace `Cargo.toml` which are then reused throughout the workspace. For example this PR shares definitions such as: * All of the Wasmtime-family of crates now use `version.workspace = true` to have a single location which defines the version number. * All crates use `edition.workspace = true` to have one default edition for the entire workspace. * Common dependencies are listed in `[workspace.dependencies]` to avoid typing the same version number in a lot of different places (e.g. the `wasmparser = "0.89.0"` is now in just one spot. Currently the workspace-inheritance feature doesn't allow having two different versions to inherit, so all of the Cranelift-family of crates still manually specify their version. The inter-crate dependencies, however, are shared amongst the root workspace. This feature can be seen as a method of "preprocessing" of sorts for Cargo manifests. This will help us develop Wasmtime but shouldn't have any actual impact on the published artifacts -- everything's dependency lists are still the same. * Fix wasi-crypto tests	2022-09-26 11:30:01 -05:00
Jamey Sharp	bd870a9d6c	Shrink all SmallVecs by 8 bytes (#4951 ) We weren't using the "union" cargo feature for the smallvec crate, which reduces the size of a SmallVec by one machine word. This feature requires Rust 1.49 but we already require much newer versions. When using Wasmtime to compile pulldown-cmark from Sightglass, this saves a decent amount of memory allocations and writes. According to `valgrind --tool=dhat`: - 6.2MiB (3.69%) less memory allocated over the program's lifetime - 0.5MiB (4.13%) less memory allocated at maximum heap size - 5.5MiB (1.88%) fewer bytes written to - 0.44% fewer instructions executed Sightglass reports a statistically significant runtime improvement too: compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 24379323.60 ± 20051394.04 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.01x to 1.13x faster than main-be690a468.so! [227506364 355007998.78 423280514] main-be690a468.so [227686018 330628675.18 406025344] shrink-abiarg-0406da67c.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 360151622.56 ± 278294316.90 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.01x to 1.07x faster than main-be690a468.so! [8709162212 8911001926.44 9535111576] main-be690a468.so [5058015392 8550850303.88 9282148438] shrink-abiarg-0406da67c.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 6936570.28 ± 6897696.38 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.00x to 1.08x faster than main-be690a468.so! [155810934 175260571.20 234737344] main-be690a468.so [119128240 168324000.92 257451074] shrink-abiarg-0406da67c.so	2022-09-23 16:32:13 -07:00
Chris Fallin	19bd8687ac	Upgrade to regalloc2 0.4.1. (#4945 ) * Upgrade to regalloc2 0.4.1. Incorporates bytecodealliance/regalloc2#85, which fixes a fuzzbug related to constraints and liverange splits. * Add audit of regalloc2 upgrade.	2022-09-23 00:00:06 +00:00
Chris Fallin	05cbd667c7	Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892 ) * Cranelift: use regalloc2 constraints on caller side of ABI code. This PR updates the shared ABI code and backends to use register-operand constraints rather than explicit pinned-vreg moves for register arguments and return values. The s390x backend was not updated, because it has its own implementation of ABI code. Ideally we could converge back to the code shared by x64 and aarch64 (which didn't exist when s390x ported calls to ISLE, so the current situation is underestandable, to be clear!). I'll leave this for future work. This PR exposed several places where regalloc2 needed to be a bit more flexible with constraints; it requires regalloc2#74 to be merged and pulled in. * Update to regalloc2 0.3.3. In addition to version bump, this required removing two asserts as `SpillSlot`s no longer carry their class (so we can't assert that they have the correct class). * Review comments. * Filetest updates. * Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade. * Update to regalloc2 0.4.0.	2022-09-21 01:17:04 +00:00
Alex Crichton	65930640f8	Bump Wasmtime to 2.0.0 (#4874 ) This commit replaces #4869 and represents the actual version bump that should have happened had I remembered to bump the in-tree version of Wasmtime to 1.0.0 prior to the branch-cut date. Alas!	2022-09-06 13:49:56 -05:00
Nick Fitzgerald	ae7688059d	Cranelift: Use bump allocation in `remove_constant_phis` pass (#4710 ) * Cranelift: Use bump allocation in `remove_constant_phis` pass This makes compilation 2-6% faster for Sightglass's bz2 benchmark: ``` compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 7290648.36 ± 4245152.07 (confidence = 99%) bump.so is 1.02x to 1.06x faster than main.so! [166388177 183238542.98 214732518] bump.so [172836648 190529191.34 217514271] main.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [182220055 225793551.12 277857575] bump.so [193212613 227784078.61 277175335] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3848442474 4295214144.37 4665127241] bump.so [3969505457 4262415290.10 4563869974] main.so ``` * Add audit for `bumpalo` * Add an audit of `arrayvec` version 0.7.2 * Remove unnecessary `collect` into `Vec` I wasn't able to measure any perf difference here, but its nice to do anyways. * Use a `SecondaryMap` for keeping track of summaries	2022-08-15 21:36:01 +00:00
Benjamin Bouvier	8a9b1a9025	Implement an incremental compilation cache for Cranelift (#4551 ) This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. After the suggestion of Chris, `Function` has been split into mostly two parts: - on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`. - on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on. Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache: - most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set. - user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`. - some refactorings have been made for function names: - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name. - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with. The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions. A basic fuzz target has been introduced that tries to do the bare minimum: - check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function. - check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache. - This last check is less efficient and less likely to happen, so probably should be rethought a bit. Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip. Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. Fixes #4155.	2022-08-12 16:47:43 +00:00
wasmtime-publish	412fa04911	Bump Wasmtime to 0.41.0 (#4620 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-08-04 20:02:19 -05:00
Jamey Sharp	f69acd6187	Upgrade regalloc2 -> 0.3.2 (#4603 ) Includes a modest improvement in memory usage and performance by removing analysis that was only used during fuzzing.	2022-08-04 00:06:13 +00:00
Benjamin Bouvier	8d0224341c	cranelift: Introduce a feature to enable `trace` logs (#4484 ) * Don't use `log::trace` directly but a feature-enabled `trace` macro * Don't emit disassembly based on the log level	2022-08-01 11:19:15 +02:00
Chris Fallin	9c72a0566e	Upgrade to regalloc2 0.3.1. (#4483 ) This includes some changes from @bnjbvr to the trace-logging/annotation to reduce overhead when logging is enabled but only non-RA2 subsystems are at `Trace` level.	2022-07-21 01:22:39 +00:00
Alex Crichton	b9e63fe77a	Update miette dependency to 5.1 (#4412 ) Just some dependency gardening, no other external motivation.	2022-07-07 22:20:09 +00:00
Alex Crichton	9ae060a12a	Update some dependency versions used by Wasmtime (#4405 ) No major motivation here, mostly just dependency gardening.	2022-07-07 18:47:39 +00:00
wasmtime-publish	7c428bbd62	Bump Wasmtime to 0.40.0 (#4378 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-07-05 09:10:52 -05:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Chris Fallin	0d829a57ee	Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. (#4335 ) * Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. * Update RELEASES.md. * Update two compile tests based on slightly shifting regalloc output.	2022-06-27 15:58:54 -05:00
wasmtime-publish	55946704cb	Bump Wasmtime to 0.39.0 (#4225 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-06-06 09:12:47 -05:00
Chris Fallin	ae2c84205f	Upgrade to regalloc2 v0.2.2. (#4222 ) Pulls in an improvement to spillslot allocation (bytecodealliance/regalloc2#56).	2022-06-03 17:12:32 -07:00
Chris Fallin	8f61eb9341	Upgrade to regalloc2 version 0.2.1. (#4199 ) This resolves an edge-case where mul.i128 with an input that continues to be live after the instruction could cause an invalid regalloc constraint (basically, the regalloc did not previously support an instruction use and def both being constrained to the same physical reg; and the "mul" variant used for mul.i128 on x64 was the only instance of such operands in Cranelift). Causes two extra move instructions in the mul.i128 filetest, but that's the price to pay for the slightly more general (works in all cases) handling of the constraints.	2022-06-01 13:26:20 -07:00
Chris Fallin	b830c3cf93	Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182 ) RA2 recently removed the need for a dedicated scratch register for cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive performance impact on function bodies that were register-constrained, as it means that one more register is available. In Sightglass, I measured +5-8% on `blake3-scalar`, at least among current benchmarks.	2022-05-23 12:51:04 -07:00
Chris Fallin	02d5edc591	Upgrade to regalloc2 0.1.3. (#4157 ) * Upgrade to regalloc2 0.1.3. This pulls in bytecodealliance/regalloc2#49, which slightly improves codegen in some cases where a safepoint (for reference-typed values) occurs in the same liverange as a register-constrained use. For example, in bytecodealliance/wasmtime#3785, an extra move instruction appeared and a callee-save register was used (necessitating a more expensive prologue) because of suboptimal splitting heuristics, which this PR fixes. The updated RA2 heuristics appear to have no measured downsides in existing benchmarks and improve the manually-observed codegen issue. * Update filetests where regalloc2 improvement altered behavior with reftypes.	2022-05-18 11:48:40 -07:00
Chris Fallin	5d671952ee	Cranelift: do not check in generated ISLE code; regenerate on every compile. (#4143 ) This PR fixes #4066: it modifies the Cranelift `build.rs` workflow to invoke the ISLE DSL compiler on every compilation, rather than only when the user specifies a special "rebuild ISLE" feature. The main benefit of this change is that it vastly simplifies the mental model required of developers, and removes a bunch of failure modes we have tried to work around in other ways. There is now just one "source of truth", the ISLE source itself, in the repository, and so there is no need to understand a special "rebuild" step and how to handle merge errors. There is no special process needed to develop the compiler when modifying the DSL. And there is no "noise" in the git history produced by constantly-regenerated files. The two main downsides we discussed in #4066 are: - Compile time could increase, by adding more to the "meta" step before the main build; - It becomes less obvious where the source definitions are (everything becomes more "magic"), which makes exploration and debugging harder. This PR addresses each of these concerns: 1. To maintain reasonable compile time, it includes work to cut down the dependencies of the `cranelift-isle` crate to nothing (only the Rust stdlib), in the default build. It does this by putting the error-reporting bits (`miette` crate) under an optional feature, and the logging (`log` crate) under a feature-controlled macro, and manually writing an `Error` impl rather than using `thiserror`. This completely avoids proc macros and the `syn` build slowness. The user can still get nice errors out of `miette`: this is enabled by specifying a Cargo feature `--features isle-errors`. 2. To allow the user to optionally inspect the generated source, which nominally lives in a hard-to-find path inside `target/` now, this PR adds a feature `isle-in-source-tree` that, as implied by the name, moves the target for ISLE generated source into the source tree, at `cranelift/codegen/isle_generated_source/`. It seems reasonable to do this when an explicit feature (opt-in) is specified because this is how ISLE regeneration currently works as well. To prevent surprises, if the feature is not specified, the build fails if this directory exists.	2022-05-11 22:25:24 -07:00
wasmtime-publish	9a6854456d	Bump Wasmtime to 0.38.0 (#4103 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-05-05 13:43:02 -05:00
Alex Crichton	871a9d93f2	Update some dependencies in `Cargo.lock` (#4081 ) * Run a `cargo update` over our dependencies This'll notably fix a `cargo audit` error where we have a pinned version of the `regex` crate which has a CVE assigned to it. * Update to `object` and `hashbrown` crates Prune some duplicate versions showing up from the previous `cargo update`	2022-04-28 11:12:58 -05:00
Chris Fallin	0af8737ec3	Add support for running the regalloc2 checker. (#4043 ) With these fixes, all this PR has to do is instantiate and run the checker on the `regalloc2::Output`. This is off by default, and is enabled by setting the `regalloc_checker` Cranelift option. This restores the old functionality provided by e.g. the `backtracking_checked` regalloc algorithm setting rather than `backtracking` when we were still on regalloc.rs.	2022-04-18 14:06:07 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
wasmtime-publish	78a595ac88	Bump Wasmtime to 0.37.0 (#3994 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-04-05 09:24:28 -05:00
Alex Crichton	7b5176baea	Upgrade all crates to the Rust 2021 edition (#3991 ) * Upgrade all crates to the Rust 2021 edition I've personally started using the new format strings for things like `panic!("some message {foo}")` or similar and have been upgrading crates on a case-by-case basis, but I think it probably makes more sense to go ahead and blanket upgrade everything so 2021 features are always available. * Fix compile of the C API * Fix a warning * Fix another warning	2022-04-04 12:27:12 -05:00
Alex Crichton	c89dc55108	Add a two-week delay to Wasmtime's release process (#3955 ) * Bump to 0.36.0 * Add a two-week delay to Wasmtime's release process This commit is a proposal to update Wasmtime's release process with a two-week delay from branching a release until it's actually officially released. We've had two issues lately that came up which led to this proposal: * In #3915 it was realized that changes just before the 0.35.0 release weren't enough for an embedding use case, but the PR didn't meet the expectations for a full patch release. * At Fastly we were about to start rolling out a new version of Wasmtime when over the weekend the fuzz bug #3951 was found. This led to the desire internally to have a "must have been fuzzed for this long" period of time for Wasmtime changes which we felt were better reflected in the release process itself rather than something about Fastly's own integration with Wasmtime. This commit updates the automation for releases to unconditionally create a `release-X.Y.Z` branch on the 5th of every month. The actual release from this branch is then performed on the 20th of every month, roughly two weeks later. This should provide a period of time to ensure that all changes in a release are fuzzed for at least two weeks and avoid any further surprises. This should also help with any last-minute changes made just before a release if they need tweaking since backporting to a not-yet-released branch is much easier. Overall there are some new properties about Wasmtime with this proposal as well: * The `main` branch will always have a section in `RELEASES.md` which is listed as "Unreleased" for us to fill out. * The `main` branch will always be a version ahead of the latest release. For example it will be bump pre-emptively as part of the release process on the 5th where if `release-2.0.0` was created then the `main` branch will have 3.0.0 Wasmtime. * Dates for major versions are automatically updated in the `RELEASES.md` notes. The associated documentation for our release process is updated and the various scripts should all be updated now as well with this commit. * Add notes on a security patch * Clarify security fixes shouldn't be previewed early on CI	2022-04-01 13:11:10 -05:00
wasmtime-publish	9137b4a50e	Bump Wasmtime to 0.35.0 (#3885 ) [automatically-tag-and-release-this-commit] Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-03-07 15:18:34 -06:00
Chris Fallin	ca0e8d0a1d	Remove incomplete/unmaintained ARM32 backend (for now). (#3799 ) In #3721, we have been discussing what to do about the ARM32 backend in Cranelift. Currently, this backend supports only 32-bit types, which is insufficient for full Wasm-MVP; it's missing other critical bits, like floating-point support; and it has only ever been exercised, AFAIK, via the filetests for the individual CLIF instructions that are implemented. We were very very thankful for the original contribution of this backend, even in its partial state, and we had hoped at the time that we could eventually mature it in-tree until it supported e.g. Wasm and other use-cases. But that hasn't yet happened -- to the blame of no-one, to be clear, we just haven't had a contributor with sufficient time. Unfortunately, the existence of the backend and lack of active maintainer now potentially pose a bit of a burden as we hope to make continuing changes to the backend framework. For example, the ISLE migration, and the use of regalloc2 that it will allow, would need all of the existing lowering patterns in the hand-written ARM32 backend to be rewritten as ISLE rules. Given that we don't currently have the resources to do this, we think it's probably best if we, sadly, for now remove this partial backend. This is not in any way a statement of what we might accept in the future, though. If, in the future, an ARM32 backend updated to our latest codebase with an active maintainer were to appear, we'd be happy to merge it (and likewise for any other architecture!). But for now, this is probably the best path. Thanks again to the original contributor @jmkrauz and we hope that this work can eventually be brought back and reused if someone has the time to do so!	2022-02-14 15:03:52 -08:00

1 2 3 4 5

208 Commits