wasmtime

Author	SHA1	Message	Date
Chris Fallin	7b8854f803	egraphs: fix handling of effectful-but-idempotent ops and GVN. (#5800 ) * Revert "egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808)" This reverts commit `c7e2571866`. * egraphs: fix handling of effectful-but-idempotent ops and GVN. This PR addresses #5796: currently, ops that are effectful, i.e., remain in the side-effecting skeleton (which we keep in the `Layout` while the egraph exists), but are idempotent and thus mergeable by a GVN pass, are not handled properly. GVN is still possible on effectful but idempotent ops precisely because our GVN does not create partial redundancies: it removes an instruction only when it is dominated by an identical instruction. An isntruction will not be "hoisted" to a point where it could execute in the optimized code but not in the original. However, there are really two parts to the egraph implementation that produce this effect: the deduplication on insertion into the egraph, and the elaboration with a scoped hashmap. The deduplication lets us give a single name (value ID) to all copies of an identical instruction, and then elaboration will re-create duplicates if GVN should not hoist or merge some of them. Because deduplication need not worry about dominance or scopes, we use a simple (non-scoped) hashmap to dedup/intern ops as "egraph nodes". When we added support for GVN'ing effectful but idempotent ops (#5594), we kept the use of this simple dedup'ing hashmap, but these ops do not get elaborated; instead they stay in the side-effecting skeleton. Thus, we inadvertently created potential for weird code-motion effects. The proposal in #5796 would solve this in a clean way by treating these ops as pure again, and keeping them out of the skeleton, instead putting "force" pseudo-ops in the skeleton. However, this is a little more complex than I would like, and I've realized that @jameysharp's earlier suggestion is much simpler: we can keep an actual scoped hashmap separately just for the effectful-but-idempotent ops, and use it to GVN while we build the egraph. In effect, we're fusing a separate GVN pass with the egraph pass (but letting it interact corecursively with egraph rewrites. This is in principle similar to how we keep a separate map for loads and fuse this pass with the egraph rewrite pass as well. Note that we can use a `ScopedHashMap` here without the "context" (as needed by `CtxHashMap`) because, as noted by @jameysharp, in practice the ops we want to GVN have all their args inline. Equality on the `InstructinoData` itself is conservative: two insts whose struct contents compare shallowly equal are definitely identical, but identical insts in a deep-equality sense may not compare shallowly equal, due to list indirection. This is fine for GVN, because it is still sound to skip any given GVN opportunity (and keep the original instructions). Fixes #5796. * Add comments from review.	2023-03-02 02:10:42 +00:00
Jamey Sharp	5cfb461945	Only emit ISLE/egraph terms for single-value insts (#5848 ) For instructions with no results (such as branches and stores) or instructions with multiple results (such as add with carry), we have assertions checking that an optimization rule doesn't try to match on or construct such instructions. When we generate terms for matching or constructing instructions, the terms for these instructions are guaranteed to panic if they're ever used. So let's just not generate them. In the future we may wish to generate terms with different types for these instructions, to make them usable in ISLE rules for optimization that fall outside our current egraph constraints.	2023-02-24 15:38:48 +00:00
Chris Fallin	673b448cfe	Cranelift DFG: make inst clone deep-clone varargs lists. (#5727 ) When investigating #5716, I found that rematerialization of a `call`, in addition to blowing up for other reasons, caused aliasing of the varargs list (the `EntityList` in the `ListPool`), such that editing the args of the second copy of the call instruction inadvertently updated the first as well. This PR modifies `DataFlowGraph::clone_inst` so that it always clones the varargs list if present. This shouldn't have any functional impact on Cranelift today, because we don't rematerialize any instructions with varargs; but it's important to get it right to avoid a bug later!	2023-02-07 01:21:09 +00:00
Trevor Elliott	b58a197d33	cranelift: Add a conditional branch instruction with two targets (#5446 ) Add a conditional branch instruction with two targets: brif. This instruction will eventually replace brz and brnz, as it encompasses the behavior of both. This PR also changes the InstructionData layout for instruction formats that hold BlockCall values, taking the same approach we use for Value arguments. This allows branch_destination to return a slice to the BlockCall values held in the instruction, rather than requiring that we pattern match on InstructionData to fetch the then/else blocks. Function generation for fuzzing has been updated to generate uses of brif, and I've run the cranelift-fuzzgen target locally for hours without triggering any new failures.	2023-01-24 14:37:16 -08:00
Trevor Elliott	1e6c13d83e	cranelift: Rework block instructions to use BlockCall (#5464 ) Add a new type BlockCall that represents the pair of a block name with arguments to be passed to it. (The mnemonic here is that it looks a bit like a function call.) Rework the implementation of jump, brz, and brnz to use BlockCall instead of storing the block arguments as varargs in the instruction's ValueList. To ensure that we're processing block arguments from BlockCall values in instructions, three new functions have been introduced on DataFlowGraph that both sets of arguments: inst_values - returns an iterator that traverses values in the instruction and block arguments map_inst_values - applies a function to each value in the instruction and block arguments overwrite_inst_values - overwrite all values in an instruction and block arguments with values from the iterator Co-authored-by: Jamey Sharp <jamey@minilop.net>	2023-01-17 16:31:15 -08:00
Nick Fitzgerald	f4a2d5337a	Cranelift: GVN `uadd_overflow_trap` (#5520 ) * Switch duplicate loads w/ dynamic memories test to `min_size = 0` This test was accidentally hitting a special case for bounds checks for when we know that `offset + access_size < min_size` and can skip some steps. This commit changes the `min_size` of the memory to zero so that we are forced to do fully general bounds checks. * Cranelift: Mark `uadd_overflow_trap` as okay for GVN Although this improves our test sequence for duplicate loads with dynamic memories, it unfortunately doesn't have any effect on sightglass benchmarks: ``` instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [34448 35607.23 37158] gvn_uadd_overflow_trap.so [34566 35734.05 36585] main.so instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [44101 60449.62 92712] gvn_uadd_overflow_trap.so [44011 60436.37 92690] main.so instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [35595 36675.72 38153] gvn_uadd_overflow_trap.so [35440 36670.42 37993] main.so compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [17370195 17405125.62 17471222] gvn_uadd_overflow_trap.so [17369324 17404859.43 17470725] main.so execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [7055720520 7055886880.32 7056265930] gvn_uadd_overflow_trap.so [7055719554 7055843809.33 7056193289] main.so compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [683589861 683767276.00 684098366] gvn_uadd_overflow_trap.so [683590024 683767998.02 684097885] main.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [46436883 46437135.10 46437823] gvn_uadd_overflow_trap.so [46436883 46437087.67 46437785] main.so compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [126522461 126565812.58 126647044] gvn_uadd_overflow_trap.so [126522176 126565757.75 126647522] main.so execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [653010531 653010533.03 653010544] gvn_uadd_overflow_trap.so [653010531 653010533.18 653010537] main.so ``` * cranelift-codegen-meta: Rename `side_effects_okay_for_gvn` to `side_effects_idempotent` * cranelift-filetests: Ensure there is a trailing newline for blessed Wasm tests	2023-01-04 22:03:16 -08:00
Nick Fitzgerald	937601c7c3	Cranelift: GVN spectre guards and run redundant load elimination twice (#5517 ) * Cranelift: Make spectre guards GVN-able While these instructions have a side effect that is otherwise invisible to the optimizer, the side effect in question is idempotent, so it can be de-duplicated by GVN. * Cranelift: Run redundant load replacement and GVN twice This allows us to actually replace redundant Wasm loads with dynamic memories. While this improves our hand-crafted test sequences, it doesn't seem to have any improvement on sightglass benchmarks run with dynamic memories, however it also isn't a hit to compilation times, so seems generally good to land anyways: ``` $ cargo run --release -- benchmark -e ~/scratch/once.so -e ~/scratch/twice.so -m insts-retired --processes 20 --iterations-per-process 3 --engine-flags="--static-memory-maximum-size 0" -- benchmarks/default.suite compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [683595240 683768610.53 684097577] once.so [683597068 700115966.83 1664907164] twice.so instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [44107 60411.07 92785] once.so [44138 59552.32 92097] twice.so compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [17369916 17404839.78 17471458] once.so [17369935 17625713.87 30700150] twice.so compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [126523640 126566170.80 126648265] once.so [126523076 127174580.30 163145149] twice.so instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [34569 35686.25 36513] once.so [34651 35749.97 36953] twice.so instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [35146 36639.10 37707] once.so [34472 36580.82 38431] twice.so execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [7055720115 7055841324.82 7056180024] once.so [7055717681 7055877095.85 7056225217] twice.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [46436881 46437081.28 46437691] once.so [46436883 46437127.68 46437766] twice.so execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [653010530 653010533.27 653010539] once.so [653010531 653010532.95 653010538] twice.so ```	2023-01-04 20:05:43 +00:00
Trevor Elliott	25bf8e0e67	Make DataFlowGraph::insts public, but restricted (#5450 ) We have some operations defined on DataFlowGraph purely to work around borrow-checker issues with InstructionData and other data on DataFlowGraph. Part of the problem is that indexing the DFG directly hides the fact that we're only indexing the insts field of the DFG. This PR makes the insts field of the DFG public, but wraps it in a newtype that only allows indexing. This means that the borrow checker is better able to tell when operations on memory held by the DFG won't conflict, which comes up frequently when mutating ValueLists held by InstructionData.	2022-12-16 10:46:09 -08:00
Ulrich Weigand	e913cf3647	Remove IFLAGS/FFLAGS types (#5406 ) All instructions using the CPU flags types (IFLAGS/FFLAGS) were already removed. This patch completes the cleanup by removing all remaining instructions that define values of CPU flags types, as well as the types themselves. Specifically, the following features are removed: - The IFLAGS and FFLAGS types and the SpecialType category. - Special handling of IFLAGS and FFLAGS in machinst/isle.rs and machinst/lower.rs. - The ifcmp, ifcmp_imm, ffcmp, iadd_ifcin, iadd_ifcout, iadd_ifcarry, isub_ifbin, isub_ifbout, and isub_ifborrow instructions. - The writes_cpu_flags instruction property. - The flags verifier pass. - Flags handling in the interpreter. All of these features are currently unused; no functional change intended by this patch. This addresses https://github.com/bytecodealliance/wasmtime/issues/3249.	2022-12-09 13:42:03 -08:00
Chris Fallin	f980defe17	egraph support: rewrite to work in terms of CLIF data structures. (#5382 ) * egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have multiple representations: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2022-12-06 14:58:57 -08:00
Nick Fitzgerald	fd7b903f33	Cranelift: Use a custom enum instead of boolean for the ISLE target (#5228 ) Easier to read and doesn't require `/* is_lower = */`-style comments at call sites.	2022-11-08 21:44:02 +00:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
Nick Fitzgerald	03d77d4d6b	Cranelift: Derive `Copy` for `InstructionData` (#5043 ) * Cranelift: Derive `Copy` for `InstructionData` And update `clone` calls to be copies. * Add a test for `InstructionData`'s size	2022-10-12 07:58:27 -07:00
Chris Fallin	2be12a5167	egraph-based midend: draw the rest of the owl (productionized). (#4953 ) * egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2022-10-11 18:15:53 -07:00
Nick Fitzgerald	e2f1ced0b6	Cranelift: Make `Opcode` represented as a `u8` instead of `u16` and remove vestigial conversion impls (#5042 ) * Cranelift: Make `Opcode` represented as a `u8` instead of `u16` * Cranelift: Remove unused conversion impls for `Opcode` These are vestigial, left over from Peepmatic.	2022-10-11 12:57:12 -07:00
Afonso Bordado	d620705a32	Fix Invalid Instruction format in fuzzgen (#4738 ) * cranelift: Add assert to prevent wrong InstFormat being used for the wrong opcode * cranelift: Use correct instruction format when inserting opcodes in fuzzgen (fixes #4733) * cranelift: Use debug assert on InstFormat assert	2022-08-20 00:49:54 +00:00
Benjamin Bouvier	8a9b1a9025	Implement an incremental compilation cache for Cranelift (#4551 ) This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. After the suggestion of Chris, `Function` has been split into mostly two parts: - on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`. - on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on. Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache: - most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set. - user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`. - some refactorings have been made for function names: - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name. - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with. The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions. A basic fuzz target has been introduced that tries to do the bare minimum: - check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function. - check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache. - This last check is less efficient and less likely to happen, so probably should be rethought a bit. Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip. Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. Fixes #4155.	2022-08-12 16:47:43 +00:00
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Chris Fallin	5d671952ee	Cranelift: do not check in generated ISLE code; regenerate on every compile. (#4143 ) This PR fixes #4066: it modifies the Cranelift `build.rs` workflow to invoke the ISLE DSL compiler on every compilation, rather than only when the user specifies a special "rebuild ISLE" feature. The main benefit of this change is that it vastly simplifies the mental model required of developers, and removes a bunch of failure modes we have tried to work around in other ways. There is now just one "source of truth", the ISLE source itself, in the repository, and so there is no need to understand a special "rebuild" step and how to handle merge errors. There is no special process needed to develop the compiler when modifying the DSL. And there is no "noise" in the git history produced by constantly-regenerated files. The two main downsides we discussed in #4066 are: - Compile time could increase, by adding more to the "meta" step before the main build; - It becomes less obvious where the source definitions are (everything becomes more "magic"), which makes exploration and debugging harder. This PR addresses each of these concerns: 1. To maintain reasonable compile time, it includes work to cut down the dependencies of the `cranelift-isle` crate to nothing (only the Rust stdlib), in the default build. It does this by putting the error-reporting bits (`miette` crate) under an optional feature, and the logging (`log` crate) under a feature-controlled macro, and manually writing an `Error` impl rather than using `thiserror`. This completely avoids proc macros and the `syn` build slowness. The user can still get nice errors out of `miette`: this is enabled by specifying a Cargo feature `--features isle-errors`. 2. To allow the user to optionally inspect the generated source, which nominally lives in a hard-to-find path inside `target/` now, this PR adds a feature `isle-in-source-tree` that, as implied by the name, moves the target for ISLE generated source into the source tree, at `cranelift/codegen/isle_generated_source/`. It seems reasonable to do this when an explicit feature (opt-in) is specified because this is how ISLE regeneration currently works as well. To prevent surprises, if the feature is not specified, the build fails if this directory exists.	2022-05-11 22:25:24 -07:00
Ulrich Weigand	071d3a68d0	ISLE: Fix clif.isle InstructionData entries Attempt to match a Jump instruction in ISLE will currently lead to the generated files not compiling. This is because the definition of the InstructionData enum in clif.isle does not match the actual type used in Rust code. Specifically, clif.isle erroneously omits the ValueList variable-length argument entry if the format does not use a typevar operand. This is the case for Jump and a few other formats. The problem is caused by a bug in the gen_isle routine in meta/src/gen_inst.rs.	2022-01-24 12:54:16 +01:00
Andrew Brown	86611d3bbc	isle: expand enums in ISLE (#3586 ) * x64: expand FloatCC enum in ISLE * isle: regenerate manifests * isle: generate all enum fields in `clif.isle` This expands the `gen_isle` function to write all of the immediate `enum`s out explicitly in `clif.isle`. Non-`enum` immediates are still `extern primitive`. * Only compile `enum_values` with `rebuild-isle` feature * Only compile `gen_enum_isle` with `rebuild-isle` feature	2021-12-12 18:31:42 -08:00
Nick Fitzgerald	d2d0a0f36b	Remove Peepmatic!!! Peepmatic was an early attempt at a DSL for peephole optimizations, with the idea that maybe sometime in the future we could user it for instruction selection as well. It didn't really pan out, however: * Peepmatic wasn't quite flexible enough, and adding new operators or snippets of code implemented externally in Rust was a bit of a pain. * The performance was never competitive with the hand-written peephole optimizers. It was very size efficient, but that came at the cost of run-time efficiency. Everything was table-based and interpreted, rather than generating any Rust code. Ultimately, because of these reasons, we never turned Peepmatic on by default. These days, we just landed the ISLE domain-specific language, and it is better suited than Peepmatic for all the things that Peepmatic was originally designed to do. It is more flexible and easy to integrate with external Rust code. It is has better time efficiency, meeting or even beating hand-written code. I think a small part of the reason why ISLE excels in these things is because its design was informed by Peepmatic's failures. I still plan on continuing Peepmatic's mission to make Cranelift's peephole optimizer passes generated from DSL rewrite rules, but using ISLE instead of Peepmatic. Thank you Peepmatic, rest in peace!	2021-11-17 13:04:17 -08:00
Alex Crichton	1c13f62189	isle: Move immediates to the end of extractors Otherwise I was getting type errors trying to match `insertlane` instructions, so I think that this was the intended order.	2021-11-16 09:07:24 -08:00
Nick Fitzgerald	d377b665c6	Initial ISLE integration with the x64 backend On the build side, this commit introduces two things: 1. The automatic generation of various ISLE definitions for working with CLIF. Specifically, it generates extern type definitions for clif opcodes and the clif instruction data `enum`, as well as extractors for matching each clif instructions. This happens inside the `cranelift-codegen-meta` crate. 2. The compilation of ISLE DSL sources to Rust code, that can be included in the main `cranelift-codegen` compilation. Next, this commit introduces the integration glue code required to get ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif instruction, we first try to use the ISLE code path. If it succeeds, then we are done lowering this instruction. If it fails, then we proceed along the existing hand-written code path for lowering. Finally, this commit ports many lowering rules over from hand-written, open-coded Rust to ISLE. In the process of supporting ISLE, this commit also makes the x64 `Inst` capable of expressing SSA by supporting 3-operand forms for all of the existing instructions that only have a 2-operand form encoding: dst = src1 op src2 Rather than only the typical x86-64 2-operand form: dst = dst op src This allows `MachInst` to be in SSA form, since `dst` and `src1` are disentangled. ("3-operand" and "2-operand" are a little bit of a misnomer since not all operations are binary operations, but we do the same thing for, e.g., unary operations by disentangling the sole operand from the result.) There are two motivations for this change: 1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE lowering to translate a CLIF expression that evaluates to some value into a `MachInst` expression that evaluates to the same value. We want both the lowering itself and the resulting `MachInst` to be pure and referentially transparent. This is both a nice paradigm for compiler writers that are authoring and maintaining lowering rules and is a prerequisite to any sort of formal verification of our lowering rules in the future. 2. Better align `MachInst` with `regalloc2`'s API, which requires that the input be in SSA form.	2021-10-12 17:11:58 -07:00
bjorn3	93e9bb02e4	Review comments	2021-11-01 18:17:57 +01:00
bjorn3	74261ccd79	Never use the first vararg as typevar operand If an instruction only takes varargs as values, it may have no arguments at all.	2021-10-31 19:57:04 +01:00
bjorn3	91d4f36970	Move generate_table from cranelift-codegen-shared to cranelift-codegen-meta	2021-10-29 14:43:09 +02:00
Chris Fallin	14cde24377	Merge pull request #3447 from bjorn3/remove_unused_inst_flags Remove various unused things from the meta crate	2021-10-13 11:31:31 -07:00
bjorn3	466a446f8c	Remove OpcodeNumber	2021-10-12 15:12:26 +02:00
bjorn3	99114547be	Remove clobbers_all_regs	2021-10-12 15:12:26 +02:00
bjorn3	e8b18b58a1	Remove is_ghost	2021-10-12 15:12:26 +02:00
bjorn3	5b24e117ee	Remove instructions used by old br_table legalization	2021-10-12 14:18:52 +02:00
bjorn3	2fc964ea35	Add serde serialization support for the full clif ir	2021-02-18 11:27:02 +01:00
Nick Fitzgerald	ee5982fd16	peepmatic: Be generic over the operator type This lets us avoid the cost of `cranelift_codegen::ir::Opcode` to `peepmatic_runtime::Operator` conversion overhead, and paves the way for allowing Peepmatic to support non-clif optimizations (e.g. vcode optimizations). Rather than defining our own `peepmatic::Operator` type like we used to, now the whole `peepmatic` crate is effectively generic over a `TOperator` type parameter. For the Cranelift integration, we use `cranelift_codegen::ir::Opcode` as the concrete type for our `TOperator` type parameter. For testing, we also define a `TestOperator` type, so that we can test Peepmatic code without building all of Cranelift, and we can keep them somewhat isolated from each other. The methods that `peepmatic::Operator` had are now translated into trait bounds on the `TOperator` type. These traits need to be shared between all of `peepmatic`, `peepmatic-runtime`, and `cranelift-codegen`'s Peepmatic integration. Therefore, these new traits live in a new crate: `peepmatic-traits`. This crate acts as a header file of sorts for shared trait/type/macro definitions. Additionally, the `peepmatic-runtime` crate no longer depends on the `peepmatic-macro` procedural macro crate, which should lead to faster build times for Cranelift when it is using pre-built peephole optimizers.	2020-07-17 16:16:49 -07:00
Nick Fitzgerald	9b867b09c7	cranelift: Sign extend `Imm64` immediates When an instruction has an `Imm64` immediate, but operates on values of a narrower width, we need to sign extend the value. Fixes #1095	2020-05-12 15:44:48 -07:00
Benjamin Bouvier	1d5a678124	Fixes #1240 : Add a new accessor to indicate that an opcode requires spilling all registers;	2020-03-23 12:19:28 +01:00
Ryan Hunt	832666c45e	Mass rename Ebb and relatives to Block (#1365 ) * Manually rename BasicBlock to BlockPredecessor BasicBlock is a pair of (Ebb, Inst) that is used to represent the basic block subcomponent of an Ebb that is a predecessor to an Ebb. Eventually we will be able to remove this struct, but for now it makes sense to give it a non-conflicting name so that we can start to transition Ebb to represent a basic block. I have not updated any comments that refer to BasicBlock, as eventually we will remove BlockPredecessor and replace with Block, which is a basic block, so the comments will become correct. * Manually rename SSABuilder block types to avoid conflict SSABuilder has its own Block and BlockData types. These along with associated identifier will cause conflicts in a later commit, so they are renamed to be more verbose here. * Automatically rename 'Ebb' to 'Block' in .rs Automatically rename 'EBB' to 'block' in .rs Automatically rename 'ebb' to 'block' in .rs Automatically rename 'extended basic block' to 'basic block' in .rs Automatically rename 'an basic block' to 'a basic block' in .rs Manually update comment for `Block` `Block`'s wikipedia article required an update. * Automatically rename 'an `Block`' to 'a `Block`' in .rs Automatically rename 'extended_basic_block' to 'basic_block' in .rs Automatically rename 'ebb' to 'block' in .clif Manually rename clif constant that contains 'ebb' as substring to avoid conflict * Automatically rename filecheck uses of 'EBB' to 'BB' 'regex: EBB' -> 'regex: BB' '$EBB' -> '$BB' * Automatically rename 'EBB' 'Ebb' to 'block' in .clif Automatically rename 'an block' to 'a block' in .clif Fix broken testcase when function name length increases Test function names are limited to 16 characters. This causes the new longer name to be truncated and fail a filecheck test. An outdated comment was also fixed.	2020-02-07 10:46:47 -06:00
Andrew Brown	f19456640c	Add documentation for top-level items in cranelift-codegen/meta	2019-10-31 09:35:08 -07:00
Benjamin Bouvier	d5e990220e	[meta] Remove OperandKind::name field and explicitly pass rust_field_name/rust_type; (fixes #1177 )	2019-10-30 18:39:20 +01:00
Benjamin Bouvier	2bebc40c16	[meta] Move the doc() default values in the Operand/OperandKind;	2019-10-30 18:39:20 +01:00
Benjamin Bouvier	0b8a579943	Fixes #851 : Document instructions' input operands in InstBuilder;	2019-10-30 18:39:20 +01:00
Benjamin Bouvier	e8c03fbd09	[meta] Remove unused bit-vector functionalty; It can be resurrected if needed in the future. It was used only for the semantics descriptions, which went away with the transition of the meta-language to Rust.	2019-10-29 14:23:10 +01:00
Benjamin Bouvier	5889dd2c64	[meta] Add more pub(crate) definitions.	2019-10-29 14:23:10 +01:00
Benjamin Bouvier	06b1817d89	[meta] Rename Operand::is_pure_immediate into is_immediate;	2019-10-29 14:23:10 +01:00
Peter Huene	9f506692c2	Fix clippy warnings. This commit fixes the current set of (stable) clippy warnings in the repo.	2019-10-24 17:20:12 -07:00
Benjamin Bouvier	0243b642e3	[meta] Remove name lookups in formats; This does a lot at once, since there was no clear way to split the three commits: - Instruction need to be passed an explicit InstructionFormat, - InstructionFormat deduplication is checked once all entities have been defined;	2019-10-22 14:05:12 +02:00
Benjamin Bouvier	9e9a7626d7	[meta] Use a ref-counted pointer to an InstructionFormat in instructions; This avoids a lot of dereferences, and InstructionFormat are immutable once they're created. It removes a lot of code that was keeping the FormatRegistry around, just in case we needed the format. This is more in line with the way we create Instructions, and make it easy to reference InstructionFormats in general.	2019-10-22 14:05:12 +02:00
Benjamin Bouvier	d404368dea	Share constant_hash code between the meta and codegen crates;	2019-10-10 16:45:48 +02:00
Anthony Ramine	178241625c	Use slice::from_ref and slice::from_mut	2019-09-23 10:36:03 +02:00
Benjamin Bouvier	d1d2e790b9	[meta] Morph a few pub into pub(crate), and remove dead code;	2019-09-06 15:47:20 +02:00

1 2

57 Commits