wasmtime

Author	SHA1	Message	Date
Afonso Bordado	c8791073d6	cranelift: Remove `iconst.i128` (#5075 ) * cranelift: Remove iconst.i128 * bugpoint: Report Changed when only one instruction is mutated * cranelift: Fix egraph bxor rule * cranelift: Remove some simple_preopt opts for i128	2022-10-24 12:43:28 -07:00
Ulrich Weigand	bfcf6616fe	s390x: clean up remnants of non-SSA code generation (#5096 ) Eliminate a few remaining instances of non-SSA code. Remove infrastructure previously used for non-SSA code emission. Related cleanup around flags handling.	2022-10-24 12:40:50 -07:00
Nick Fitzgerald	442f9fa01b	Cranelift: pass iterators to `ABIMachineSpec::compute_arg_locs` (#5095 ) Instead of slices. This gives us more flexibility to pass custom sequences without needing to allocate a `Vec` to hold them and pass in as a slice.	2022-10-21 16:08:09 -07:00
Nick Fitzgerald	5c5fa192f7	Cranelift: use `.enumerate()` to avoid indexing in s390x backend (#5094 ) This can help rustc/llvm avoid bounds checks, but more importantly I will have future changes here that remove indexing of params, and instead hand them out as an iterator.	2022-10-21 13:08:56 -07:00
Nick Fitzgerald	4a66c3b855	Cranelift: Remove duplicate IR signature legalizations (#5093 ) The `SigData::from_func_sig` constructor will already ensure that the struct return pointer is returned, so this is a purely unnecessary call. Note that this is not a performance speed up, since `ensure_struct_return_ptr_is_returned` doesn't do any significant work if the signature is already legalized.	2022-10-21 13:08:44 -07:00
Ulrich Weigand	9dadba60a0	s390x: use constraints for call arguments and return values (#5092 ) Use the regalloc constraint-based CallArgList / CallRetList mechanism instead of directly using physregs in instructions.	2022-10-21 11:01:22 -07:00
Chris Fallin	86e77953f8	Fix some egraph-related issues. (#5088 ) This fixes #5086 by addressing two separate issues: - The `ValueDataPacked::set_type()` helper had an embarrassing bitfield-manipulation bug that would mangle the rest of a `ValueDef` when setting its type. This is not normally used, only when the egraph elaboration fills in types after-the-fact on a multi-value node. - The lowering rules for `isplit` on aarch64 and s390x were dispatching on the first output type, rather than the input type. When only the second output is used (as in the example in #5086), the first output type actually remains `INVALID` (and this is fine because it's never used).	2022-10-21 10:24:48 -07:00
Trevor Elliott	d9753fac2b	Remove uses of `reg_mod` from s390x (#5073 ) Remove uses of reg_mod from the s390x backend. This required moving away from using r0/r1 as the result registers from a few different pseudo instructions, standardizing instead on r2/r3. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, which r0/r1 are not. Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-21 09:22:16 -07:00
Afonso Bordado	51d8734235	fuzzgen: Generate compiler flags (#5020 ) * fuzzgen: Test compiler flags * cranelift: Generate `all()` function for all enum flags This allows a user to iterate all flags that exist. * fuzzgen: Minimize regalloc_checker compiles * fuzzgen: Limit the amount of test case inputs * fuzzgen: Add egraphs flag It's finally here! 🥳 * cranelift: Add fuzzing comment to settings * fuzzgen: Add riscv64 * fuzzgen: Unconditionally enable some flags	2022-10-20 16:40:50 -07:00
Chris Fallin	c392e461a3	egraphs: a few miscellaneous compile-time optimizations. (#5072 ) * egraphs: a few miscellaneous compile-time optimizations. These optimizations together are worth about a 2% compile-time reduction, as measured on one core with spidermonkey.wasm as an input, using `hyperfine` on `wasmtime compile`. The changes included are: - Some better pre-allocation (blockparams and side-effects concatenated list vecs); - Avoiding the indirection of storing list-of-types for every Pure and Inst node, when almost all nodes produce only a single result; instead, store arity and single type if it exists, and allow result projection nodes to fill in types otherwise; - Pack the `MemoryState` enum into one `u32` (this together with the above removal of the type slice allows `Node` to shrink from 48 bytes to 32 bytes); - always-inline an accessor (`entry` on `CtxHash`) that wasn't (`always(inline)` appears to be load-bearing, rather than just `inline`); - Split the update-analysis path into two hotpaths, one for the union case and one for the new-node case (and the former can avoid recomputing for the contained node when replacing a node with node-and-child eclass entry). * Review feedback. * Fix test build. * Fix to lowering when unused output with invalid type is present.	2022-10-19 11:05:00 -07:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
Afonso Bordado	766ecb561e	fuzzgen: Always generate reachable blocks (#5034 ) * fuzzgen: Always reachable blocks * fuzzgen: Rename BlockTerminator * fuzzgen: Rename `finalize_block` * fuzzgen: Use `cloned` instead of map clone Thanks @jameysharp! Co-authored-by: Jamey Sharp <jamey@minilop.net> * fuzzgen: `rustfmt` * fuzzgen: Document paramless targets * fuzzgen: Add `BlockTerminatorKind` * fuzzen: Update BrTable/Switch comment * fuzzgen: Minor cleanup Co-authored-by: Jamey Sharp <jamey@minilop.net>	2022-10-17 12:51:20 -07:00
Chris Fallin	1aaea279e5	egraph opts: fix uextend-of-i32. (#5061 ) This is a simple error in the const-prop rules: uextend was not masking iconst's u64 immediate when extending from i32 to i64. Arguably an iconst.i32 should not have nonzero bits in the upper 32 of its immediate, but that's a separate design question. For now, if our invariant is that the upper bits are ignored, then it is required to mask the bits when const-evaling a `uextend`. Fixes #5047.	2022-10-17 12:45:49 -07:00
Nick Fitzgerald	03d77d4d6b	Cranelift: Derive `Copy` for `InstructionData` (#5043 ) * Cranelift: Derive `Copy` for `InstructionData` And update `clone` calls to be copies. * Add a test for `InstructionData`'s size	2022-10-12 07:58:27 -07:00
Chris Fallin	2be12a5167	egraph-based midend: draw the rest of the owl (productionized). (#4953 ) * egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2022-10-11 18:15:53 -07:00
Nick Fitzgerald	e2f1ced0b6	Cranelift: Make `Opcode` represented as a `u8` instead of `u16` and remove vestigial conversion impls (#5042 ) * Cranelift: Make `Opcode` represented as a `u8` instead of `u16` * Cranelift: Remove unused conversion impls for `Opcode` These are vestigial, left over from Peepmatic.	2022-10-11 12:57:12 -07:00
Benjamin Bouvier	d68ca3711b	Upgrade sha2 to 0.10.2 in wasmtime (#4749 )	2022-10-10 09:40:40 +00:00
Jun Ryung Ju	39fbff92c3	cranelift: Added fp and, or, xor, not ops to interpreter. (#4999 ) * cranelift: Added fp and, or, xor, not ops to interpreter. * Formatting. * Removed archtecture dependent test on float-bitops.	2022-10-06 18:24:45 -07:00
Chris Fallin	e95ffe4413	Fix StructReturn handling: properly mark the clobber, and offset actual rets. (#5023 ) * Fix StructReturn handling: properly mark the clobber, and offset actual rets. The legalization of `StructReturn` was causing issues in the new call-handling code: the `StructReturn` ret was included in the `SigData` as if it were an actual CLIF-level return value, but it is not. Prior to using regalloc constraints for return values, we unconditionally included rax (or the architecture's usual return register) as a def, so it would be properly handled as "clobbered" by the regalloc. With the new scheme, we include defs on the call only for CLIF-level outputs. Callees with `StructReturn` args were thus not known to clobber the return-value register, and values might be corrupted. This PR updates the code to include a `StructReturn` ret as a clobber rather than a returned value in the relevant spots. I observed it causing saves/restores of rax in some CLIF that @bjorn3 provided me, but I was having difficulty minimizing this into a test-case that I would be comfortable including as a precise-output case (including the whole thing verbatim would lock down a bunch of other irrelevant details and cause test-update noise later). If we can find a more minimized example I'm happy to include it as a filetest. Fixes #5018.	2022-10-07 00:14:38 +00:00
Jamey Sharp	04b30acad9	Misc cleanups (#5014 ) * Replace resize+copy_from_slice with extend_from_slice Vec::resize initializes the new space, which is wasted effort if we're just going to call `copy_from_slice` on it immediately afterward. Using `extend_from_slice` is simpler, and very slightly faster. If the new size were bigger than the buffer we're copying from, then it would make sense to initialize the excess. But it isn't: it's always exactly the same size. * Move helpers from Context to CompiledCode These methods only use information from Context::compiled_code, so they should live on CompiledCode instead. * Remove an unnecessary #[cfg_attr] There are other uses of `#[allow(clippy::too_many_arguments)]` in this file, so apparently it doesn't need to be guarded by the "cargo-clippy" feature. * Fix a few comments Two of these were wrong/misleading: - `FunctionBuilder::new` does not clear the provided func_ctx. It does debug-assert that the context is already clear, but I don't think that's worth a comment. - `switch_to_block` does not "create values for the arguments." That's done by the combination of `append_block_params_for_function_params` and `declare_wasm_parameters`. * wasmtime-cranelift: Misc cleanups The main change is to use the `CompiledCode` reference we already had instead of getting it out of `Context` repeatedly. This removes a bunch of `unwrap()` calls. * wasmtime-cranelift: Factor out uncached compile	2022-10-05 10:35:59 -07:00
wasmtime-publish	a9be4a9b56	Bump Wasmtime to 3.0.0 (#5016 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-10-05 09:30:55 -05:00
Trevor Elliott	a209cb63f5	ISLE: Enable the overlap checker (#5011 ) This PR turns the overlap checker on by default, requiring the use of priorities to resolve overlap between rules.	2022-10-04 21:56:49 +00:00
yuyang	07584f6ac8	fix issue 4996. (#5003 )	2022-10-04 11:18:42 -07:00
Trevor Elliott	c9ff14e00b	Resolve overlap in the s390x backend (#5002 ) Resolve overlap in the s390x backend by adding rule priorities to disambiguate rule order.	2022-10-03 17:06:10 -07:00
Trevor Elliott	ab4be2bdd1	ISLE: Resolve overlaps in the aarch64 backend (#4988 )	2022-09-30 12:57:50 -07:00
Trevor Elliott	c1d6ca48a7	ISLE: Resolve overlap in the riscv64 backend (#4982 ) Resolve overlap in the RiscV64 backend by adding priorities to rules. Additionally, one test updated as a result of this work, as a peephole optimization for addition with immediates fires now.	2022-09-29 17:22:25 -07:00
Trevor Elliott	46e42601eb	ISLE: Always default the priority to 0 (#4983 ) * Always default priorities to 0 in the ISLE IR * Fix a hidden overlap in the x64 backend	2022-09-29 15:32:29 -07:00
Trevor Elliott	7d5b2b5bb2	ISLE: Add the overlap_errors pragma (#4981 ) * Add the overlap_errors pragma to ISLE * Enable overlap errors in the x64 backend	2022-09-29 12:40:39 -07:00
Trevor Elliott	db06e4e622	ISLE: Resolve remaining x64 overlap errors (#4977 ) Resolve overlap errors with the x64 backend.	2022-09-29 10:09:37 -07:00
Trevor Elliott	faf31f6216	ISLE: Resolve overlap in prelude.isle and x64/inst.isle (#4941 ) Resolve overlap in the ISLE prelude and the x64 inst module by introducing new types that allow better sharing of extractor resuls, or falling back on priorities.	2022-09-28 10:54:39 -07:00
Ulrich Weigand	83bb6fd4c8	s390x: Fix regalloc checker error (#4973 ) For ShiftRR and VecShiftRR, if shift_reg is zero_reg(), the instruction does not actually use any register value. Fixes #4969	2022-09-28 08:25:23 -07:00
yuyang-ok	cdecc858b4	add riscv64 backend for cranelift. (#4271 ) Add a RISC-V 64 (`riscv64`, RV64GC) backend. Co-authored-by: yuyang <756445638@qq.com> Co-authored-by: Chris Fallin <chris@cfallin.org> Co-authored-by: Afonso Bordado <afonsobordado@az8.co>	2022-09-27 17:30:31 -07:00
Afonso Bordado	65a3af72c7	fuzzgen: Statistics framework (#4868 ) * cranelift: Add non user trap codes function * cranelift: Add Fuzzgen stats * cranelift: Use `once_cell` and cleanup some stuff * fuzzgen: Remove total_inputs metric * fuzzgen: Filter empty trap codes	2022-09-27 16:04:57 +00:00
Alex Crichton	7b311004b5	Leverage Cargo's workspace inheritance feature (#4905 ) * Leverage Cargo's workspace inheritance feature This commit is an attempt to reduce the complexity of the Cargo manifests in this repository with Cargo's workspace-inheritance feature becoming stable in Rust 1.64.0. This feature allows specifying fields in the root workspace `Cargo.toml` which are then reused throughout the workspace. For example this PR shares definitions such as: * All of the Wasmtime-family of crates now use `version.workspace = true` to have a single location which defines the version number. * All crates use `edition.workspace = true` to have one default edition for the entire workspace. * Common dependencies are listed in `[workspace.dependencies]` to avoid typing the same version number in a lot of different places (e.g. the `wasmparser = "0.89.0"` is now in just one spot. Currently the workspace-inheritance feature doesn't allow having two different versions to inherit, so all of the Cranelift-family of crates still manually specify their version. The inter-crate dependencies, however, are shared amongst the root workspace. This feature can be seen as a method of "preprocessing" of sorts for Cargo manifests. This will help us develop Wasmtime but shouldn't have any actual impact on the published artifacts -- everything's dependency lists are still the same. * Fix wasi-crypto tests	2022-09-26 11:30:01 -05:00
bjorn3	af226d37c2	[AArch64] Fix incorrect regalloc constraints for atomic_cas (#4959 ) * [AArch64] Fix incorrect regalloc constraints for atomic_cas * Update test for latest Cranelift changes	2022-09-26 16:05:57 +00:00
Damian Heaton	3a2b32bf4d	Port branches to ISLE (AArch64) (#4943 ) * Port branches to ISLE (AArch64) Ported the existing implementations of the following opcodes for AArch64 to ISLE: - `Brz` - `Brnz` - `Brif` - `Brff` - `BrIcmp` - `Jump` - `BrTable` Copyright (c) 2022 Arm Limited * Remove dead code Copyright (c) 2022 Arm Limited	2022-09-26 09:45:32 +01:00
Jamey Sharp	bd870a9d6c	Shrink all SmallVecs by 8 bytes (#4951 ) We weren't using the "union" cargo feature for the smallvec crate, which reduces the size of a SmallVec by one machine word. This feature requires Rust 1.49 but we already require much newer versions. When using Wasmtime to compile pulldown-cmark from Sightglass, this saves a decent amount of memory allocations and writes. According to `valgrind --tool=dhat`: - 6.2MiB (3.69%) less memory allocated over the program's lifetime - 0.5MiB (4.13%) less memory allocated at maximum heap size - 5.5MiB (1.88%) fewer bytes written to - 0.44% fewer instructions executed Sightglass reports a statistically significant runtime improvement too: compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 24379323.60 ± 20051394.04 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.01x to 1.13x faster than main-be690a468.so! [227506364 355007998.78 423280514] main-be690a468.so [227686018 330628675.18 406025344] shrink-abiarg-0406da67c.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 360151622.56 ± 278294316.90 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.01x to 1.07x faster than main-be690a468.so! [8709162212 8911001926.44 9535111576] main-be690a468.so [5058015392 8550850303.88 9282148438] shrink-abiarg-0406da67c.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 6936570.28 ± 6897696.38 (confidence = 99%) shrink-abiarg-0406da67c.so is 1.00x to 1.08x faster than main-be690a468.so! [155810934 175260571.20 234737344] main-be690a468.so [119128240 168324000.92 257451074] shrink-abiarg-0406da67c.so	2022-09-23 16:32:13 -07:00
Chris Fallin	19bd8687ac	Upgrade to regalloc2 0.4.1. (#4945 ) * Upgrade to regalloc2 0.4.1. Incorporates bytecodealliance/regalloc2#85, which fixes a fuzzbug related to constraints and liverange splits. * Add audit of regalloc2 upgrade.	2022-09-23 00:00:06 +00:00
Damian Heaton	3f8cccfb59	Port flag-based ops to ISLE (AArch64) (#4942 ) Ported the existing implementations of the following opcodes for AArch64 to ISLE: - `Trueif` - `Trueff` - `Trapif` - `Trapff` - `Select` - `Selectif` - `SelectifSpectreGuard` Copyright (c) 2022 Arm Limited	2022-09-22 15:44:32 -07:00
Chris Fallin	b652ce2fb1	ISLE: add support for multi-extractors and multi-constructors. (#4908 ) * ISLE: add support for multi-extractors and multi-constructors. This support allows for rules that process multiple matching values per extractor call on the left-hand side, and as a result, can produce multiple values from the constructor whose body they define. This is useful in situations where we are matching on an input data structure that can have multiple "nodes" for a given value or ID, for example in an e-graph. * Review feedback: all multi-ctors and multi-etors return iterators; no `Vec` case. * Add additional warning suppressions to generated-code toplevels to be consistent with new islec output.	2022-09-21 23:36:50 +00:00
Damian Heaton	352c7595c6	Improve `fcvt_to_{u,s}int_sat` lowering (AArch64) (#4913 ) Improved the instruction lowering for the following opcodes on AArch64, and introduced support for converting to integers less than 32-bits wide as per the docs: - `FcvtToSintSat` - `FcvtToUintSat` Copyright (c) 2022 Arm Limited	2022-09-21 10:16:09 -07:00
Damian Heaton	e786bda002	Vector bitcast support (AArch64 & Interpreter) (#4820 ) * Vector bitcast support (AArch64 & Interpreter) Implemented support for `bitcast` on vector values for AArch64 and the interpreter. Also corrected the verifier to ensure that the size, in bits, of the input and output types match for a `bitcast`, per the docs. Copyright (c) 2022 Arm Limited * `I128` same-type bitcast support Copyright (c) 2022 Arm Limited * Directly return input for 64-bit GPR<=>GPR bitcast Copyright (c) 2022 Arm Limited	2022-09-21 09:20:28 -07:00
Chris Fallin	05cbd667c7	Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892 ) * Cranelift: use regalloc2 constraints on caller side of ABI code. This PR updates the shared ABI code and backends to use register-operand constraints rather than explicit pinned-vreg moves for register arguments and return values. The s390x backend was not updated, because it has its own implementation of ABI code. Ideally we could converge back to the code shared by x64 and aarch64 (which didn't exist when s390x ported calls to ISLE, so the current situation is underestandable, to be clear!). I'll leave this for future work. This PR exposed several places where regalloc2 needed to be a bit more flexible with constraints; it requires regalloc2#74 to be merged and pulled in. * Update to regalloc2 0.3.3. In addition to version bump, this required removing two asserts as `SpillSlot`s no longer carry their class (so we can't assert that they have the correct class). * Review comments. * Filetest updates. * Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade. * Update to regalloc2 0.4.0.	2022-09-21 01:17:04 +00:00
Damian Heaton	e9b08b856d	Port `icmp` to ISLE (AArch64) (#4898 ) * Port `icmp` to ISLE (AArch64) Ported the existing implementation of `icmp` (and, by extension, the `lower_icmp` function) to ISLE for AArch64. Copyright (c) 2022 Arm Limited * Allow 'producer chains', eliminating `Nop0`s Copyright (c) 2022 Arm Limited	2022-09-13 08:56:50 -07:00
Trevor Elliott	ad09c273c6	Don't merge loads for xmm registers (#4891 ) Do not merge loads for xmm registers, as alignment requirements currently aren't satisfied with clif lowered from wasm. Fixes #4890	2022-09-12 10:14:35 -07:00
Chris Fallin	96bfd4e8c0	s390x: update some regalloc metadata to remove use of `reg_mod`. (#4856 ) * s390x: update some regalloc metadata to remove use of `reg_mod`. This is a step toward ultimately removing modify-operands, which along with removal of pinned vregs, lets us move to a completely constraint-based and fully-SSA regalloc input and get some nice advantages eventually. There are still a few uses of `mod` operands and pinned vregs remaining, especially around the "regpair" abstraction. Those proved to be a bit trickier to update though, so will have to be done separately. * Review feedback: restore two-arg pretty-print form. * Review feedback.	2022-09-09 18:43:36 -05:00
Chris Fallin	2986f6b0ff	ABI: implement register arguments with constraints. (#4858 ) * ABI: implement register arguments with constraints. Currently, Cranelift's ABI code emits a sequence of moves from physical registers into vregs at the top of the function body, one for every register-carried argument. For a number of reasons, we want to move to operand constraints instead, and remove the use of explicitly-named "pinned vregs"; this allows for better regalloc in theory, as it removes the need to "reverse-engineer" the sequence of moves. This PR alters the ABI code so that it generates a single "args" pseudo-instruction as the first instruction in the function body. This pseudo-inst defs all register arguments, and constrains them to the appropriate registers at the def-point. Subsequently the regalloc can move them wherever it needs to. Some care was taken not to have this pseudo-inst show up in post-regalloc disassemblies, but the change did cause a general regalloc "shift" in many tests, so the precise-output updates are a bit noisy. Sorry about that! A subsequent PR will handle the other half of the ABI code, namely, the callsite case, with a similar preg-to-constraint conversion. * Update based on review feedback. * Review feedback.	2022-09-08 18:03:14 -07:00
Chris Fallin	13c7846815	Cranelift: add a vreg limit check to correctly return an error on too-large inputs. (#4882 ) Previously, Cranelift panicked (via a a panic in regalloc2) when the virtual-register limit of 2M (2^21) was reached. This resulted in a perplexing and unhelpful failure when the user provided a too-large input (such as the Wasm module in #4865). This PR adds an explicit check when allocating vregs that fails with a "code too large" error when the limit is hit, producing output such as (on the minimized testcase from #4865): ``` Error: failed to compile wasm function 3785 at offset 0xa3f3 Caused by: Compilation error: Code for function is too large ``` Fixes #4865.	2022-09-08 10:04:59 -07:00
Anton Kirilov	d8b290898c	Initial forward-edge CFI implementation (#3693 ) * Initial forward-edge CFI implementation Give the user the option to start all basic blocks that are targets of indirect branches with the BTI instruction introduced by the Branch Target Identification extension to the Arm instruction set architecture. Copyright (c) 2022, Arm Limited. * Refactor `from_artifacts` to avoid second `make_executable` (#1) This involves "parsing" twice but this is parsing just the header of an ELF file so it's not a very intensive operation and should be ok to do twice. * Address the code review feedback Copyright (c) 2022, Arm Limited. Co-authored-by: Alex Crichton <alex@alexcrichton.com>	2022-09-08 09:35:58 -05:00
Trevor Elliott	caad14826c	Rework the ISA flag checking extractors for x64 (#4878 ) Using fallible extractors that produce no values for flag checks means that it's not possible to pattern match cases where those flags are false. This change reworks the existing flag-checking extractors to be infallible, returning the flag's boolean value from the context instead.	2022-09-07 13:49:35 -07:00

1 2 3 4 5 ...

1975 Commits