wasmtime

Author	SHA1	Message	Date
Ulrich Weigand	137a8b710f	Move bitselect->vselect optimization to x64 back-end (#5191 ) The simplifier was performing an optimization to replace bitselect with vselect if the all bytes of the condition mask could be shown to be all ones or all zeros. This optimization only ever made any difference in codegen on the x64 target. Therefore, move this optimization to the x64 back-end and perform it in ISLE instead. Resulting codegen should be unchanged, with slightly improved compile time. This also eliminates a few endian-dependent bitcast operations.	2022-11-03 20:17:36 +00:00
Afonso Bordado	3ef30b5b67	cranelift: Rename `i{min,max}` to `s{min,max}` (#5187 ) This brings these instructions with our general naming convention of signed instructions being prefixed with `s`.	2022-11-03 18:20:33 +00:00
Afonso Bordado	2c69b94744	cranelift: Add support for `bswap.i128` (#5186 ) * fuzzgen: Request only one variable for bswap This was included by accident. Bswap only has one input, instead of two. * cranelift: Add `bswap.i128` support Adds support only for x86, AArch64, S390X. RISCV does not yet have bswap.	2022-11-03 18:03:37 +00:00
Alex Crichton	22159848c5	Fix instruction size test for Rust 1.65.0 (#5188 ) Looks like Rust generously shrank our `enum` in 1.65.0, so update the test assertion to pass CI.	2022-11-03 16:53:51 +00:00
Trevor Elliott	aeceea28e2	Remove trapif and trapff (#5162 ) This branch removes the trapif and trapff instructions, in favor of using an explicit comparison and trapnz. This moves us closer to removing iflags and fflags, but introduces the need to implement instructions like iadd_cout in the x64 and aarch64 backends.	2022-11-03 09:25:11 -07:00
Ulrich Weigand	961107ec63	Merge raw_bitcast and bitcast (#5175 ) - Allow bitcast for vectors with differing lane widths - Remove raw_bitcast IR instruction - Change all users of raw_bitcast to bitcast - Implement support for no-op bitcast cases across backends This implements the second step of the plan outlined here: https://github.com/bytecodealliance/wasmtime/issues/4566#issuecomment-1234819394	2022-11-02 10:16:27 -07:00
Alex Crichton	cd53bed898	Implement AOT compilation for components (#5160 ) * Pull `Module` out of `ModuleTextBuilder` This commit is the first in what will likely be a number towards preparing for serializing a compiled component to bytes, a precompiled artifact. To that end my rough plan is to merge all of the compiled artifacts for a component into one large object file instead of having lots of separate object files and lots of separate mmaps to manage. To that end I plan on eventually using `ModuleTextBuilder` to build one large text section for all core wasm modules and trampolines, meaning that `ModuleTextBuilder` is no longer specific to one module. I've extracted out functionality such as function name calculation as well as relocation resolving (now a closure passed in) in preparation for this. For now this just keeps tests passing, and the trajectory for this should become more clear over the following commits. * Remove component-specific object emission This commit removes the `ComponentCompiler::emit_obj` function in favor of `Compiler::emit_obj`, now renamed `append_code`. This involved significantly refactoring code emission to take a flat list of functions into `append_code` and the caller is responsible for weaving together various "families" of functions and un-weaving them afterwards. * Consolidate ELF parsing in `CodeMemory` This commit moves the ELF file parsing and section iteration from `CompiledModule` into `CodeMemory` so one location keeps track of section ranges and such. This is in preparation for sharing much of this code with components which needs all the same sections to get tracked but won't be using `CompiledModule`. A small side benefit from this is that the section parsing done in `CodeMemory` and `CompiledModule` is no longer duplicated. * Remove separately tracked traps in components Previously components would generate an "always trapping" function and the metadata around which pc was allowed to trap was handled manually for components. With recent refactorings the Wasmtime-standard trap section in object files is now being generated for components as well which means that can be reused instead of custom-tracking this metadata. This commit removes the manual tracking for the `always_trap` functions and plumbs the necessary bits around to make components look more like modules. * Remove a now-unnecessary `Arc` in `Module` Not expected to have any measurable impact on performance, but complexity-wise this should make it a bit easier to understand the internals since there's no longer any need to store this somewhere else than its owner's location. * Merge compilation artifacts of components This commit is a large refactoring of the component compilation process to produce a single artifact instead of multiple binary artifacts. The core wasm compilation process is refactored as well to share as much code as necessary with the component compilation process. This method of representing a compiled component necessitated a few medium-sized changes internally within Wasmtime: * A new data structure was created, `CodeObject`, which represents metadata about a single compiled artifact. This is then stored as an `Arc` within a component and a module. For `Module` this is always uniquely owned and represents a shuffling around of data from one owner to another. For a `Component`, however, this is shared amongst all loaded modules and the top-level component. * The "module registry" which is used for symbolicating backtraces and for trap information has been updated to account for a single region of loaded code holding possibly multiple modules. This involved adding a second-level `BTreeMap` for now. This will likely slow down instantiation slightly but if it poses an issue in the future this should be able to be represented with a more clever data structure. This commit additionally solves a number of longstanding issues with components such as compiling only one host-to-wasm trampoline per signature instead of possibly once-per-module. Additionally the `SignatureCollection` registration now happens once-per-component instead of once-per-module-within-a-component. * Fix compile errors from prior commits * Support AOT-compiling components This commit adds support for AOT-compiled components in the same manner as `Module`, specifically adding: * `Engine::precompile_component` * `Component::serialize` * `Component::deserialize` * `Component::deserialize_file` Internally the support for components looks quite similar to `Module`. All the prior commits to this made adding the support here (unsurprisingly) easy. Components are represented as a single object file as are modules, and the functions for each module are all piled into the same object file next to each other (as are areas such as data sections). Support was also added here to quickly differentiate compiled components vs compiled modules via the `e_flags` field in the ELF header. * Prevent serializing exported modules on components The current representation of a module within a component means that the implementation of `Module::serialize` will not work if the module is exported from a component. The reason for this is that `serialize` doesn't actually do anything and simply returns the underlying mmap as a list of bytes. The mmap, however, has `.wasmtime.info` describing component metadata as opposed to this module's metadata. While rewriting this section could be implemented it's not so easy to do so and is otherwise seen as not super important of a feature right now anyway. * Fix windows build * Fix an unused function warning * Update crates/environ/src/compilation.rs Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2022-11-02 15:26:26 +00:00
Trevor Elliott	09d8df6fab	Switch to `x64_rbp` to avoid the use of a pinned register (#5168 ) Avoid a use of preg_rpb in the x64 backend, using x64_rbp instead.	2022-11-01 13:23:33 -07:00
Trevor Elliott	0c53941364	Remove the need for count_operands by restructuring emit in s390x (#5164 ) Remove the need for count_operands by restructuring emit in the s390x backend to instead take the AllocationConsumer as an argument.	2022-11-01 10:05:45 -07:00
Nick Fitzgerald	d0673ff7da	Cranelift: use iterators instead of indexing; clean up match expressions (#5161 )	2022-11-01 09:48:40 -07:00
Nick Fitzgerald	6d289723bd	Cranelift: Use a single, shared vector allocation for all `ABIArg`s (#5127 ) * Cranelift: Use a single, shared vector allocation for all `ABIArg`s Instead of two `SmallVec`s per `SigData`. * Remove `Deref` and `DerefMut` impls for `ArgsAccumulator`	2022-10-31 14:32:17 -07:00
Afonso Bordado	04fa8951ba	aarch64: Deduplicate `bmask` lowering rule (#5154 )	2022-10-31 12:44:35 -07:00
11evan	4ca9e82bd1	cranelift: Add Bswap instruction (#1092 ) (#5147 ) Adds Bswap to the Cranelift IR. Implements the Bswap instruction in the x64 and aarch64 codegen backends. Cranelift users can now: ``` builder.ins().bswap(value) ``` to get a native byteswap instruction. * x64: implements the 32- and 64-bit bswap instruction, following the pattern set by similar unary instrutions (Neg and Not) - it only operates on a dst register, but is parameterized with both a src and dst which are expected to be the same register. As x64 bswap instruction is only for 32- or 64-bit registers, the 16-bit swap is implemented as a rotate left by 8. Updated x64 RexFlags type to support emitting for single-operand instructions like bswap * aarch64: Bswap gets emitted as aarch64 rev16, rev32, or rev64 instruction as appropriate. * s390x: Bswap was already supported in backend, just had to add a bit of plumbing * For completeness, added bswap to the interpreter as well. * added filetests and runtests for each ISA * added bswap to fuzzgen, thanks to afonso360 for the code there * 128-bit swaps are not yet implemented, that can be done later	2022-10-31 19:30:00 +00:00
Afonso Bordado	2fb76be2e4	x64: Add `bmask` implementation (#5148 )	2022-10-28 17:17:22 -07:00
Afonso Bordado	879b52825f	cranelift: Implement `ineg.i128` for everyone (#5129 ) * cranelift: Add `ineg` runtests * aarch64: Implement `ineg.i128` * x64: Implement `ineg.i128` * riscv: Implement `ineg.i128` * fuzzgen: Enable `ineg.i128`	2022-10-28 16:10:00 -07:00
Afonso Bordado	e8f3d03bbe	cranelift: Mask high bits on `bmask` for types smaller than a register (#5118 ) * aarch64: Fix incorrect masking for small types on bmask `bmask` was accidentally relying on the uppermost bits of the register for small types. This was found by fuzzgen, when it generated a shift left followed by a bmask, the shift left shifted the bits out of the range of the input type (i8), however these are not automatically cleared since they remained inside the 32 bits of the register. That caused issues when the bmask tried to compare the whole register instead of just the bottom bits. The solution here is to mask the upper bits for small types. * aarch64: Emit 32bit cmp on bmask This fixes an issue where bmask was accidentally comparing the upper bits of the register by always using a 64bit cmp. * riscv: Mask high bits in bmask * riscv: Add compile tests for br{z,nz} * riscv: Use shifts to mask 32bit values This produces less code than the AND since that version needs to load an immediate constant from memory. * cranelift: Update test input to hexadecimal values This makes it a bit more clear what is being tested. * riscv: Use addiw for masking 32 bit values Co-authored-by: Trevor Elliott <telliott@fastly.com> * aarch64: Update bmask rule priority Co-authored-by: Trevor Elliott <telliott@fastly.com>	2022-10-27 09:45:39 -07:00
Trevor Elliott	02620441c3	Add uadd_overflow_trap (#5123 ) Add a new instruction uadd_overflow_trap, which is a fused version of iadd_ifcout and trapif. Adding this instruction removes a dependency on the iflags type, and would allow us to move closer to removing it entirely. The instruction is defined for the i32 and i64 types only, and is currently only used in the legalization of heap_addr.	2022-10-27 09:43:15 -07:00
Afonso Bordado	4867813f77	cranelift: Remove `copy` instruction (#5125 )	2022-10-25 17:27:33 -07:00
Ulrich Weigand	b61e678309	s390x: Fix more regalloc checker errors (#5121 ) For VecInsertLane[Undef] and VecExtractLane, if lane_reg is zero_reg(), the instruction does not actually use any register value. Fixes https://github.com/bytecodealliance/wasmtime/issues/5090	2022-10-25 18:04:31 +00:00
Ulrich Weigand	39b3b1d772	s390x: Fix handling of sret arguments (#5116 ) Skip synthetic StructReturn entries in the return value list. Fixes https://github.com/bytecodealliance/wasmtime/issues/5089	2022-10-25 10:40:10 -07:00
Trevor Elliott	ec12415b1f	cranelift: Remove redundant branch and select instructions (#5097 ) As discussed in the 2022/10/19 meeting, this PR removes many of the branch and select instructions that used iflags, in favor if using brz/brnz and select in their place. Additionally, it reworks selectif_spectre_guard to take an i8 input instead of an iflags input. For reference, the removed instructions are: br_icmp, brif, brff, trueif, trueff, and selectif.	2022-10-24 16:14:35 -07:00
Ulrich Weigand	bfcf6616fe	s390x: clean up remnants of non-SSA code generation (#5096 ) Eliminate a few remaining instances of non-SSA code. Remove infrastructure previously used for non-SSA code emission. Related cleanup around flags handling.	2022-10-24 12:40:50 -07:00
Nick Fitzgerald	442f9fa01b	Cranelift: pass iterators to `ABIMachineSpec::compute_arg_locs` (#5095 ) Instead of slices. This gives us more flexibility to pass custom sequences without needing to allocate a `Vec` to hold them and pass in as a slice.	2022-10-21 16:08:09 -07:00
Nick Fitzgerald	5c5fa192f7	Cranelift: use `.enumerate()` to avoid indexing in s390x backend (#5094 ) This can help rustc/llvm avoid bounds checks, but more importantly I will have future changes here that remove indexing of params, and instead hand them out as an iterator.	2022-10-21 13:08:56 -07:00
Ulrich Weigand	9dadba60a0	s390x: use constraints for call arguments and return values (#5092 ) Use the regalloc constraint-based CallArgList / CallRetList mechanism instead of directly using physregs in instructions.	2022-10-21 11:01:22 -07:00
Chris Fallin	86e77953f8	Fix some egraph-related issues. (#5088 ) This fixes #5086 by addressing two separate issues: - The `ValueDataPacked::set_type()` helper had an embarrassing bitfield-manipulation bug that would mangle the rest of a `ValueDef` when setting its type. This is not normally used, only when the egraph elaboration fills in types after-the-fact on a multi-value node. - The lowering rules for `isplit` on aarch64 and s390x were dispatching on the first output type, rather than the input type. When only the second output is used (as in the example in #5086), the first output type actually remains `INVALID` (and this is fine because it's never used).	2022-10-21 10:24:48 -07:00
Trevor Elliott	d9753fac2b	Remove uses of `reg_mod` from s390x (#5073 ) Remove uses of reg_mod from the s390x backend. This required moving away from using r0/r1 as the result registers from a few different pseudo instructions, standardizing instead on r2/r3. That change was necessary as regalloc2 will not correctly allocate registers that aren't listed in the allocatable set, which r0/r1 are not. Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-21 09:22:16 -07:00
Trevor Elliott	32a7593c94	cranelift: Remove booleans (#5031 ) Remove the boolean types from cranelift, and the associated instructions breduce, bextend, bconst, and bint. Standardize on using 1/0 for the return value from instructions that produce scalar boolean results, and -1/0 for boolean vector elements. Fixes #3205 Co-authored-by: Afonso Bordado <afonso360@users.noreply.github.com> Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-10-17 16:00:27 -07:00
Chris Fallin	2be12a5167	egraph-based midend: draw the rest of the owl (productionized). (#4953 ) * egraph-based midend: draw the rest of the owl. * Rename `egg` submodule of cranelift-codegen to `egraph`. * Apply some feedback from @jsharp during code walkthrough. * Remove recursion from find_best_node by doing a single pass. Rather than recursively computing the lowest-cost node for a given eclass and memoizing the answer at each eclass node, we can do a single forward pass; because every eclass node refers only to earlier nodes, this is sufficient. The behavior may slightly differ from the earlier behavior because we cannot short-circuit costs to zero once a node is elaborated; but in practice this should not matter. * Make elaboration non-recursive. Use an explicit stack instead (with `ElabStackEntry` entries, alongside a result stack). * Make elaboration traversal of the domtree non-recursive/stack-safe. * Work analysis logic in Cranelift-side egraph glue into a general analysis framework in cranelift-egraph. * Apply static recursion limit to rule application. * Fix aarch64 wrt dynamic-vector support -- broken rebase. * Topo-sort cranelift-egraph before cranelift-codegen in publish script, like the comment instructs me to! * Fix multi-result call testcase. * Include `cranelift-egraph` in `PUBLISHED_CRATES`. * Fix atomic_rmw: not really a load. * Remove now-unnecessary PartialOrd/Ord derivations. * Address some code-review comments. * Review feedback. * Review feedback. * No overlap in mid-end rules, because we are defining a multi-constructor. * rustfmt * Review feedback. * Review feedback. * Review feedback. * Review feedback. * Remove redundant `mut`. * Add comment noting what rules can do. * Review feedback. * Clarify comment wording. * Update `has_memory_fence_semantics`. * Apply @jameysharp's improved loop-level computation. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion commit. * Fix off-by-one in new loop-nest analysis. * Review feedback. * Review feedback. * Review feedback. * Use `Default`, not `std::default::Default`, as per @fitzgen Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Apply @fitzgen's comment elaboration to a doc-comment. Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com> * Add stat for hitting the rewrite-depth limit. * Some code motion in split prelude to make the diff a little clearer wrt `main`. * Take @jameysharp's suggested `try_into()` usage for blockparam indices. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Take @jameysharp's suggestion to avoid double-match on load op. Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix suggestion (add import). * Review feedback. * Fix stack_load handling. * Remove redundant can_store case. * Take @jameysharp's suggested improvement to FuncEGraph::build() logic Co-authored-by: Jamey Sharp <jamey@minilop.net> * Tweaks to FuncEGraph::build() on top of suggestion. * Take @jameysharp's suggested clarified condition Co-authored-by: Jamey Sharp <jamey@minilop.net> * Clean up after suggestion (unused variable). * Fix loop analysis. * loop level asserts * Revert constant-space loop analysis -- edge cases were incorrect, so let's go with the simple thing for now. * Take @jameysharp's suggestion re: result_tys Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fix up after suggestion * Take @jameysharp's suggestion to use fold rather than reduce Co-authored-by: Jamey Sharp <jamey@minilop.net> * Fixup after suggestion * Take @jameysharp's suggestion to remove elaborate_eclass_use's return value. * Clarifying comment in terminator insts. Co-authored-by: Jamey Sharp <jamey@minilop.net> Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>	2022-10-11 18:15:53 -07:00
Jamey Sharp	04b30acad9	Misc cleanups (#5014 ) * Replace resize+copy_from_slice with extend_from_slice Vec::resize initializes the new space, which is wasted effort if we're just going to call `copy_from_slice` on it immediately afterward. Using `extend_from_slice` is simpler, and very slightly faster. If the new size were bigger than the buffer we're copying from, then it would make sense to initialize the excess. But it isn't: it's always exactly the same size. * Move helpers from Context to CompiledCode These methods only use information from Context::compiled_code, so they should live on CompiledCode instead. * Remove an unnecessary #[cfg_attr] There are other uses of `#[allow(clippy::too_many_arguments)]` in this file, so apparently it doesn't need to be guarded by the "cargo-clippy" feature. * Fix a few comments Two of these were wrong/misleading: - `FunctionBuilder::new` does not clear the provided func_ctx. It does debug-assert that the context is already clear, but I don't think that's worth a comment. - `switch_to_block` does not "create values for the arguments." That's done by the combination of `append_block_params_for_function_params` and `declare_wasm_parameters`. * wasmtime-cranelift: Misc cleanups The main change is to use the `CompiledCode` reference we already had instead of getting it out of `Context` repeatedly. This removes a bunch of `unwrap()` calls. * wasmtime-cranelift: Factor out uncached compile	2022-10-05 10:35:59 -07:00
Trevor Elliott	a209cb63f5	ISLE: Enable the overlap checker (#5011 ) This PR turns the overlap checker on by default, requiring the use of priorities to resolve overlap between rules.	2022-10-04 21:56:49 +00:00
yuyang	07584f6ac8	fix issue 4996. (#5003 )	2022-10-04 11:18:42 -07:00
Trevor Elliott	c9ff14e00b	Resolve overlap in the s390x backend (#5002 ) Resolve overlap in the s390x backend by adding rule priorities to disambiguate rule order.	2022-10-03 17:06:10 -07:00
Trevor Elliott	ab4be2bdd1	ISLE: Resolve overlaps in the aarch64 backend (#4988 )	2022-09-30 12:57:50 -07:00
Trevor Elliott	c1d6ca48a7	ISLE: Resolve overlap in the riscv64 backend (#4982 ) Resolve overlap in the RiscV64 backend by adding priorities to rules. Additionally, one test updated as a result of this work, as a peephole optimization for addition with immediates fires now.	2022-09-29 17:22:25 -07:00
Trevor Elliott	46e42601eb	ISLE: Always default the priority to 0 (#4983 ) * Always default priorities to 0 in the ISLE IR * Fix a hidden overlap in the x64 backend	2022-09-29 15:32:29 -07:00
Trevor Elliott	7d5b2b5bb2	ISLE: Add the overlap_errors pragma (#4981 ) * Add the overlap_errors pragma to ISLE * Enable overlap errors in the x64 backend	2022-09-29 12:40:39 -07:00
Trevor Elliott	db06e4e622	ISLE: Resolve remaining x64 overlap errors (#4977 ) Resolve overlap errors with the x64 backend.	2022-09-29 10:09:37 -07:00
Trevor Elliott	faf31f6216	ISLE: Resolve overlap in prelude.isle and x64/inst.isle (#4941 ) Resolve overlap in the ISLE prelude and the x64 inst module by introducing new types that allow better sharing of extractor resuls, or falling back on priorities.	2022-09-28 10:54:39 -07:00
Ulrich Weigand	83bb6fd4c8	s390x: Fix regalloc checker error (#4973 ) For ShiftRR and VecShiftRR, if shift_reg is zero_reg(), the instruction does not actually use any register value. Fixes #4969	2022-09-28 08:25:23 -07:00
yuyang-ok	cdecc858b4	add riscv64 backend for cranelift. (#4271 ) Add a RISC-V 64 (`riscv64`, RV64GC) backend. Co-authored-by: yuyang <756445638@qq.com> Co-authored-by: Chris Fallin <chris@cfallin.org> Co-authored-by: Afonso Bordado <afonsobordado@az8.co>	2022-09-27 17:30:31 -07:00
bjorn3	af226d37c2	[AArch64] Fix incorrect regalloc constraints for atomic_cas (#4959 ) * [AArch64] Fix incorrect regalloc constraints for atomic_cas * Update test for latest Cranelift changes	2022-09-26 16:05:57 +00:00
Damian Heaton	3a2b32bf4d	Port branches to ISLE (AArch64) (#4943 ) * Port branches to ISLE (AArch64) Ported the existing implementations of the following opcodes for AArch64 to ISLE: - `Brz` - `Brnz` - `Brif` - `Brff` - `BrIcmp` - `Jump` - `BrTable` Copyright (c) 2022 Arm Limited * Remove dead code Copyright (c) 2022 Arm Limited	2022-09-26 09:45:32 +01:00
Damian Heaton	3f8cccfb59	Port flag-based ops to ISLE (AArch64) (#4942 ) Ported the existing implementations of the following opcodes for AArch64 to ISLE: - `Trueif` - `Trueff` - `Trapif` - `Trapff` - `Select` - `Selectif` - `SelectifSpectreGuard` Copyright (c) 2022 Arm Limited	2022-09-22 15:44:32 -07:00
Chris Fallin	b652ce2fb1	ISLE: add support for multi-extractors and multi-constructors. (#4908 ) * ISLE: add support for multi-extractors and multi-constructors. This support allows for rules that process multiple matching values per extractor call on the left-hand side, and as a result, can produce multiple values from the constructor whose body they define. This is useful in situations where we are matching on an input data structure that can have multiple "nodes" for a given value or ID, for example in an e-graph. * Review feedback: all multi-ctors and multi-etors return iterators; no `Vec` case. * Add additional warning suppressions to generated-code toplevels to be consistent with new islec output.	2022-09-21 23:36:50 +00:00
Damian Heaton	352c7595c6	Improve `fcvt_to_{u,s}int_sat` lowering (AArch64) (#4913 ) Improved the instruction lowering for the following opcodes on AArch64, and introduced support for converting to integers less than 32-bits wide as per the docs: - `FcvtToSintSat` - `FcvtToUintSat` Copyright (c) 2022 Arm Limited	2022-09-21 10:16:09 -07:00
Damian Heaton	e786bda002	Vector bitcast support (AArch64 & Interpreter) (#4820 ) * Vector bitcast support (AArch64 & Interpreter) Implemented support for `bitcast` on vector values for AArch64 and the interpreter. Also corrected the verifier to ensure that the size, in bits, of the input and output types match for a `bitcast`, per the docs. Copyright (c) 2022 Arm Limited * `I128` same-type bitcast support Copyright (c) 2022 Arm Limited * Directly return input for 64-bit GPR<=>GPR bitcast Copyright (c) 2022 Arm Limited	2022-09-21 09:20:28 -07:00
Chris Fallin	05cbd667c7	Cranelift: use regalloc2 constraints on caller side of ABI code. (#4892 ) * Cranelift: use regalloc2 constraints on caller side of ABI code. This PR updates the shared ABI code and backends to use register-operand constraints rather than explicit pinned-vreg moves for register arguments and return values. The s390x backend was not updated, because it has its own implementation of ABI code. Ideally we could converge back to the code shared by x64 and aarch64 (which didn't exist when s390x ported calls to ISLE, so the current situation is underestandable, to be clear!). I'll leave this for future work. This PR exposed several places where regalloc2 needed to be a bit more flexible with constraints; it requires regalloc2#74 to be merged and pulled in. * Update to regalloc2 0.3.3. In addition to version bump, this required removing two asserts as `SpillSlot`s no longer carry their class (so we can't assert that they have the correct class). * Review comments. * Filetest updates. * Add cargo-vet audit for regalloc2 0.3.2 -> 0.3.3 upgrade. * Update to regalloc2 0.4.0.	2022-09-21 01:17:04 +00:00
Damian Heaton	e9b08b856d	Port `icmp` to ISLE (AArch64) (#4898 ) * Port `icmp` to ISLE (AArch64) Ported the existing implementation of `icmp` (and, by extension, the `lower_icmp` function) to ISLE for AArch64. Copyright (c) 2022 Arm Limited * Allow 'producer chains', eliminating `Nop0`s Copyright (c) 2022 Arm Limited	2022-09-13 08:56:50 -07:00
Trevor Elliott	ad09c273c6	Don't merge loads for xmm registers (#4891 ) Do not merge loads for xmm registers, as alignment requirements currently aren't satisfied with clif lowered from wasm. Fixes #4890	2022-09-12 10:14:35 -07:00

1 2 3 4 5 ...

1095 Commits