wasmtime

Author	SHA1	Message	Date
Chris Fallin	0af8737ec3	Add support for running the regalloc2 checker. (#4043 ) With these fixes, all this PR has to do is instantiate and run the checker on the `regalloc2::Output`. This is off by default, and is enabled by setting the `regalloc_checker` Cranelift option. This restores the old functionality provided by e.g. the `backtracking_checked` regalloc algorithm setting rather than `backtracking` when we were still on regalloc.rs.	2022-04-18 14:06:07 -07:00
Chris Fallin	5aa9bdc7eb	Cranelift: fix fuzzbug in critical-edge splitting. (#4044 ) regalloc2 is a bit pickier about critical edges than regalloc.rs was, because of how it inserts moves. In particular, if a branch has any arguments (e.g., a conditional branch or br_table), its successors must all have only one predecessor, so we can do edge moves at the top of successor blocks rather than at the end of this block. Otherwise, moves that semantically must come after the block's last uses (the branch's args) would be placed before it. This is almost always the case, because crit-edge splitting ensures that if we have more than one succ, all our succs will have only one pred. This is because branch kinds that take arguments (fixed args, not the blockparam args) tend to have more than one successor: conditionals and br_tables. However, a fuzzbug recently illuminated one corner case I had missed: a br_table can have one successor only, if it has a default target and an empty table. In this case, crit-edge splitting will happily skip a split and assume that we can insert edge moves at the end of the block with the br_table. But this will fail. regalloc2 explicitly checks this and bails with a panic, rather than continue, so no miscompilation is possible; but without this fix, we will get these panics on br_tables with empty tables.	2022-04-18 10:59:26 -07:00
Chris Fallin	5774e068b7	Cranelift: fix regalloc2 integration bug wrt blockparam branch args. (#4042 ) Previously, the block successor accumulation and the blockparam branch arg setup were decoupled. The lowering backend implicitly specified the order of successor edges via its `MachTerminator` enum on the last instruction in the block, while the `Lower` toplevel machine-independent driver set up blockparam branch args in the edge order seen in CLIF. In some cases, these orders did not match -- for example, when the conditional branch depended on an FP condition that was implemented by swapping taken/not-taken edges and inverting the condition code. This PR refactors the successor handling to be centralized in `Lower` rather than flow through the terminator `MachInst`, and adds a successor block and its blockparam args at the same time, ensuring the orders match.	2022-04-18 09:53:57 -07:00
Chris Fallin	7cf5f05830	Cranelift: remove slow invariant validation in cfg(fuzzing) from MachBuffer. (#4038 ) Following the merge of regalloc2 support, this became slower because we are stricter about the critical-edge invariant, generating a separate edge block for every out-edge even if two or more out-edges go to the same successor (this is significant in cases of `br_table` with many entries having the same target block, for example). Many of those edge blocks are empty and end up collapsed by the MachBuffer, which leads to a large set of aliased labels. The invariant validation will dutifully iterate over all the data structures at every step, validating all of our conditions. But this gets way slower in the new context, to the point that we'll probably have some fuzz timeouts. This was pointed out in [1] but I missed removing this in #3989. Given that `MachBuffer` has been around for nearly two years now, has been fuzzed continuously with the invariant validation for that time, and also has a correctness proof in the comments, it's probably reasonable to remove this high (recently increased) cost from the fuzzing-specific compilation configuration. [1] https://github.com/bytecodealliance/wasmtime/pull/3989#discussion_r847712263	2022-04-15 09:04:02 -05:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Andrew Brown	f62199da8c	x64: port `load` to ISLE (#3993 ) This change moves the majority of the lowerings for CLIF's `load` instruction over to ISLE. To do so, it also migrates the previous mechanism for creating an `Amode` (`lower_to_amode`) to several ISLE rules (see `to_amode`).	2022-04-07 18:31:22 -07:00
Chris Fallin	666c2554ea	Merge pull request from GHSA-gwc9-348x-qwv2 * Run the GC smoketest with epoch support enabled as well. * Handle safepoints in cold blocks properly. Currently, the way that we find safepoint slots for a given instruction relies on the instruction index order in the safepoint list matching the order of instruction emission. Previous to the introduction of cold-block support, this was trivially satisfied by sorting the safepoint list: we emit instructions 0, 1, 2, 3, 4, ..., and so if we have safepoints at instructions 1 and 4, we will encounter them in that order. However, cold blocks are supported by swizzling the emission order at the last moment (to avoid having to renumber instructions partway through the compilation pipeline), so we actually emit instructions out of index order when cold blocks are present. Reference-type support in Wasm in particular uses cold blocks for slowpaths, and has live refs and safepoints in these slowpaths, so we can reliably "skip" a safepoint (not emit any metadata for it) in the presence of reftype usage. This PR fixes the emission code by building a map from instruction index to safepoint index first, then doing lookups through this map, rather than following along in-order as it emits instructions.	2022-03-31 14:26:01 -07:00
Andrew Brown	bd6fe11ca9	cranelift: remove `load_complex` and `store_complex` (#3976 ) This change removes all variants of `load_complex` and `store_complex` from Cranelift; this is a breaking change to the instructions exposed by CLIF. The complete list of instructions removed is: `load_complex`, `store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`, `sload8_complex`, `uload16_complex`, `sload16_complex`, `istore16_complex`, `uload32_complex`, `sload32_complex`, `istore32_complex`, `uload8x8_complex`, `sload8x8_complex`, `sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`, `sload32x2_complex`. The rationale for this removal is that the Cranelift backend now has the ability to pattern-match multiple upstream additions in order to calculate the address to access. Previously, this was not possible so the `*_complex` instructions were needed. Over time, these instructions have fallen out of use in this repository, making the additional overhead of maintaining them a chore.	2022-03-31 10:05:10 -07:00
Andrew Brown	5d8dd648d7	x64: port `fcmp` to ISLE (#3967 ) * x64: port scalar `fcmp` to ISLE Implement the CLIF lowering for the `fcmp` to ISLE. This adds a new type-matcher, `ty_scalar_float`, for detecting uses of `F32` and `F64`. * isle: rename `vec128` to `ty_vec12` This refactoring changes the name of the `vec128` matcher function to follow the `ty_` convention of the other type matchers. It also makes the helper an inline function call. x64: port vector `fcmp` to ISLE	2022-03-29 15:41:49 -07:00
Alex Crichton	4d404c90b4	Ensure functions are aligned properly on AArch64 (#3908 ) Previously (as in an hour ago) #3905 landed a new ability for fuzzing to arbitrarily insert padding between functions. Running some fuzzers locally though this instantly hit a lot of problems on AArch64 because the arbitrary padding isn't aligned to 4 bytes like all other functions are. To fix this issue appending functions now correctly aligns the output as appropriate for the platform. The alignment argument for appending was switched to `None` where `None` means "use the platform default" and otherwise and explicit alignment can be specified for inserting other data (like arbitrary padding or Windows unwind tables).	2022-03-09 15:45:30 -06:00
Chris Fallin	90a081a731	ISLE: port extend/reduce opcodes on x64. (#3849 )	2022-02-28 11:49:28 -08:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Ulrich Weigand	b064e60087	ISLE: Re-implement ValueSlice (#3784 ) The current definition of `ValueSlice` is not usable, since any call to a constructor returning a `ValueSlice` will extend the mutable borrow on the context taken by the constructor call, with the result that it cannot be passed to any other constructor ever. Re-implement `ValueSlice` as a pair of a `ValueList` identifer plus an offset into the list. This type can simply be copied without requiring a borrow on the context.	2022-02-24 15:24:40 -08:00
Ulrich Weigand	07d615d3f7	ISLE: Lowering of multi-output instructions (#3783 ) This changes the output of the `lower` constructor from a `ValueRegs` to a new `InstOutput` type, which is a vector of `ValueRegs`. Code in `lower_common` is updated to use this new type to handle instructions with multiple outputs. All back-ends are updated to use the new type.	2022-02-24 14:03:06 -08:00
Chris Fallin	ca0e8d0a1d	Remove incomplete/unmaintained ARM32 backend (for now). (#3799 ) In #3721, we have been discussing what to do about the ARM32 backend in Cranelift. Currently, this backend supports only 32-bit types, which is insufficient for full Wasm-MVP; it's missing other critical bits, like floating-point support; and it has only ever been exercised, AFAIK, via the filetests for the individual CLIF instructions that are implemented. We were very very thankful for the original contribution of this backend, even in its partial state, and we had hoped at the time that we could eventually mature it in-tree until it supported e.g. Wasm and other use-cases. But that hasn't yet happened -- to the blame of no-one, to be clear, we just haven't had a contributor with sufficient time. Unfortunately, the existence of the backend and lack of active maintainer now potentially pose a bit of a burden as we hope to make continuing changes to the backend framework. For example, the ISLE migration, and the use of regalloc2 that it will allow, would need all of the existing lowering patterns in the hand-written ARM32 backend to be rewritten as ISLE rules. Given that we don't currently have the resources to do this, we think it's probably best if we, sadly, for now remove this partial backend. This is not in any way a statement of what we might accept in the future, though. If, in the future, an ARM32 backend updated to our latest codebase with an active maintainer were to appear, we'd be happy to merge it (and likewise for any other architecture!). But for now, this is probably the best path. Thanks again to the original contributor @jmkrauz and we hope that this work can eventually be brought back and reused if someone has the time to do so!	2022-02-14 15:03:52 -08:00
Nick Fitzgerald	dc86e7a6dc	cranelift: Use GPR newtypes extensively in x64 lowering (#3798 ) We already defined the `Gpr` newtype and used it in a few places, and we already defined the `Xmm` newtype and used it extensively. This finishes the transition to using the newtypes extensively in lowering by making use of `Gpr` in more places. Fixes #3685	2022-02-14 12:54:41 -08:00
Ulrich Weigand	10198553c7	ISLE: Common accessors for some insn data fields (#3781 ) Add accessors to prelude.isle to access data fields of `func_addr` and `symbol_value` instructions. These are based on similar versions I had added to the s390x back-end, but are a bit more straightforward to use. - func_ref_data: Extract SigRef, ExternalName, and RelocDistance fields given a FuncRef. - symbol_value_data: Extract ExternalName, RelocDistance, and offset fields given a GlobalValue representing a Symbol. - reloc_distance_near: Test for RelocDistance::Near. The s390x back-end is changed to use these common versions. Note that this exposed a bug in common isle code: This extractor: (extractor (load_sym inst) (and inst (load _ (def_inst (symbol_value (symbol_value_data _ (reloc_distance_near) offset))) (i64_from_offset (memarg_symbol_offset_sum <offset _))))) would raise an assertion in sema.rs due to a supposed cycle in extractor definitions. But there was no actual cycle, it was simply that the extractor tree refers twice to the `insn_data` extractor (once via the `load` and once via the `symbol_value` extractor). Fixed by checking for pre-existing definitions only along one path in the tree, not across the whole tree.	2022-02-08 17:57:27 -08:00
Chris Fallin	d9d6469422	Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout. The debuginfo analyses are written with the assumption that the order of instructions in the VCode is the order of instructions in the final machine ocde. This was previously a strong invariant, until we introduced support for cold blocks. Cold blocks are implemented by reordering during emission, because the VCode ordering has other requirements related to lowering (respecting def-use dependencies in the reverse pass), so it is much simpler to reorder instructions at the last moment. Unfortunately, this causes the breakage we now see. This commit fixes the issue by skipping all cold instructions when emitting value-label ranges (which are translated into debuginfo). This means that variables defined in cold blocks will not have DWARF metadata. But cold blocks are usually compiler-inserted slowpaths, not user code, so this is probably OK. Debuginfo is always best-effort, so in any case this does not violate any correctness constraints.	2022-02-04 23:15:04 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00
Ulrich Weigand	36369a6f35	s390x: Migrate branches and traps to ISLE In order to migrate branches to ISLE, we define a second entry point `lower_branch` which gets the list of branch targets as additional argument. This requires a small change to `lower_common`: the `isle_lower` callback argument is changed from a function pointer to a closure. This allows passing the extra argument via a closure. Traps make use of the recently added facility to emit safepoints from ISLE, but are otherwise straightforward.	2022-01-25 18:15:32 +01:00
Ulrich Weigand	906f6a35cf	ISLE: Allow emitting safepoint insns Change the implementation of emitted_insts in IsleContext from a plain vector of instructions into a vector of tuples, where the second element is a boolean that indicates whether this instruction should be emitted as a safepoint. This allows targets to emit safepoint insns via ISLE.	2022-01-25 14:21:41 +01:00
Chris Fallin	ef1b2d2fa8	Cranelift: Fix cold-blocks-related lowering bug. If a block is marked cold but has side-effect-free code that is only used by side-effectful code in non-cold blocks, we will erroneously fail to emit it, causing a regalloc failure. This is due to the interaction of block ordering and lowering: we rely on block ordering to visit uses before defs (except for backedges) so that we can effectively do an inline liveness analysis and skip lowering operations that are not used anywhere. This "inline DCE" is needed because instruction lowering can pattern-match and merge one instruction into another, removing the need to generate the source instruction. Unfortunately, the way that I added cold-block support in #3698 was oblivious to this -- it just changed the block sort order. For efficiency reasons, we generate code in its final order directly, so it would not be tenable to generate it in e.g. RPO first and then reorder cold blocks to the bottom; we really do want to visit in the same order as the final code. This PR fixes the bug by moving the point at which cold blocks are sunk to emission-time instead. This is cheaper than either trying to visit blocks during lowering in RPO but add to VCode out-of-order, or trying to do some expensive analysis to recover proper liveness. It's not clear that the latter would be possible anyway -- the need to lower some instructions depends on other instructions' isel results/merging success, so we really do need to visit in RPO, and we can't simply lower all instructions as side-effecting roots (some can't be toplevel nodes). The one downside of this approach is that the VCode itself still has cold blocks inline; so in the text format (and hence compile-tests) it's not possible to see the sinking. This PR adds a test for cold-block sinking that actually verifies the machine code. (The test also includes an add-instruction in the cold path that would have been incorrectly skipped prior to this fix.) Fortunately this bug would not have been triggered by the one current use of cold blocks in #3699, because there the only operation in the cold block was an (always effectful) call instruction. The worst-case effect of the bug in other code would be a regalloc panic; no silent miscompilations could result.	2022-01-21 10:47:49 -08:00
Ulrich Weigand	be60a19623	ISLE standard prelude: Additional types and helpers In preparing to move the s390x back-end to ISLE, I noticed a few missing pieces in the common prelude code. This patch: - Defines the reference types $R32 / $R64. - Provides a trap_code_bad_conversion_to_integer helper. - Provides an avoid_div_traps helper. This requires passing the generic flags in addition to the ISA-specifc flags into the ISLE lowering context.	2022-01-20 17:23:31 +01:00
Chris Fallin	ae476fde60	Merge pull request #3698 from cfallin/cold-blocks Cranelift: add support for cold blocks.	2022-01-19 12:58:33 -08:00
Chris Fallin	f489b83835	Cranelift: add support for cold blocks. This PR adds a flag to each block that can be set via the frontend/builder interface that indicates that the block will not be frequently executed. As such, the compiler backend should place the block "out of line" in the final machine code, so that the ordinary, more frequent execution path that excludes the block does not have to jump around it. This is useful for adding handlers for exceptional conditions (slow-paths, guard violations) in a way that minimizes performance cost. Fixes #2747.	2022-01-19 12:17:41 -08:00
Anton Kirilov	89919f4b1f	Pass the ISA-specific compilation flags to the ABI implementations Copyright (c) 2021, Arm Limited.	2022-01-14 14:18:01 +00:00
Nick Fitzgerald	658c5d33c1	cranelift: Port `trap` and `resumable_trap` lowering to ISLE on x64	2022-01-13 15:57:17 -08:00
Nick Fitzgerald	a7dba81c1d	cranelift: Port `ishl` SIMD lowerings to ISLE (#3686 )	2022-01-13 09:34:37 -06:00
Chris Fallin	13f17db297	Merge pull request #3680 from bjorn3/remove_code_sink Remove the CodeSink interface in favor of MachBufferFinalized	2022-01-12 10:47:23 -08:00
Nick Fitzgerald	7454f1f3af	cranelift: port `sshr` to ISLE on x64 (#3681 )	2022-01-12 09:13:58 -06:00
bjorn3	f0e821b9e0	Remove all Sink traits	2022-01-11 19:03:10 +01:00
bjorn3	55d722db05	Remove CodeSink	2022-01-11 17:10:37 +01:00
bjorn3	a48a60f958	Remove reloc_external from CodeSink And introduce MachBufferFinalized::relocs() in the place.	2022-01-11 16:54:27 +01:00
bjorn3	63e2360346	Remove trap from CodeSink And introduce MachBufferFinalized::traps() in the place.	2022-01-11 16:42:52 +01:00
bjorn3	38aaa6e1da	Remove add_call_site from CodeSink and RelocSink And introduce MachBufferFinalized::call_sites() in the place.	2022-01-11 16:32:57 +01:00
bjorn3	37598ad170	Remove end_codegen method from CodeSink	2022-01-11 14:52:04 +01:00
bjorn3	354c4f7bf8	Remove unused CodeSink methods	2022-01-11 14:52:04 +01:00
bjorn3	88baac4ca6	Move the TestCodeSink functionality to MachBufferFinalized	2022-01-11 14:40:53 +01:00
bjorn3	376c93bda0	Remove MachBackend It is identical to TargetIsa	2022-01-06 15:08:12 +01:00
bjorn3	58c25d9e24	Add text_section_builder method to TargetIsa	2022-01-06 14:39:50 +01:00
bjorn3	03dc74d8e7	Add emit_unwind_info method to TargetIsa	2022-01-06 14:39:50 +01:00
bjorn3	9eba87a6c8	Add compile_function method to TargetIsa	2022-01-06 14:39:50 +01:00
bjorn3	d50f27e8f9	Remove reg_universe method from MachBackend and MachInst	2022-01-06 14:39:50 +01:00
bjorn3	96b8879e4b	Take reg_universe as argument to machinst::compile	2022-01-06 14:39:50 +01:00
Chris Fallin	e2b37a57dc	Merge pull request #3639 from bjorn3/machinst_cleanups Various cleanups around machinst	2022-01-05 10:01:27 -08:00
Chris Fallin	833ebeed76	Fix spillslot size bug in SIMD by removing type-dependent spillslot allocation. This patch makes spillslot allocation, spilling and reloading all based on register class only. Hence when we have a 32- or 64-bit value in a 128-bit XMM register on x86-64 or vector register on aarch64, this results in larger spillslots and spills/restores. Why make this change, if it results in less efficient stack-frame usage? Simply put, it is safer: there is always a risk when allocating spillslots or spilling/reloading that we get the wrong type and make the spillslot or the store/load too small. This was one contributing factor to CVE-2021-32629, and is now the source of a fuzzbug in SIMD code that puns an arbitrary user-controlled vector constant over another stackslot. (If this were a pointer, that could result in RCE. SIMD is not yet on by default in a release, fortunately. In particular, we have not been particularly careful about using moves between values of different types, for example with `raw_bitcast` or with certain SIMD operations, and such moves indicate to regalloc.rs that vregs are in equivalence classes and some arbitrary vreg in the class is provided when allocating the spillslot or spilling/reloading. Since regalloc.rs does not track actual type, and since we haven't been careful about moves, we can't really trust this "arbitrary vreg in equivalence class" to provide accurate type information. In the fix to CVE-2021-32629 we fixed this for integer registers by always spilling/reloading 64 bits; this fix can be seen as the analogous change for FP/vector regs.	2022-01-04 13:24:40 -08:00
bjorn3	17c3c1813f	Remove MachInstEmitInfo	2022-01-04 18:06:01 +01:00
bjorn3	8d1fc75b6b	Make MachBackend::triple return &Triple This avoids an unnecessary clone	2022-01-04 18:06:01 +01:00
bjorn3	4915162230	Remove unnecessary fields from CodeInfo	2022-01-04 18:05:45 +01:00
bjorn3	e98a85e1e2	Make get_mach_backend non-optional	2022-01-04 15:48:19 +01:00

1 2 3 4 5

217 Commits