wasmtime

Author	SHA1	Message	Date
Chris Fallin	0824abbae4	Add a basic alias analysis with redundant-load elim and store-to-load fowarding opts. (#4163 ) This PR adds a basic alias analysis, and optimizations that use it. This is a "mid-end optimization": it operates on CLIF, the machine-independent IR, before lowering occurs. The alias analysis (or maybe more properly, a sort of memory-value analysis) determines when it can prove a particular memory location is equal to a given SSA value, and when it can, it replaces any loads of that location. This subsumes two common optimizations: * Redundant load elimination: when the same memory address is loaded two times, and it can be proven that no intervening operations will write to that memory, then the second load is redundant and its result must be the same as the first. We can use the first load's result and remove the second load. * Store-to-load forwarding: when a load can be proven to access exactly the memory written by a preceding store, we can replace the load's result with the store's data operand, and remove the load. Both of these optimizations rely on a "last store" analysis that is a sort of coloring mechanism, split across disjoint categories of abstract state. The basic idea is that every memory-accessing operation is put into one of N disjoint categories; it is disallowed for memory to ever be accessed by an op in one category and later accessed by an op in another category. (The frontend must ensure this.) Then, given this, we scan the code and determine, for each memory-accessing op, when a single prior instruction is a store to the same category. This "colors" the instruction: it is, in a sense, a static name for that version of memory. This analysis provides an important invariant: if two operations access memory with the same last-store, then no other store can alias in the time between that last store and these operations. This must-not-alias property, together with a check that the accessed address is exactly the same (same SSA value and offset), and other attributes of the access (type, extension mode) are the same, let us prove that the results are the same. Given last-store info, we scan the instructions and build a table from "memory location" key (last store, address, offset, type, extension) to known SSA value stored in that location. A store inserts a new mapping. A load may also insert a new mapping, if we didn't already have one. Then when a load occurs and an entry already exists for its "location", we can reuse the value. This will be either RLE or St-to-Ld depending on where the value came from. Note that this does work across basic blocks: the last-store analysis is a full iterative dataflow pass, and we are careful to check dominance of a previously-defined value before aliasing to it at a potentially redundant load. So we will do the right thing if we only have a "partially redundant" load (loaded already but only in one predecessor block), but we will also correctly reuse a value if there is a store or load above a loop and a redundant load of that value within the loop, as long as no potentially-aliasing stores happen within the loop.	2022-05-20 13:19:32 -07:00
Anton Kirilov	edf07a8da6	Cranelift AArch64: Migrate Bitselect and Vselect to ISLE (#4139 ) Copyright (c) 2022, Arm Limited.	2022-05-16 09:39:28 -07:00
Chris Fallin	eb435f3057	x64: use constant pool for u64 constants rather than `movabs`. (#4088 ) * Allow emitting u64 constants into constant pool. * Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm. * Fix rip-relative addressing bug in pinsrd emission.	2022-05-10 09:21:05 -07:00
Chris Fallin	f85047b084	Rework x64 addressing-mode lowering to be slightly more flexible. (#4080 ) This PR refactors the x64 backend address-mode lowering to use an incremental-build approach, where it considers each node in a tree of `iadd`s that feed into a load/store address and, at each step, builds the best possible `Amode`. It will combine an arbitrary number of constant offsets (an extension beyond the current rules), and can capture a left-shifted (scaled) index in any position of the tree (another extension). This doesn't have any measurable performance improvement on our Wasm benchmarks in Sightglass, unfortunately, because the IR lowered from wasm32 will do address computation in 32 bits and then `uextend` it to add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64 bits because this loses the wraparound semantics. (We could label adds as "expected not to overflow", and allow those to be lifted to 64 bit operations; wasm32 heap address computation should fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's likely my next step.) Nevertheless, (i) this generalizes the cases we can handle, which should be a good thing, all other things being equal (and in this case, no compile time impact was measured); and (ii) might benefit non-Wasm frontends.	2022-05-02 16:20:39 -07:00
Sam Parker	12b4374cd5	[AArch64] Port atomic rmw to ISLE (#4021 ) Also fix and extend the current implementation: - AtomicRMWOp::Clr != AtomicRmwOp::And, as the input needs to be inverted first. - Inputs to the cmp for the RMWLoop case are sign-extended when needed. - Lower Xchg to Swp. - Lower Sub to Add with a negated input. - Added more runtests. Copyright (c) 2022, Arm Limited.	2022-04-27 13:13:59 -07:00
Chris Fallin	164bfeaf7e	x64 backend: migrate stores, and remainder of loads (I128 case), to ISLE. (#4069 )	2022-04-26 09:50:46 -07:00
Chris Fallin	e4b7c8a737	Cranelift: fix #3953 : rework single/multiple-use logic in lowering. (#4061 ) * Cranelift: fix #3953: rework single/multiple-use logic in lowering. This PR addresses the longstanding issue with loads trying to merge into compares on x86-64, and more generally, with the lowering framework falsely recognizing "single uses" of one op by another (which would normally allow merging of side-effecting ops like loads) when there is indirect duplication. To fix this, we replace the direct `value_uses` count with a transitive notion of uniqueness (not unlike Rust's `&`/`&mut` and how a `&mut` downgrades to `&` when accessed through another `&`!). A value is used multiple times transitively if it has multiple direct uses, or is used by another op that is used multiple times transitively. The canonical example of badness is: ``` v1 := load v2 := ifcmp v1, ... v3 := selectif v2, ... v4 := selectif v2, ... ``` both `v3` and `v4` effectively merge the `ifcmp` (`v2`), so even though the use of `v1` is "unique", it is codegenned twice. This is why we ~~can't have nice things~~ can't merge loads into compares (#3953). There is quite a subtle and interesting design space around this problem and how we might solve it. See the long doc-comment on `ValueUseState` in this PR for more justification for the particular design here. In particular, this design deliberately simplifies a bit relative to an "optimal" solution: some uses can become unique depending on merging, but we don't design our data structures for such updates because that would require significant extra costly tracking (some sort of transitive refcounting). For example, in the above, if `selectif` somehow did not merge `ifcmp`, then we would only codegen the `ifcmp` once into its result register (and use that register twice); then the load is uniquely used, and could be merged. But that requires transitioning from "multiple use" back to "unique use" with careful tracking as we do pattern-matching, which I've chosen to make out-of-scope here for now. In practice, I don't think it will matter too much (and we can always improve later). With this PR, we can now re-enable load-op merging for compares. A subsequent commit does this. * Update x64 backend to allow load-op merging for `cmp`. * Update filetests. * Add test for cmp-mem merging on x64. * Comment fixes. * Rework ValueUseState analysis for better performance. * Update s390x filetest: iadd_ifcout cannot merge loads anymore because it has multiple outputs (ValueUseState limitation) * Address review comments.	2022-04-22 18:00:48 -07:00
Chris Fallin	0af8737ec3	Add support for running the regalloc2 checker. (#4043 ) With these fixes, all this PR has to do is instantiate and run the checker on the `regalloc2::Output`. This is off by default, and is enabled by setting the `regalloc_checker` Cranelift option. This restores the old functionality provided by e.g. the `backtracking_checked` regalloc algorithm setting rather than `backtracking` when we were still on regalloc.rs.	2022-04-18 14:06:07 -07:00
Chris Fallin	5aa9bdc7eb	Cranelift: fix fuzzbug in critical-edge splitting. (#4044 ) regalloc2 is a bit pickier about critical edges than regalloc.rs was, because of how it inserts moves. In particular, if a branch has any arguments (e.g., a conditional branch or br_table), its successors must all have only one predecessor, so we can do edge moves at the top of successor blocks rather than at the end of this block. Otherwise, moves that semantically must come after the block's last uses (the branch's args) would be placed before it. This is almost always the case, because crit-edge splitting ensures that if we have more than one succ, all our succs will have only one pred. This is because branch kinds that take arguments (fixed args, not the blockparam args) tend to have more than one successor: conditionals and br_tables. However, a fuzzbug recently illuminated one corner case I had missed: a br_table can have one successor only, if it has a default target and an empty table. In this case, crit-edge splitting will happily skip a split and assume that we can insert edge moves at the end of the block with the br_table. But this will fail. regalloc2 explicitly checks this and bails with a panic, rather than continue, so no miscompilation is possible; but without this fix, we will get these panics on br_tables with empty tables.	2022-04-18 10:59:26 -07:00
Chris Fallin	5774e068b7	Cranelift: fix regalloc2 integration bug wrt blockparam branch args. (#4042 ) Previously, the block successor accumulation and the blockparam branch arg setup were decoupled. The lowering backend implicitly specified the order of successor edges via its `MachTerminator` enum on the last instruction in the block, while the `Lower` toplevel machine-independent driver set up blockparam branch args in the edge order seen in CLIF. In some cases, these orders did not match -- for example, when the conditional branch depended on an FP condition that was implemented by swapping taken/not-taken edges and inverting the condition code. This PR refactors the successor handling to be centralized in `Lower` rather than flow through the terminator `MachInst`, and adds a successor block and its blockparam args at the same time, ensuring the orders match.	2022-04-18 09:53:57 -07:00
Chris Fallin	7cf5f05830	Cranelift: remove slow invariant validation in cfg(fuzzing) from MachBuffer. (#4038 ) Following the merge of regalloc2 support, this became slower because we are stricter about the critical-edge invariant, generating a separate edge block for every out-edge even if two or more out-edges go to the same successor (this is significant in cases of `br_table` with many entries having the same target block, for example). Many of those edge blocks are empty and end up collapsed by the MachBuffer, which leads to a large set of aliased labels. The invariant validation will dutifully iterate over all the data structures at every step, validating all of our conditions. But this gets way slower in the new context, to the point that we'll probably have some fuzz timeouts. This was pointed out in [1] but I missed removing this in #3989. Given that `MachBuffer` has been around for nearly two years now, has been fuzzed continuously with the invariant validation for that time, and also has a correctness proof in the comments, it's probably reasonable to remove this high (recently increased) cost from the fuzzing-specific compilation configuration. [1] https://github.com/bytecodealliance/wasmtime/pull/3989#discussion_r847712263	2022-04-15 09:04:02 -05:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Andrew Brown	f62199da8c	x64: port `load` to ISLE (#3993 ) This change moves the majority of the lowerings for CLIF's `load` instruction over to ISLE. To do so, it also migrates the previous mechanism for creating an `Amode` (`lower_to_amode`) to several ISLE rules (see `to_amode`).	2022-04-07 18:31:22 -07:00
Chris Fallin	666c2554ea	Merge pull request from GHSA-gwc9-348x-qwv2 * Run the GC smoketest with epoch support enabled as well. * Handle safepoints in cold blocks properly. Currently, the way that we find safepoint slots for a given instruction relies on the instruction index order in the safepoint list matching the order of instruction emission. Previous to the introduction of cold-block support, this was trivially satisfied by sorting the safepoint list: we emit instructions 0, 1, 2, 3, 4, ..., and so if we have safepoints at instructions 1 and 4, we will encounter them in that order. However, cold blocks are supported by swizzling the emission order at the last moment (to avoid having to renumber instructions partway through the compilation pipeline), so we actually emit instructions out of index order when cold blocks are present. Reference-type support in Wasm in particular uses cold blocks for slowpaths, and has live refs and safepoints in these slowpaths, so we can reliably "skip" a safepoint (not emit any metadata for it) in the presence of reftype usage. This PR fixes the emission code by building a map from instruction index to safepoint index first, then doing lookups through this map, rather than following along in-order as it emits instructions.	2022-03-31 14:26:01 -07:00
Andrew Brown	bd6fe11ca9	cranelift: remove `load_complex` and `store_complex` (#3976 ) This change removes all variants of `load_complex` and `store_complex` from Cranelift; this is a breaking change to the instructions exposed by CLIF. The complete list of instructions removed is: `load_complex`, `store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`, `sload8_complex`, `uload16_complex`, `sload16_complex`, `istore16_complex`, `uload32_complex`, `sload32_complex`, `istore32_complex`, `uload8x8_complex`, `sload8x8_complex`, `sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`, `sload32x2_complex`. The rationale for this removal is that the Cranelift backend now has the ability to pattern-match multiple upstream additions in order to calculate the address to access. Previously, this was not possible so the `*_complex` instructions were needed. Over time, these instructions have fallen out of use in this repository, making the additional overhead of maintaining them a chore.	2022-03-31 10:05:10 -07:00
Andrew Brown	5d8dd648d7	x64: port `fcmp` to ISLE (#3967 ) * x64: port scalar `fcmp` to ISLE Implement the CLIF lowering for the `fcmp` to ISLE. This adds a new type-matcher, `ty_scalar_float`, for detecting uses of `F32` and `F64`. * isle: rename `vec128` to `ty_vec12` This refactoring changes the name of the `vec128` matcher function to follow the `ty_` convention of the other type matchers. It also makes the helper an inline function call. x64: port vector `fcmp` to ISLE	2022-03-29 15:41:49 -07:00
Alex Crichton	4d404c90b4	Ensure functions are aligned properly on AArch64 (#3908 ) Previously (as in an hour ago) #3905 landed a new ability for fuzzing to arbitrarily insert padding between functions. Running some fuzzers locally though this instantly hit a lot of problems on AArch64 because the arbitrary padding isn't aligned to 4 bytes like all other functions are. To fix this issue appending functions now correctly aligns the output as appropriate for the platform. The alignment argument for appending was switched to `None` where `None` means "use the platform default" and otherwise and explicit alignment can be specified for inserting other data (like arbitrary padding or Windows unwind tables).	2022-03-09 15:45:30 -06:00
Chris Fallin	90a081a731	ISLE: port extend/reduce opcodes on x64. (#3849 )	2022-02-28 11:49:28 -08:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Ulrich Weigand	b064e60087	ISLE: Re-implement ValueSlice (#3784 ) The current definition of `ValueSlice` is not usable, since any call to a constructor returning a `ValueSlice` will extend the mutable borrow on the context taken by the constructor call, with the result that it cannot be passed to any other constructor ever. Re-implement `ValueSlice` as a pair of a `ValueList` identifer plus an offset into the list. This type can simply be copied without requiring a borrow on the context.	2022-02-24 15:24:40 -08:00
Ulrich Weigand	07d615d3f7	ISLE: Lowering of multi-output instructions (#3783 ) This changes the output of the `lower` constructor from a `ValueRegs` to a new `InstOutput` type, which is a vector of `ValueRegs`. Code in `lower_common` is updated to use this new type to handle instructions with multiple outputs. All back-ends are updated to use the new type.	2022-02-24 14:03:06 -08:00
Chris Fallin	ca0e8d0a1d	Remove incomplete/unmaintained ARM32 backend (for now). (#3799 ) In #3721, we have been discussing what to do about the ARM32 backend in Cranelift. Currently, this backend supports only 32-bit types, which is insufficient for full Wasm-MVP; it's missing other critical bits, like floating-point support; and it has only ever been exercised, AFAIK, via the filetests for the individual CLIF instructions that are implemented. We were very very thankful for the original contribution of this backend, even in its partial state, and we had hoped at the time that we could eventually mature it in-tree until it supported e.g. Wasm and other use-cases. But that hasn't yet happened -- to the blame of no-one, to be clear, we just haven't had a contributor with sufficient time. Unfortunately, the existence of the backend and lack of active maintainer now potentially pose a bit of a burden as we hope to make continuing changes to the backend framework. For example, the ISLE migration, and the use of regalloc2 that it will allow, would need all of the existing lowering patterns in the hand-written ARM32 backend to be rewritten as ISLE rules. Given that we don't currently have the resources to do this, we think it's probably best if we, sadly, for now remove this partial backend. This is not in any way a statement of what we might accept in the future, though. If, in the future, an ARM32 backend updated to our latest codebase with an active maintainer were to appear, we'd be happy to merge it (and likewise for any other architecture!). But for now, this is probably the best path. Thanks again to the original contributor @jmkrauz and we hope that this work can eventually be brought back and reused if someone has the time to do so!	2022-02-14 15:03:52 -08:00
Nick Fitzgerald	dc86e7a6dc	cranelift: Use GPR newtypes extensively in x64 lowering (#3798 ) We already defined the `Gpr` newtype and used it in a few places, and we already defined the `Xmm` newtype and used it extensively. This finishes the transition to using the newtypes extensively in lowering by making use of `Gpr` in more places. Fixes #3685	2022-02-14 12:54:41 -08:00
Ulrich Weigand	10198553c7	ISLE: Common accessors for some insn data fields (#3781 ) Add accessors to prelude.isle to access data fields of `func_addr` and `symbol_value` instructions. These are based on similar versions I had added to the s390x back-end, but are a bit more straightforward to use. - func_ref_data: Extract SigRef, ExternalName, and RelocDistance fields given a FuncRef. - symbol_value_data: Extract ExternalName, RelocDistance, and offset fields given a GlobalValue representing a Symbol. - reloc_distance_near: Test for RelocDistance::Near. The s390x back-end is changed to use these common versions. Note that this exposed a bug in common isle code: This extractor: (extractor (load_sym inst) (and inst (load _ (def_inst (symbol_value (symbol_value_data _ (reloc_distance_near) offset))) (i64_from_offset (memarg_symbol_offset_sum <offset _))))) would raise an assertion in sema.rs due to a supposed cycle in extractor definitions. But there was no actual cycle, it was simply that the extractor tree refers twice to the `insn_data` extractor (once via the `load` and once via the `symbol_value` extractor). Fixed by checking for pre-existing definitions only along one path in the tree, not across the whole tree.	2022-02-08 17:57:27 -08:00
Chris Fallin	d9d6469422	Cranelift: fix debuginfo wrt cold blocks and non-monotonic layout. The debuginfo analyses are written with the assumption that the order of instructions in the VCode is the order of instructions in the final machine ocde. This was previously a strong invariant, until we introduced support for cold blocks. Cold blocks are implemented by reordering during emission, because the VCode ordering has other requirements related to lowering (respecting def-use dependencies in the reverse pass), so it is much simpler to reorder instructions at the last moment. Unfortunately, this causes the breakage we now see. This commit fixes the issue by skipping all cold instructions when emitting value-label ranges (which are translated into debuginfo). This means that variables defined in cold blocks will not have DWARF metadata. But cold blocks are usually compiler-inserted slowpaths, not user code, so this is probably OK. Debuginfo is always best-effort, so in any case this does not violate any correctness constraints.	2022-02-04 23:15:04 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00
Ulrich Weigand	36369a6f35	s390x: Migrate branches and traps to ISLE In order to migrate branches to ISLE, we define a second entry point `lower_branch` which gets the list of branch targets as additional argument. This requires a small change to `lower_common`: the `isle_lower` callback argument is changed from a function pointer to a closure. This allows passing the extra argument via a closure. Traps make use of the recently added facility to emit safepoints from ISLE, but are otherwise straightforward.	2022-01-25 18:15:32 +01:00
Ulrich Weigand	906f6a35cf	ISLE: Allow emitting safepoint insns Change the implementation of emitted_insts in IsleContext from a plain vector of instructions into a vector of tuples, where the second element is a boolean that indicates whether this instruction should be emitted as a safepoint. This allows targets to emit safepoint insns via ISLE.	2022-01-25 14:21:41 +01:00
Chris Fallin	ef1b2d2fa8	Cranelift: Fix cold-blocks-related lowering bug. If a block is marked cold but has side-effect-free code that is only used by side-effectful code in non-cold blocks, we will erroneously fail to emit it, causing a regalloc failure. This is due to the interaction of block ordering and lowering: we rely on block ordering to visit uses before defs (except for backedges) so that we can effectively do an inline liveness analysis and skip lowering operations that are not used anywhere. This "inline DCE" is needed because instruction lowering can pattern-match and merge one instruction into another, removing the need to generate the source instruction. Unfortunately, the way that I added cold-block support in #3698 was oblivious to this -- it just changed the block sort order. For efficiency reasons, we generate code in its final order directly, so it would not be tenable to generate it in e.g. RPO first and then reorder cold blocks to the bottom; we really do want to visit in the same order as the final code. This PR fixes the bug by moving the point at which cold blocks are sunk to emission-time instead. This is cheaper than either trying to visit blocks during lowering in RPO but add to VCode out-of-order, or trying to do some expensive analysis to recover proper liveness. It's not clear that the latter would be possible anyway -- the need to lower some instructions depends on other instructions' isel results/merging success, so we really do need to visit in RPO, and we can't simply lower all instructions as side-effecting roots (some can't be toplevel nodes). The one downside of this approach is that the VCode itself still has cold blocks inline; so in the text format (and hence compile-tests) it's not possible to see the sinking. This PR adds a test for cold-block sinking that actually verifies the machine code. (The test also includes an add-instruction in the cold path that would have been incorrectly skipped prior to this fix.) Fortunately this bug would not have been triggered by the one current use of cold blocks in #3699, because there the only operation in the cold block was an (always effectful) call instruction. The worst-case effect of the bug in other code would be a regalloc panic; no silent miscompilations could result.	2022-01-21 10:47:49 -08:00
Ulrich Weigand	be60a19623	ISLE standard prelude: Additional types and helpers In preparing to move the s390x back-end to ISLE, I noticed a few missing pieces in the common prelude code. This patch: - Defines the reference types $R32 / $R64. - Provides a trap_code_bad_conversion_to_integer helper. - Provides an avoid_div_traps helper. This requires passing the generic flags in addition to the ISA-specifc flags into the ISLE lowering context.	2022-01-20 17:23:31 +01:00
Chris Fallin	ae476fde60	Merge pull request #3698 from cfallin/cold-blocks Cranelift: add support for cold blocks.	2022-01-19 12:58:33 -08:00
Chris Fallin	f489b83835	Cranelift: add support for cold blocks. This PR adds a flag to each block that can be set via the frontend/builder interface that indicates that the block will not be frequently executed. As such, the compiler backend should place the block "out of line" in the final machine code, so that the ordinary, more frequent execution path that excludes the block does not have to jump around it. This is useful for adding handlers for exceptional conditions (slow-paths, guard violations) in a way that minimizes performance cost. Fixes #2747.	2022-01-19 12:17:41 -08:00
Anton Kirilov	89919f4b1f	Pass the ISA-specific compilation flags to the ABI implementations Copyright (c) 2021, Arm Limited.	2022-01-14 14:18:01 +00:00
Nick Fitzgerald	658c5d33c1	cranelift: Port `trap` and `resumable_trap` lowering to ISLE on x64	2022-01-13 15:57:17 -08:00
Nick Fitzgerald	a7dba81c1d	cranelift: Port `ishl` SIMD lowerings to ISLE (#3686 )	2022-01-13 09:34:37 -06:00
Chris Fallin	13f17db297	Merge pull request #3680 from bjorn3/remove_code_sink Remove the CodeSink interface in favor of MachBufferFinalized	2022-01-12 10:47:23 -08:00
Nick Fitzgerald	7454f1f3af	cranelift: port `sshr` to ISLE on x64 (#3681 )	2022-01-12 09:13:58 -06:00
bjorn3	f0e821b9e0	Remove all Sink traits	2022-01-11 19:03:10 +01:00
bjorn3	55d722db05	Remove CodeSink	2022-01-11 17:10:37 +01:00
bjorn3	a48a60f958	Remove reloc_external from CodeSink And introduce MachBufferFinalized::relocs() in the place.	2022-01-11 16:54:27 +01:00
bjorn3	63e2360346	Remove trap from CodeSink And introduce MachBufferFinalized::traps() in the place.	2022-01-11 16:42:52 +01:00
bjorn3	38aaa6e1da	Remove add_call_site from CodeSink and RelocSink And introduce MachBufferFinalized::call_sites() in the place.	2022-01-11 16:32:57 +01:00
bjorn3	37598ad170	Remove end_codegen method from CodeSink	2022-01-11 14:52:04 +01:00
bjorn3	354c4f7bf8	Remove unused CodeSink methods	2022-01-11 14:52:04 +01:00
bjorn3	88baac4ca6	Move the TestCodeSink functionality to MachBufferFinalized	2022-01-11 14:40:53 +01:00
bjorn3	376c93bda0	Remove MachBackend It is identical to TargetIsa	2022-01-06 15:08:12 +01:00
bjorn3	58c25d9e24	Add text_section_builder method to TargetIsa	2022-01-06 14:39:50 +01:00
bjorn3	03dc74d8e7	Add emit_unwind_info method to TargetIsa	2022-01-06 14:39:50 +01:00
bjorn3	9eba87a6c8	Add compile_function method to TargetIsa	2022-01-06 14:39:50 +01:00
bjorn3	d50f27e8f9	Remove reg_universe method from MachBackend and MachInst	2022-01-06 14:39:50 +01:00

1 2 3 4 5

224 Commits