wasmtime

Author	SHA1	Message	Date
Andrew Brown	f063082474	x64: remove `Inst::XmmLoadConst` (#4876 ) This is a cherry-pick of a long-ago commit, 2d46637. The original message reads: > Now that `SyntheticAmode` can refer to constants, there is no longer a > need for a separate instruction format--standard load instructions will > work. Since then, the transition to ISLE and the use of `XmmLoadConst` in many more places makes this change a larger diff than the original. The basic idea is the same, though: the extra indirection of `Inst::XMmLoadConst` is removed and replaced by a direct use of `VCodeConstant` as a `SyntheticAmode`. This has no effect on codegen, but the CLIF output is now clearer in that the actual instruction is displayed (e.g., `movdqu`) instead of a made-up instruction (`load_const`).	2022-09-07 12:52:13 -07:00
Jamey Sharp	3d6d49daba	cranelift: Remove of/nof overflow flags from icmp (#4879 ) * cranelift: Remove of/nof overflow flags from icmp Neither Wasmtime nor cg-clif use these flags under any circumstances. From discussion on #3060 I see it's long been unclear what purpose these flags served. Fixes #3060, fixes #4406, and fixes #4875... by deleting all the code that could have been buggy. This changes the cranelift-fuzzgen input format by removing some IntCC options, so I've gone ahead and enabled I128 icmp tests at the same time. Since only the of/nof cases were failing before, I expect these to work. * Restore trapif tests It's still useful to validate that iadd_ifcout's iflags result can be forwarded correctly to trapif, and for that purpose it doesn't really matter what condition code is checked.	2022-09-07 08:38:41 -07:00
Chris Fallin	385bd0cbf8	x64: fix CvtFloatToUintSeq: do not clobber src. (#4842 ) This slipped through the regalloc2 operand code update in #4811: the CvtFloatToUintSeq pseudo-instruction actually clobbers its source. It was marked as a "mod" operand in the original and I mistakenly converted it to a "use" as I had not seen the actual clobber. The instruction now takes an extra temp and makes a copy of `src` in the appropriate place. Fixes #4840.	2022-09-01 22:46:57 +00:00
Afonso Bordado	08e7a7f1a0	cranelift: Add inline stack probing for x64 (#4747 ) * cranelift: Add inline stack probe for x64 * cranelift: Cleanups comments Thanks @jameysharp!	2022-09-01 22:32:54 +00:00
Trevor Elliott	fb8b9838fe	Add MInst.XmmUnaryRmRImm to handle rounding instructions (#4823 ) Add a new pseudo-instruction, XmmUnaryRmRImm, to handle instructions like roundss that only use their first register argument for the instruction's result. This has the added benefit of allowing the isle wrappers for those instructions to take an XmmMem argument, allowing for more cases where loads may be merged.	2022-08-31 08:29:32 -07:00
Chris Fallin	186c7c3b89	x64: clean up regalloc-related semantics on several instructions. (#4811 ) * x64: clean up regalloc-related semantics on several instructions. This PR removes all uses of "modify" operands on instructions in the x64 backend, and also removes all uses of "pinned vregs", or vregs that are explicitly tied to particular physical registers. In place of both of these mechanisms, which are legacies of the old regalloc design and supported via compatibility code, the backend now uses operand constraints. This is more flexible as it allows the regalloc to see the liveranges and constraints without "reverse-engineering" move instructions. Eventually, after removing all such uses (including in other backends and by the ABI code), we can remove the compatibility code in regalloc2, significantly simplifying its liverange-construction frontend and thus allowing for higher confidence in correctness as well as possibly a bit more compilation speed. Curiously, there are a few extra move instructions now; they are likely poor splitting decisions and I can try to chase these down later. * Fix cranelift-codegen tests. * Review feedback.	2022-08-30 17:21:14 -07:00
Trevor Elliott	25d960f9c4	x64: Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE (#4793 ) Lower tlsvalue, sqmul_round_sat, and uunarrow in ISLE.	2022-08-26 16:33:48 -07:00
Trevor Elliott	c1f9736938	x64: Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE (#4787 ) Lower vany_true, vall_true, vhigh_bits, iconcat, and isplit in ISLE.	2022-08-26 09:07:22 -07:00
Trevor Elliott	9386409607	x64: Lower extractlane, scalar_to_vector, and splat in ISLE (#4780 ) Lower extractlane, scalar_to_vector and splat in ISLE. This PR also makes some changes to the SinkableLoad api * change the return type of sink_load to RegMem as there are more functions available for dealing with RegMem * add reg_mem_to_reg_mem_imm and register it as an automatic conversion	2022-08-25 09:38:03 -07:00
Trevor Elliott	b8b6f2781e	x64: Lower shuffle and swizzle in ISLE (#4772 ) Lower `shuffle` and `swizzle` in ISLE. This PR surfaced a bug with the lowering of `shuffle` when avx512vl and avx512vbmi are enabled: we use `vpermi2b` as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into `0` in the result. I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the `shuffle` op: `94bcbe8446/cranelift/codegen/meta/src/shared/instructions.rs (L1495-L1498)`	2022-08-24 21:49:51 +00:00
Trevor Elliott	b5f1ab7780	x64: Lower stack_addr, udiv, sdiv, urem, srem, umulhi, smulhi in ISLE (#4741 ) Lower stack_addr, udiv, sdiv, urem, srem, umulhi, and smulhi in ISLE. For udiv, sdiv, urem, and srem I opted to move the original lowering into an extern constructor, as the interactions with rax and rdx for the div instruction didn't seem meaningful to implement in ISLE. However, I'm happy to revisit this choice and move more of the embedding into ISLE.	2022-08-23 11:22:49 -07:00
Trevor Elliott	cee4b209f3	x64: Lower fcopysign, ceil, floor, nearest, and trunc in ISLE (#4730 ) https://github.com/bytecodealliance/wasmtime/pull/4730	2022-08-22 13:57:36 -07:00
Trevor Elliott	80c77da334	x64: Lower bitcast, fabs, and fneg in ISLE (#4729 ) * Add tests for bitcast * Migrate bitcast to ISLE * Add tests for fabs * Lower fabs in ISLE * Add tests for fneg * Lower fneg in ISLE	2022-08-18 17:59:23 -07:00
Trevor Elliott	3c1490dd59	x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE (#4704 ) https://github.com/bytecodealliance/wasmtime/pull/4704	2022-08-16 09:03:50 -07:00
Trevor Elliott	498e7156b4	Remove the handling of `cmpps` in `produces_const` (#4714 ) https://github.com/bytecodealliance/wasmtime/pull/4714	2022-08-15 15:48:01 -07:00
Trevor Elliott	1d0f6fa4fb	Fix a bug in produces_const (#4709 ) https://github.com/bytecodealliance/wasmtime/pull/4709	2022-08-15 19:00:33 +00:00
Nick Fitzgerald	f0c60f46a8	Cranelift: Remove `ABICallee` trait (#4701 ) * Cranelift: Remove `ABICallee` trait It has only one implementation: the `ABICalleeImpl` struct. By using that directly we can avoid unnecessary layers of generics and abstractions as well as a couple `Box`es that were previously putting the single implementation into a `Box<dyn>`. * Cranelift: Rename `ABICalleeImpl` to `AbiCallee` * Fix comments as per review * Rename `AbiCallee` to `Callee`	2022-08-15 18:27:05 +00:00
Benjamin Bouvier	8a9b1a9025	Implement an incremental compilation cache for Cranelift (#4551 ) This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime. After the suggestion of Chris, `Function` has been split into mostly two parts: - on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`. - on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on. Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache: - most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set. - user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`. - some refactorings have been made for function names: - `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name. - The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with. The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions. A basic fuzz target has been introduced that tries to do the bare minimum: - check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function. - check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache. - This last check is less efficient and less likely to happen, so probably should be rethought a bit. Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip. Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement. Fixes #4155.	2022-08-12 16:47:43 +00:00
Afonso Bordado	3ea1813173	x64: Add native lowering for scalar `fma` (#4539 ) Use `vfmadd213{ss,sd}` for these lowerings.	2022-08-11 22:48:16 +00:00
Trevor Elliott	0c2e0494bd	x64: Lower fcvt_from_uint in ISLE (#4684 ) * Add a test for the existing behavior of fcvt_from_unit * Migrate the I8, I16, I32 cases of fcvt_from_uint * Implement the I64 case of fcvt_from_uint * Add a test for the existing behavior of fcvt_from_uint.f64x2 * Migrate fcvt_from_uint.f64x2 to ISLE * Lower the last case of `fcvt_from_uint` * Add a test for `fcvt_from_uint` * Finish lowering fcmp_from_uint * Format	2022-08-11 12:28:41 -07:00
Afonso Bordado	c5bc368cfe	cranelift: Add COFF TLS Support (#4546 ) * cranelift: Implement COFF TLS Relocations * cranelift: Emit SecRel relocations * cranelift: Handle _tls_index symbol in backend	2022-08-11 09:33:40 -07:00
Trevor Elliott	a25d52046b	x64: Migrate fcvt_from_sint and fcvt_low_from_sint to ISLE (#4650 ) https://github.com/bytecodealliance/wasmtime/pull/4650	2022-08-10 10:49:02 -07:00
Trevor Elliott	0c2a48f682	x64: Migrate selectif and selectif_spectre_guard to ISLE (#4619 ) https://github.com/bytecodealliance/wasmtime/pull/4619	2022-08-05 09:36:11 -07:00
Trevor Elliott	dc8362ceec	x64: Finish migrating brz and brnz to ISLE (#4614 ) https://github.com/bytecodealliance/wasmtime/pull/4614	2022-08-04 12:58:43 -07:00
Trevor Elliott	1fc11bbe51	x64: Migrate brff and I128 branching instructions to ISLE (#4599 ) https://github.com/bytecodealliance/wasmtime/pull/4599	2022-08-04 08:58:50 -07:00
Nick Fitzgerald	42bba452a6	Cranelift: Add instructions for getting the current stack/frame/return pointers (#4573 ) * Cranelift: Add instructions for getting the current stack/frame pointers and return address This is the initial part of https://github.com/bytecodealliance/wasmtime/issues/4535 * x64: Remove `Amode::RbpOffset` and use `Amode::ImmReg` instead We just special case getting operands from `Amode`s now. * Fix s390x `get_return_address`; require `preserve_frame_pointers=true` * Assert that `Amode::ImmRegRegShift` doesn't use rbp/rsp * Handle non-allocatable registers in Amode::with_allocs * Use "stack" instead of "r15" on s390x * r14 is an allocatable register on s390x, so it shouldn't be used with `MovPReg`	2022-08-02 14:37:17 -07:00
Chris Fallin	43f1765272	Cranellift: remove Baldrdash support and related features. (#4571 ) * Cranellift: remove Baldrdash support and related features. As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team has recently determined that their current form of integration with Cranelift is too hard to maintain, and they have chosen to remove it from their codebase. If and when they decide to build updated support for Cranelift, they will adopt different approaches to several details of the integration. In the meantime, after discussion with the SpiderMonkey folks, they agree that it makes sense to remove the bits of Cranelift that exist to support the integration ("Baldrdash"), as they will not need them. Many of these bits are difficult-to-maintain special cases that are not actually tested in Cranelift proper: for example, the Baldrdash integration required Cranelift to emit function bodies without prologues/epilogues, and instead communicate very precise information about the expected frame size and layout, then stitched together something post-facto. This was brittle and caused a lot of incidental complexity ("fallthrough returns", the resulting special logic in block-ordering); this is just one example. As another example, one particular Baldrdash ABI variant processed stack args in reverse order, so our ABI code had to support both traversal orders. We had a number of other Baldrdash-specific settings as well that did various special things. This PR removes Baldrdash ABI support, the `fallthrough_return` instruction, and pulls some threads to remove now-unused bits as a result of those two, with the understanding that the SpiderMonkey folks will build new functionality as needed in the future and we can perhaps find cleaner abstractions to make it all work. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425 * Review feedback. * Fix (?) DWARF debug tests: add `--disable-cache` to wasmtime invocations. The debugger tests invoke `wasmtime` from within each test case under the control of a debugger (gdb or lldb). Some of these tests started to inexplicably fail in CI with unrelated changes, and the failures were only inconsistently reproducible locally. It seems to be cache related: if we disable cached compilation on the nested `wasmtime` invocations, the tests consistently pass. * Review feedback.	2022-08-02 19:37:56 +00:00
Trevor Elliott	25782b527e	x64: Migrate trapif and trapff to ISLE (#4545 ) https://github.com/bytecodealliance/wasmtime/pull/4545	2022-08-01 11:24:11 -07:00
Benjamin Bouvier	8d0224341c	cranelift: Introduce a feature to enable `trace` logs (#4484 ) * Don't use `log::trace` directly but a feature-enabled `trace` macro * Don't emit disassembly based on the log level	2022-08-01 11:19:15 +02:00
Trevor Elliott	7ac6134894	x64: Shrink Inst from 72 to 48 bytes (#4514 ) https://github.com/bytecodealliance/wasmtime/pull/4514	2022-07-27 10:39:22 -07:00
Afonso Bordado	02c3b47db2	x64: Implement SIMD `fma` (#4474 ) * x64: Add VEX Instruction Encoder This uses a similar builder pattern to the EVEX Encoder. Does not yet support memory accesses. * x64: Add FMA Flag * x64: Implement SIMD `fma` * x64: Use 4 register Vex Inst * x64: Reorder VEX pretty print args	2022-07-25 22:01:02 +00:00
Trevor Elliott	06407dd337	Add a test to prevent x64 Inst size slipping further (#4489 ) * Add a test to prevent x64 Inst size slipping further * Enable the test for 64-bit pointers only	2022-07-21 00:01:33 +00:00
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Andrew Brown	8629cbc6a4	x64: port `atomic_rmw` to ISLE (#4389 ) * x64: port `atomic_rmw` to ISLE This change ports `atomic_rmw` to ISLE for the x64 backend. It does not change the lowering in any way, though it seems possible that the fixed regs need not be as fixed and that there are opportunities for single instruction lowerings. It does rename `inst_common::AtomicRmwOp` to `MachAtomicRmwOp` to disambiguate with the IR enum with the same name. * x64: remove remaining hardcoded register constraints for `atomic_rmw` * x64: use `SyntheticAmode` in `AtomicRmwSeq` * review: add missing reg collector for amode * review: collect memory registers in the 'late' phase	2022-07-06 23:58:59 +00:00
bjorn3	d1446f767d	Mark return value as define instead of clobber for TLS pseudoinstructions (#4357 )	2022-06-30 10:44:51 -07:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Chris Fallin	5c2c285dd7	Cranelift/x64: fix register allocator metadata for 8-bit divides. (#4332 ) `idiv` on x86-64 only reads `rdx`/`edx`/`dx`/`dl` for divides with width greater than 8 bits; for an 8-bit divide, it reads the whole 16-bit divisor from `ax`, as our CISC ancestors intended. This PR fixes the metadata to avoid a regalloc panic (due to undefined `rdx`) in this case. Does not affect Wasmtime or other Wasm-frontend embedders.	2022-06-27 12:31:06 -07:00
Alex Crichton	8bb07523e2	x64: Fix codegen for the `select` instruction with v128 (#4317 ) This commit fixes a bug in the previous codegen for the `select` instruction when the operations of the `select` were of the `v128` type. Previously teh `XmmCmove` instruction only stored an `OperandSize` of 32 or 64 for a 64 or 32-bit move, but this was also used for these 128-bit types which meant that when used the wrong move instruction was generated. The fix applied here is to store the whole `Type` being moved so the 128-bit variant can be selected as well.	2022-06-27 11:02:40 -07:00
Andrew Brown	816aae6aca	x64: port some atomics to ISLE (#4212 ) * x64: port `fence` to ISLE * x64: port `atomic_load` to ISLE * x64: port `atomic_store` to ISLE	2022-06-02 14:13:10 -07:00
Chris Fallin	b830c3cf93	Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182 ) RA2 recently removed the need for a dedicated scratch register for cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive performance impact on function bodies that were register-constrained, as it means that one more register is available. In Sightglass, I measured +5-8% on `blake3-scalar`, at least among current benchmarks.	2022-05-23 12:51:04 -07:00
Benjamin Bouvier	6e828df632	Remove unused `SourceLoc` in many `Mach` data structures (#4180 ) * Remove unused srcloc in MachReloc * Remove unused srcloc in MachTrap * Use `into_iter` on array in bench code to suppress a warning * Remove unused srcloc in MachCallSite	2022-05-23 09:27:28 -07:00
Chris Fallin	eb435f3057	x64: use constant pool for u64 constants rather than `movabs`. (#4088 ) * Allow emitting u64 constants into constant pool. * Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm. * Fix rip-relative addressing bug in pinsrd emission.	2022-05-10 09:21:05 -07:00
Chris Fallin	019ebf47b1	x64: fix pretty-printing argument order for XmmRmR instructions. (#4094 ) The pretty-printing had swapped dst and src2; this was introduced when we moved to RA2 (sorry about that! IMHO we should do something to automate the mapping between regalloc arg collection and pretty printing/emission). `src2` comes at the end because it has a variable number of register mentions; this is in line with how many of the other inst formats work. Actual emitted code was never incorrect, just the pretty-printing. Updated test golden outputs look correct to me now, including the one that we saw was incorrect in #3945.	2022-05-03 10:12:58 -07:00
Chris Fallin	f85047b084	Rework x64 addressing-mode lowering to be slightly more flexible. (#4080 ) This PR refactors the x64 backend address-mode lowering to use an incremental-build approach, where it considers each node in a tree of `iadd`s that feed into a load/store address and, at each step, builds the best possible `Amode`. It will combine an arbitrary number of constant offsets (an extension beyond the current rules), and can capture a left-shifted (scaled) index in any position of the tree (another extension). This doesn't have any measurable performance improvement on our Wasm benchmarks in Sightglass, unfortunately, because the IR lowered from wasm32 will do address computation in 32 bits and then `uextend` it to add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64 bits because this loses the wraparound semantics. (We could label adds as "expected not to overflow", and allow those to be lifted to 64 bit operations; wasm32 heap address computation should fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's likely my next step.) Nevertheless, (i) this generalizes the cases we can handle, which should be a good thing, all other things being equal (and in this case, no compile time impact was measured); and (ii) might benefit non-Wasm frontends.	2022-05-02 16:20:39 -07:00
Chris Fallin	61dc38c065	Implement Spectre mitigations for table accesses and br_tables. (#4092 ) Currently, we have partial Spectre mitigation: we protect heap accesses with dynamic bounds checks. Specifically, we guard against errant accesses on the misspeculated path beyond the bounds-check conditional branch by adding a conditional move that is also dependent on the bounds-check condition. This data dependency on the condition is not speculated and thus will always pick the "safe" value (in the heap case, a NULL address) on the misspeculated path, until the pipeline flushes and recovers onto the correct path. This PR uses the same technique both for table accesses -- used to implement Wasm tables -- and for jumptables, used to implement Wasm `br_table` instructions. In the case of Wasm tables, the cmove picks the table base address on the misspeculated path. This is equivalent to reading the first table entry. This prevents loads of arbitrary data addresses on the misspeculated path. In the case of `br_table`, the cmove picks index 0 on the misspeculated path. This is safer than allowing a branch to an address loaded from an index under misspeculation (i.e., it preserves control-flow integrity even under misspeculation). The table mitigation is controlled by a Cranelift setting, on by default. The br_table mitigation is always on, because it is part of the single lowering pseudoinstruction. In both cases, the impact should be minimal: a single extra cmove in a (relatively) rarely-used operation. The table mitigation is architecture-independent (happens during legalization); the br_table mitigation has been implemented for both x64 and aarch64. (I don't know enough about s390x to implement this confidently there, but would happily review a PR to do the same on that platform.)	2022-05-02 11:19:16 -07:00
Chris Fallin	dd45f44511	x64 backend: add lowerings with load-op-store fusion. (#4071 ) x64 backend: add lowerings with load-op-store fusion. These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP %reg, (mem)`) -- i.e., x86 instructions that load from memory, perform an ALU operation, and store the result, all in one instruction. Using these instruction forms, we can merge three CLIF ops together: a load, an arithmetic operation, and a store.	2022-04-26 18:58:26 -07:00
Chris Fallin	5774e068b7	Cranelift: fix regalloc2 integration bug wrt blockparam branch args. (#4042 ) Previously, the block successor accumulation and the blockparam branch arg setup were decoupled. The lowering backend implicitly specified the order of successor edges via its `MachTerminator` enum on the last instruction in the block, while the `Lower` toplevel machine-independent driver set up blockparam branch args in the edge order seen in CLIF. In some cases, these orders did not match -- for example, when the conditional branch depended on an FP condition that was implemented by swapping taken/not-taken edges and inverting the condition code. This PR refactors the successor handling to be centralized in `Lower` rather than flow through the terminator `MachInst`, and adds a successor block and its blockparam args at the same time, ensuring the orders match.	2022-04-18 09:53:57 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Chris Fallin	cd173cfe8e	ISLE: port fmin, fmax, fmin_pseudo, fmax_pseudo on x64. (#3856 )	2022-02-28 14:40:26 -08:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00

1 2 3 4 5 ...

318 Commits