wasmtime

Author	SHA1	Message	Date
Chris Fallin	61dc38c065	Implement Spectre mitigations for table accesses and br_tables. (#4092 ) Currently, we have partial Spectre mitigation: we protect heap accesses with dynamic bounds checks. Specifically, we guard against errant accesses on the misspeculated path beyond the bounds-check conditional branch by adding a conditional move that is also dependent on the bounds-check condition. This data dependency on the condition is not speculated and thus will always pick the "safe" value (in the heap case, a NULL address) on the misspeculated path, until the pipeline flushes and recovers onto the correct path. This PR uses the same technique both for table accesses -- used to implement Wasm tables -- and for jumptables, used to implement Wasm `br_table` instructions. In the case of Wasm tables, the cmove picks the table base address on the misspeculated path. This is equivalent to reading the first table entry. This prevents loads of arbitrary data addresses on the misspeculated path. In the case of `br_table`, the cmove picks index 0 on the misspeculated path. This is safer than allowing a branch to an address loaded from an index under misspeculation (i.e., it preserves control-flow integrity even under misspeculation). The table mitigation is controlled by a Cranelift setting, on by default. The br_table mitigation is always on, because it is part of the single lowering pseudoinstruction. In both cases, the impact should be minimal: a single extra cmove in a (relatively) rarely-used operation. The table mitigation is architecture-independent (happens during legalization); the br_table mitigation has been implemented for both x64 and aarch64. (I don't know enough about s390x to implement this confidently there, but would happily review a PR to do the same on that platform.)	2022-05-02 11:19:16 -07:00
Chris Fallin	dd45f44511	x64 backend: add lowerings with load-op-store fusion. (#4071 ) x64 backend: add lowerings with load-op-store fusion. These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP %reg, (mem)`) -- i.e., x86 instructions that load from memory, perform an ALU operation, and store the result, all in one instruction. Using these instruction forms, we can merge three CLIF ops together: a load, an arithmetic operation, and a store.	2022-04-26 18:58:26 -07:00
Chris Fallin	a0318f36f0	Switch Cranelift over to regalloc2. (#3989 ) This PR switches Cranelift over to the new register allocator, regalloc2. See [this document](https://gist.github.com/cfallin/08553421a91f150254fe878f67301801) for a summary of the design changes. This switchover has implications for core VCode/MachInst types and the lowering pass. Overall, this change brings improvements to both compile time and speed of generated code (runtime), as reported in #3942: ``` Benchmark Compilation (wallclock) Execution (wallclock) blake3-scalar 25% faster 28% faster blake3-simd no diff no diff meshoptimizer 19% faster 17% faster pulldown-cmark 17% faster no diff bz2 15% faster no diff SpiderMonkey, 21% faster 2% faster fib(30) clang.wasm 42% faster N/A ```	2022-04-14 10:28:21 -07:00
Chris Fallin	cd173cfe8e	ISLE: port fmin, fmax, fmin_pseudo, fmax_pseudo on x64. (#3856 )	2022-02-28 14:40:26 -08:00
Chris Fallin	24f145cd1e	Migrate clz, ctz, popcnt, bitrev, is_null, is_invalid on x64 to ISLE. (#3848 )	2022-02-28 09:45:13 -08:00
Andrew Brown	f87c61176a	x64: port select to ISLE (#3682 ) * x64: port `select` using an FP comparison to ISLE This change includes quite a few interlocking parts, required mainly by the current x64 conventions in ISLE: - it adds a way to emit a `cmove` with multiple OR-ing conditions; because x64 ISLE cannot currently safely emit a comparison followed by several jumps, this adds `MachInst::CmoveOr` and `MachInst::XmmCmoveOr` macro instructions. Unfortunately, these macro instructions hide the multi-instruction sequence in `lower.isle` - to properly keep track of what instructions consume and produce flags, @cfallin added a way to pass around variants of `ConsumesFlags` and `ProducesFlags`--these changes affect all backends - then, to lower the `fcmp + select` CLIF, this change adds several `cmove*_from_values` helpers that perform all of the awkward conversions between `Value`, `ValueReg`, `Reg`, and `Gpr/Xmm`; one upside is that now these lowerings have much-improved documentation explaining why the various `FloatCC` and `CC` choices are made the the way they are. Co-authored-by: Chris Fallin <chris@cfallin.org>	2022-02-23 10:03:16 -08:00
Nick Fitzgerald	dc86e7a6dc	cranelift: Use GPR newtypes extensively in x64 lowering (#3798 ) We already defined the `Gpr` newtype and used it in a few places, and we already defined the `Xmm` newtype and used it extensively. This finishes the transition to using the newtypes extensively in lowering by making use of `Gpr` in more places. Fixes #3685	2022-02-14 12:54:41 -08:00
Nick Fitzgerald	795b0aaf9a	cranelift: Add newtype wrappers for x64 register classes This primary motivation of this large commit (apologies for its size!) is to introduce `Gpr` and `Xmm` newtypes over `Reg`. This should help catch difficult-to-diagnose register class mixup bugs in x64 lowerings. But having a newtype for `Gpr` and `Xmm` themselves isn't enough to catch all of our operand-with-wrong-register-class bugs, because about 50% of operands on x64 aren't just a register, but a register or memory address or even an immediate! So we have `{Gpr,Xmm}Mem[Imm]` newtypes as well. Unfortunately, `GprMem` et al can't be `enum`s and are therefore a little bit noisier to work with from ISLE. They need to maintain the invariant that their registers really are of the claimed register class, so they need to encapsulate the inner data. If they exposed the underlying `enum` variants, then anyone could just change register classes or construct a `GprMem` that holds an XMM register, defeating the whole point of these newtypes. So when working with these newtypes from ISLE, we rely on external constructors like `(gpr_to_gpr_mem my_gpr)` instead of `(GprMem.Gpr my_gpr)`. A bit of extra lines of code are included to add support for register mapping for all of these newtypes as well. Ultimately this is all a bit wordier than I'd hoped it would be when I first started authoring this commit, but I think it is all worth it nonetheless! In the process of adding these newtypes, I didn't want to have to update both the ISLE `extern` type definition of `MInst` and the Rust definition, so I move the definition fully into ISLE, similar as aarch64. Finally, this process isn't complete. I've introduced the newtypes here, and I've made most XMM-using instructions switch from `Reg` to `Xmm`, as well as register class-converting instructions, but I haven't moved all of the GPR-using instructions over to the newtypes yet. I figured this commit was big enough as it was, and I can continue the adoption of these newtypes in follow up commits. Part of #3685.	2022-02-03 14:08:08 -08:00
bjorn3	17c3c1813f	Remove MachInstEmitInfo	2022-01-04 18:06:01 +01:00
Nick Fitzgerald	d377b665c6	Initial ISLE integration with the x64 backend On the build side, this commit introduces two things: 1. The automatic generation of various ISLE definitions for working with CLIF. Specifically, it generates extern type definitions for clif opcodes and the clif instruction data `enum`, as well as extractors for matching each clif instructions. This happens inside the `cranelift-codegen-meta` crate. 2. The compilation of ISLE DSL sources to Rust code, that can be included in the main `cranelift-codegen` compilation. Next, this commit introduces the integration glue code required to get ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif instruction, we first try to use the ISLE code path. If it succeeds, then we are done lowering this instruction. If it fails, then we proceed along the existing hand-written code path for lowering. Finally, this commit ports many lowering rules over from hand-written, open-coded Rust to ISLE. In the process of supporting ISLE, this commit also makes the x64 `Inst` capable of expressing SSA by supporting 3-operand forms for all of the existing instructions that only have a 2-operand form encoding: dst = src1 op src2 Rather than only the typical x86-64 2-operand form: dst = dst op src This allows `MachInst` to be in SSA form, since `dst` and `src1` are disentangled. ("3-operand" and "2-operand" are a little bit of a misnomer since not all operations are binary operations, but we do the same thing for, e.g., unary operations by disentangling the sole operand from the result.) There are two motivations for this change: 1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE lowering to translate a CLIF expression that evaluates to some value into a `MachInst` expression that evaluates to the same value. We want both the lowering itself and the resulting `MachInst` to be pure and referentially transparent. This is both a nice paradigm for compiler writers that are authoring and maintaining lowering rules and is a prerequisite to any sort of formal verification of our lowering rules in the future. 2. Better align `MachInst` with `regalloc2`'s API, which requires that the input be in SSA form.	2021-10-12 17:11:58 -07:00
Johnnie Birch	e373ddfe1b	Add extend-add-pairwise instructions x64	2021-07-30 15:06:58 -07:00
Johnnie Birch	500f530322	Add support for i32x4_trunc_sat_f64x2_s for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	23290f0450	Add support for i32x4_trunc_sat_f64x2_u for x64	2021-07-26 22:24:30 -07:00
Johnnie Birch	5deda27977	Add support for Saturating Rounding Q-format Multiplication for x64	2021-07-26 20:32:46 -07:00
Nick Fitzgerald	4283d2116d	cranelift: Move most debug-level logs to the trace level Cranelift crates have historically been much more verbose with debug-level logging than most other crates in the Rust ecosystem. We log things like how many parameters a basic block has, the color of virtual registers during regalloc, etc. Even for Cranelift hackers, these things are largely only useful when hacking specifically on Cranelift and looking at a particular test case, not even when using some Cranelift embedding (such as Wasmtime). Most of the time, when people want logging for their Rust programs, they do something like: RUST_LOG=debug cargo run This means that they get all that mostly not useful debug logging out of Cranelift. So they might want to disable logging for Cranelift, or change it to a higher log level: RUST_LOG=debug,cranelift=info cargo run The problem is that this is already more annoying to type that `RUST_LOG=debug`, and that Cranelift isn't one single crate, so you actually have to play whack-a-mole with naming all the Cranelift crates off the top of your head, something more like this: RUST_LOG=debug,cranelift=info,cranelift_codegen=info,cranelift_wasm=info,... Therefore, we're changing most of the `debug!` logs into `trace!` logs: anything that is very Cranelift-internal, unlikely to be useful/meaningful to the "average" Cranelift embedder, or prints a message for each instruction visited during a pass. On the other hand, things that just report a one line statistic for a whole pass, for example, are left as `debug!`. The more verbose the log messages are, the higher the bar they must clear to be `debug!` rather than `trace!`.	2021-07-26 11:50:16 -07:00
Johnnie Birch	6fbe0b72bd	Add simd_extmul_* support for x64	2021-07-15 01:07:52 -07:00
Johnnie Birch	2d676d838f	Implements f64x2.convert_low_i32x4_u for x64	2021-07-09 10:39:05 -07:00
Johnnie Birch	1770880e19	x64: add support for packed promote and demote (#2783 ) * Add support for x64 packed promote low * Add support for x64 packed floating point demote * Update vector promote low and demote by adding constraints Also does some renaming and minor refactoring	2021-06-04 15:59:20 -07:00
Andrew Brown	8dc4cc9fe3	x64: fix AVX512 flag checks Previously, the multiple flags for certain AVX512 instructions were checked using `OR`: e.g., if the CPU has AVX512VL `OR` AVX512DQ, emit `VPMULLQ`. This is incorrect--the logic should be `AND`. The Intel Software Developer Manual, vol. 1, sec. 15.4, has more information on this (notable there is the suggestion to check with `XGETBV` that the OS is allowing the use of the XMM registers--but that is a separate issue). This change switches to `AND` logic in the new backend.	2021-06-01 11:41:16 -07:00
Andrew Brown	2a9f458ea3	x64: lower i8x16.shuffle to VPERMI2B when possible When shuffling values from two different registers, the x64 lowering for `i8x16.shuffle` must first shuffle each register separately and then OR the results with SSE instructions. With `VPERMI2B`, available in AVX512VL + AVX512VBMI, this can be done in a single instruction after the shuffle mask has been moved into the destination register. This change uses `VPERMI2B` for that case when the CPU supports it.	2021-06-01 11:40:53 -07:00
Andrew Brown	459fce3467	x64: lower i8x16.popcnt to VPOPCNTB when possible When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt` instruction can be lowered to a single x64 instruction, `VPOPCNTB`, instead of 8+ instructions.	2021-05-25 12:16:25 -07:00
Andrew Brown	54b45d28a3	x64: lower fcvt_from_uint to VCVTUDQ2PS when possible When AVX512VL and AVX512F are available, use a single instruction (`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This optimization is a port from the legacy x86 backend.	2021-05-19 12:20:11 -07:00
Andrew Brown	7ef3ae2903	x64: implement vselect with variable blend instructions This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`, and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used by [simple_preopt.rs](`fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999)`) to lower `bitselect` to a single x86 instruction when the condition mask is known to be boolean (all 1s or 0s, e.g., from a conversion). This is better than `bitselect` in general, which lowers to 4-5 instructions. The old backend had the `vselect` lowering; this simply introduces it to the new backend.	2021-05-17 11:23:33 -07:00
Andrew Brown	e676589b0c	x64: lower i64x2.imul to VPMULLQ when possible This adds the machinery to encode the VPMULLQ instruction which is available in AVX512VL and AVX512DQ. When these feature sets are available, we use this instruction instead of a lengthy 12-instruction sequence.	2021-05-13 20:14:05 -07:00
Andrew Brown	02796fc670	x64: move encodings to a separate module In order to benchmark the encoding code with criterion, the functions and structures must be public. Moving this code to its own module (instead of keeping as a submodule to `inst`), allows `inst` to remain private. This avoids having to expose and document (or ignore documenting) the numerous instruction variants in `inst` while allowing access to the encoding code. This commit changes no functionality.	2021-05-13 10:46:08 -07:00
Andrew Brown	0acc1451ea	x64: lower iabs.i64x2 using a single AVX512 instruction when possible (#2819 ) * x64: add EVEX encoding mechanism Also, includes an empty stub module for the VEX encoding. * x64: lower abs.i64x2 to VPABSQ when available * x64: refactor EVEX encodings to use `EvexInstruction` This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked. * x64: rename encoding CodeSink to ByteSink	2021-04-15 11:53:58 -07:00
bjorn3	b272d4b7da	Fix srem.{i8,i16}	2021-04-13 21:28:27 +02:00
Andrew Brown	8e495ac79d	x64: match multiple ISA requirements before emitting Because there are instructions that are present in more than one ISA feature set, we need to see if any of the ISA requirements match before emitting. This change includes the `VPABSQ` instruction as an example, which is present in both `AVX512F` and `AVX512VL`.	2021-04-08 10:30:39 -07:00
Andrew Brown	d32501c554	x64: refactor REX-specific encoding machinery to its own module In preparation for adding new encoding modes to the x64 backend (e.g. VEX, EVEX), this change moves all of the current instruction encoding functions to `encodings::rex`. This refactor does not change any logic.	2021-04-02 11:17:39 -07:00
Johnnie Birch	31d3db1ec2	Implements convert low signed integer to float for x64 simd	2021-03-26 12:13:29 -07:00
Chris Fallin	b429f77ee9	Handle `srem` properly when `avoid_div_traps` is false. The codegen for div/rem ops has two modes, depending on the `avoid_div_traps` flag: it can either do all checks for trapping conditions explicitly, and use explicit trap instructions, then use a hardware divide instruction that will not trap (`avoid_div_traps == true`); or it can run in a mode where a hardware FP fault on the divide instruction implies a Wasm trap (`avoid_div_traps == false`). Wasmtime uses the former while Lucet (for example) uses the latter. It turns out that because we run all our spec tests run under Wasmtime, we missed a spec corner case that fails in the latter: INT_MIN % -1 == 0 per the spec, but causes a trap with the x86 signed divide/remainder instruction. Hence, in Lucet, this specific remainder computation would incorrectly result in a Wasm trap. This PR fixes the issue by just forcing use of the explicit-checks implementation for `srem` even when `avoid_div_traps` is false.	2021-03-24 22:30:07 -07:00
Chris Fallin	2d5db92a9e	Rework/simplify unwind infrastructure and implement Windows unwind. Our previous implementation of unwind infrastructure was somewhat complex and brittle: it parsed generated instructions in order to reverse-engineer unwind info from prologues. It also relied on some fragile linkage to communicate instruction-layout information that VCode was not designed to provide. A much simpler, more reliable, and easier-to-reason-about approach is to embed unwind directives as pseudo-instructions in the prologue as we generate it. That way, we can say what we mean and just emit it directly. The usual reasoning that leads to the reverse-engineering approach is that metadata is hard to keep in sync across optimization passes; but here, (i) prologues are generated at the very end of the pipeline, and (ii) if we ever do a post-prologue-gen optimization, we can treat unwind directives as black boxes with unknown side-effects, just as we do for some other pseudo-instructions today. It turns out that it was easier to just build this for both x64 and aarch64 (since they share a factored-out ABI implementation), and wire up the platform-specific unwind-info generation for Windows and SystemV. Now we have simpler unwind on all platforms and we can delete the old unwind infra as soon as we remove the old backend. There were a few consequences to supporting Fastcall unwind in particular that led to a refactor of the common ABI. Windows only supports naming clobbered-register save locations within 240 bytes of the frame-pointer register, whatever one chooses that to be (RSP or RBP). We had previously saved clobbers below the fixed frame (and below nominal-SP). The 240-byte range has to include the old RBP too, so we're forced to place clobbers at the top of the frame, just below saved RBP/RIP. This is fine; we always keep a frame pointer anyway because we use it to refer to stack args. It does mean that offsets of fixed-frame slots (spillslots, stackslots) from RBP are no longer known before we do regalloc, so if we ever want to index these off of RBP rather than nominal-SP because we add support for `alloca` (dynamic frame growth), then we'll need a "nominal-BP" mode that is resolved after regalloc and clobber-save code is generated. I added a comment to this effect in `abi_impl.rs`. The above refactor touched both x64 and aarch64 because of shared code. This had a further effect in that the old aarch64 prologue generation subtracted from `sp` once to allocate space, then used stores to `[sp, offset]` to save clobbers. Unfortunately the offset only has 7-bit range, so if there are enough clobbered registers (and there can be -- aarch64 has 384 bytes of registers; at least one unit test hits this) the stores/loads will be out-of-range. I really don't want to synthesize large-offset sequences here; better to go back to the simpler pre-index/post-index `stp r1, r2, [sp, #-16]` form that works just like a "push". It's likely not much worse microarchitecturally (dependence chain on SP, but oh well) and it actually saves an instruction if there's no other frame to allocate. As a further advantage, it's much simpler to understand; simpler is usually better. This PR adds the new backend on Windows to CI as well.	2021-03-11 20:03:52 -08:00
Andrew Brown	508f8fa5a9	[x64] Add i64x2.abs This instruction has a single instruction lowering in AVX512F/VL and a three instruction lowering in AVX but neither is currently supported in the x64 backend. To implement this, we instead subtract the vector from 0 and use a blending instruction to pick the lanes containing the absolute value.	2021-03-02 12:30:02 -08:00
Chris Fallin	cdb60ec5a9	Merge pull request #2682 from cfallin/shift-bugs Fix some `i128` shift-related bugs in x64 backend.	2021-02-26 15:13:08 -08:00
Chris Fallin	40db4de44a	Fix incomplete trap metadata due to multiple traps at one address. If an instruction has more than one trap record associated with it (for example: a divide instruction that has participated in load-op fusion, so we have both a heap-out-of-bounds trap record due to its load and a divide-by-zero trap record due to its divide op), the current MachBuffer code would emit only one of the trap records to the sink. Separately, divide instructions probably shouldn't merge loads, because the two separate possible traps at one location might be confusing for some embedders (certainly in Lucet). Divide seems to be the only case in our current codegen where such merging might occur. This PR changes the lowering to always force the divisor into a register. Finally, while working out why trap records were not appearing, I had noticed that `isa::x64::emit_std_enc_mem()` was only emitting heap-OOB trap metadata for loads/stores when it had a srcloc. This PR ensures that the metadata is emitted even when the srcloc is empty. Note that none of the above presents a security or correctness problem; trap metadata only affects the status that we return to the embedder when a Wasm program terminates with a trap.	2021-02-24 15:13:45 -08:00
Chris Fallin	0f3e00b25e	Fix some `i128` shift-related bugs in x64 backend. This fixes #2672 and #2679, and also fixes an incorrect instruction emission (`test` with small immediate) that we had missed earlier. The shift-related fixes have to do with (i) shifts by 0 bits, as a special case that must be handled; and (ii) shifts by a 128-bit amount, which we can handle by just dropping the upper half (we only use 3--7 bits of shift amount). This adjusts the lowerings appropriately, and also adds run-tests to ensure that the lowerings actually execute correctly (previously we only had compile-tests with golden lowerings; I'd like to correct this for more ops eventually, adding run-tests beyond what the Wasm spec and frontend covers).	2021-02-23 14:22:04 -08:00
bjorn3	ff22842da5	More atomic ops	2021-02-18 14:16:15 +01:00
Kasey Carrothers	9c3edee9d0	Add methods to construct RexFlags from OperandSizes. This unifies the logic around Rex prefix emission and hopefully makes REX prefix errors less likely. There are still several instructions that use other sources to determine the flags, so set_w and clear_w are left as is. Additional cleanups: * Change always_emit_if_8bit_needed to take a Reg instead of a u8 for type safety. * Deduplicated emission code in MovRM.	2021-02-17 18:48:05 -08:00
Kasey Carrothers	7bd96c8e2f	Refactor x64::Insts that use an is_64 bool to use OperandSize.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	3306408100	Refactor x64::Inst to use OperandSize instead of u8s. TODO: some types take a 'is_64_bit' bool. Those are left unchanged for now.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	b12d41bfe9	Expand x64 OperandSize to support 8 and 16-bit operands. This is in preparation for refactoring all x64::Inst arms to use OperandSize. Current uses of OperandSize fall into two categories: 1. XMM operations which require 32/64 bit operands 2. Immediates which only care about 64-bit or not. Adds assertions to existing Inst constructors to check that they are passed valid sizes. This change also removes the implicit widening of 1 and 2 byte values to 4 bytes. from_bytes() is only used by category 2, so removing this behavior will not change any visible behavior. Overall this change should be a no-op.	2021-02-03 10:40:11 -08:00
Benjamin Bouvier	13027ad670	cranelift x64: add instruction set checks for popcnt/tzcnt/lzcnt;	2021-01-30 13:38:55 +01:00
Benjamin Bouvier	2275519cb1	cranelift x64: use the POPCNT instruction for Popcount when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	6bf6612d96	cranelift x64: use the TZCNT instruction for Ctz when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	d3acd9a283	cranelift x64: use the LZCNT instruction for Clz when it's available;	2021-01-29 19:41:01 +01:00
Johnnie Birch	cbd7a6a80e	Add sse41 lowering for rounding x64	2021-01-28 17:37:17 -08:00
Chris Fallin	c84d6be6f4	Detailed debug-info (DWARF) support in new backends (initially x64). This PR propagates "value labels" all the way from CLIF to DWARF metadata on the emitted machine code. The key idea is as follows: - Translate value-label metadata on the input into "value_label" pseudo-instructions when lowering into VCode. These pseudo-instructions take a register as input, denote a value label, and semantically are like a "move into value label" -- i.e., they update the current value (as seen by debugging tools) of the given local. These pseudo-instructions emit no machine code. - Perform a dataflow analysis at the machine-code level, tracking value-labels that propagate into registers and into [SP+constant] stack storage. This is a forward dataflow fixpoint analysis where each storage location can contain a set of value labels, and each value label can reside in a set of storage locations. (Meet function is pairwise intersection by storage location.) This analysis traces value labels symbolically through loads and stores and reg-to-reg moves, so it will naturally handle spills and reloads without knowing anything special about them. - When this analysis converges, we have, at each machine-code offset, a mapping from value labels to some number of storage locations; for each offset for each label, we choose the best location (prefer registers). Note that we can choose any location, as the symbolic dataflow analysis is sound and guarantees that the value at the value_label instruction propagates to all of the named locations. - Then we can convert this mapping into a format that the DWARF generation code (wasmtime's debug crate) can use. This PR also adds the new-backend variant to the gdb tests on CI.	2021-01-21 15:59:49 -08:00
bjorn3	81d248c057	Implement Mach-O TLS access for x64 newBE	2021-01-21 18:25:56 +01:00
Chris Fallin	0f563f786a	Add ELF TLS support in new x64 backend. This follows the implementation in the legacy x86 backend, including hardcoded sequence that is compatible with what the linker expects. We could potentially do better here, but it is likely not necessary. Thanks to @bjorn3 for a bugfix to an earlier version of this.	2021-01-17 22:48:51 -08:00
Chris Fallin	71ead6e31d	x64 backend: implement 128-bit ops and misc fixes. This implements all of the ops on I128 that are implemented by the legacy x86 backend, and includes all that are required by at least one major use-case (cg_clif rustc backend). The sequences are open-coded where necessary; for e.g. the bit operations, this can be somewhat complex, but these sequences have been tested carefully. This PR also includes a drive-by fix of clz/ctz for 8- and 16-bit cases where they were incorrect previously. Also includes ridealong fixes developed while bringing up cg_clif support, because they are difficult to completely separate due to other refactors that occurred in this PR: - fix REX prefix logic for some 8-bit instructions. When using an 8-bit register in 64-bit mode on x86-64, the REX prefix semantics are somewhat subtle: without the REX prefix, register numbers 4--7 correspond to the second-to-lowest byte of the first four registers (AH, CH, BH, DH), whereas with the REX prefix, these register numbers correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always emit a REX byte for instructions with 8-bit cases (this is harmless even if unneeded), but this would unnecessarily inflate code size; instead, the usual approach is to emit it only for these registers. This logic was present in some cases but missing for some other instructions: divide, not, negate, shifts. Fixes #2508. - avoid unaligned SSE loads on some f64 ops. The implementations of several FP ops, such as fabs/fneg, used SSE instructions. This is not a problem per-se, except that load-op merging did not take alignment into account. Specifically, if an op on an f64 loaded from memory happened to merge that load, and the instruction into which it was merged was an SSE instruction, then the SSE instruction imposes stricter (128-bit) alignment requirements than the load.f64 did. This PR simply forces any instruction lowerings that could use SSE instructions to implement non-SIMD operations to take inputs in registers only, and avoid load-op merging. Fixes #2507. - two bugfixes exposed by cg_clif: urem/srem.i8, select.b1. - urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places the remainder in AH, not RDX, different from all the other width-forms of this instruction. - select.b1: we were not recognizing selects of boolean values as integer-typed operations, so we were generating XMM moves instead (!).	2021-01-14 13:45:50 -08:00

1 2 3 4

156 Commits