wasmtime

Author	SHA1	Message	Date
Chris Fallin	e41d882144	Merge pull request #2678 from cfallin/x64-fastcall x86-64 Windows fastcall ABI support.	2021-03-05 10:46:47 -08:00
Chris Fallin	6c94eb82aa	x86-64 Windows fastcall ABI support. This adds support for the "fastcall" ABI, which is the native C/C++ ABI on Windows platforms on x86-64. It is similar to but not exactly like System V; primarily, its argument register assignments are different, and it requires stack shadow space. Note that this also adjusts the handling of multi-register values in the shared ABI implementation, and with this change, adjusts handling of `i128`s on both Fastcall/x64 and SysV/x64 platforms. This was done to align with actual behavior by the "rustc ABI" on both platforms, as mapped out experimentally (Compiler Explorer link in comments). This behavior is gated under the `enable_llvm_abi_extensions` flag. Note also that this does not add x64 unwind info on Windows. That will come in a future PR (but is planned!).	2021-03-03 19:53:18 -08:00
Andrew Brown	508f8fa5a9	[x64] Add i64x2.abs This instruction has a single instruction lowering in AVX512F/VL and a three instruction lowering in AVX but neither is currently supported in the x64 backend. To implement this, we instead subtract the vector from 0 and use a blending instruction to pick the lanes containing the absolute value.	2021-03-02 12:30:02 -08:00
Kasey Carrothers	7bd96c8e2f	Refactor x64::Insts that use an is_64 bool to use OperandSize.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	3306408100	Refactor x64::Inst to use OperandSize instead of u8s. TODO: some types take a 'is_64_bit' bool. Those are left unchanged for now.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	b12d41bfe9	Expand x64 OperandSize to support 8 and 16-bit operands. This is in preparation for refactoring all x64::Inst arms to use OperandSize. Current uses of OperandSize fall into two categories: 1. XMM operations which require 32/64 bit operands 2. Immediates which only care about 64-bit or not. Adds assertions to existing Inst constructors to check that they are passed valid sizes. This change also removes the implicit widening of 1 and 2 byte values to 4 bytes. from_bytes() is only used by category 2, so removing this behavior will not change any visible behavior. Overall this change should be a no-op.	2021-02-03 10:40:11 -08:00
Benjamin Bouvier	13027ad670	cranelift x64: add instruction set checks for popcnt/tzcnt/lzcnt;	2021-01-30 13:38:55 +01:00
Benjamin Bouvier	2275519cb1	cranelift x64: use the POPCNT instruction for Popcount when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	6bf6612d96	cranelift x64: use the TZCNT instruction for Ctz when it's available;	2021-01-29 19:41:01 +01:00
Benjamin Bouvier	d3acd9a283	cranelift x64: use the LZCNT instruction for Clz when it's available;	2021-01-29 19:41:01 +01:00
Chris Fallin	71ead6e31d	x64 backend: implement 128-bit ops and misc fixes. This implements all of the ops on I128 that are implemented by the legacy x86 backend, and includes all that are required by at least one major use-case (cg_clif rustc backend). The sequences are open-coded where necessary; for e.g. the bit operations, this can be somewhat complex, but these sequences have been tested carefully. This PR also includes a drive-by fix of clz/ctz for 8- and 16-bit cases where they were incorrect previously. Also includes ridealong fixes developed while bringing up cg_clif support, because they are difficult to completely separate due to other refactors that occurred in this PR: - fix REX prefix logic for some 8-bit instructions. When using an 8-bit register in 64-bit mode on x86-64, the REX prefix semantics are somewhat subtle: without the REX prefix, register numbers 4--7 correspond to the second-to-lowest byte of the first four registers (AH, CH, BH, DH), whereas with the REX prefix, these register numbers correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always emit a REX byte for instructions with 8-bit cases (this is harmless even if unneeded), but this would unnecessarily inflate code size; instead, the usual approach is to emit it only for these registers. This logic was present in some cases but missing for some other instructions: divide, not, negate, shifts. Fixes #2508. - avoid unaligned SSE loads on some f64 ops. The implementations of several FP ops, such as fabs/fneg, used SSE instructions. This is not a problem per-se, except that load-op merging did not take alignment into account. Specifically, if an op on an f64 loaded from memory happened to merge that load, and the instruction into which it was merged was an SSE instruction, then the SSE instruction imposes stricter (128-bit) alignment requirements than the load.f64 did. This PR simply forces any instruction lowerings that could use SSE instructions to implement non-SIMD operations to take inputs in registers only, and avoid load-op merging. Fixes #2507. - two bugfixes exposed by cg_clif: urem/srem.i8, select.b1. - urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places the remainder in AH, not RDX, different from all the other width-forms of this instruction. - select.b1: we were not recognizing selects of boolean values as integer-typed operations, so we were generating XMM moves instead (!).	2021-01-14 13:45:50 -08:00
Andrew Brown	09a5b91b9d	x64: make several structures debuggable	2021-01-08 16:21:57 -08:00
Chris Fallin	dbd2241b60	x64: handle tests of b1 values correctly (only LSB is defined). Previously, `select` and `brz`/`brnz` instructions, when given a `b1` boolean argument, would test whether that boolean argument was nonzero, rather than whether its LSB was nonzero. Since our invariant for mapping CLIF state to machine state is that bits beyond the width of a value are undefined, the proper lowering is to test only the LSB. (aarch64 does not have the same issue because its `Extend` pseudoinst already properly handles masking of b1 values when a zero-extend is requested, as it is for select/brz/brnz.) Found by Nathan Ringo on Zulip [1] (thanks!). [1] https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/bnot.20on.20b1s	2021-01-05 14:45:46 -08:00
Yury Delendik	2964023a77	[SIMD][x86_64] Add encoding for PMADDWD (#2530 ) * [SIMD][x86_64] Add encoding for PMADDWD * also for "experimental_x64"	2020-12-24 07:52:50 -06:00
Johnnie Birch	a548516f97	Enable SIMD spec tests for f32x4_rounding and f64x4_rounding. Also address some review comments pointing out minor issues.	2020-12-02 13:44:51 -08:00
Johnnie Birch	a33e755cb2	Adds x86 SIMD support for Ceil, Floor, Trunc, and Nearest	2020-12-02 13:44:51 -08:00
Johnnie Birch	2cc501427e	Add remaining X86_64 support for pack w/ signed/unsigned saturation Adds lowering for packssdw, packusdw, packuswb	2020-11-22 23:14:29 -08:00
Johnnie Birch	124096735b	Add support for palignr for X86_64 vcode backend	2020-11-22 22:14:02 -08:00
Johnnie Birch	615a575da1	Add support for x86_64 packed move lowering for the vcode backend	2020-11-22 20:23:00 -08:00
Chris Fallin	073c727a74	x64 and aarch64: carry MemFlags on loads/stores; don't emit trap info unless an op can trap. This end result was previously enacted by carrying a `SourceLoc` on every load/store, which was somewhat cumbersome, and only indirectly encoded metadata about a memory reference (can it trap) by its presence or absence. We have a type for this -- `MemFlags` -- that tells us everything we might want to know about a load or store, and we should plumb it through to code emission instead. This PR attaches a `MemFlags` to an `Amode` on x64, and puts it on load and store `Inst` variants on aarch64. These two choices seem to factor things out in the nicest way: there are relatively few load/store insts on aarch64 but many addressing modes, while the opposite is true on x64.	2020-11-17 11:43:06 -08:00
Andrew Brown	8ba92853be	[machinst x64]: add punpack[hl]bw instructions	2020-11-12 14:21:45 -08:00
Andrew Brown	8131b15921	[machinst x64]: allow addressing of constants	2020-11-12 14:21:45 -08:00
Andrew Brown	6725b6b129	[machinst x64]: implement bitmask	2020-10-28 15:16:36 -07:00
Johnnie Birch	8bbe6a25a9	Add support for packed float to signed int conversion Implements i32x4.trunc_sat_f32x4_s	2020-10-28 13:02:50 -07:00
Johnnie Birch	f27c0f3434	Adds support for signed packed integer conversion to float f32x4.convert_i32x4_s	2020-10-16 14:16:53 -07:00
Andrew Brown	3c55523d40	[machinst x64]: implement packed and, and_not, xor, or	2020-10-09 10:04:50 -07:00
Andrew Brown	c8cce5d2d7	[machinst x64]: enable packed saturated arithmetic	2020-10-08 08:46:20 -07:00
Benjamin Bouvier	a470f1e0cd	machinst x64: remove dead code and allow(dead_code) annotation; The BranchTarget is always used as a label, so just use a plain MachLabel in this case.	2020-10-08 10:05:57 +02:00
Chris Fallin	71768bb6cf	Fix AArch64 ABI to respect half-caller-save, half-callee-save vec regs. This PR updates the AArch64 ABI implementation so that it (i) properly respects that v8-v15 inclusive have callee-save lower halves, and caller-save upper halves, by conservatively approximating (to full registers) in the appropriate directions when generating prologue caller-saves and when informing the regalloc of clobbered regs across callsites. In order to prevent saving all of these vector registers in the prologue of every non-leaf function due to the above approximation, this also makes use of a new regalloc.rs feature to exclude call instructions' writes from the clobber set returned by register allocation. This is safe whenever the caller and callee have the same ABI (because anything the callee could clobber, the caller is allowed to clobber as well without saving it in the prologue). Fixes #2254.	2020-10-06 14:44:02 -07:00
Andrew Brown	50b9399006	[machinst x64]: lower remaining lane operations--any_true, all_true, splat	2020-10-02 08:29:31 -07:00
Andrew Brown	0579e9f9de	[machinst x64]: add packed OR	2020-10-02 08:29:31 -07:00
Andrew Brown	74226d6781	[machinst x64]: add integer comparisons	2020-10-02 08:29:31 -07:00
Andrew Brown	4484a00ea5	[machinst x64]: calculate extension modes in one place	2020-09-29 14:48:59 -07:00
Andrew Brown	f50d905152	[machinst x64]: refactor using added RegMem::from(Writable<Reg>)	2020-09-29 08:45:12 -07:00
Andrew Brown	050f078f86	[machinst x64]: add saturating addition implementation	2020-09-29 08:45:12 -07:00
Andrew Brown	a64abf9b76	[machinst x64]: add shuffle implementation	2020-09-29 08:45:12 -07:00
Andrew Brown	f4836f9ca9	[machinst x64]: add extractlane implementation	2020-09-29 08:45:12 -07:00
Andrew Brown	29fa894790	[machinst x64]: add insertlane implementation	2020-09-29 08:45:12 -07:00
Andrew Brown	ac2bf9d246	[machinst x64]: add packed min/max implementations	2020-09-23 15:40:46 -07:00
Andrew Brown	7546d98844	[machinst x64]: add avg_round implementation	2020-09-23 15:40:46 -07:00
Andrew Brown	b202464fa0	[machinst x64]: add iabs implementation	2020-09-23 15:40:46 -07:00
Benjamin Bouvier	3849dc18b1	machinst x64: revamp integer immediate emission; In particular: - try to optimize the integer emission into a 32-bit emission, when the high bits are all zero, and stop relying on the caller of `imm_r` to ensure this. - rename `Inst::imm_r`/`Inst::Imm_R` to `Inst::imm`/`Inst::Imm`. - generate a sign-extending mov 32-bit immediate to 64-bits, whenever possible. - fix a few places where the previous commit did introduce the generation of zero-constants with xor, when calling `put_input_to_reg`, thus clobbering the flags before they were read.	2020-09-11 18:13:30 +02:00
Johnnie Birch	a64af55cda	Adds x64 packed negation for the new backend	2020-09-07 11:56:05 -07:00
Benjamin Bouvier	cca10b87cb	machinst x64: optimize select/brz/brnz when the input is a comparison;	2020-08-24 17:00:30 +02:00
Julian Seward	620e4b4e82	This patch fills in the missing pieces needed to support wasm atomics on newBE/x64. It does this by providing an implementation of the CLIF instructions `AtomicRmw`, `AtomicCas`, `AtomicLoad`, `AtomicStore` and `Fence`. The translation is straightforward. `AtomicCas` is translated into x64 `cmpxchg`, `AtomicLoad` becomes a normal load because x64-TSO provides adequate sequencing, `AtomicStore` becomes a normal store followed by `mfence`, and `Fence` becomes `mfence`. `AtomicRmw` is the only complex case: it becomes a normal load, followed by a loop which computes an updated value, tries to `cmpxchg` it back to memory, and repeats if necessary. This is a minimum-effort initial implementation. `AtomicRmw` could be implemented more efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old value in memory is not required. Subsequent work could add that, if required. The x64 emitter has been updated to emit the new instructions, obviously. The `LegacyPrefix` mechanism has been revised to handle multiple prefix bytes, not just one, since it is now sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock). In the aarch64 implementation of atomics, there has been some minor renaming for the sake of clarity, and for consistency with this x64 implementation.	2020-08-24 11:50:06 +02:00
Andrew Brown	2767b2efc6	machinst x64: add `Inst::[move\|load\|store]` for choosing the correct x86 instruction This change primarily adds the ability to lower packed `[move\|load\|store]` instructions (the vector types were previously unimplemented), but with the addition of the utility `Inst::[move\|load\|store]` functions it became possible to remove duplicated code (e.g. `stack_load` and `stack_store`) and use these utility functions elsewhere (though not exhaustively).	2020-08-20 12:37:22 -07:00
Andrew Brown	cf598dc35b	machinst x64: add packed moves for different vector types	2020-08-20 12:37:22 -07:00
Johnnie Birch	a31336996c	Add support for some packed multiplication for new x64 backend Adds support for i32x4, and i16x8 and lowering for pmuludq in preperation for i64x2.	2020-08-19 10:24:14 -07:00
Johnnie Birch	38ef98700f	Adds packed integer subtraction	2020-08-12 09:41:20 -07:00
Johnnie Birch	2eadc6e2a8	Add packed integer add opcodes (v128) to instruction set enum	2020-08-06 22:25:18 -07:00

1 2 3

132 Commits