wasmtime

Author	SHA1	Message	Date
bjorn3	ff22842da5	More atomic ops	2021-02-18 14:16:15 +01:00
Kasey Carrothers	7bd96c8e2f	Refactor x64::Insts that use an is_64 bool to use OperandSize.	2021-02-03 10:40:11 -08:00
Kasey Carrothers	3306408100	Refactor x64::Inst to use OperandSize instead of u8s. TODO: some types take a 'is_64_bit' bool. Those are left unchanged for now.	2021-02-03 10:40:11 -08:00
bjorn3	81d248c057	Implement Mach-O TLS access for x64 newBE	2021-01-21 18:25:56 +01:00
Chris Fallin	0f563f786a	Add ELF TLS support in new x64 backend. This follows the implementation in the legacy x86 backend, including hardcoded sequence that is compatible with what the linker expects. We could potentially do better here, but it is likely not necessary. Thanks to @bjorn3 for a bugfix to an earlier version of this.	2021-01-17 22:48:51 -08:00
Chris Fallin	71ead6e31d	x64 backend: implement 128-bit ops and misc fixes. This implements all of the ops on I128 that are implemented by the legacy x86 backend, and includes all that are required by at least one major use-case (cg_clif rustc backend). The sequences are open-coded where necessary; for e.g. the bit operations, this can be somewhat complex, but these sequences have been tested carefully. This PR also includes a drive-by fix of clz/ctz for 8- and 16-bit cases where they were incorrect previously. Also includes ridealong fixes developed while bringing up cg_clif support, because they are difficult to completely separate due to other refactors that occurred in this PR: - fix REX prefix logic for some 8-bit instructions. When using an 8-bit register in 64-bit mode on x86-64, the REX prefix semantics are somewhat subtle: without the REX prefix, register numbers 4--7 correspond to the second-to-lowest byte of the first four registers (AH, CH, BH, DH), whereas with the REX prefix, these register numbers correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always emit a REX byte for instructions with 8-bit cases (this is harmless even if unneeded), but this would unnecessarily inflate code size; instead, the usual approach is to emit it only for these registers. This logic was present in some cases but missing for some other instructions: divide, not, negate, shifts. Fixes #2508. - avoid unaligned SSE loads on some f64 ops. The implementations of several FP ops, such as fabs/fneg, used SSE instructions. This is not a problem per-se, except that load-op merging did not take alignment into account. Specifically, if an op on an f64 loaded from memory happened to merge that load, and the instruction into which it was merged was an SSE instruction, then the SSE instruction imposes stricter (128-bit) alignment requirements than the load.f64 did. This PR simply forces any instruction lowerings that could use SSE instructions to implement non-SIMD operations to take inputs in registers only, and avoid load-op merging. Fixes #2507. - two bugfixes exposed by cg_clif: urem/srem.i8, select.b1. - urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places the remainder in AH, not RDX, different from all the other width-forms of this instruction. - select.b1: we were not recognizing selects of boolean values as integer-typed operations, so we were generating XMM moves instead (!).	2021-01-14 13:45:50 -08:00
Chris Fallin	81bc811236	Merge pull request #2558 from cfallin/pic-symbol-refs x64: support PC-rel symbol references using the GOT when in PIC mode.	2021-01-08 10:03:10 -08:00
Chris Fallin	3ee898cb2c	x64: support PC-rel symbol references using the GOT when in PIC mode.	2021-01-07 22:46:56 -08:00
Chris Fallin	dbd2241b60	x64: handle tests of b1 values correctly (only LSB is defined). Previously, `select` and `brz`/`brnz` instructions, when given a `b1` boolean argument, would test whether that boolean argument was nonzero, rather than whether its LSB was nonzero. Since our invariant for mapping CLIF state to machine state is that bits beyond the width of a value are undefined, the proper lowering is to test only the LSB. (aarch64 does not have the same issue because its `Extend` pseudoinst already properly handles masking of b1 values when a zero-extend is requested, as it is for select/brz/brnz.) Found by Nathan Ringo on Zulip [1] (thanks!). [1] https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/bnot.20on.20b1s	2021-01-05 14:45:46 -08:00
Yury Delendik	2964023a77	[SIMD][x86_64] Add encoding for PMADDWD (#2530 ) * [SIMD][x86_64] Add encoding for PMADDWD * also for "experimental_x64"	2020-12-24 07:52:50 -06:00
Johnnie Birch	f705a72aeb	Refactor packed moves to use xmm_mov instead of xmm_rm_r Refactors previous packed move implementation to use xmm_mov instead of xmm_rm_r which looks to simplify register accounting during lowering.	2020-12-16 17:13:27 -08:00
Johnnie Birch	a33e755cb2	Adds x86 SIMD support for Ceil, Floor, Trunc, and Nearest	2020-12-02 13:44:51 -08:00
Johnnie Birch	2cc501427e	Add remaining X86_64 support for pack w/ signed/unsigned saturation Adds lowering for packssdw, packusdw, packuswb	2020-11-22 23:14:29 -08:00
Johnnie Birch	124096735b	Add support for palignr for X86_64 vcode backend	2020-11-22 22:14:02 -08:00
Johnnie Birch	615a575da1	Add support for x86_64 packed move lowering for the vcode backend	2020-11-22 20:23:00 -08:00
Andrew Brown	8ba92853be	[machinst x64]: add punpack[hl]bw instructions	2020-11-12 14:21:45 -08:00
Chris Fallin	4dce51096d	MachInst backends: handle SourceLocs out-of-band, not in Insts. In existing MachInst backends, many instructions -- any that can trap or result in a relocation -- carry `SourceLoc` values in order to propagate the location-in-original-source to use to describe resulting traps or relocation errors. This is quite tedious, and also error-prone: it is likely that the necessary plumbing will be missed in some cases, and in any case, it's unnecessarily verbose. This PR factors out the `SourceLoc` handling so that it is tracked during emission as part of the `EmitState`, and plumbed through automatically by the machine-independent framework. Instruction emission code that directly emits trap or relocation records can query the current location as necessary. Then we only need to ensure that memory references and trap instructions, at their (one) emission point rather than their (many) lowering/generation points, are wired up correctly. This does have the side-effect that some loads and stores that do not correspond directly to user code's heap accesses will have unnecessary but harmless trap metadata. For example, the load that fetches a code offset from a jump table will have a 'heap out of bounds' trap record attached to it; but because it is bounds-checked, and will never actually trap if the lowering is correct, this should be harmless. The simplicity improvement here seemed more worthwhile to me than plumbing through a "corresponds to user-level load/store" bit, because the latter is a bit complex when we allow for op merging. Closes #2290: though it does not implement a full "metadata" scheme as described in that issue, this seems simpler overall.	2020-11-10 15:46:53 -08:00
Andrew Brown	6725b6b129	[machinst x64]: implement bitmask	2020-10-28 15:16:36 -07:00
Andrew Brown	5b9a21e099	Add missing `SourceLoc` to newly-emitted instructions The changes in https://github.com/bytecodealliance/wasmtime/pull/2278 added `SourceLoc`s to several x64 `Inst` variants; between when that PR was last run in CI and when it was merged, new instructions were added that require this new parameter. This change adds the parameter in order to fix CI.	2020-10-28 14:33:09 -07:00
Johnnie Birch	8bbe6a25a9	Add support for packed float to signed int conversion Implements i32x4.trunc_sat_f32x4_s	2020-10-28 13:02:50 -07:00
Chris Fallin	c35904a8bf	Merge pull request #2278 from akirilov-arm/load_splat Introduce the Cranelift IR instruction `LoadSplat`	2020-10-28 12:54:03 -07:00
Johnnie Birch	f27c0f3434	Adds support for signed packed integer conversion to float f32x4.convert_i32x4_s	2020-10-16 14:16:53 -07:00
Andrew Brown	d990dd4c9a	[machinst x64]: add source locations to more instruction formats In order to register traps for `load_splat`, several instruction formats need knowledge of `SourceLoc`s; however, since the x64 backend does not correctly and completely register traps for `RegMem::Mem` variants I opened https://github.com/bytecodealliance/wasmtime/issues/2290 to discuss and resolve this issue. In the meantime, the current behavior (i.e. remaining largely unaware of `SourceLoc`s) is retained.	2020-10-14 09:43:33 -07:00
Andrew Brown	c8cce5d2d7	[machinst x64]: enable packed saturated arithmetic	2020-10-08 08:46:20 -07:00
Benjamin Bouvier	a470f1e0cd	machinst x64: remove dead code and allow(dead_code) annotation; The BranchTarget is always used as a label, so just use a plain MachLabel in this case.	2020-10-08 10:05:57 +02:00
Benjamin Bouvier	c5bbc87498	machinst: allow passing constant information to the instruction emitter; A new associated type Info is added to MachInstEmit, which is the immutable counterpart to State. It can't easily be constructed from an ABICallee, since it would require adding an associated type to the latter, and making so leaks the associated type in a lot of places in the code base and makes the code harder to read. Instead, the EmitInfo state can simply be passed to the `Vcode::emit` function directly.	2020-10-08 09:21:51 +02:00
Andrew Brown	050f078f86	[machinst x64]: add saturating addition implementation	2020-09-29 08:45:12 -07:00
Andrew Brown	a64abf9b76	[machinst x64]: add shuffle implementation	2020-09-29 08:45:12 -07:00
Andrew Brown	29fa894790	[machinst x64]: add insertlane implementation	2020-09-29 08:45:12 -07:00
Andrew Brown	ac2bf9d246	[machinst x64]: add packed min/max implementations	2020-09-23 15:40:46 -07:00
Andrew Brown	7546d98844	[machinst x64]: add avg_round implementation	2020-09-23 15:40:46 -07:00
Andrew Brown	b202464fa0	[machinst x64]: add iabs implementation	2020-09-23 15:40:46 -07:00
Benjamin Bouvier	3849dc18b1	machinst x64: revamp integer immediate emission; In particular: - try to optimize the integer emission into a 32-bit emission, when the high bits are all zero, and stop relying on the caller of `imm_r` to ensure this. - rename `Inst::imm_r`/`Inst::Imm_R` to `Inst::imm`/`Inst::Imm`. - generate a sign-extending mov 32-bit immediate to 64-bits, whenever possible. - fix a few places where the previous commit did introduce the generation of zero-constants with xor, when calling `put_input_to_reg`, thus clobbering the flags before they were read.	2020-09-11 18:13:30 +02:00
bjorn3	9428480230	Merge SignExtendAlAh and SignExtendRaxRdx	2020-09-08 15:00:24 +02:00
bjorn3	067255ef45	x64: Implement rotl and rotr for small integers	2020-09-08 15:00:24 +02:00
bjorn3	ce033f2a0c	x64: Fix udiv and sdiv for 8bit integers	2020-09-08 15:00:24 +02:00
bjorn3	74642b166f	x64: Implement ineg and bnot	2020-09-08 15:00:24 +02:00
Johnnie Birch	a64af55cda	Adds x64 packed negation for the new backend	2020-09-07 11:56:05 -07:00
Julian Seward	620e4b4e82	This patch fills in the missing pieces needed to support wasm atomics on newBE/x64. It does this by providing an implementation of the CLIF instructions `AtomicRmw`, `AtomicCas`, `AtomicLoad`, `AtomicStore` and `Fence`. The translation is straightforward. `AtomicCas` is translated into x64 `cmpxchg`, `AtomicLoad` becomes a normal load because x64-TSO provides adequate sequencing, `AtomicStore` becomes a normal store followed by `mfence`, and `Fence` becomes `mfence`. `AtomicRmw` is the only complex case: it becomes a normal load, followed by a loop which computes an updated value, tries to `cmpxchg` it back to memory, and repeats if necessary. This is a minimum-effort initial implementation. `AtomicRmw` could be implemented more efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old value in memory is not required. Subsequent work could add that, if required. The x64 emitter has been updated to emit the new instructions, obviously. The `LegacyPrefix` mechanism has been revised to handle multiple prefix bytes, not just one, since it is now sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock). In the aarch64 implementation of atomics, there has been some minor renaming for the sake of clarity, and for consistency with this x64 implementation.	2020-08-24 11:50:06 +02:00
Johnnie Birch	a31336996c	Add support for some packed multiplication for new x64 backend Adds support for i32x4, and i16x8 and lowering for pmuludq in preperation for i64x2.	2020-08-19 10:24:14 -07:00
Johnnie Birch	38ef98700f	Adds packed integer subtraction	2020-08-12 09:41:20 -07:00
Johnnie Birch	f5909b37c3	Add emit tests for packed integer add instructions	2020-08-06 22:25:18 -07:00
Andrew Brown	8cfff26957	machinst x64: implement floating point comparisons Note that this fixes an encoding issue in which the packed single and packed double prefixes were flipped.	2020-08-04 13:24:38 -07:00
Andrew Brown	999e04a2c4	machinst x64: refactor imports to use rustfmt convention This change is a pure refactoring--no change to functionality. It removes newlines between the `use ...` statements in the x64 backend so that rustfmt can format them according to its convention. I noticed some files had followed a manual convention but subsequent additions did not seem to fit; this change fixes that and lightly coalesces some of the occurrences of `use a::b; use a::c;` into `use::{b, c}`.	2020-08-04 09:17:54 -07:00
Andrew Brown	c74a9d1225	machinst x64: add packed shifts	2020-07-30 14:16:12 -07:00
Benjamin Bouvier	2e3ad3227d	machinst x64: fix encoding of movzx/movsx with non-ABCD input registers; Using an input register that doesn't belong to the ABCD family (al, etc.) as the source of movsx/movzx requires a redundant REX prefix, that was not emitted.	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	cd54f05efd	machinst x64: implement float-to-int and int-to-float conversions;	2020-07-24 19:29:12 +02:00
Johnnie Birch	a7cedf3100	Add support for 32 bit and 64 bit fcmp for the new backend Implements commiss and commisd.	2020-07-17 13:46:54 -07:00
Benjamin Bouvier	ead8a835c4	machinst x64: add more FP support	2020-07-17 15:56:44 +02:00
Benjamin Bouvier	bab337fc32	Address review comments;	2020-07-16 18:21:06 +02:00

1 2

68 Commits