wasmtime

Author	SHA1	Message	Date
Chris Fallin	bad99c93b1	Merge pull request #2051 from cfallin/aarch64-add-negative-imm Aarch64 codegen quality: handle add-negative-imm as subtract.	2020-07-24 12:26:54 -07:00
Chris Fallin	1b80860f1f	Aarch64 codegen quality: handle add-negative-imm as subtract. We often see patterns like: ``` mov w2, #0xffff_ffff // uses ORR with logical immediate form add w0, w1, w2 ``` which is just `w0 := w1 - 1`. It would be much better to recognize when the inverse of an immediate will fit in a 12-bit immediate field if the immediate itself does not, and flip add to subtract (and vice versa), so we can instead generate: ``` sub w0, w1, #1 ``` We see this pattern in e.g. `bz2`, where this commit makes the following difference (counting instructions with `perf stat`, filling in the wasmtime cache first then running again to get just runtime): pre: ``` 992.762250 task-clock (msec) # 0.998 CPUs utilized 109 context-switches # 0.110 K/sec 0 cpu-migrations # 0.000 K/sec 5,035 page-faults # 0.005 M/sec 3,224,119,134 cycles # 3.248 GHz 4,000,521,171 instructions # 1.24 insn per cycle <not supported> branches 27,573,755 branch-misses 0.995072322 seconds time elapsed ``` post: ``` 993.853850 task-clock (msec) # 0.998 CPUs utilized 123 context-switches # 0.124 K/sec 1 cpu-migrations # 0.001 K/sec 5,072 page-faults # 0.005 M/sec 3,201,278,337 cycles # 3.221 GHz 3,917,061,340 instructions # 1.22 insn per cycle <not supported> branches 28,410,633 branch-misses 0.996008047 seconds time elapsed ``` In other words, a 2.1% reduction in instruction count on `bz2`.	2020-07-24 11:41:33 -07:00
Benjamin Bouvier	35d9ab19b7	Review fixes;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	ad4a2f919f	machinst x64: implement support for reference types;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	987c616bf5	machinst x64: implement support for dynamic heaps and explicit bound checks;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	2e3ad3227d	machinst x64: fix encoding of movzx/movsx with non-ABCD input registers; Using an input register that doesn't belong to the ABCD family (al, etc.) as the source of movsx/movzx requires a redundant REX prefix, that was not emitted.	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	de4923356a	machinst x64: fix fcmp comparison for NotEqual; We have to emit both checks against the parity bit (for unordered) and non-equality bit (for equality), otherwise this returns false when comparing NaN against itself.	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	4b26f5b120	machinst x64: baldrdash: fix multi-value when both gpr and xmm are returned; In baldrdash, only the first return value may live in a register, be it an integer or a floating point value.	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	aa103698d4	machinst x64: extend Copysign to work for f64 inputs too;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	694af3aec2	machinst x64: implement float Floor/Ceil/Trunc/Nearest as VM calls;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	48ec806a9d	machinst x64: implement Fabs/Fneg in terms of other instructions;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	03b9e1e86a	machinst x64: implement float min/max with the right semantics;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	e43310a088	machinst x64: adapt conversions for saturation behaviors;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	cd54f05efd	machinst x64: implement float-to-int and int-to-float conversions;	2020-07-24 19:29:12 +02:00
Chris Fallin	87eb4392c4	Merge pull request #2063 from jgouly/vselect arm64: Implement Vselect opcode	2020-07-22 13:35:46 -07:00
Chris Fallin	44ef8247a9	Merge pull request #2062 from akirilov-arm/extract_lane AArch64: Improve code generation for Extractlane + Sextend / Uextend	2020-07-22 13:35:00 -07:00
Chris Fallin	b8f6d53a6b	Aarch64 codegen: represent bool `true` as -1, not 1. It seems that this is actually the correct behavior for bool types wider than `b1`; some of the vector instruction optimizations depend on bool lanes representing false and true as all-zeroes and all-ones respectively. For `b8`..`b64`, this results in an extra negation after a `cset` when a bool is produced by an `icmp`/`fcmp`, but the most common case (`b1`) is unaffected, because an all-ones one-bit value is just `1`. An example of this assumption can be seen here: `399ee0a54c/cranelift/codegen/src/simple_preopt.rs (L956)` Thanks to Joey Gouly of ARM for noting this issue while implementing SIMD support, and digging into the source (finding the above example) to determine the correct behavior.	2020-07-22 12:30:55 -07:00
Joey Gouly	5355c3e3d5	arm64: Implement Vselect opcode This is implemented the same as Bitselect, as the controlling vector is a boolean vector. A boolean vector in cranelift has elements that are either 0 or all 1s, so it can be used to select elements lane wise. Copyright (c) 2020, Arm Limited.	2020-07-22 12:50:29 +01:00
Anton Kirilov	420c4f06b8	AArch64: Improve code generation for Extractlane + Sextend / Uextend Copyright (c) 2020, Arm Limited.	2020-07-22 11:47:51 +01:00
Chris Fallin	96ef2f1a1b	Fix `u8::MAX` -> `std::u8::MAX`. (#2047 ) As per Carlo Kok on Zulip #cranelift, this breaks builds with stable Rust pre-1.43, as `core::u8::MAX` was only stabilized then. We'd like to support older versions if we can easily do so. This PR also adds `cranelift-tools` to the crates checked on CI with Rust 1.41.0, which pulls in all backends (including `aarch64`).	2020-07-20 14:59:15 -05:00
Chris Fallin	784e2f1480	Merge pull request #2038 from jgouly/arith2 arm64: Enable arith2 tests	2020-07-20 09:00:10 -07:00
Chris Fallin	1b3b2dbfd0	Merge pull request #2043 from cfallin/csel-opt Aarch64: handle csel with icmp/fcmp source without materializing the bool.	2020-07-18 19:33:47 -07:00
Chris Fallin	ea894c0eeb	Merge pull request #2042 from cfallin/aarch64-fix-regshift-mask Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts.	2020-07-18 19:33:35 -07:00
Chris Fallin	21dac670f0	Aarch64: handle csel with icmp/fcmp source without materializing the bool. Previously, we simply compared the input bool to 0, which forced the value into a register (usually via a cmp and cset), zero-extended it, etc. This patch performs the same pattern-matching that branches do to directly perform the cmp and use its flag results with the csel. On the `bz2` benchmark, the runtime is affected as follows (measuring with `perf stat`, using wasmtime with its cache enabled, and taking the second run after the first compiles and populates the cache): pre: 1117.232000 task-clock (msec) # 1.000 CPUs utilized 133 context-switches # 0.119 K/sec 1 cpu-migrations # 0.001 K/sec 5,041 page-faults # 0.005 M/sec 3,511,615,100 cycles # 3.143 GHz 4,272,427,772 instructions # 1.22 insn per cycle <not supported> branches 27,980,906 branch-misses 1.117299838 seconds time elapsed post: 1003.738075 task-clock (msec) # 1.000 CPUs utilized 121 context-switches # 0.121 K/sec 0 cpu-migrations # 0.000 K/sec 5,052 page-faults # 0.005 M/sec 3,224,875,393 cycles # 3.213 GHz 4,000,838,686 instructions # 1.24 insn per cycle <not supported> branches 27,928,232 branch-misses 1.003440004 seconds time elapsed In other words, with this change, on `bz2`, we see a 6.3% reduction in executed instructions.	2020-07-17 21:10:21 -07:00
Chris Fallin	9bd9c628aa	Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts. We had previously fixed a bug in which constant shift amounts should be masked to modulo the number of bits in the operand; however, we did not fix the analogous case for shifts incorporated into the second register argument of ALU instructions that support integrated shifts. This failure to mask resulted in illegal instructions being generated, e.g. in https://bugzilla.mozilla.org/show_bug.cgi?id=1653502. This PR fixes the issue by masking the amount, as the shift semantics require.	2020-07-17 14:55:23 -07:00
Johnnie Birch	a7cedf3100	Add support for 32 bit and 64 bit fcmp for the new backend Implements commiss and commisd.	2020-07-17 13:46:54 -07:00
Joey Gouly	40473dffed	arm64: Enable arith2 tests Copyright (c) 2020, Arm Limited.	2020-07-17 15:58:16 +01:00
Benjamin Bouvier	ead8a835c4	machinst x64: add more FP support	2020-07-17 15:56:44 +02:00
bjorn3	7b7b1f4997	Rename sarg__ to sarg_t	2020-07-17 12:03:17 +02:00
bjorn3	4431ac1108	Implement SystemV struct argument passing	2020-07-17 12:03:17 +02:00
Benjamin Bouvier	bab337fc32	Address review comments;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	5a55646fc3	machinst x64: support out-of-bounds memory accesses;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	ea33ce9116	machinst x64: basic support for baldrdash + fix multi-value support	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	00b38c91f6	machinst x64: fix generation of RegMemImm immediate operands;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	1430c5e436	machinst x64: fix index handling of jump table; The index should be truncated to 32 bits before being used for the jump table entry computation.	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	55b9059954	machinst x64: remove spurious assertion about FP offset requiring to be 16-bytes aligned	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	3905a1b17b	machinst x64: implement SymbolValue and FuncAddr with a movabsq+reloc;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	cfa0a0c4e8	machinst x64: lower resumable_trap as trap;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	311027869b	machinst x64: implement popcnt.i64	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	d9310e8d90	machinst x64: fix checked div sequence - it should mark as clobbering (def) rdx, not modifying it - the signed-div check requires a temporary to compare against int64_min	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	f932bccaf8	machinst x64: fix sign-extension at boundary	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	6f5403a94b	machinst x64: lower Ctz using the Bsf x86 instruction	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	33e0d05645	machinst x64: have cmov modify its destination operand; This is tricky: the control flow implicitly implied by the operand makes it so that the output register may be undefined, if we mark it only as a "def". Make it a "mod" instead, which matches our usage in the codebase, and will make it crash if the output operand isn't unconditionally defined before the instruction.	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	aa7db7fd7b	machinst x64: fix JmpUnknown register mapping;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	fe7dd41435	machinst x64: fix iconst emission	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	ec2209665a	machinst x64: implement bsr and lower Clz;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	eda2d143ed	machinst x64: add support for umulhi/smulhi;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	571061fe4c	machinst x64: add support for rotations;	2020-07-16 18:21:06 +02:00
Benjamin Bouvier	22892466e7	machinst x64: fix implementation of *reduce; They should just generate a plain move, since the high bits are then ignored, and not an extended move.	2020-07-16 18:21:06 +02:00
Andrew Brown	c8ddf8a34c	Encode `[u\|s]widen_low` for x86	2020-07-15 11:32:08 -07:00

1 2 3 4 5 ...

315 Commits