wasmtime

Author	SHA1	Message	Date
Andrew Brown	459fce3467	x64: lower i8x16.popcnt to VPOPCNTB when possible When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt` instruction can be lowered to a single x64 instruction, `VPOPCNTB`, instead of 8+ instructions.	2021-05-25 12:16:25 -07:00
Chris Fallin	f2fe0c669e	Merge pull request #2929 from cfallin/bb-offsets Provide BB layout info externally in terms of code offsets.	2021-05-24 14:27:53 -07:00
Chris Fallin	800cf25bb5	Make the CFG metadata computation conditional on a flag.	2021-05-24 13:01:15 -07:00
Afonso Bordado	4ddbfe50ba	aarch64: Implement imul for i128 operands	2021-05-24 18:23:30 +01:00
Chris Fallin	11a2ef01e7	Provide BB layout info externally in terms of code offsets. This is sometimes useful when performing analyses on the generated machine code: for example, some kinds of code verifiers will want to do a control-flow analysis, and it is much easier to do this if one does not have to recover the CFG from the machine code (doing so requires heavyweight analysis when indirect branches are involved). If one trusts the control-flow lowering and only needs to verify other properties of the code, this can be very useful.	2021-05-24 09:18:06 -07:00
Afonso Bordado	a2e74b2c45	aarch64: Implement isub for i128 operands	2021-05-22 21:51:41 +01:00
Afonso Bordado	d3b525fa29	aarch64: Implement iadd for i128 operands	2021-05-22 21:21:44 +01:00
Johnnie Birch	9a5c9607e1	Vpopcnt for x64	2021-05-21 19:23:26 -07:00
Chris Fallin	65e0e20210	Merge pull request #2892 from afonso360/aarch64-multireg-args Handle i128 arguments in the aarch64 ABI	2021-05-21 16:57:42 -07:00
Chris Fallin	95559c01aa	Merge pull request from GHSA-hpqh-2wqx-7qp5 Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Release v0.74.0 as security-patch release.	2021-05-21 12:01:55 -07:00
Afonso Bordado	fbcfffdeab	Handle spilling i128 arguments into the stack in aarch64	2021-05-21 17:05:41 +01:00
Andrew Brown	54b45d28a3	x64: lower fcvt_from_uint to VCVTUDQ2PS when possible When AVX512VL and AVX512F are available, use a single instruction (`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This optimization is a port from the legacy x86 backend.	2021-05-19 12:20:11 -07:00
Chris Fallin	a1c9b06cea	Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Previously, the x64 backend's ABI code would generate a sign-extending load when loading a less-than-64-bit integer from a spillslot. This is incorrect: e.g., for i32s > 0x80000000, this would result in all high bits set. This interacts poorly with another optimization. Normally, the invariant is that the high bits of a register holding a value of a certain type, beyond that type's bits, are undefined. However, as an optimization, we recognize and use the fact that on x86-64, 32-bit instructions zero the upper 32 bits. This allows us to elide a 32-to-64-bit zero-extend op (turning it into just a move, which can then sometimes disappear entirely due to register coalescing). If a spill and reload happen between the production of a 32-bit value from an instruction known to zero the upper bits and its use, then we will rely on zero upper bits that might actually be set by a sign-extend. This will result in incorrect execution. As a fix, we stick to a simple invariant: we always spill and reload a full 64 bits when handling integer registers on x64. This ensures that no bits are mangled.	2021-05-19 12:19:19 -07:00
Andrew Brown	7ef3ae2903	x64: implement vselect with variable blend instructions This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`, and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used by [simple_preopt.rs](`fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999)`) to lower `bitselect` to a single x86 instruction when the condition mask is known to be boolean (all 1s or 0s, e.g., from a conversion). This is better than `bitselect` in general, which lowers to 4-5 instructions. The old backend had the `vselect` lowering; this simply introduces it to the new backend.	2021-05-17 11:23:33 -07:00
Andrew Brown	bc0df92137	peepmatic: rebuild peephole optimizers after cranelift/meta change	2021-05-17 06:54:45 -07:00
Andrew Brown	84b6f05971	cranelift: remove unreachable scalar lowerings of saturating arithmetic Since `uadd_sat`, `sadd_sat`, `usub_sat`, and `ssub_sat` are now only available to vector types, this removes the lowering code for the scalar versions of these instructions in the arm32 and aarch64 backends.	2021-05-17 06:54:45 -07:00
Andrew Brown	e676589b0c	x64: lower i64x2.imul to VPMULLQ when possible This adds the machinery to encode the VPMULLQ instruction which is available in AVX512VL and AVX512DQ. When these feature sets are available, we use this instruction instead of a lengthy 12-instruction sequence.	2021-05-13 20:14:05 -07:00
Andrew Brown	c982d2be65	x64: move multiplication lowering Since the lowering of `imul` complicated the other ALU operations it was matched with and since future commits will alter the multiplication lowering further, this change moves the `imul` lowering to its own match block.	2021-05-13 20:14:05 -07:00
Andrew Brown	c89e6b2353	x64: make the x64 module public In order to benchmark portions of the x64 module, this change makes it a public module of cranelift-codegen.	2021-05-13 10:46:08 -07:00
Andrew Brown	02796fc670	x64: move encodings to a separate module In order to benchmark the encoding code with criterion, the functions and structures must be public. Moving this code to its own module (instead of keeping as a submodule to `inst`), allows `inst` to remain private. This avoids having to expose and document (or ignore documenting) the numerous instruction variants in `inst` while allowing access to the encoding code. This commit changes no functionality.	2021-05-13 10:46:08 -07:00
Afonso Bordado	ac624da8d9	Handle i128 arguments in the aarch64 ABI When dealing with params that need to be split, we follow the arch64 ABI and split the value in two, and make sure that start that argument in an even numbered xN register. The apple ABI does not require this, so on those platforms, we start params anywhere.	2021-05-12 13:06:13 +01:00
Chris Fallin	5fb2c8c235	Merge pull request #2874 from uweigand/s390x-backend Support IBM z/Architecture	2021-05-10 13:53:23 -07:00
Afonso Bordado	e021995323	Allow i128 amount operands on shift instructions in the x64 backend Fixes #2727.	2021-05-10 18:32:20 +01:00
Ulrich Weigand	89b5fc776d	Support IBM z/Architecture This adds support for the IBM z/Architecture (s390x-ibm-linux). The status of the s390x backend in its current form is: - Wasmtime is fully functional and passes all tests on s390x. - All back-end features supported, with the exception of SIMD. - There is still a lot of potential for performance improvements. - Currently the only supported processor type is z15.	2021-05-10 16:01:16 +02:00
bjorn3	82f3ad4f1a	Add comment why thiserror is not used	2021-05-04 13:51:28 +02:00
bjorn3	03fdbadfb4	Remove thiserror dependency from cranelift_codegen	2021-05-04 13:45:20 +02:00
Ulrich Weigand	e1cc1a67d5	Object file support for s390x (#2872 ) Add support for s390x binary format object files. In particular, add support for s390x ELF relocation types (currently only S390xPCRel32Dbl).	2021-05-03 11:50:00 -05:00
Anton Kirilov	480670e17f	Enable the simd_boolean test for AArch64 Also, enable the simd_i64x2_arith2 test because it doesn't need any code changes. Copyright (c) 2021, Arm Limited.	2021-04-27 20:19:51 +01:00
Jubilee Young	a8c956ede1	Factor out byteorder in cranelift This removes an existing dependency on the byteorder crate in favor of using std equivalents directly. While not an issue for wasmtime per se, cranelift is now part of the critical path of building and testing Rust, and minimizing dependencies, even small ones, can help reduce the time and bandwidth required.	2021-04-23 12:05:18 -07:00
StackDoubleFlow	9637bc5a09	Fix cranelift `Module` and `ObjectModule` docs links (#2852 )	2021-04-21 06:29:02 -07:00
Benjamin Bouvier	8ab3511b3b	Generate unwind information on Win64 with the old backend Following the new ABI introduced for efficient support of multiple return values, the old-backend test for generating unwind information was incomplete, resulting in no unwind information being generated and traps not being correctly caught by the runtime.	2021-04-16 18:05:49 +02:00
Benjamin Bouvier	50aa645769	cranelift: use a deferred display wrapper for logging the vcode's IR	2021-04-16 10:27:19 +02:00
Chris Fallin	03077e0de9	Merge pull request #2843 from uweigand/spillslot-fix cranelift: Fix spillslot regression on big-endian platforms	2021-04-15 13:28:33 -07:00
Ulrich Weigand	10efe8e780	cranelift: Fix spillslot regression on big-endian platforms PR 2840 changed the store_spillslot routine to always store integer registers in full word size to a spill slot. However, the load_spillslot routine was not updated, which may causes the contents to be reloaded in a different type. On big-endian systems this will fetch wrong data. Fixed by using the same type override in load_spillslot.	2021-04-15 21:39:14 +02:00
Andrew Brown	0acc1451ea	x64: lower iabs.i64x2 using a single AVX512 instruction when possible (#2819 ) * x64: add EVEX encoding mechanism Also, includes an empty stub module for the VEX encoding. * x64: lower abs.i64x2 to VPABSQ when available * x64: refactor EVEX encodings to use `EvexInstruction` This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked. * x64: rename encoding CodeSink to ByteSink	2021-04-15 11:53:58 -07:00
Chris Fallin	36c667d58d	Merge pull request #2837 from uweigand/outgoing-args Add back support for accumulating outgoing arguments	2021-04-14 12:54:06 -07:00
Chris Fallin	fd4bfbe5a7	Merge pull request #2836 from uweigand/framesizefix Fix frame size after unwind rework	2021-04-14 12:19:38 -07:00
Chris Fallin	1f21b32e99	Merge pull request #2838 from uweigand/optionalfp Allow unwind support to work without a frame pointer	2021-04-14 10:58:51 -07:00
Chris Fallin	337cc47d2f	Merge pull request #2840 from bnjbvr/fix-2839 cranelift: always spill i32 with i64 stores	2021-04-14 10:11:47 -07:00
Benjamin Bouvier	e7bced9512	cranelift: always spill i32 with i64 stores; Fixes #2839. See also the issue description and comments in this commits for details of what the fix is about here.	2021-04-14 18:08:52 +02:00
Ulrich Weigand	5904c09682	Allow unwind support to work without a frame pointer The patch extends the unwinder to support targets that do not need to use a dedicated frame pointer register. Specifically, the changes include: - Change the "fp" routine in the RegisterMapper to return an optional frame pointer regsiter via Option<Register>. - On targets that choose to not define a FP register via the above routine, the UnwindInst::DefineNewFrame operation no longer switches the CFA to be defined in terms of the FP. (The operation still can be used to define the location of the clobber area.) - In addition, on targets that choose not to define a FP register, the UnwindInst::PushFrameRegs operation is not supported. - There is a new operation UnwindInst::StackAlloc that needs to be called on targets without FP whenever the stack pointer is updated. This caused the CFA offset to be adjusted accordingly. (On targets with FP this operation is a no-op.)	2021-04-14 15:32:31 +02:00
Ulrich Weigand	336c6369b4	Add back support for accumulating outgoing arguments The unwind rework (commit `2d5db92a`) removed support for the feature to allow a target to allocate the space for outgoing function arguments right in the prologue (originally added via commit `80c2d70d`). This patch adds it back.	2021-04-14 13:51:16 +02:00
Ulrich Weigand	e3bb36ba77	Fix frame size after unwind rework After the unwind rework (commit `2d5db92a`) the space used to save clobbered registers now lies between the nominal SP and the FP. Therefore, the size of that space should now be included in the frame size as reported by frame_size(), since this value is used to compute the nominal_sp_to_fp offset.	2021-04-14 13:46:08 +02:00
Chris Fallin	27b3162f87	Merge pull request #2833 from abrown/2826 x64: fix Inst::store to understand all scalar types	2021-04-13 15:36:41 -07:00
Chris Fallin	8caac9ed79	Merge pull request #2823 from akirilov-arm/callee_saves Cranelift AArch64: Improve the handling of callee-saved registers	2021-04-13 15:35:46 -07:00
Andrew Brown	6bdef48473	x64: refactor to use Inst::store during lowering This re-factoring replaces uses of `Inst::mov_r_m` with `Inst::store` to ensure there is only one code location to troubleshoot when generating store instructions for a specific type.	2021-04-13 13:09:07 -07:00
Andrew Brown	9b25b06d86	x64: store to all scalar sizes Previously, `Inst::store` only understood a subset of the scalar types, which resulted in failures seen in #2826. This change allows `Inst::store` to generate instructions for all scalar widths (`8 \| 16 \| 32 \| 64`) since all of these are supported in the emission code of `Inst::MovRM`.	2021-04-13 12:38:35 -07:00
bjorn3	b272d4b7da	Fix srem.{i8,i16}	2021-04-13 21:28:27 +02:00
Anton Kirilov	7248abd591	Cranelift AArch64: Improve the handling of callee-saved registers SIMD & FP registers are now saved and restored in pairs, similarly to general-purpose registers. Also, only the bottom 64 bits of the registers are saved and restored (in case of non-Baldrdash ABIs), which is the requirement from the Procedure Call Standard for the Arm 64-bit Architecture. As for the callee-saved general-purpose registers, if a procedure needs to save and restore an odd number of them, it no longer uses store and load pair instructions for the last register. Copyright (c) 2021, Arm Limited.	2021-04-13 20:23:08 +01:00
Andrew Brown	8e495ac79d	x64: match multiple ISA requirements before emitting Because there are instructions that are present in more than one ISA feature set, we need to see if any of the ISA requirements match before emitting. This change includes the `VPABSQ` instruction as an example, which is present in both `AVX512F` and `AVX512VL`.	2021-04-08 10:30:39 -07:00

... 6 7 8 9 10 ...

1271 Commits