wasmtime

Author	SHA1	Message	Date
Afonso Bordado	4d085d8fbf	aarch64: Add sbcs instruction encodings	2021-06-09 22:56:39 +01:00
Afonso Bordado	61f07d79a7	aarch64: Add adcs instruction encodings	2021-06-09 22:56:39 +01:00
Afonso Bordado	b1475f32a6	aarch64: Add ishl,ushr,sshr for i128 values	2021-06-09 22:48:14 +01:00
Afonso Bordado	2c4d1c0003	aarch64: Add ands instruction encoding	2021-06-09 22:38:01 +01:00
Afonso Bordado	c38a5e8b62	aarch64: Add basic i128 bit ops to the AArch64 backend Currently we just basically use a two instruction version of the same i64 ops. IMMLogic doesn't really support multiple register inputs, so its left as a TODO for future optimizations.	2021-06-09 22:37:55 +01:00
Chris Fallin	ffb92d9109	Merge pull request #2966 from akirilov-arm/simd_int_to_int_extend Enable the simd_int_to_int_extend test for AArch64	2021-06-06 23:34:52 -07:00
Johnnie Birch	1770880e19	x64: add support for packed promote and demote (#2783 ) * Add support for x64 packed promote low * Add support for x64 packed floating point demote * Update vector promote low and demote by adding constraints Also does some renaming and minor refactoring	2021-06-04 15:59:20 -07:00
Anton Kirilov	5e8a8fe5a0	Enable the simd_int_to_int_extend test for AArch64 Copyright (c) 2021, Arm Limited.	2021-06-04 16:10:02 +01:00
Andrew Brown	8dc4cc9fe3	x64: fix AVX512 flag checks Previously, the multiple flags for certain AVX512 instructions were checked using `OR`: e.g., if the CPU has AVX512VL `OR` AVX512DQ, emit `VPMULLQ`. This is incorrect--the logic should be `AND`. The Intel Software Developer Manual, vol. 1, sec. 15.4, has more information on this (notable there is the suggestion to check with `XGETBV` that the OS is allowing the use of the XMM registers--but that is a separate issue). This change switches to `AND` logic in the new backend.	2021-06-01 11:41:16 -07:00
Andrew Brown	2a9f458ea3	x64: lower i8x16.shuffle to VPERMI2B when possible When shuffling values from two different registers, the x64 lowering for `i8x16.shuffle` must first shuffle each register separately and then OR the results with SSE instructions. With `VPERMI2B`, available in AVX512VL + AVX512VBMI, this can be done in a single instruction after the shuffle mask has been moved into the destination register. This change uses `VPERMI2B` for that case when the CPU supports it.	2021-06-01 11:40:53 -07:00
Benjamin Bouvier	51edea9e57	cranelift: introduce a new WasmtimeAppleAarch64 calling convention The previous choice to use the WasmtimeSystemV calling convention for apple-aarch64 devices was incorrect: padding of arguments was incorrectly computed. So we have to use some flavor of the apple-aarch64 ABI there. Since we want to support the wasmtime custom convention for multiple returns on apple-aarch64 too, a new custom Wasmtime calling convention was introduced to support this.	2021-06-01 17:29:12 +02:00
Olivier Lemasle	eadafb1554	s390x: Fix import cranelift-codegen's build failed on s390x, with this error: ``` error[E0432]: unresolved import `crate::isa::unwind::systemv` --> cranelift/codegen/src/isa/s390x/mod.rs:6:25 \| 6 \| use crate::isa::unwind::systemv::RegisterMappingError; \| ^^^^^^^ could not find `systemv` in `unwind` ``` This import should be used only with `unwind` feature enabled.	2021-06-01 15:16:41 +02:00
Andrew Brown	459fce3467	x64: lower i8x16.popcnt to VPOPCNTB when possible When AVX512VL or AVX512BITALG are available, Wasm SIMD's `popcnt` instruction can be lowered to a single x64 instruction, `VPOPCNTB`, instead of 8+ instructions.	2021-05-25 12:16:25 -07:00
Chris Fallin	f2fe0c669e	Merge pull request #2929 from cfallin/bb-offsets Provide BB layout info externally in terms of code offsets.	2021-05-24 14:27:53 -07:00
Chris Fallin	800cf25bb5	Make the CFG metadata computation conditional on a flag.	2021-05-24 13:01:15 -07:00
Afonso Bordado	4ddbfe50ba	aarch64: Implement imul for i128 operands	2021-05-24 18:23:30 +01:00
Chris Fallin	11a2ef01e7	Provide BB layout info externally in terms of code offsets. This is sometimes useful when performing analyses on the generated machine code: for example, some kinds of code verifiers will want to do a control-flow analysis, and it is much easier to do this if one does not have to recover the CFG from the machine code (doing so requires heavyweight analysis when indirect branches are involved). If one trusts the control-flow lowering and only needs to verify other properties of the code, this can be very useful.	2021-05-24 09:18:06 -07:00
Afonso Bordado	a2e74b2c45	aarch64: Implement isub for i128 operands	2021-05-22 21:51:41 +01:00
Afonso Bordado	d3b525fa29	aarch64: Implement iadd for i128 operands	2021-05-22 21:21:44 +01:00
Johnnie Birch	9a5c9607e1	Vpopcnt for x64	2021-05-21 19:23:26 -07:00
Chris Fallin	65e0e20210	Merge pull request #2892 from afonso360/aarch64-multireg-args Handle i128 arguments in the aarch64 ABI	2021-05-21 16:57:42 -07:00
Chris Fallin	95559c01aa	Merge pull request from GHSA-hpqh-2wqx-7qp5 Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Release v0.74.0 as security-patch release.	2021-05-21 12:01:55 -07:00
Afonso Bordado	fbcfffdeab	Handle spilling i128 arguments into the stack in aarch64	2021-05-21 17:05:41 +01:00
Andrew Brown	54b45d28a3	x64: lower fcvt_from_uint to VCVTUDQ2PS when possible When AVX512VL and AVX512F are available, use a single instruction (`VCVTUDQ2PS`) instead of a length 9-instruction sequence. This optimization is a port from the legacy x86 backend.	2021-05-19 12:20:11 -07:00
Chris Fallin	a1c9b06cea	Fix spillslot reload of narrow values: zero-extend, don't sign-extend. Previously, the x64 backend's ABI code would generate a sign-extending load when loading a less-than-64-bit integer from a spillslot. This is incorrect: e.g., for i32s > 0x80000000, this would result in all high bits set. This interacts poorly with another optimization. Normally, the invariant is that the high bits of a register holding a value of a certain type, beyond that type's bits, are undefined. However, as an optimization, we recognize and use the fact that on x86-64, 32-bit instructions zero the upper 32 bits. This allows us to elide a 32-to-64-bit zero-extend op (turning it into just a move, which can then sometimes disappear entirely due to register coalescing). If a spill and reload happen between the production of a 32-bit value from an instruction known to zero the upper bits and its use, then we will rely on zero upper bits that might actually be set by a sign-extend. This will result in incorrect execution. As a fix, we stick to a simple invariant: we always spill and reload a full 64 bits when handling integer registers on x64. This ensures that no bits are mangled.	2021-05-19 12:19:19 -07:00
Andrew Brown	7ef3ae2903	x64: implement vselect with variable blend instructions This change implements `vselect` using SSE4.1's `BLENDVPS`, `BLENDVPD`, and `PBLENDVB`. `vselect` is a lane-selecting instruction that is used by [simple_preopt.rs](`fa1faf5d22/cranelift/codegen/src/simple_preopt.rs (L947-L999)`) to lower `bitselect` to a single x86 instruction when the condition mask is known to be boolean (all 1s or 0s, e.g., from a conversion). This is better than `bitselect` in general, which lowers to 4-5 instructions. The old backend had the `vselect` lowering; this simply introduces it to the new backend.	2021-05-17 11:23:33 -07:00
Andrew Brown	bc0df92137	peepmatic: rebuild peephole optimizers after cranelift/meta change	2021-05-17 06:54:45 -07:00
Andrew Brown	84b6f05971	cranelift: remove unreachable scalar lowerings of saturating arithmetic Since `uadd_sat`, `sadd_sat`, `usub_sat`, and `ssub_sat` are now only available to vector types, this removes the lowering code for the scalar versions of these instructions in the arm32 and aarch64 backends.	2021-05-17 06:54:45 -07:00
Andrew Brown	e676589b0c	x64: lower i64x2.imul to VPMULLQ when possible This adds the machinery to encode the VPMULLQ instruction which is available in AVX512VL and AVX512DQ. When these feature sets are available, we use this instruction instead of a lengthy 12-instruction sequence.	2021-05-13 20:14:05 -07:00
Andrew Brown	c982d2be65	x64: move multiplication lowering Since the lowering of `imul` complicated the other ALU operations it was matched with and since future commits will alter the multiplication lowering further, this change moves the `imul` lowering to its own match block.	2021-05-13 20:14:05 -07:00
Andrew Brown	c89e6b2353	x64: make the x64 module public In order to benchmark portions of the x64 module, this change makes it a public module of cranelift-codegen.	2021-05-13 10:46:08 -07:00
Andrew Brown	02796fc670	x64: move encodings to a separate module In order to benchmark the encoding code with criterion, the functions and structures must be public. Moving this code to its own module (instead of keeping as a submodule to `inst`), allows `inst` to remain private. This avoids having to expose and document (or ignore documenting) the numerous instruction variants in `inst` while allowing access to the encoding code. This commit changes no functionality.	2021-05-13 10:46:08 -07:00
Afonso Bordado	ac624da8d9	Handle i128 arguments in the aarch64 ABI When dealing with params that need to be split, we follow the arch64 ABI and split the value in two, and make sure that start that argument in an even numbered xN register. The apple ABI does not require this, so on those platforms, we start params anywhere.	2021-05-12 13:06:13 +01:00
Chris Fallin	5fb2c8c235	Merge pull request #2874 from uweigand/s390x-backend Support IBM z/Architecture	2021-05-10 13:53:23 -07:00
Afonso Bordado	e021995323	Allow i128 amount operands on shift instructions in the x64 backend Fixes #2727.	2021-05-10 18:32:20 +01:00
Ulrich Weigand	89b5fc776d	Support IBM z/Architecture This adds support for the IBM z/Architecture (s390x-ibm-linux). The status of the s390x backend in its current form is: - Wasmtime is fully functional and passes all tests on s390x. - All back-end features supported, with the exception of SIMD. - There is still a lot of potential for performance improvements. - Currently the only supported processor type is z15.	2021-05-10 16:01:16 +02:00
bjorn3	82f3ad4f1a	Add comment why thiserror is not used	2021-05-04 13:51:28 +02:00
bjorn3	03fdbadfb4	Remove thiserror dependency from cranelift_codegen	2021-05-04 13:45:20 +02:00
Ulrich Weigand	e1cc1a67d5	Object file support for s390x (#2872 ) Add support for s390x binary format object files. In particular, add support for s390x ELF relocation types (currently only S390xPCRel32Dbl).	2021-05-03 11:50:00 -05:00
Anton Kirilov	480670e17f	Enable the simd_boolean test for AArch64 Also, enable the simd_i64x2_arith2 test because it doesn't need any code changes. Copyright (c) 2021, Arm Limited.	2021-04-27 20:19:51 +01:00
Jubilee Young	a8c956ede1	Factor out byteorder in cranelift This removes an existing dependency on the byteorder crate in favor of using std equivalents directly. While not an issue for wasmtime per se, cranelift is now part of the critical path of building and testing Rust, and minimizing dependencies, even small ones, can help reduce the time and bandwidth required.	2021-04-23 12:05:18 -07:00
StackDoubleFlow	9637bc5a09	Fix cranelift `Module` and `ObjectModule` docs links (#2852 )	2021-04-21 06:29:02 -07:00
Benjamin Bouvier	8ab3511b3b	Generate unwind information on Win64 with the old backend Following the new ABI introduced for efficient support of multiple return values, the old-backend test for generating unwind information was incomplete, resulting in no unwind information being generated and traps not being correctly caught by the runtime.	2021-04-16 18:05:49 +02:00
Benjamin Bouvier	50aa645769	cranelift: use a deferred display wrapper for logging the vcode's IR	2021-04-16 10:27:19 +02:00
Chris Fallin	03077e0de9	Merge pull request #2843 from uweigand/spillslot-fix cranelift: Fix spillslot regression on big-endian platforms	2021-04-15 13:28:33 -07:00
Ulrich Weigand	10efe8e780	cranelift: Fix spillslot regression on big-endian platforms PR 2840 changed the store_spillslot routine to always store integer registers in full word size to a spill slot. However, the load_spillslot routine was not updated, which may causes the contents to be reloaded in a different type. On big-endian systems this will fetch wrong data. Fixed by using the same type override in load_spillslot.	2021-04-15 21:39:14 +02:00
Andrew Brown	0acc1451ea	x64: lower iabs.i64x2 using a single AVX512 instruction when possible (#2819 ) * x64: add EVEX encoding mechanism Also, includes an empty stub module for the VEX encoding. * x64: lower abs.i64x2 to VPABSQ when available * x64: refactor EVEX encodings to use `EvexInstruction` This change replaces the `encode_evex` function with a builder-style struct, `EvexInstruction`. This approach clarifies the code, adds documentation, and results in slight speedups when benchmarked. * x64: rename encoding CodeSink to ByteSink	2021-04-15 11:53:58 -07:00
Chris Fallin	36c667d58d	Merge pull request #2837 from uweigand/outgoing-args Add back support for accumulating outgoing arguments	2021-04-14 12:54:06 -07:00
Chris Fallin	fd4bfbe5a7	Merge pull request #2836 from uweigand/framesizefix Fix frame size after unwind rework	2021-04-14 12:19:38 -07:00
Chris Fallin	1f21b32e99	Merge pull request #2838 from uweigand/optionalfp Allow unwind support to work without a frame pointer	2021-04-14 10:58:51 -07:00

... 2 3 4 5 6 ...

1082 Commits