wasmtime

Author	SHA1	Message	Date
Afonso Bordado	925891245d	cranelift: Fix `fmin`/`fmax` when dealing with zeroes (#4373 ) `fmin`/`fmax` are defined as returning -0.0 as smaller than 0.0. This is not how the IEEE754 views these values and the interpreter was returning the wrong value in these operations since it was just using the standard IEEE754 comparisons. This also tries to preserve NaN information by avoiding passing NaN's through any operation that could canonicalize it.	2022-07-05 12:59:23 -07:00
Sam Parker	d9e0e6a6a9	[AArch64] Port min/max to ISLE (#4374 ) Copyright (c) 2022, Arm Limited.	2022-07-05 09:16:45 -07:00
Afonso Bordado	2003ae99a0	Implement `fma`/`fabs`/`fneg`/`fcopysign` on the interpreter (#4367 ) * cranelift: Implement `fma` on interpreter * cranelift: Implement `fabs` on interpreter * cranelift: Fix `fneg` implementation on interpreter `fneg` was implemented as `0 - x` which is not correct according to the standard since that operation makes no guarantees on what the output is when the input is `NaN`. However for `fneg` the output for `NaN` inputs is fully defined. * cranelift: Implement `fcopysign` on interpreter	2022-07-05 09:03:04 -07:00
wasmtime-publish	7c428bbd62	Bump Wasmtime to 0.40.0 (#4378 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-07-05 09:10:52 -05:00
Afonso Bordado	f2e6ff5e70	cranelift: Implement `sqrt` in interpreter (#4362 ) This ignores SIMD for now.	2022-07-01 09:39:11 -07:00
Afonso Bordado	38ecd3744f	aarch64: Implement `bmask`/`bextend` in ISLE (#4358 ) * aarch64: Implement `bmask`/`bextend` in ISLE * cranelift: Remove vector versions of `bextend` * aarch64: Cleanup `bmask`/`bextend` documentation	2022-07-01 09:37:18 -07:00
Ulrich Weigand	ec83144c88	s390x: use full vector register file for FP operations (#4360 ) This defines the full set of 32 128-bit vector registers on s390x. (Note that the VRs overlap the existing FPRs.) In addition, this adds support to use all 32 vector registers to implement floating- point operations, by using vector floating-point instructions with the 'W' bit set to operate only on the first element. This part of the vector instruction set mostly matches the old FP instruction set, with two exceptions: - There is no vector version of the COPY SIGN instruction. Instead, now use a VECTOR SELECT with an appropriate bit mask to implement the fcopysign operation. - There are no vector version of the float <-> int conversion instructions where source and target differ in bit size. Use appropriate multiple conversion steps instead. This also requires use of explicit checking to implement correct overflow handling. As a side effect, this version now also implements the i8 / i16 variants of all conversions, which had been missing so far. For all operations except those two above, we continue to use the old FP instruction if applicable (i.e. if all operands happen to have been allocated to the original FP register set), and use the vector instruction otherwise.	2022-06-30 16:33:39 -07:00
Sam Parker	a2d49ebf27	Use u32 in Type API (#4280 ) Move from passing and returning u8 and u16 values to u32 in many of the functions. This removes a number of type conversions and gives a small compilation time speedup, around ~0.7% on my aarch64 machine. Copyright (c) 2022, Arm Limited.	2022-06-30 12:43:36 -07:00
Ulrich Weigand	95836ba114	s390x: clean up lower.rs (#4355 ) Now that lowering is fully done in ISLE, clean up some code remnants in lower.rs. In particular, move code to lower/isle.rs where possible, and inline lower_insn_to_regs into its caller and simplify.	2022-06-30 11:16:59 -07:00
Afonso Bordado	919604b8c5	aarch64: Implement `ireduce`/`breduce` in ISLE (#4331 ) * aarch64: Implement `ireduce`/`breduce` in ISLE * cranelift: Remove vector versions of `breduce`/`ireduce`	2022-06-30 11:15:47 -07:00
bjorn3	d1446f767d	Mark return value as define instead of clobber for TLS pseudoinstructions (#4357 )	2022-06-30 10:44:51 -07:00
Ulrich Weigand	7a9479f77c	ISLE: Migrate call and return instructions (#3785 ) This adds infrastructure to allow implementing call and return instructions in ISLE, and migrates the s390x back-end. To implement ABI details, this patch creates public accessors for `ABISig` and makes them accessible in ISLE. All actual code generation is then done in ISLE rules, following the information provided by that signature. [ Note that the s390x back end never requires multiple slots for a single argument - the infrastructure to handle this should already be present, however. ] To implement loops in ISLE rules, this patch uses regular tail recursion, employing a `Range` data structure holding a range of integers to be looped over.	2022-06-29 14:22:50 -07:00
Sam Parker	fb61774df2	[AArch64] Port AtomicLoad and AtomicStore to ISLE (#4301 ) Copyright (c) 2022, Arm Limited.	2022-06-29 12:12:48 -07:00
Chris Fallin	2034c8aa45	Cranelift: add a config option for alias analysis and redundant-load elimination. (#4349 ) This allows for experiments as in here [1] and also generally gives an option to anyone who is concerned that the extra optimization may be counterproductive or take too much time. The optimization remains enabled by default. [1] https://github.com/bytecodealliance/wasmtime/pull/4163#issuecomment-1169303683	2022-06-28 15:25:47 -07:00
Chris Fallin	b2e28b917a	Cranelift: update to latest regalloc2: (#4324 ) - Handle call instructions' clobbers with the clobbers API, using RA2's clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers list; - Pull in changes from bytecodealliance/regalloc2#59 for much more sane edge-case behavior w.r.t. liverange splitting.	2022-06-28 09:01:59 -07:00
Afonso Bordado	42d4f97b78	cranelift: Fix `cls` for small types on aarch64 (#4305 ) The previous `cls` code was producing wrong results when fed with a -1 i8. The fix here is to sign extend instead of zero extending since we want to keep the sign bit as one in order for it to be counted correctly in the cls instruction This also merges the interpreter only tests now that aarch64 correctly supports this instruction	2022-06-27 15:55:02 -07:00
Afonso Bordado	aef53784ec	aarch64: Implement `bint` in ISLE (#4319 )	2022-06-27 15:50:46 -07:00
Chris Fallin	0d829a57ee	Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. (#4335 ) * Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60. * Update RELEASES.md. * Update two compile tests based on slightly shifting regalloc output.	2022-06-27 15:58:54 -05:00
Chris Fallin	5c2c285dd7	Cranelift/x64: fix register allocator metadata for 8-bit divides. (#4332 ) `idiv` on x86-64 only reads `rdx`/`edx`/`dx`/`dl` for divides with width greater than 8 bits; for an 8-bit divide, it reads the whole 16-bit divisor from `ax`, as our CISC ancestors intended. This PR fixes the metadata to avoid a regalloc panic (due to undefined `rdx`) in this case. Does not affect Wasmtime or other Wasm-frontend embedders.	2022-06-27 12:31:06 -07:00
Alex Crichton	dc2fe0ac67	x64: Fix codegen for the `i8x16.swizzle` instruction (#4318 ) This commit fixes a mistake in the `Swizzle` opcode implementation in the x64 backend of Cranelift. Previously an input register was casted to a writable register and then modified, which I believe instructions are not supposed to do. This was discovered as part of my investigation into #4315.	2022-06-27 13:20:31 -05:00
Alex Crichton	8bb07523e2	x64: Fix codegen for the `select` instruction with v128 (#4317 ) This commit fixes a bug in the previous codegen for the `select` instruction when the operations of the `select` were of the `v128` type. Previously teh `XmmCmove` instruction only stored an `OperandSize` of 32 or 64 for a 64 or 32-bit move, but this was also used for these 128-bit types which meant that when used the wrong move instruction was generated. The fix applied here is to store the whole `Type` being moved so the 128-bit variant can be selected as well.	2022-06-27 11:02:40 -07:00
Afonso Bordado	87007c5839	cranelift: Fix `bint` implementation on interpreter (#4299 ) * cranelift: Fix `bint` implementation on interpreter The interpreter was returning -1 instead of 1 for positive values. This also extends the bint test suite to cover all types. * cranelift: Restrict `bint` to scalar values only	2022-06-23 13:43:35 -07:00
Anton Kirilov	25a588c35f	Cranelift AArch64: Use an allocated encoding for Udf (#4281 ) Preserve the current behaviour when code is generated for SpiderMonkey. Copyright (c) 2022, Arm Limited.	2022-06-22 15:03:28 +01:00
Trevor Elliott	7e0bb465d0	X64: port the rest of icmp to ISLE (#4254 ) Finish migrating icmp to ISLE for x64	2022-06-13 16:34:11 -07:00
Benjamin Bouvier	43d4f0b93b	Serialize `BlockNode`'s cold field too when serializing a `Layout` (#4265 ) This fixes a bug when the `cold` field would not be serialized, since we're using a custom (de)serializer for `Layout`. This is now properly handled by adding a boolean in the serialized stream. This was caught during the work on #4155, as this would result in cache mismatches between a function and itself.	2022-06-13 12:04:37 -07:00
Anton Kirilov	c15c3061ca	CFI improvements to the AArch64 fiber implementation (#4195 ) Now the fiber implementation on AArch64 authenticates function return addresses and includes the relevant BTI instructions, except on macOS. Also, change the locations of the saved FP and LR registers on the fiber stack to make them compliant with the Procedure Call Standard for the Arm 64-bit Architecture. Copyright (c) 2022, Arm Limited.	2022-06-09 09:17:12 -05:00
Chris Fallin	ed9db962de	x64 backend: fix cmpxchg (don't return RealReg as result). (#4243 ) The current lowering helper for `cmpxchg` returns the literal RealReg `rax` as its result. However, this breaks a number of invariants, and eventually causes a regalloc panic if used as a blockparam arg (pinned vregs cannot be used in this way). In general we have to return regular vregs, not a RealReg, as results of instructions during lowering. However #4223 added a helper for `x64_cmpxchg` that returns a literal `rax`. Fortunately we can do the right thing here by just giving a fresh vreg to the instruction; the regalloc constraints mean that this vreg is constrained to `rax` at the instruction (at its def/late point), so the generator of the instruction need not worry about `rax` here.	2022-06-08 06:13:31 -07:00
Trevor Elliott	bc3c4fa206	X64: port fvpromote to ISLE (#4242 )	2022-06-07 17:18:23 -07:00
Chris Fallin	54acd8b3e2	x64 backend: fix `to_amode` with constant address (no registers). (#4239 ) If an address expression is given to `to_amode` that is completely constant (no registers at all), then it will produce an `Amode` that has the resulting constant as an offset, and `(invalid_reg)` as the base. This is a side-effect of the way we build up the amode step-by-step -- we're waiting to see a register and plug it into the base field. If we never get a reg though, we need to generate a constant zero into a register and use that as the base. This PR adds a `finalize_amode` helper to do just that. Fixes #4234.	2022-06-07 11:40:10 -07:00
Johnnie Birch	3f152273d3	X64: Port fpromote to ISLE (#4230 )	2022-06-06 14:47:44 -07:00
Andrew Brown	6df56e6aa6	x64: port `atomic_cas` to ISLE (#4223 )	2022-06-06 13:20:33 -07:00
Sam Parker	acfeda4d80	[AArch64] Port IaddPairwise to ISLE (#4201 ) Copyright (c) 2022, Arm Limited.	2022-06-06 15:37:13 +01:00
wasmtime-publish	55946704cb	Bump Wasmtime to 0.39.0 (#4225 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2022-06-06 09:12:47 -05:00
Chris Fallin	ae2c84205f	Upgrade to regalloc2 v0.2.2. (#4222 ) Pulls in an improvement to spillslot allocation (bytecodealliance/regalloc2#56).	2022-06-03 17:12:32 -07:00
Andrew Brown	816aae6aca	x64: port some atomics to ISLE (#4212 ) * x64: port `fence` to ISLE * x64: port `atomic_load` to ISLE * x64: port `atomic_store` to ISLE	2022-06-02 14:13:10 -07:00
Chris Fallin	8f61eb9341	Upgrade to regalloc2 version 0.2.1. (#4199 ) This resolves an edge-case where mul.i128 with an input that continues to be live after the instruction could cause an invalid regalloc constraint (basically, the regalloc did not previously support an instruction use and def both being constrained to the same physical reg; and the "mul" variant used for mul.i128 on x64 was the only instance of such operands in Cranelift). Causes two extra move instructions in the mul.i128 filetest, but that's the price to pay for the slightly more general (works in all cases) handling of the constraints.	2022-06-01 13:26:20 -07:00
Sam Parker	010e028d67	[AArch64] Port AtomicCAS to isle (#4140 ) Copyright (c) 2022, Arm Limited.	2022-05-25 09:19:24 +01:00
Chris Fallin	b830c3cf93	Pull in regalloc2 v0.2.0, with no more separate scratch registers. (#4182 ) RA2 recently removed the need for a dedicated scratch register for cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive performance impact on function bodies that were register-constrained, as it means that one more register is available. In Sightglass, I measured +5-8% on `blake3-scalar`, at least among current benchmarks.	2022-05-23 12:51:04 -07:00
Benjamin Bouvier	6e828df632	Remove unused `SourceLoc` in many `Mach` data structures (#4180 ) * Remove unused srcloc in MachReloc * Remove unused srcloc in MachTrap * Use `into_iter` on array in bench code to suppress a warning * Remove unused srcloc in MachCallSite	2022-05-23 09:27:28 -07:00
Chris Fallin	32622b3e6f	Cranelift: fix use of pinned reg with SysV calling convention. (#4176 ) Previously, the pinned register (enabled by the `enable_pinned_reg` Cranelift setting and used via the `get_pinned_reg` and `set_pinned_reg` CLIF ops) was only used when Cranelift was embedded in SpiderMonkey, in order to support a pinned heap register. SpiderMonkey has its own calling convention in Cranelift (named after the integration layer, "Baldrdash"). However, the feature is more general, and should be usable with the default system calling convention too, e.g. SysV or Windows Fastcall. This PR fixes the ABI code to properly treat the pinned register as a globally allocated register -- and hence an implicit input and output to every function, not saved/restored in the prologue/epilogue -- for SysV on x86-64 and aarch64, and Fastcall on x86-64. Fixes #4170.	2022-05-23 09:18:51 -07:00
Chris Fallin	0824abbae4	Add a basic alias analysis with redundant-load elim and store-to-load fowarding opts. (#4163 ) This PR adds a basic alias analysis, and optimizations that use it. This is a "mid-end optimization": it operates on CLIF, the machine-independent IR, before lowering occurs. The alias analysis (or maybe more properly, a sort of memory-value analysis) determines when it can prove a particular memory location is equal to a given SSA value, and when it can, it replaces any loads of that location. This subsumes two common optimizations: * Redundant load elimination: when the same memory address is loaded two times, and it can be proven that no intervening operations will write to that memory, then the second load is redundant and its result must be the same as the first. We can use the first load's result and remove the second load. * Store-to-load forwarding: when a load can be proven to access exactly the memory written by a preceding store, we can replace the load's result with the store's data operand, and remove the load. Both of these optimizations rely on a "last store" analysis that is a sort of coloring mechanism, split across disjoint categories of abstract state. The basic idea is that every memory-accessing operation is put into one of N disjoint categories; it is disallowed for memory to ever be accessed by an op in one category and later accessed by an op in another category. (The frontend must ensure this.) Then, given this, we scan the code and determine, for each memory-accessing op, when a single prior instruction is a store to the same category. This "colors" the instruction: it is, in a sense, a static name for that version of memory. This analysis provides an important invariant: if two operations access memory with the same last-store, then no other store can alias in the time between that last store and these operations. This must-not-alias property, together with a check that the accessed address is exactly the same (same SSA value and offset), and other attributes of the access (type, extension mode) are the same, let us prove that the results are the same. Given last-store info, we scan the instructions and build a table from "memory location" key (last store, address, offset, type, extension) to known SSA value stored in that location. A store inserts a new mapping. A load may also insert a new mapping, if we didn't already have one. Then when a load occurs and an entry already exists for its "location", we can reuse the value. This will be either RLE or St-to-Ld depending on where the value came from. Note that this does work across basic blocks: the last-store analysis is a full iterative dataflow pass, and we are careful to check dominance of a previously-defined value before aliasing to it at a potentially redundant load. So we will do the right thing if we only have a "partially redundant" load (loaded already but only in one predecessor block), but we will also correctly reuse a value if there is a store or load above a loop and a redundant load of that value within the loop, as long as no potentially-aliasing stores happen within the loop.	2022-05-20 13:19:32 -07:00
Chris Fallin	02d5edc591	Upgrade to regalloc2 0.1.3. (#4157 ) * Upgrade to regalloc2 0.1.3. This pulls in bytecodealliance/regalloc2#49, which slightly improves codegen in some cases where a safepoint (for reference-typed values) occurs in the same liverange as a register-constrained use. For example, in bytecodealliance/wasmtime#3785, an extra move instruction appeared and a callee-save register was used (necessitating a more expensive prologue) because of suboptimal splitting heuristics, which this PR fixes. The updated RA2 heuristics appear to have no measured downsides in existing benchmarks and improve the manually-observed codegen issue. * Update filetests where regalloc2 improvement altered behavior with reftypes.	2022-05-18 11:48:40 -07:00
Andrew Brown	e898cb750a	x64: remove TODO for i128 load (#4159 ) This work has already been finished in a previous PR.	2022-05-17 17:43:43 -07:00
Anton Kirilov	edf07a8da6	Cranelift AArch64: Migrate Bitselect and Vselect to ISLE (#4139 ) Copyright (c) 2022, Arm Limited.	2022-05-16 09:39:28 -07:00
bjorn3	c4eab2beb6	Avoid spurious build script runs (#4150 ) * Don't attempt to track the generated clif.isle in cargo This causes the build script to rerun every time for me. * Put build script debug messages on stderr instead of stdout This keeps stdout reserved for cargo build script directives	2022-05-12 11:49:20 -07:00
Ulrich Weigand	0243a16679	s390x: Fix bitwise operations (#4146 ) Current codegen had a number of logic errors confusing NAND with AND WITH COMPLEMENT, and NOR with OR WITH COMPLEMENT. Add support for the missing z15 instructions and fix logic.	2022-05-12 10:05:22 -07:00
Chris Fallin	5d671952ee	Cranelift: do not check in generated ISLE code; regenerate on every compile. (#4143 ) This PR fixes #4066: it modifies the Cranelift `build.rs` workflow to invoke the ISLE DSL compiler on every compilation, rather than only when the user specifies a special "rebuild ISLE" feature. The main benefit of this change is that it vastly simplifies the mental model required of developers, and removes a bunch of failure modes we have tried to work around in other ways. There is now just one "source of truth", the ISLE source itself, in the repository, and so there is no need to understand a special "rebuild" step and how to handle merge errors. There is no special process needed to develop the compiler when modifying the DSL. And there is no "noise" in the git history produced by constantly-regenerated files. The two main downsides we discussed in #4066 are: - Compile time could increase, by adding more to the "meta" step before the main build; - It becomes less obvious where the source definitions are (everything becomes more "magic"), which makes exploration and debugging harder. This PR addresses each of these concerns: 1. To maintain reasonable compile time, it includes work to cut down the dependencies of the `cranelift-isle` crate to nothing (only the Rust stdlib), in the default build. It does this by putting the error-reporting bits (`miette` crate) under an optional feature, and the logging (`log` crate) under a feature-controlled macro, and manually writing an `Error` impl rather than using `thiserror`. This completely avoids proc macros and the `syn` build slowness. The user can still get nice errors out of `miette`: this is enabled by specifying a Cargo feature `--features isle-errors`. 2. To allow the user to optionally inspect the generated source, which nominally lives in a hard-to-find path inside `target/` now, this PR adds a feature `isle-in-source-tree` that, as implied by the name, moves the target for ISLE generated source into the source tree, at `cranelift/codegen/isle_generated_source/`. It seems reasonable to do this when an explicit feature (opt-in) is specified because this is how ISLE regeneration currently works as well. To prevent surprises, if the feature is not specified, the build fails if this directory exists.	2022-05-11 22:25:24 -07:00
Chris Fallin	7c5a56b836	Cranelift: division/remainder CLIF ops are scalar-only. (#4141 ) In #4104 we discussed whether it makes sense for the division and remainder ops to support vector types. We concluded that because most hardware doesn't support it directly, it probably is not ideal to force all backends to polyfill it. In the future we can always reverse this decision, perhaps with a platform-independent legalization. This PR restricts the allowed types on the CLIF ops to integer types only.	2022-05-11 11:10:02 -07:00
Chris Fallin	eb435f3057	x64: use constant pool for u64 constants rather than `movabs`. (#4088 ) * Allow emitting u64 constants into constant pool. * Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm. * Fix rip-relative addressing bug in pinsrd emission.	2022-05-10 09:21:05 -07:00
Benjamin Bouvier	71fc16bbeb	Narrow `allow(dead_code)` declarations (#4116 ) * Narrow `allow(dead_code)` declarations Having module wide `allow(dead_code)` may hide some code that's really dead. In this commit I just narrowed the declarations to the specific enum variants that were not used (as it seems reasonable to keep them and their handling in all the matches, for future use). And the compiler found more dead code that I think we can remove safely in the short term. With this, the only files annotated with a module-wide `allow(dead_code)` are isle-generated files. * resurrect some functions as test helpers	2022-05-10 12:02:52 +02:00

... 6 7 8 9 10 ...

2109 Commits