wasmtime

Author	SHA1	Message	Date
Anton Kirilov	ead6edb0c5	Cranelift AArch64: Migrate Splat to ISLE (#4521 ) Copyright (c) 2022, Arm Limited.	2022-07-26 17:57:15 +00:00
Afonso Bordado	80976b6fc7	cranelift: Add `fadd`/`fsub`/`fmul`/`fdiv` to interpreter (#4446 ) Fuzzgen found these as soon as I added float support	2022-07-14 21:53:03 +00:00
Afonso Bordado	16cb287c53	cranelift: Use `round_ties_even` for `nearest` in interpreter (#4413 ) As @MaxGraey pointed out (thanks!) in #4397, `round` has different behavior from `nearest`. And it looks like the native rust implementation is still pending stabilization. Right now we duplicate the wasmtime implementation, merged in #2171. However, we definitely should switch to the rust native version when it is available.	2022-07-07 16:36:43 -07:00
Sam Parker	9c43749dfe	[RFC] Dynamic Vector Support (#4200 ) Introduce a new concept in the IR that allows a producer to create dynamic vector types. An IR function can now contain global value(s) that represent a dynamic scaling factor, for a given fixed-width vector type. A dynamic type is then created by 'multiplying' the corresponding global value with a fixed-width type. These new types can be used just like the existing types and the type system has a set of hard-coded dynamic types, such as I32X4XN, which the user defined types map onto. The dynamic types are also used explicitly to create dynamic stack slots, which have no set size like their existing counterparts. New IR instructions are added to access these new stack entities. Currently, during codegen, the dynamic scaling factor has to be lowered to a constant so the dynamic slots do eventually have a compile-time known size, as do spill slots. The current lowering for aarch64 just targets Neon, using a dynamic scale of 1. Copyright (c) 2022, Arm Limited.	2022-07-07 12:54:39 -07:00
Afonso Bordado	f98076ae88	cranelift: Implement float rounding operations (#4397 ) Implements the following operations on the interpreter: * `ceil` * `floor` * `nearest` * `trunc`	2022-07-06 16:43:54 -07:00
Sam Parker	e86e628061	ValueDataPacked getter and setter for Type (#4390 ) Speeds compilation of the spidermonkey benchmark by ~0.5%. Copyright (c) 2022, Arm Limited.	2022-07-06 09:40:03 -07:00
Chris Fallin	00f357c028	Cranelift: support 14-bit Type index with some bitpacking. (#4269 ) * Cranelift: make `ir::Type` a `u16`. * Cranelift: pack ValueData back into 64 bits. After extending `Type` to a `u16`, `ValueData` became 12 bytes rather than 8. This packs it back down to 8 bytes (64 bits) by stealing two bits from the `Type` for the enum discriminant (leaving 14 bits for the type itself). Performance comparison (3-way between original (`ty-u8`), 16-bit `Type` (`ty-u16`), and this PR (`ty-packed`)): ``` ~/work/sightglass% target/release/sightglass-cli benchmark \ -e ~/ty-u8.so -e ~/ty-u16.so -e ~/ty-packed.so \ --iterations-per-process 10 --processes 2 \ benchmarks-next/spidermonkey/benchmark.wasm compilation benchmarks-next/spidermonkey/benchmark.wasm cycles [20654406874 21749213920.50 22958520306] /home/cfallin/ty-packed.so [22227738316 22584704883.90 22916433748] /home/cfallin/ty-u16.so [20659150490 21598675968.60 22588108428] /home/cfallin/ty-u8.so nanoseconds [5435333269 5723139427.25 6041072883] /home/cfallin/ty-packed.so [5848788229 5942729637.85 6030030341] /home/cfallin/ty-u16.so [5436002390 5683248226.10 5943626225] /home/cfallin/ty-u8.so ``` So, when compiling SpiderMonkey.wasm, making `Type` 16 bits regresses performance by 4.5% (5.683s -> 5.723s), while this PR gets 14 bits for a 1.0% cost (5.683s -> 5.723s). That's still not great, and we can likely do better, but it's a start. * Fix test failure: entities to/from u32 via `{from,to}_bits`, not `{from,to}_u32`.	2022-07-05 14:51:02 -07:00
Afonso Bordado	925891245d	cranelift: Fix `fmin`/`fmax` when dealing with zeroes (#4373 ) `fmin`/`fmax` are defined as returning -0.0 as smaller than 0.0. This is not how the IEEE754 views these values and the interpreter was returning the wrong value in these operations since it was just using the standard IEEE754 comparisons. This also tries to preserve NaN information by avoiding passing NaN's through any operation that could canonicalize it.	2022-07-05 12:59:23 -07:00
Afonso Bordado	2003ae99a0	Implement `fma`/`fabs`/`fneg`/`fcopysign` on the interpreter (#4367 ) * cranelift: Implement `fma` on interpreter * cranelift: Implement `fabs` on interpreter * cranelift: Fix `fneg` implementation on interpreter `fneg` was implemented as `0 - x` which is not correct according to the standard since that operation makes no guarantees on what the output is when the input is `NaN`. However for `fneg` the output for `NaN` inputs is fully defined. * cranelift: Implement `fcopysign` on interpreter	2022-07-05 09:03:04 -07:00
Afonso Bordado	f2e6ff5e70	cranelift: Implement `sqrt` in interpreter (#4362 ) This ignores SIMD for now.	2022-07-01 09:39:11 -07:00
Sam Parker	a2d49ebf27	Use u32 in Type API (#4280 ) Move from passing and returning u8 and u16 values to u32 in many of the functions. This removes a number of type conversions and gives a small compilation time speedup, around ~0.7% on my aarch64 machine. Copyright (c) 2022, Arm Limited.	2022-06-30 12:43:36 -07:00
Benjamin Bouvier	43d4f0b93b	Serialize `BlockNode`'s cold field too when serializing a `Layout` (#4265 ) This fixes a bug when the `cold` field would not be serialized, since we're using a custom (de)serializer for `Layout`. This is now properly handled by adding a boolean in the serialized stream. This was caught during the work on #4155, as this would result in cache mismatches between a function and itself.	2022-06-13 12:04:37 -07:00
Chris Fallin	0824abbae4	Add a basic alias analysis with redundant-load elim and store-to-load fowarding opts. (#4163 ) This PR adds a basic alias analysis, and optimizations that use it. This is a "mid-end optimization": it operates on CLIF, the machine-independent IR, before lowering occurs. The alias analysis (or maybe more properly, a sort of memory-value analysis) determines when it can prove a particular memory location is equal to a given SSA value, and when it can, it replaces any loads of that location. This subsumes two common optimizations: * Redundant load elimination: when the same memory address is loaded two times, and it can be proven that no intervening operations will write to that memory, then the second load is redundant and its result must be the same as the first. We can use the first load's result and remove the second load. * Store-to-load forwarding: when a load can be proven to access exactly the memory written by a preceding store, we can replace the load's result with the store's data operand, and remove the load. Both of these optimizations rely on a "last store" analysis that is a sort of coloring mechanism, split across disjoint categories of abstract state. The basic idea is that every memory-accessing operation is put into one of N disjoint categories; it is disallowed for memory to ever be accessed by an op in one category and later accessed by an op in another category. (The frontend must ensure this.) Then, given this, we scan the code and determine, for each memory-accessing op, when a single prior instruction is a store to the same category. This "colors" the instruction: it is, in a sense, a static name for that version of memory. This analysis provides an important invariant: if two operations access memory with the same last-store, then no other store can alias in the time between that last store and these operations. This must-not-alias property, together with a check that the accessed address is exactly the same (same SSA value and offset), and other attributes of the access (type, extension mode) are the same, let us prove that the results are the same. Given last-store info, we scan the instructions and build a table from "memory location" key (last store, address, offset, type, extension) to known SSA value stored in that location. A store inserts a new mapping. A load may also insert a new mapping, if we didn't already have one. Then when a load occurs and an entry already exists for its "location", we can reuse the value. This will be either RLE or St-to-Ld depending on where the value came from. Note that this does work across basic blocks: the last-store analysis is a full iterative dataflow pass, and we are careful to check dominance of a previously-defined value before aliasing to it at a potentially redundant load. So we will do the right thing if we only have a "partially redundant" load (loaded already but only in one predecessor block), but we will also correctly reuse a value if there is a store or load above a loop and a redundant load of that value within the loop, as long as no potentially-aliasing stores happen within the loop.	2022-05-20 13:19:32 -07:00
Andrew Brown	bd6fe11ca9	cranelift: remove `load_complex` and `store_complex` (#3976 ) This change removes all variants of `load_complex` and `store_complex` from Cranelift; this is a breaking change to the instructions exposed by CLIF. The complete list of instructions removed is: `load_complex`, `store_complex`, `uload8_complex`, `sload8_complex`, `istore8_complex`, `sload8_complex`, `uload16_complex`, `sload16_complex`, `istore16_complex`, `uload32_complex`, `sload32_complex`, `istore32_complex`, `uload8x8_complex`, `sload8x8_complex`, `sload16x4_complex`, `uload16x4_complex`, `uload32x2_complex`, `sload32x2_complex`. The rationale for this removal is that the Cranelift backend now has the ability to pattern-match multiple upstream additions in order to calculate the address to access. Previously, this was not possible so the `*_complex` instructions were needed. Over time, these instructions have fallen out of use in this repository, making the additional overhead of maintaining them a chore.	2022-03-31 10:05:10 -07:00
Ulrich Weigand	b064e60087	ISLE: Re-implement ValueSlice (#3784 ) The current definition of `ValueSlice` is not usable, since any call to a constructor returning a `ValueSlice` will extend the mutable borrow on the context taken by the constructor call, with the result that it cannot be passed to any other constructor ever. Re-implement `ValueSlice` as a pair of a `ValueList` identifer plus an offset into the list. This type can simply be copied without requiring a borrow on the context.	2022-02-24 15:24:40 -08:00
Chris Fallin	f489b83835	Cranelift: add support for cold blocks. This PR adds a flag to each block that can be set via the frontend/builder interface that indicates that the block will not be frequently executed. As such, the compiler backend should place the block "out of line" in the final machine code, so that the ordinary, more frequent execution path that excludes the block does not have to jump around it. This is useful for adding handlers for exceptional conditions (slow-paths, guard violations) in a way that minimizes performance cost. Fixes #2747.	2022-01-19 12:17:41 -08:00
Teymour Aldridge	40072f844e	Clarify some documentation. (#3641 )	2022-01-04 11:15:19 -08:00
Teymour Aldridge	28ede8356a	Add a doclink.	2022-01-03 19:22:21 +00:00
Scott McMurray	ca7c54b5f8	Add `Type::int_with_byte_size` constructor	2021-11-29 16:53:54 -08:00
Scott McMurray	c266f7f4c3	Cranelift: Add `LibCall::Memcmp` The comment says the enum is "likely to grow" and the function's been in libc since C89, so hopefully this is ok. I'd like to use it for emitting things like array equality.	2021-11-29 01:42:59 -08:00
bjorn3	2fbd57e9e2	Remove imm_with_name It is only used once to rename an imm field to mask	2021-10-31 19:57:04 +01:00
bjorn3	1fd491dadd	Remove fallthrough instruction	2021-10-12 14:22:07 +02:00
bjorn3	5b24e117ee	Remove instructions used by old br_table legalization	2021-10-12 14:18:52 +02:00
bjorn3	20463d60f3	Replace StackSlots struct with a type alias	2021-10-11 16:41:45 +02:00
bjorn3	fd59a3e045	Remove all unused stackslot handling code	2021-10-11 16:41:45 +02:00
Pat Hickey	bca6946a9d	Merge pull request #3432 from bjorn3/remove_reloc_constant ConstantData related cleanups for the removal of the old backend	2021-10-10 09:59:13 -07:00
Pat Hickey	b7375817b1	Merge pull request #3431 from bjorn3/remove_sarg_t Remove the sarg_t type and dummy_sarg_t instruction	2021-10-10 09:58:14 -07:00
bjorn3	80709ab624	Rustfmt	2021-10-10 15:26:43 +02:00
bjorn3	355dd996a2	Fix tests	2021-10-10 15:00:25 +02:00
bjorn3	aa0486eb15	Remove offset fields from ConstantPool	2021-10-10 14:47:53 +02:00
bjorn3	8a8797b911	Remove the sarg_t type and dummy_sarg_t instruction They are no longer necessary with the new style backends	2021-10-10 14:38:35 +02:00
bjorn3	2b89b13c57	Move condcodes from cranelift-codegen-shared to cranelift-codegen	2021-10-10 14:23:35 +02:00
bjorn3	2db3b5b9df	Remove code offsets from Function (#3412 ) * Remove code offsets from Function * Remove reloc_jt and fix wasmtime-cranelift	2021-10-07 15:54:00 +02:00
bjorn3	c5c7508289	Remove StackLayoutInfo	2021-10-04 19:39:33 +02:00
bjorn3	b3702f5821	Remove old_signature	2021-10-04 19:39:33 +02:00
Benjamin Bouvier	43a86f14d5	Remove more old backend ISA concepts (#3402 ) This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate. Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!	2021-10-04 10:36:12 +02:00
Benjamin Bouvier	bae4ec6427	Remove ancient register allocation (#3401 )	2021-09-30 21:27:23 +02:00
Chris Fallin	38728c5746	Merge pull request #3362 from dheaton-arm/implement-unarrow Implement `Unarrow`, `Uunarrow`, and `Snarrow` for the interpreter	2021-09-21 10:06:46 -07:00
dheaton-arm	3fc29f5f6c	Return `u128` from `bounds`; form `new_vec` from iter chain Copyright (c) 2021, Arm Limited	2021-09-20 09:57:19 +01:00
dheaton-arm	83c3bc5b9d	Implement `Unarrow`, `Uunarrow`, and `Snarrow` for the interpreter Implemented the following Opcodes for the Cranelift interpreter: - `Unarrow` to combine two SIMD vectors into a new vector with twice the lanes but half the width, with signed inputs which are clamped to `0x00`. - `Uunarrow` to perform the same operation as `Unarrow` but treating inputs as unsigned. - `Snarrow` to perform the same operation as `Unarrow` but treating both inputs and outputs as signed, and saturating accordingly. Note that all 3 instructions saturate at the type boundaries. Copyright (c) 2021, Arm Limited	2021-09-17 13:26:10 +01:00
Afonso Bordado	92690b84a0	cranelift: Add SIMD `icmp` comparisons to interpreter	2021-09-11 17:15:44 +01:00
Afonso Bordado	3c1133379c	cranelift: Add `is_bool_vector` helper	2021-09-10 15:46:14 +01:00
Afonso Bordado	85d468dc5a	cranelift: Add `coerce_bools_to_ints` helper	2021-09-10 15:38:30 +01:00
Afonso Bordado	9460a4fb16	cranelift: Support bool vectors in trampoline	2021-09-10 15:10:51 +01:00
dheaton-arm	8f057e0482	Implement `SaddSat` and `SsubSat` for the interpreter Implemented `SaddSat` and `SsubSat` to add and subtract signed vector values, saturating at the type boundaries rather than overflowing. Changed the parser to allow signed `i8` immediates in vectors as part of this work; fixes #3276. Copyright (c) 2021, Arm Limited.	2021-09-03 11:35:39 +01:00
dheaton-arm	d956d349d8	Implement `Insertlane` for the Cranelift interpreter Implemented `Insertlane` to insert a value in the lane specified by the immediate value, overwriting the existing value in that lane. Added `TernaryImm8` support for the `imm_value` function. Copyright (c) 2021, Arm Limited.	2021-09-01 16:21:27 +01:00
Afonso Bordado	2776074dfc	cranelift: Add stack support to the interpreter with virtual addresses (#3187 ) * cranelift: Add stack support to the interpreter We also change the approach for heap loads and stores. Previously we would use the offset as the address to the heap. However, this approach does not allow using the load/store instructions to read/write from both the heap and the stack. This commit changes the addressing mechanism of the interpreter. We now return the real addresses from the addressing instructions (stack_addr/heap_addr), and instead check if the address passed into the load/store instructions points to an area in the heap or the stack. * cranelift: Add virtual addresses to cranelift interpreter Adds a Virtual Addressing scheme that was discussed as a better alternative to returning the real addresses. The virtual addresses are split into 4 regions (stack, heap, tables and global values), and the address itself is composed of an `entry` field and an `offset` field. In general the `entry` field corresponds to the instance of the resource (e.g. table5 is entry 5) and the `offset` field is a byte offset inside that entry. There is one exception to this which is the stack, where due to only having one stack, the whole address is an offset field. The number of bits in entry vs offset fields is variable with respect to the `region` and the address size (32bits vs 64bits). This is done because with 32 bit addresses we would have to compromise on heap size, or have a small number of global values / tables. With 64 bit addresses we do not have to compromise on this, but we need to support 32 bit addresses. * cranelift: Remove interpreter trap codes * cranelift: Calculate frame_offset when entering or exiting a frame * cranelift: Add safe read/write interface to DataValue * cranelift: DataValue write full 128bit slot for booleans * cranelift: Use DataValue accessors for trampoline.	2021-08-24 09:29:11 -07:00
Alex Crichton	e68aa99588	Implement the memory64 proposal in Wasmtime (#3153 ) * Implement the memory64 proposal in Wasmtime This commit implements the WebAssembly [memory64 proposal][proposal] in both Wasmtime and Cranelift. In terms of work done Cranelift ended up needing very little work here since most of it was already prepared for 64-bit memories at one point or another. Most of the work in Wasmtime is largely refactoring, changing a bunch of `u32` values to something else. A number of internal and public interfaces are changing as a result of this commit, for example: * Acessors on `wasmtime::Memory` that work with pages now all return `u64` unconditionally rather than `u32`. This makes it possible to accommodate 64-bit memories with this API, but we may also want to consider `usize` here at some point since the host can't grow past `usize`-limited pages anyway. * The `wasmtime::Limits` structure is removed in favor of minimum/maximum methods on table/memory types. * Many libcall intrinsics called by jit code now unconditionally take `u64` arguments instead of `u32`. Return values are `usize`, however, since the return value, if successful, is always bounded by host memory while arguments can come from any guest. * The `heap_addr` clif instruction now takes a 64-bit offset argument instead of a 32-bit one. It turns out that the legalization of `heap_addr` already worked with 64-bit offsets, so this change was fairly trivial to make. * The runtime implementation of mmap-based linear memories has changed to largely work in `usize` quantities in its API and in bytes instead of pages. This simplifies various aspects and reflects that mmap-memories are always bound by `usize` since that's what the host is using to address things, and additionally most calculations care about bytes rather than pages except for the very edge where we're going to/from wasm. Overall I've tried to minimize the amount of `as` casts as possible, using checked `try_from` and checked arithemtic with either error handling or explicit `unwrap()` calls to tell us about bugs in the future. Most locations have relatively obvious things to do with various implications on various hosts, and I think they should all be roughly of the right shape but time will tell. I mostly relied on the compiler complaining that various types weren't aligned to figure out type-casting, and I manually audited some of the more obvious locations. I suspect we have a number of hidden locations that will panic on 32-bit hosts if 64-bit modules try to run there, but otherwise I think we should be generally ok (famous last words). In any case I wouldn't want to enable this by default naturally until we've fuzzed it for some time. In terms of the actual underlying implementation, no one should expect memory64 to be all that fast. Right now it's implemented with "dynamic" heaps which have a few consequences: * All memory accesses are bounds-checked. I'm not sure how aggressively Cranelift tries to optimize out bounds checks, but I suspect not a ton since we haven't stressed this much historically. * Heaps are always precisely sized. This means that every call to `memory.grow` will incur a `memcpy` of memory from the old heap to the new. We probably want to at least look into `mremap` on Linux and otherwise try to implement schemes where dynamic heaps have some reserved pages to grow into to help amortize the cost of `memory.grow`. The memory64 spec test suite is scheduled to now run on CI, but as with all the other spec test suites it's really not all that comprehensive. I've tried adding more tests for basic things as I've had to implement guards for them, but I wouldn't really consider the testing adequate from just this PR itself. I did try to take care in one test to actually allocate a 4gb+ heap and then avoid running that in the pooling allocator or in emulation because otherwise that may fail or take excessively long. [proposal]: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md * Fix some tests * More test fixes * Fix wasmtime tests * Fix doctests * Revert to 32-bit immediate offsets in `heap_addr` This commit updates the generation of addresses in wasm code to always use 32-bit offsets for `heap_addr`, and if the calculated offset is bigger than 32-bits we emit a manual add with an overflow check. * Disable memory64 for spectest fuzzing * Fix wrong offset being added to heap addr * More comments! * Clarify bytes/pages	2021-08-12 09:40:20 -05:00
Alex Crichton	ee3ff52661	Refactor cranelift immediates slightly I've run up against the `Into`-vs-`From` impls a few times and figured I'd go ahead and put up a refactoring. This switches `Into` impls into `From` impls which allows using both traits instead of just the `Into` version. Additionally this removes a few small `as` casts in favor of infallible `from`/`into` or `try_from` with error handling.	2021-08-06 09:14:25 -07:00
Alex Crichton	4cfa031c5f	Implement API support for v128-globals (#3147 ) Found via fuzzing, and looks like these were accidentally left out along the way SIMD was taking shape.	2021-08-05 13:02:34 -05:00

1 2 3 4

187 Commits