wasmtime

Author	SHA1	Message	Date
Benjamin Bouvier	2c1d370465	CI: use fixed version of Rust nightly following build failures in staticvec	2020-07-28 12:29:49 +02:00
Benjamin Bouvier	7f109a5198	machinst x64: use a sign-extension when loading jump table offsets; The jump table offset that's loaded out of the jump table could be signed (if it's an offset to before the jump table itself), so we should use a signed extension there, not an unsigned extension.	2020-07-28 12:29:49 +02:00
Chris Fallin	8fd92093a4	Merge pull request #2061 from cfallin/aarch64-amode Aarch64 codegen quality: support more general add+extend address computations.	2020-07-27 13:48:55 -07:00
Chris Fallin	f9b98f0ddc	Aarch64 codegen quality: support more general add+extend computations. Previously, our pattern-matching for generating load/store addresses was somewhat limited. For example, it could not use a register-extend address mode to handle the following CLIF: ``` v2760 = uextend.i64 v985 v2761 = load.i64 notrap aligned readonly v1 v1018 = iadd v2761, v2760 store v1017, v1018 ``` This PR adds more general support for address expressions made up of additions and extensions. In particular, it pattern-matches a tree of 64-bit `iadd`s, optionally with `uextend`/`sextend` from 32-bit values at the leaves, to collect the list of all addends that form the address. It also collects all offsets at leaves, combining them. It applies a series of heuristics to make the best use of the available addressing modes, filling the load/store itself with as many 64-bit registers, zero/sign-extended 32-bit registers, and/or an offset, then computing the rest with add instructions as necessary. It attempts to make use of immediate forms (add-immediate or subtract-immediate) whenever possible, and also uses the built-in extend operators on add instructions when possible. There are certainly cases where this is not optimal (i.e., does not generate the strictly shortest sequence of instructions), but it should be good enough for most code. Using `perf stat` to measure instruction count (runtime only, on wasmtime, after populating the cache to avoid measuring compilation), this impacts `bz2` as follows: ``` pre: 1006.410425 task-clock (msec) # 1.000 CPUs utilized 113 context-switches # 0.112 K/sec 1 cpu-migrations # 0.001 K/sec 5,036 page-faults # 0.005 M/sec 3,221,547,476 cycles # 3.201 GHz 4,000,670,104 instructions # 1.24 insn per cycle <not supported> branches 27,958,613 branch-misses 1.006071348 seconds time elapsed post: 963.499525 task-clock (msec) # 0.997 CPUs utilized 117 context-switches # 0.121 K/sec 0 cpu-migrations # 0.000 K/sec 5,081 page-faults # 0.005 M/sec 3,039,687,673 cycles # 3.155 GHz 3,837,761,690 instructions # 1.26 insn per cycle <not supported> branches 28,254,585 branch-misses 0.966072682 seconds time elapsed ``` In other words, this reduces instruction count by 4.1% on `bz2`.	2020-07-27 13:10:50 -07:00
Peter Huene	9b340f27f7	Update the .NET docs for the current .NET API. (#2069 )	2020-07-24 23:49:05 -05:00
Chris Fallin	bad99c93b1	Merge pull request #2051 from cfallin/aarch64-add-negative-imm Aarch64 codegen quality: handle add-negative-imm as subtract.	2020-07-24 12:26:54 -07:00
Chris Fallin	1b80860f1f	Aarch64 codegen quality: handle add-negative-imm as subtract. We often see patterns like: ``` mov w2, #0xffff_ffff // uses ORR with logical immediate form add w0, w1, w2 ``` which is just `w0 := w1 - 1`. It would be much better to recognize when the inverse of an immediate will fit in a 12-bit immediate field if the immediate itself does not, and flip add to subtract (and vice versa), so we can instead generate: ``` sub w0, w1, #1 ``` We see this pattern in e.g. `bz2`, where this commit makes the following difference (counting instructions with `perf stat`, filling in the wasmtime cache first then running again to get just runtime): pre: ``` 992.762250 task-clock (msec) # 0.998 CPUs utilized 109 context-switches # 0.110 K/sec 0 cpu-migrations # 0.000 K/sec 5,035 page-faults # 0.005 M/sec 3,224,119,134 cycles # 3.248 GHz 4,000,521,171 instructions # 1.24 insn per cycle <not supported> branches 27,573,755 branch-misses 0.995072322 seconds time elapsed ``` post: ``` 993.853850 task-clock (msec) # 0.998 CPUs utilized 123 context-switches # 0.124 K/sec 1 cpu-migrations # 0.001 K/sec 5,072 page-faults # 0.005 M/sec 3,201,278,337 cycles # 3.221 GHz 3,917,061,340 instructions # 1.22 insn per cycle <not supported> branches 28,410,633 branch-misses 0.996008047 seconds time elapsed ``` In other words, a 2.1% reduction in instruction count on `bz2`.	2020-07-24 11:41:33 -07:00
Benjamin Bouvier	35d9ab19b7	Review fixes;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	ad4a2f919f	machinst x64: implement support for reference types;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	987c616bf5	machinst x64: implement support for dynamic heaps and explicit bound checks;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	2e3ad3227d	machinst x64: fix encoding of movzx/movsx with non-ABCD input registers; Using an input register that doesn't belong to the ABCD family (al, etc.) as the source of movsx/movzx requires a redundant REX prefix, that was not emitted.	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	de4923356a	machinst x64: fix fcmp comparison for NotEqual; We have to emit both checks against the parity bit (for unordered) and non-equality bit (for equality), otherwise this returns false when comparing NaN against itself.	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	4b26f5b120	machinst x64: baldrdash: fix multi-value when both gpr and xmm are returned; In baldrdash, only the first return value may live in a register, be it an integer or a floating point value.	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	aa103698d4	machinst x64: extend Copysign to work for f64 inputs too;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	694af3aec2	machinst x64: implement float Floor/Ceil/Trunc/Nearest as VM calls;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	48ec806a9d	machinst x64: implement Fabs/Fneg in terms of other instructions;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	03b9e1e86a	machinst x64: implement float min/max with the right semantics;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	e43310a088	machinst x64: adapt conversions for saturation behaviors;	2020-07-24 19:29:12 +02:00
Benjamin Bouvier	cd54f05efd	machinst x64: implement float-to-int and int-to-float conversions;	2020-07-24 19:29:12 +02:00
Chris Fallin	37a09c4ef6	Merge pull request #2066 from cfallin/machinst-timing Add timing for several new-backend stages.	2020-07-23 10:54:17 -07:00
Yury Delendik	42127aac4e	Refactor Cache logic to include debug information (#2065 ) * move caching to the CompilationArtifacts * mv cache_config from Compiler to CompiledModule * hash isa flags * no cache for wasm2obj * mv caching to wasmtime crate * account each Compiler field when hash	2020-07-23 12:10:13 -05:00
Chris Fallin	2b9fefe89a	Add timing for several new-backend stages. This PR adds a bit more granularity to the output of e.g. `clif-util compile -T`, indicating how much time is spent in VCode lowering and various other new-backend-specific tasks.	2020-07-23 09:54:39 -07:00
Chris Fallin	87eb4392c4	Merge pull request #2063 from jgouly/vselect arm64: Implement Vselect opcode	2020-07-22 13:35:46 -07:00
Chris Fallin	44ef8247a9	Merge pull request #2062 from akirilov-arm/extract_lane AArch64: Improve code generation for Extractlane + Sextend / Uextend	2020-07-22 13:35:00 -07:00
Chris Fallin	d22cefd220	Merge pull request #2058 from cfallin/aarch64-fix-bool Aarch64 codegen: represent bool `true` as -1, not 1.	2020-07-22 13:16:12 -07:00
Chris Fallin	b8f6d53a6b	Aarch64 codegen: represent bool `true` as -1, not 1. It seems that this is actually the correct behavior for bool types wider than `b1`; some of the vector instruction optimizations depend on bool lanes representing false and true as all-zeroes and all-ones respectively. For `b8`..`b64`, this results in an extra negation after a `cset` when a bool is produced by an `icmp`/`fcmp`, but the most common case (`b1`) is unaffected, because an all-ones one-bit value is just `1`. An example of this assumption can be seen here: `399ee0a54c/cranelift/codegen/src/simple_preopt.rs (L956)` Thanks to Joey Gouly of ARM for noting this issue while implementing SIMD support, and digging into the source (finding the above example) to determine the correct behavior.	2020-07-22 12:30:55 -07:00
Alex Crichton	8a04fc3cdc	Refactor wasmtime's internal cache slightly (#2057 ) Be more generic over the type being serialized, and remove an intermediate struct which was only used for serialization but isn't necessary.	2020-07-22 10:32:53 -05:00
Joey Gouly	5355c3e3d5	arm64: Implement Vselect opcode This is implemented the same as Bitselect, as the controlling vector is a boolean vector. A boolean vector in cranelift has elements that are either 0 or all 1s, so it can be used to select elements lane wise. Copyright (c) 2020, Arm Limited.	2020-07-22 12:50:29 +01:00
Anton Kirilov	420c4f06b8	AArch64: Improve code generation for Extractlane + Sextend / Uextend Copyright (c) 2020, Arm Limited.	2020-07-22 11:47:51 +01:00
Yury Delendik	399ee0a54c	Serialize and deserialize compilation artifacts. (#2020 ) * Serialize and deserialize Module * Use bincode to serialize * Add wasm_module_serialize; docs * Simple tests	2020-07-21 15:05:50 -05:00
Nick Fitzgerald	c420f65214	Merge pull request #2052 from fitzgen/gc-function-in-c-api Add a GC function to the C API	2020-07-21 10:37:20 -07:00
Nick Fitzgerald	17b99cc9c8	examples: Add a GC call to the externref Rust example	2020-07-21 09:43:01 -07:00
Nick Fitzgerald	56c517d265	examples: Add a GC call to the externref C example	2020-07-21 09:42:54 -07:00
Nick Fitzgerald	2efb46afd5	wasmtime-c-api: Add `wasmtime_store_gc` for GCing `externref`s	2020-07-21 09:33:34 -07:00
Chris Fallin	96ef2f1a1b	Fix `u8::MAX` -> `std::u8::MAX`. (#2047 ) As per Carlo Kok on Zulip #cranelift, this breaks builds with stable Rust pre-1.43, as `core::u8::MAX` was only stabilized then. We'd like to support older versions if we can easily do so. This PR also adds `cranelift-tools` to the crates checked on CI with Rust 1.41.0, which pulls in all backends (including `aarch64`).	2020-07-20 14:59:15 -05:00
Dan Gohman	4c15a4daf2	Use AsRef<Path> instead of AsRef<OsStr> in yanix functions (#1950 ) * Use AsRef<Path> instead of AsRef<OsStr> in yanix functions. `AsRef<Path>` makes these more consistent with `std` interfaces, making them easier to use outside of wasi-common. Also, refactor the conversion to `CString` into a helper function. * Reduce clutter from fully-qualifying names. * rustfmt	2020-07-20 10:02:45 -07:00
Chris Fallin	784e2f1480	Merge pull request #2038 from jgouly/arith2 arm64: Enable arith2 tests	2020-07-20 09:00:10 -07:00
Ömer Sinan Ağacan	3fcf9fcf3e	Cranelift README: fix markdown link syntax (#2044 )	2020-07-19 19:41:53 -05:00
Chris Fallin	1b3b2dbfd0	Merge pull request #2043 from cfallin/csel-opt Aarch64: handle csel with icmp/fcmp source without materializing the bool.	2020-07-18 19:33:47 -07:00
Chris Fallin	ea894c0eeb	Merge pull request #2042 from cfallin/aarch64-fix-regshift-mask Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts.	2020-07-18 19:33:35 -07:00
Chris Fallin	21dac670f0	Aarch64: handle csel with icmp/fcmp source without materializing the bool. Previously, we simply compared the input bool to 0, which forced the value into a register (usually via a cmp and cset), zero-extended it, etc. This patch performs the same pattern-matching that branches do to directly perform the cmp and use its flag results with the csel. On the `bz2` benchmark, the runtime is affected as follows (measuring with `perf stat`, using wasmtime with its cache enabled, and taking the second run after the first compiles and populates the cache): pre: 1117.232000 task-clock (msec) # 1.000 CPUs utilized 133 context-switches # 0.119 K/sec 1 cpu-migrations # 0.001 K/sec 5,041 page-faults # 0.005 M/sec 3,511,615,100 cycles # 3.143 GHz 4,272,427,772 instructions # 1.22 insn per cycle <not supported> branches 27,980,906 branch-misses 1.117299838 seconds time elapsed post: 1003.738075 task-clock (msec) # 1.000 CPUs utilized 121 context-switches # 0.121 K/sec 0 cpu-migrations # 0.000 K/sec 5,052 page-faults # 0.005 M/sec 3,224,875,393 cycles # 3.213 GHz 4,000,838,686 instructions # 1.24 insn per cycle <not supported> branches 27,928,232 branch-misses 1.003440004 seconds time elapsed In other words, with this change, on `bz2`, we see a 6.3% reduction in executed instructions.	2020-07-17 21:10:21 -07:00
Nick Fitzgerald	b35cf7cf8e	Merge pull request #1960 from fitzgen/peepmatic-generic-over-ir peepmatic: Be generic over the IR we are optimizing	2020-07-17 17:05:55 -07:00
Nick Fitzgerald	ee5982fd16	peepmatic: Be generic over the operator type This lets us avoid the cost of `cranelift_codegen::ir::Opcode` to `peepmatic_runtime::Operator` conversion overhead, and paves the way for allowing Peepmatic to support non-clif optimizations (e.g. vcode optimizations). Rather than defining our own `peepmatic::Operator` type like we used to, now the whole `peepmatic` crate is effectively generic over a `TOperator` type parameter. For the Cranelift integration, we use `cranelift_codegen::ir::Opcode` as the concrete type for our `TOperator` type parameter. For testing, we also define a `TestOperator` type, so that we can test Peepmatic code without building all of Cranelift, and we can keep them somewhat isolated from each other. The methods that `peepmatic::Operator` had are now translated into trait bounds on the `TOperator` type. These traits need to be shared between all of `peepmatic`, `peepmatic-runtime`, and `cranelift-codegen`'s Peepmatic integration. Therefore, these new traits live in a new crate: `peepmatic-traits`. This crate acts as a header file of sorts for shared trait/type/macro definitions. Additionally, the `peepmatic-runtime` crate no longer depends on the `peepmatic-macro` procedural macro crate, which should lead to faster build times for Cranelift when it is using pre-built peephole optimizers.	2020-07-17 16:16:49 -07:00
Chris Fallin	9bd9c628aa	Aarch64: mask shift-amounts incorporated into reg-reg-shift ALU insts. We had previously fixed a bug in which constant shift amounts should be masked to modulo the number of bits in the operand; however, we did not fix the analogous case for shifts incorporated into the second register argument of ALU instructions that support integrated shifts. This failure to mask resulted in illegal instructions being generated, e.g. in https://bugzilla.mozilla.org/show_bug.cgi?id=1653502. This PR fixes the issue by masking the amount, as the shift semantics require.	2020-07-17 14:55:23 -07:00
Nick Fitzgerald	ae95ad8733	cranelift: Don't build `peepmatic`-based optimizations in `build.rs` Instead, when the `rebuild-peephole-optimizers` feature is enabled, rebuild them the first time they are used. This allows peepmatic to run when Cranelift's `Opcode` is defined and available, which paves the way forward for: * merging `peepmatic_runtime::operator::Operator` and Cranelift's `Opcode` (we are wasting a bunch of cycles converting between the two of them), and * supporting vcode optimizations in `peepmatic`.	2020-07-17 14:35:16 -07:00
Alex Crichton	978070c020	Verify crates are publish-able on CI (#2036 ) This commit updates our CI to verify that all crates are publish-able at all times on every commit. During the 0.19.0 release we found another case where the crates as they live in this repository weren't publish-able, so the hope is that this no longer comes up again! The script added in this commit also takes the time/liberty to remove the existing bump/publish scripts and instead replace them with one Rust script originally sourced from wasm-bindgen. The intention of this script is that it has three modes: * `./publish bump` - bumps version numbers which are sent as a PR to get reviewed (probably with a changelog as well) * `./publish verify` - run on CI on every commit, builds every crate we publish as if it's being published to crates.io, notably without raw access to other crates in the repository. * `./publish publish` - publishes all crates to crates.io, passing the `--no-verify` flag to make this a much speedier process than it is today.	2020-07-17 16:19:35 -05:00
Johnnie Birch	a7cedf3100	Add support for 32 bit and 64 bit fcmp for the new backend Implements commiss and commisd.	2020-07-17 13:46:54 -07:00
Alex Crichton	fbc05faa49	Fix `wasm_val_copy` for null funcref/externref (#2041 ) This commit fixes `Clone for wasm_val_t` to avoid attempting to chase a null pointer. It also fixes the implementation for `FuncRef` values by cloning their internal `wasm_ref_t` as well.	2020-07-17 14:46:02 -05:00
Alex Crichton	3aeab23bf1	Fix leaking funcrefs in the C API (#2040 ) This commit adds a case to the destructor of `wasm_val_t` to be sure to deallocate the `Box<wasm_ref_t>`.	2020-07-17 14:45:55 -05:00
Alex Crichton	c3ff0754d4	Fix a panic with `Func::new` and reference types (#2039 ) Currently `Func::new` will panic if one of the arguments of the function is a reference type and the `Store` doesn't have reference types enabled. This happens because cranelift isn't configure to enable stack maps but the register allocators expects them to exist when reference types are seen. The fix here is to always enable reference types in cranelift for our trampoline generation and `Func::new`. This should hopefully ensure that trampolines are generated correctly and they'll just not be able to get hooked up to an `Instance` because validation will prevent reference types from being used elsewhere.	2020-07-17 12:05:42 -05:00

... 51 52 53 54 55 ...

9391 Commits