Commit Graph

11023 Commits

Author SHA1 Message Date
Bobby Holley
5ff2824ebb Bump cargo-vet to 0.5. (#6029)
Aside from a few new features (notably automatic registry suggestions), this
release removes the need to import description for criteria that are not
directly used, and adds an explicit version to the cargo-vet instance.
2023-03-15 22:14:38 +00:00
Alex Crichton
d76f7ee52e x64: Improve codegen for splats (#6025)
This commit goes through the lowerings for the CLIF `splat` instruction
and improves the support for each operator. Many of these lowerings are
mirrored from v8/SpiderMonkey and there are a number of improvements:

* AVX2 `v{p,}broadcast*` instructions are added and used when available.
* Float-based splats are much simpler and always a single-instruction
* Integer-based splats don't insert into an uninit xmm value and instead
  start out with a `movd` to move into an `xmm` register. This
  thoeretically breaks dependencies with prior instructions since `movd`
  creates a fresh new value in the destination register.
* Loads are now sunk into all of the instructions. A new extractor,
  `sinkable_load_exact`, was added to sink the i8/i16 loads.
2023-03-15 21:33:56 +00:00
Afonso Bordado
a10c50afe9 cranelift: Translate stack_* accesses as unaligned (#6016)
We can't currently ensure that these will be aligned, so we shouldn't mark them as such.
2023-03-15 18:05:55 +00:00
Alex Crichton
6ed90f86c8 x64: Add support for the pblendw instruction (#6023)
This commit adds another case for `shuffle` lowering to the x64 backend
for the `{,v}pblendw` instruction. This instruction selects 16-bit
values from either of the inputs corresponding to an immediate 8-bit-mask where
each bit selects the corresponding lane from the inputs.
2023-03-15 17:20:43 +00:00
Alex Crichton
fcddb9ca81 x64: Add lea-based lowering for iadd (#5986)
* x64: Refactor `Amode` computation in ISLE

This commit replaces the previous computation of `Amode` with a
different set of rules that are intended to achieve the same purpose but
are structured differently. The motivation for this commit is going to
become more relevant in the next commit where `lea` will be used for the
`iadd` instruction, possibly, on x64. When doing so it caused a stack
overflow in the test suite during the compilation phase of a wasm
module, namely as part of the `amode_add` function. This function is
recursively defined in terms of itself and recurses as deep as the
deepest `iadd`-chain in a program. A particular test in our test suite
has a 10k-long chain of `iadd` which ended up causing a stack overflow
in debug mode.

This stack overflow is caused because the `amode_add` helper in ISLE
unconditionally peels all the `iadd` nodes away and looks at all of
them, even if most end up in intermediate registers along the way. Given
that structure I couldn't find a way to easily abort the recursion. The
new `to_amode` helper is structured in a similar fashion but attempts to
instead only recurse far enough to fold items into the final `Amode`
instead of recursing through items which themselves don't end up in the
`Amode`. Put another way previously the `amode_add` helper might emit
`x64_add` instructions, but it no longer does that.

This goal of this commit is to preserve all the original `Amode`
optimizations, however. For some parts, though, it relies more on egraph
optimizations to run since if an `iadd` is 10k deep it doesn't try to
find a constant buried 9k levels inside there to fold into the `Amode`.
The hope, though, is that with egraphs having run already it's shuffled
constants to the right most of the time and already folded any possible
together.

* x64: Add `lea`-based lowering for `iadd`

This commit adds a rule for the lowering of `iadd` to use `lea` for 32
and 64-bit addition. The theoretical benefit of `lea` over the `add`
instruction is that the `lea` variant can emulate a 3-operand
instruction which doesn't destructively modify on of its operands.
Additionally the `lea` operation can fold in other components such as
constant additions and shifts.

In practice, however, if `lea` is unconditionally used instead of `iadd`
it ends up losing 10% performance on a local `meshoptimizer` benchmark.
My best guess as to what's going on here is that my CPU's dedicated
units for address computation are all overloaded while the ALUs are
basically idle in a memory-intensive loop. Previously when the ALU was
used for `add` and the address units for stores/loads it in theory
pipelined things better (most of this is me shooting in the dark). To
prevent the performance loss here I've updated the lowering of `iadd` to
conditionally sometimes use `lea` and sometimes use `add` depending on
how "complicated" the `Amode` is. Simple ones like `a + b` or `a + $imm`
continue to use `add` (and its subsequent hypothetical extra `mov`
necessary into the result). More complicated ones like `a + b + $imm` or
`a + b << c + $imm` use `lea` as it can remove the need for extra
instructions. Locally at least this fixes the performance loss relative
to unconditionally using `lea`.

One note is that this adds an `OperandSize` argument to the
`MInst::LoadEffectiveAddress` variant to add an encoding for 32-bit
`lea` in addition to the preexisting 64-bit encoding.

* Conditionally use `lea` based on regalloc
2023-03-15 17:14:25 +00:00
Benjamin Bouvier
2e6c7bf994 perf: Create a per-process JIT dump file (#6024) 2023-03-15 14:04:15 +00:00
Trevor Elliott
68b937d965 cranelift: Fix shift overflow when constructing BitSet (#6020)
* Fix shift overflow when constructing the Wider constraint for integers

* Clarify comment
2023-03-14 22:25:51 +00:00
Trevor Elliott
48ecb6f119 Compact valid_for_target using or patterns (#6019) 2023-03-14 20:45:36 +00:00
Saúl Cabrera
80bfb35072 winch: Introduce winch-environ (#6017)
This commit introduces the `winch-environ` crate. This crate's responsibility is
to provide a shared implementatation of the `winch_codegen::FuncEnv` trait,
which is Winch's function compilation environment, used to resolve module and
runtime specific information needed by the code generation, such as resolving
all the details about a callee in a WebAssembly module, or resolving specific
information from the `VMContext`.

As of this change, the implementation only includes the necessary pieces to
resolve a function callee in a WebAssembly module. The idea is to evolve the
`winch_codegen::FuncEnv` trait as we evolve Winch's code generation.
2023-03-14 19:59:15 +00:00
Trevor Elliott
e4d9bb7c5a cranelift: Exclude the control type in narrower and wider (#6018)
* Don't include the control type in `narrower` or `wider` constraints

* Add verifier tests for instructions that use narrower and wider
2023-03-14 18:09:15 +00:00
Trevor Elliott
f5ad74e546 cranelift: Add narrower and wider constraints to the instruction DSL (#6013)
* Add narrower and wider constraints to the instruction DSL

* Add docs to narrower/wider operands

* Update cranelift/codegen/meta/src/cdsl/instructions.rs

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* Fix assertion message

* Simplify upper bounds for the wider constraint

* Remove additional unnecessary cases in the verifier

* Remove unused variables

* Remove changes to is_ctrl_typevar_candidate

These changes were only necessary when the type returned by an
instruction was a variable constrained by narrow or widen. As we have
switched to requiring that constraints must appear on argument types and
not return types, these changes were not longer necessary.

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
2023-03-14 16:34:17 +00:00
Alex Crichton
5c1b468648 x64: Migrate {s,u}{div,rem} to ISLE (#6008)
* x64: Add precise-output tests for div traps

This adds a suite of `*.clif` files which are intended to test the
`avoid_div_traps=true` compilation of the `{s,u}{div,rem}` instructions.

* x64: Remove conditional regalloc in `Div` instruction

Move the 8-bit `Div` logic into a dedicated `Div8` instruction to avoid
having conditionally-used registers with respect to regalloc.

* x64: Migrate non-trapping, `udiv`/`urem` to ISLE

* x64: Port checked `udiv` to ISLE

* x64: Migrate urem entirely to ISLE

* x64: Use `test` instead of `cmp` to compare-to-zero

* x64: Port `sdiv` lowering to ISLE

* x64: Port `srem` lowering to ISLE

* Tidy up regalloc behavior and fix tests

* Update docs and winch

* Review comments

* Reword again

* More refactoring test fixes

* More test fixes
2023-03-14 01:44:06 +00:00
Trevor Elliott
188f712025 Mark fpromote and fdemote as operating on float scalars (#6014) 2023-03-14 00:47:40 +00:00
Ingvar Stepanyan
873d3b50a0 Allow to disable clocks in WasiCtx (#6007)
Takes the approach described in #6004, but also creates a wrapper for the monotonic time that encapsulates the `creation_time` field as well, since they logically belong and are always used together.

This makes it easier to configure `WasiCtx` with custom clocks as well as disable them for security or determinism reasons.

Closes #6004.
2023-03-13 23:47:04 +00:00
Alex Crichton
d6ce632b5b aarch64: Specialize constant vector shifts (#5976)
* aarch64: Specialize constant vector shifts

This commit adds special lowering rules for
vector-shifts-by-constant-amounts to use dedicated instructions which
cuts down on the codegen here quite a bit for constant values.

* Fix codegen for 0-shift-rights

* Special-case zero left-shifts as well

* Remove left-shift special case
2023-03-13 22:37:59 +00:00
Nick Fitzgerald
90c9bec225 wasmtime: Option to return default values for unknown imports (#6010)
Similar to the `--trap-unknown-imports` option, which defines unknown function
imports with functions that trap when called, this new
`--default-values-unknown-imports` option defines unknown function imports with
a function that returns the default values for the result types (either zero or
null depending on the value type).
2023-03-13 21:39:30 +00:00
Alex Crichton
e2a6fe99c2 x64: Add shuffle specialization for palignr (#5999)
* x64: Add `shuffle` specialization for `palignr`

This commit adds specializations for the `palignr` instruction to the
x64 backend to specialize some more patterns of byte shuffles.

* Fix tests
2023-03-13 21:01:24 +00:00
Alex Crichton
bba49646c3 Reduce VM overhead of pooling spec tests (#6006)
This commit forces bounds checks to be used when pooling and running the
spec tests to ensure that they can be run at a reasonable degree of
parallelism. Otherwise currently the VM reservation required for the
multi-memory tests is so large that it fails to get reserved at runtime,
failing the test.

Closes #6003
2023-03-13 19:56:47 +00:00
Alex Crichton
03b5dbb3e0 aarch64: Use VCodeConstant for f64/v128 constants (#5997)
* aarch64: Translate float and splat lowering to ISLE

I was looking into `constant_f128` and its fallback lowering into memory
and to get familiar with the code I figured it'd be good to port some
Rust logic to ISLE. This commit ports the `constant_{f128,f64,f32}`
helpers into ISLE from Rust as well as the `splat_const` helper which
ended up being closely related.

Tests reflect a number of regalloc changes that happened but also namely
one major difference is that in the lowering of `f32` a 32-bit immediate
is created now instead of a 64-bit immediate (in a GP register before
it's moved into a FP register). This semantically has no change but the
generated code is slightly different in a few minor cases.

* aarch64: Load f64/v128 constants from a pool

This commit removes the `LoadFpuConst64` and `LoadFpuConst128`
pseudo-instructions from the AArch64 backend which internally loaded a
nearby constant and then jumped over it. Constants now go through the
`VCodeConstant` infrastructure which gets placed at the end of the
function similar to how x64 works. Some minor support was added in as
well to add a new addressing mode for a `MachLabel`-relative load.
2023-03-13 19:33:52 +00:00
Alex Crichton
6ecdc2482e x64: Improve memory support in {insert,extract}lane (#5982)
* x64: Improve memory support in `{insert,extract}lane`

This commit improves adds support to Cranelift to emit `pextr{b,w,d,q}`
with a memory destination, merging a store-of-extract operation into one
instruction. Additionally AVX support is added for the `pextr*`
instructions.

I've additionally tried to ensure that codegen tests and runtests exist
for all forms of these instructions too.

* Add missing commas

* Fix tests
2023-03-13 19:30:44 +00:00
Afonso Bordado
5c95e6fbaf riscv64: Codemotion cleanups to ISLE files (#5984)
* riscv64: Fix typo in extensions

* riscv64: Move converters to top of file

* riscv64: Group up all imm12 rules

* riscv64: Move zero_reg helpers to Physical Regs section

* riscv64: Move helpers away from `clz` lowerings

These were in the middle of the `clz` rules and are kind of distracting

* riscv64: Move `cls` rules next to `ctz`/`clz`

* cranelift: Move `u8_and` / `u32_add` to Primitive Arithmetic section

* riscv64: Mark some imm12 constructors as pure

* cranelift: Move `s32_add_fallible` next to `u32_add`

* riscv64: Fix Typo
2023-03-13 19:20:15 +00:00
uint256_t
b50cf9bb57 cranelift-entity: more efficient EntitySet implementation (#5978)
* Use usize intead of u8

* Rename 'byte's to appropriate words
2023-03-13 18:43:34 +00:00
Afonso Bordado
ad0bce3a36 riscv64: Fix regaloc panic with bor+bnot on floats (#5857) 2023-03-13 18:29:36 +00:00
Saúl Cabrera
d03612c2d9 cranelift-codegen(x64): Expose CallInfo (#6005)
This commit exposes the `CallInfo` struct, needed by Winch to emit function
calls.
2023-03-13 17:50:53 +00:00
Alex Crichton
7956dc6ba2 Change CLIF shuffle to validate lane indices (#5995)
* Change CLIF `shuffle` to validate lane indices

Previously the CLIF `shuffle` instruction did not perform any validation
on the lane shuffle mask and specified that out-of-bounds lanes always
returned 0 as the value. This behavior though is not required by
WebAssembly which validates that lane indices are always in-bounds.
Additionally since these are static immediates even other code
generators should be able to verify that the immediates are in-bounds.

As a result this commit updates the definition of the `shuffle`
instruction to specify that all byte immediates must be in-bounds in the
range of [0, 32). The verifier has been updated and some test cases have
been removed that were testing this functionality.

Closes #5989

* Only generate valid shuffle immediates in fuzzer
2023-03-13 14:24:11 +00:00
Afonso Bordado
2386eee56b fuzzgen: Add SIMD instructions supported by the interpreter (#5971)
* fuzzgen: Add some SIMD instructions

* fuzzgen: Remove `scalar_to_vector`

Broken in the interpreter #5911

* fuzzgen: Remove SIMD bitcasts

Broken in the interpreter #5915

* fuzzgen: Fix insert lane

* fuzzgen: Remove debug code

* fuzzgen: Remove vall_true

This is broken in the interpreter #5916

* fuzzgen: Disable a few more ops

* fuzzgen: Remove `iadd_pairwise.i64x2`

Turns out it doesen't exist

* fuzzgen: Remove scalar `sqmul_round_sat`

#5923

* fuzzgen: Disable aligned loads to SIMD values

* fuzzgen: Address Review Feedback

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* fuzzgen: Rework `cmp` exclusion rules

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

---------

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-03-11 12:20:21 +00:00
Alex Crichton
af7ef8df9a Fix some minor issues with the explorer command (#5988)
This commit fixes a few minor issues that Nick and I ran into walking
through some code with the `wasmtime explore` command:

* When a new function is reached the address map iterator is advanced
  past the prior function to avoid accidentally attributing instructions
  across functions.

* A `<` comparison was changed to `<=` to fix some off-by-one
  attributions from instructions to wasm instructions.

* The `skipdata` option is enabled in Capstone to avoid truncating
  AArch64 disassemblies too early.
2023-03-11 02:31:31 +00:00
Nick Fitzgerald
9ed441e657 Introduce the wasmtime-explorer crate (#5975)
This implements Godbolt Compiler Explorer-like functionality for Wasmtime and
Cranelift. Given a Wasm module, it compiles the module to native code and then
writes a standalone HTML file that gives a split pane view between the WAT and
ASM disassemblies.
2023-03-11 00:33:06 +00:00
Chris Fallin
264089e29d Cranelift: aarch64: fix undefined dest reg in f32x4.splat case. (#5987)
One of the cases for a splat operation, as updated in #5370, wrote to
a temp reg but then only conditionally transformed the temp into the
final destination register. In another codepath, `rd` was left
undefined. This causes a panic later when regalloc2 verifies SSA
properties of its input (here, value not def'd before use).

Fixes #5985.
2023-03-11 00:22:29 +00:00
Alex Crichton
52896e020d aarch64: Add specialized shuffle lowerings (#5977)
* aarch64: Add `shuffle` lowerings for the `uzp{1,2}` instructions

This commit uses the same style of patterns in the x64 backend to start
adding specific lowerings of the Cranelift `shuffle` instruction to
particular AArch64 instructions.

* aarch64: Add `shuffle` lowerings to the `zip{1,2}` instructions

These instructions match the `punpck*` family of instructions on x64 and
should help provide more efficient lowerings than the current `shuffle`
fallback.

* aarch64: Add `shuffle` lowerings for `trn{1,2}`

Along the lines of prior commits adds specific patterns to lowering for
individual AArch64 instructions available.

* aarch64: Add a `shuffle` lowering for the `ext` instruction

This instruction will more-or-less concatenate two 128-bit vector
registers to create a 256-bit value, shift it right, and then take the
lower 128-bits into the destination. This can be modeled with a
`shuffle` of consecutive bytes so this adds a lowering rule to generate
this instruction.

* aarch64: Add `shuffle` special case for `dup`

This commit adds special cases for Cranelift's `shuffle` on AArch64 when
the lowering can be represented with a `dup` instruction which
broadcasts one vector's lane into all lanes of the destination.

* aarch64: Add `shuffle` specializations for `rev` instructions

This commit adds shuffle mask specializations for the `rev{16,32,64}`
family of instructions on AArch64 which can be used to reverse bytes,
16-bit values, or 32-bit values within larger values.

* Fix tests

* Add doc-comments in ISLE
2023-03-10 21:37:13 +00:00
Nick Fitzgerald
5623f7280c Update wasmprinter and wasm-mutate deps (#5983)
* Bump wasm-mutate and wasmprinter deps

* Add wildcard audits for wasmprinter and wasm-mutate

* Add wildcard audit for bumpalo
2023-03-10 20:20:57 +00:00
Ulrich Weigand
411781d2fe s390x: Fix mistake in available_in_isa (#5981)
The 32-bit float<->int conversion instructions are part of
the VXRS_EXT2 facility, not MIE2.

Fixes https://github.com/bytecodealliance/wasmtime/issues/5979.
2023-03-10 19:41:41 +00:00
bjorn3
108f7917c8 Support plugging external profilers into the Cranelift timing infrastructure (#5749)
* Remove no-std code for cranelift_codegen::timings

no-std mode isn't supported by Cranelift anymore

* Simplify define_passes macro

* Add egraph opt timings

* Replace the add_to_current api with PassTimes::add

* Omit a couple of unused time measurements

* Reduce divergence between run and run_passes a bit

* Introduce a Profiler trait

This allows plugging in external profilers into the Cranelift profiling
framework.

* Add Pass::description method

* Remove duplicate usage of the compile pass timing

* Rustfmt
2023-03-10 19:33:56 +00:00
Alex Crichton
0751cba6e2 Add a --dynamic-memory-reserved-for-growth CLI flag (#5980)
Maps to the corresponding `wasmtime::Config` option. The motivation here
is largely completeness and was something I was looking into with the
failures in #5970
2023-03-10 17:58:02 +00:00
Alex Crichton
7650d857fa Update the spec test suite submodule (#5970)
* Update the spec test suite submodule

Delete the local copies of the relaxed-simd test suite as well as
they're now incorporated.

Closes #5914

* Remove page guards in QEMU emulation

Otherwise `(memory 0 0)` was being compiled as a static memory with huge
guards which we're trying to avoid in QEMU.
2023-03-10 16:50:20 +00:00
Afonso Bordado
e64fb6ab39 riscv64: Fix underflow in call relocation handling (#5951)
Under some test case layouts the call relocation
panicking with an underflow. Use `wrapping_sub` to
signal that this is expected.

The fuzzer took a while to generate such a test case.
And I can't introduce it as a regression test because
when running via the regular clif-util run tests the
layout is different and the test case passes!

I think this is because in the fuzzer we only add
one trampoline, while in clif-util we build trampolines
for each funcion in the file.

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
2023-03-10 11:43:27 +00:00
yuyang
4e875f33a7 Codegen fix fcvt_from_sint.f32 with small types on riscv64. (#5964)
* fix issue5952

* We should only extend i8 and i16

* remove extra space

* move some code
2023-03-10 10:29:55 +00:00
Alex Crichton
0ec7b872fa x64: Optimize store-of-extract-lane-0 (#5924)
* x64: Optimize store-of-extract-lane-0

The `movss` and `movsd` instructions can be used to store the 0th lane
of a `t32x4` or a `t64x2` vector into memory, enabling fusing a `store`
and an `extractlane` instruction.

* Fix merge conflict with `main`
2023-03-10 01:06:38 +00:00
Alex Crichton
83f21e784a x64: Add more support for more AVX instructions (#5931)
* x64: Add a smattering of lowerings for `shuffle` specializations (#5930)

* x64: Add lowerings for `punpck{h,l}wd`

Add some special cases for `shuffle` for more specialized x86
instructions.

* x64: Add `shuffle` lowerings for `pshufd`

This commit adds special-cased lowerings for the x64 `shuffle`
instruction when the `pshufd` instruction alone is necessary. This is
possible when the shuffle immediate permutes 32-bit values within one of
the vector inputs of the `shuffle` instruction, but not both.

* x64: Add shuffle lowerings for `punpck{h,l}{q,}dq`

This adds specific permutations for some x86 instructions which
specifically interleave high/low bytes for 32 and 64-bit values. This
corresponds to the preexisting specific lowerings for interleaving 8 and
16-bit values.

* x64: Add `shuffle` lowerings for `shufps`

This commit adds targeted lowerings for the `shuffle` instruction that
match the pattern that `shufps` supports. The `shufps` instruction
selects two elements from the first vector and two elements from the
second vector which means while it's not generally applicable it should
still be more useful than the catch-all lowering of `shuffle`.

* x64: Add shuffle support for `pshuf{l,h}w`

This commit adds special lowering cases for these instructions which
permute 16-bit values within a 128-bit value either within the upper or
lower half of the 128-bit value.

* x64: Specialize `shuffle` with an all-zeros immediate

Instead of loading the all-zeros immediate from a rip-relative address
at the end of the function instead generate a zero with a `pxor`
instruction and then use `pshufb` to do the broadcast.

* Review comments

* x64: Add an AVX encoding for the `pshufd` instruction

This will benefit from lack of need for alignment vs the `pshufd`
instruction if working with a memory operand and additionally, as I've
just learned, this reduces dependencies between instructions because the
`v*` instructions zero the upper bits as opposed to preserving them
which could accidentally create false dependencies in the CPU between
instructions.

* x64: Add more support for AVX loads/stores

This commit adds VEX-encoded versions of instructions such as
`mov{ss,sd,upd,ups,dqu}` for load and store operations. This also
changes some signatures so the `load` helpers specifically take a
`SyntheticAmode` argument which ended up doing a small refactoring of
the `*_regmove` variant used for `insertlane 0` into f64x2 vectors.

* x64: Enable using AVX instructions for zero regs

This commit refactors the internal ISLE helpers for creating zero'd
xmm registers to leverage the AVX support for all other instructions.
This moves away from picking opcodes to instead picking instructions
with a bit of reorganization.

* x64: Remove `XmmConstOp` as an instruction

All existing users can be replaced with usage of the `xmm_uninit_value`
helper instruction so there's no longer any need for these otherwise
constant operations. This additionally reduces manual usage of opcodes
in favor of instruction helpers.

* Review comments

* Update test expectations
2023-03-09 23:57:42 +00:00
Alex Crichton
1c3a1bda6c x64: Add a smattering of lowerings for shuffle specializations (#5930)
* x64: Add lowerings for `punpck{h,l}wd`

Add some special cases for `shuffle` for more specialized x86
instructions.

* x64: Add `shuffle` lowerings for `pshufd`

This commit adds special-cased lowerings for the x64 `shuffle`
instruction when the `pshufd` instruction alone is necessary. This is
possible when the shuffle immediate permutes 32-bit values within one of
the vector inputs of the `shuffle` instruction, but not both.

* x64: Add shuffle lowerings for `punpck{h,l}{q,}dq`

This adds specific permutations for some x86 instructions which
specifically interleave high/low bytes for 32 and 64-bit values. This
corresponds to the preexisting specific lowerings for interleaving 8 and
16-bit values.

* x64: Add `shuffle` lowerings for `shufps`

This commit adds targeted lowerings for the `shuffle` instruction that
match the pattern that `shufps` supports. The `shufps` instruction
selects two elements from the first vector and two elements from the
second vector which means while it's not generally applicable it should
still be more useful than the catch-all lowering of `shuffle`.

* x64: Add shuffle support for `pshuf{l,h}w`

This commit adds special lowering cases for these instructions which
permute 16-bit values within a 128-bit value either within the upper or
lower half of the 128-bit value.

* x64: Specialize `shuffle` with an all-zeros immediate

Instead of loading the all-zeros immediate from a rip-relative address
at the end of the function instead generate a zero with a `pxor`
instruction and then use `pshufb` to do the broadcast.

* Review comments
2023-03-09 22:58:19 +00:00
Nick Fitzgerald
8a2bf29444 wasmtime: Privately expose a module's address map and its function's bytes (#5973)
This will allow us to build developer tools for Wasmtime and Cranelift like WAT
and asm side-by-side viewers (a la Godbolt).

These are not proper public APIs, so they are marked `doc(hidden)` and have
comments saying they are only for use within this repo's workspace.
2023-03-09 20:04:51 +00:00
Chris Fallin
7f3500a172 Cranelift: x64, aarch64, s390x, riscv64: ensure addresses are I64s. (#5972)
* Cranelift: x64, aarch64, s390x, riscv64: ensure addresses are I64s.

@avanhatt has been looking at our address-mode lowering and found an
example where when feeding an `I32`-typed address into a load or store,
we can violate assumptions and get incorrect codegen.

This should never be reachable in practice, because all producers on
64-bit architectures use 64-bit types for addresses. However, our IR is
insufficiently constrained, and allows loads/stores to `I32` addresses
as well. This is nonsensical on a 64-bit architecture.

Initially I had thought we should tighten either the instruction
definition's accepted types, or the CLIF verifier, to reject this.
However both are target-independent, and we don't want to bake
an assumption of 64-bit-ness into the compiler core. Instead this PR
tightens specific backends' lowerings to rejecct loads/stores of
`I32`-typed addresses.

tl;dr: no security implications as all producers use I64-typed
addresses (and must, for correct operation); but we currently accept
I32-typed addresses too, and this breaks other assumptions.

* Allow R64 as well as I64 types.

* Add an explicit extractor to match 64-bit address types.
2023-03-09 19:08:16 +00:00
Jamey Sharp
f877141668 cranelift-meta: Don't let-bind operand definitions (#5969)
We've adopted this pattern in Cranelift's instruction definitions where
we let-bind some calls to `Operand::new` and then later use them in one
or more calls to `Inst::new`.

That pattern has two problems:
- It puts the type of each operand somewhere potentially far removed
  from the instruction in which it's used.
- We let-bind the same name for many different operands, compounding the
  first problem by making it harder to find _which_ definition is used.

So instead this commit removes all let-bindings for operand definitions
and constructs a new `Operand` every time.

Constructing an `Operand` at every use means we duplicate some
documentation strings, but not all that many of them as it turns out.

I've left the let-bound type-sets alone, so those are currently still
shared across many instructions. They have some of the same problems and
should be reviewed as well.
2023-03-09 17:24:19 +00:00
Alex Crichton
9141fcf8cf components: Fix bindgen! with renamed interfaces (#5968)
This follows the same strategy pioneered by the `wit-bindgen` guest Rust
bindgen which keeps track of the latest name of an interface for how to
refer to an interface.

Closes #5961
2023-03-08 23:15:58 +00:00
Alex Crichton
8a08fedc69 components: Assert sizes match wit-parser sizes (#5943)
I was debugging [an issue] recently where it appears that the underlying
cause was a discrepancy in the size/align of a WIT type between Wasmtime
and `wit-parser`. This commit adds compile-time assertions that the size
of a WIT type is the same with `wit-parser` as it is in Wasmtime since
the two have different systems to calculate the size of a type. The hope
is that this will head off any future issues if they crop up.

[an issue]: https://github.com/bytecodealliance/wit-bindgen/issues/526
2023-03-08 21:42:03 +00:00
Alex Crichton
46386cca0e Add release notes for 7.0.0 (#5963) 2023-03-08 21:22:33 +00:00
Alex Crichton
63fb30e4b4 Merge pull request from GHSA-ff4p-7xrq-q5r8
* x64: Remove incorrect `amode_add` lowering rules

This commit removes two incorrect rules as part of the x64 backend's
computation of addressing modes. These two rules folded a zero-extended
32-bit computation into the address mode operand, but this isn't correct
as the 32-bit computation should be truncated to 32-bits but when folded
into the address mode computation it happens with 64-bit operands,
meaning truncation doesn't happen.

* Add release notes
2023-03-08 13:00:40 -06:00
Alex Crichton
5dc2bbccbb Merge pull request from GHSA-xm67-587q-r2vw
This commit fixes an off-by-one error in the subtraction of indices when
shuffling a vector with itself. Lanes 16-and-above are mapped to select
from the first vector since the first and second element are the same,
but the subtraction was with 15 rather than 16 by accident.
2023-03-08 13:00:00 -06:00
Alex Crichton
e8331661cb Add Clone for component::InstancePre (#5966)
This is present on `wasmtime::InstancePre` and should be available for
components as well.

Closes #5965
2023-03-08 17:13:28 +00:00
Kevin Rizzo
013b35ff32 winch: Refactoring wasmtime compiler integration pieces to share more between Cranelift and Winch (#5944)
* Enable the native target by default in winch

Match cranelift-codegen's build script where if no architecture is
explicitly enabled then the host architecture is implicitly enabled.

* Refactor Cranelift's ISA builder to share more with Winch

This commit refactors the `Builder` type to have a type parameter
representing the finished ISA with Cranelift and Winch having their own
typedefs for `Builder` to represent their own builders. The intention is
to use this shared functionality to produce more shared code between the
two codegen backends.

* Moving compiler shared components to a separate crate

* Restore native flag inference in compiler building

This fixes an oversight from the previous commits to use
`cranelift-native` to infer flags for the native host when using default
settings with Wasmtime.

* Move `Compiler::page_size_align` into wasmtime-environ

The `cranelift-codegen` crate doesn't need this and winch wants the same
implementation, so shuffle it around so everyone has access to it.

* Fill out `Compiler::{flags, isa_flags}` for Winch

These are easy enough to plumb through with some shared code for
Wasmtime.

* Plumb the `is_branch_protection_enabled` flag for Winch

Just forwarding an isa-specific setting accessor.

* Moving executable creation to shared compiler crate

* Adding builder back in and removing from shared crate

* Refactoring the shared pieces for the `CompilerBuilder`

I decided to move a couple things around from Alex's initial changes.
Instead of having the shared builder do everything, I went back to
having each compiler have a distinct builder implementation. I
refactored most of the flag setting logic into a single shared location,
so we can still reduce the amount of code duplication.

With them being separate, we don't need to maintain things like
`LinkOpts` which Winch doesn't currently use. We also have an avenue to
error when certain flags are sent to Winch if we don't support them. I'm
hoping this will make things more maintainable as we build out Winch.

I'm still unsure about keeping everything shared in a single crate
(`cranelift_shared`). It's starting to feel like this crate is doing too
much, which makes it difficult to name. There does seem to be a need for
two distinct abstraction: creating the final executable and the handling
of shared/ISA flags when building the compiler. I could make them into
two separate crates, but there doesn't seem to be enough there yet to
justify it.

* Documentation updates, and renaming the finish method

* Adding back in a default temporarily to pass tests, and removing some unused imports

* Fixing winch tests with wrong method name

* Removing unused imports from codegen shared crate

* Apply documentation formatting updates

Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>

* Adding back in cranelift_native flag inferring

* Adding new shared crate to publish list

* Adding write feature to pass cargo check

---------

Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
2023-03-08 15:07:13 +00:00