* ISLE: move `icmp` rewrites to separate file.
Move `icmp`-related rewrite rules from `algebraic.isle` to `icmp.isle`.
Also move `icmp`-related tests from `algebraic.clif` to `icmp.clif`.
* Put parameterized and unparameterized `icmp` tests in separate files
* Undo refactoring of (ir)reflexivity rewrites
* Fix `icmp-parameterised.clif`
* Undo formatting/comment changes
* add cargo-deny exception for duplicate versions of windows-sys
* cargo vetting for all new deps introduced by https://github.com/bytecodealliance/wasmtime/pull/5929
The audits are straightforward. The exemptions, as always, need to be justified:
* core-foundation, core-foundation-sys, security-framework, security-framework-sys: these are large crates which are FFI bindings to Mac OS frameworks. As such they contain tons of unsafe code to make these FFI calls and manage memory. These crates are too big to audit.
* schannel: same as the above, except this is a windows component, which I'm also unfamiliar with.
* openssl, openssl-sys: also large FFI bindings which are impractical to audit.
* futures-macro, futures-task: while not as complex as futures-util, these are beyond my personal understanding of futures to vet practically. I've asked Alex to look at auditing these, and he will after he returns from vacation next week.
* futures-util: 25kloc of code, over 149 instances of the substring "unsafe" (case insensitive), this is impractical to audit in the extreme.
* h2, http, httparse, hyper, mio, tokio: this so-called tokio/hyper family are very large and challenging to audit. Bobby Holley has indicated that he is working to get the AWS engineers who maintain these crates to publish their own audits, which we can then import. We expect to exempt these until those imports are available.
* x64: Add AVX encodings of `vcvt{ss2sd,sd2ss}`
Additionally update the instruction helpers to take an `XmmMem` argument
to allow load sinking into the instruction.
* x64: Add AVX encoding of `sqrts{s,d}`
* x64: Add AVX support for `rounds{s,d}`
* x64: Deduplicate fcmp emission logic
The `select`-of-`fcmp` lowering duplicated a good deal of `FloatCC`
lowering logic that was already done by `emit_fcmp`, so this commit
refactors these lowering rules to instead delegate to `emit_fcmp` and
then handle that result.
* Swap order of condition codes
Shouldn't affect the correctness of this operation and it's a bit more
natural to write the lowering rule this way.
* Swap the order of comparison operands
No need to swap `a b`, only the `x y` needs swapping.
* Fix x64 printing of `XmmCmove`
The teams named in this revised version of the `CODEOWNERS` file are
currently configured with the same members as are listed in the file
now.
We wanted a single member of the selected team to be assigned as a
reviewer, but listing people explicitly in CODEOWNERS causes all the
named people to be assigned. Using teams instead allows us to configure
the load-balancing policy.
This also will allow us to add and remove reviewers without needing a PR
every time.
* Add `precise_output` argument to `test optimise`.
Also allow optimise tests to be updated by `CRANELIFT_TEST_BLESS=1`
* Move `check_precise_output` and `update_test` to `subtest`
* winch(x64): Initial implementation for function calls
This change adds the main building blocks for calling locally defined
functions. Support for function imports will be added iteratively after this
change lands and once trampolines are supported.
To support function calls, this change introduces the following functionality to
the MacroAssembler:
* `pop` to pop the machine stack into a given register, which in the case of
this change, translates to the x64 pop instruction.
* `call` to a emit a call to locally defined functions.
* `address_from_sp` to construct memory addresses with the SP as a base.
* `free_stack` to emit the necessary instrunctions to claim stack space.
The heavy lifting of setting up and emitting the function call is done through
the implementation of `FnCall`.
* Fix spill behaviour in function calls and add more documentation
This commits adds a more detailed documentation to the `call.rs` module.
It also fixes a couple of bugs, mainly:
* The previous commit didn't account for memory addresses used as arguments for
the function call, any memory entry in the value stack used as a function
argument should be tracked and then used to claim that memory when the function
call ends. We could `pop` and do this implicitly, but we can also track this
down and emit a single instruction to decrement the stack pointer, which will
result in better code.
* Introduce a differentiator between addresses relative or absolute to the stack
pointer. When passing arguments in the stack -- assuming that SP at that point
is aligned for the function call -- we should store the arguments relative to
the absolute position of the stack pointer and when addressing a memory entry in
the Wasm value stack, we should use an address relative to the offset and the
position of the stack pointer.
* Simplify tracking of the stack space needed for emitting a function call
We want to make sure every contributor gets some kind of meaningful
response in a timely fashion. To that end, this PR configures GitHub to
auto-assign somebody to every newly-opened PR.
People must only be added to this file if they've agreed to this
obligation. The details of what's expected are listed in the file. I'll
only merge this if it's signed off by everyone listed in this initial
version.
* Fix export translation for components.
Exports in the component model cause a new index to be added to the index space
of the item being exported.
This commit updates component translation so that translation of component
export sections properly updates internal lists representing those index
spaces.
* Code review feedback.
* Implement TLS on Aarch64 Mach-O
* Add aarch64 macho TLS filetest
* Address review comments
- `Aarch64` instead of `AArch64` in comments
- Remove unnecessary guard in tls_value lowering
- Remove unnecessary regalloc metadata in emission
* Use x1 as temporary register in emission
- Instead of passing in a temporary register to use when emitting
the TLS code, just use `x1`, as it's already in the clobber set.
This also keeps the size of `aarch64::inst::Inst` at 32 bytes.
- Update filetest accordingly
* Update aarch64 mach-o TLS filetest
There was a missing rest field access. In addition createTag doesn't actually create a tag, it creates a tag object. A tag object is an object which references a commit or other kind of object and has various kinds of metadata. You need to store it in a reference stored in refs/tags/ to actually show as tag in the git ui. The code to update the tag however creates a lightweight tag (which is a file in refs/tags/ which directly references a commit rather than a tag object) as such do the same when creating the initial dev tag by using createRef with a commit id as object sha.
See also https://git-scm.com/book/en/v2/Git-Internals-Git-References for the difference between a lightweight tag and an annotated tag.
Following up on the discussion in
https://github.com/bytecodealliance/wasmtime/pull/6011
this adds an improved implementation of TrapIf for s390x
using a single conditional branch instruction.
If the trap conditions is true, we branch into the middle of
the branch instruction - those middle two bytes are zero,
which matches the encoding of the trap instruction.
In addition, show the trap code for Trap and TrapIf
instructions in assembler output.
This commit changes the signature of the `Store::epoch_deadline_callback` to
take in `StoreContextMut` instead of a mutable reference to the store's data.
This is useful in cases in which the callback definition needs access to the
Store to be able to use other methods that take in `AsContext`/`AsContextMut`,
like for example `WasmtimeBacktrace::capture`
* x64: Add instruction helpers for `mov{d,q}`
These will soon grow AVX-equivalents so move them to instruction helpers
to have clauses for AVX in the future.
* x64: Don't auto-convert between RegMemImm and XmmMemImm
The previous conversion, `mov_rmi_to_xmm`, would move from GPR registers
to XMM registers which isn't what many of the other `convert` statements
between these newtypes do. This seemed like a possible footgun so I've
removed the auto-conversion and added an explicit helper to go from a
`u32` to an `XmmMemImm`.
* x64: Add AVX encodings of some more GPR-related insns
This commit adds some more support for AVX instructions where GPRs are
in use mixed in with XMM registers. This required a few more variants of
`Inst` to handle the new instructions.
* Fix vpmovmskb encoding
* Fix xmm-to-gpr encoding of vmovd/vmovq
* Fix typo
* Fix rebase conflict
* Fix rebase conflict with tests
* cranelift: Add extra runtests for `clz`/`ctz`
* riscv64: Restrict lowering rules for `ctz`/`clz`
* cranelift: Add `u64` isle helpers
* riscv64: Improve `ctz` codegen
* riscv64: Improve `clz` codegen
* riscv64: Improve `cls` codegen
* riscv64: Improve `clz.i128` codegen
Instead of checking if we have 64 zeros in the top half. Check
if it *is* 0, that way we avoid loading the `64` constant.
* riscv64: Improve `ctz.i128` codegen
Instead of checking if we have 64 zeros in the bottom half. Check
if it *is* 0, that way we avoid loading the `64` constant.
* riscv64: Use extended value in `lower_cls`
* riscv64: Use pattern matches on `bseti`
* Pick argument and return types based on opcode constraints
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Lazily build the OPCODE_SIGNATURES list
* Skip unsupported isplit/iconcat cases
* Add an issue reference for the isplit/iconcat exemption
* Refactor the deny lists to use exceptions!, and remove redundant entries
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Restrict the types for isplit and iconcat to match backends
* Admit unimplemented bitwidths to isplit/iconcat
* Modify the NarrowInt type instead of shadowing it
* Fix filetest failures
* Add a `MachBuffer::defer_trap` method
This commit adds a new method to `MachBuffer` to defer trap opcodes to
the end of a function in a similar manner to how constants are deferred
to the end of the function. This is useful for backends which frequently
use `TrapIf`-style opcodes. Currently a jump is emitted which skips the
next instruction, a trap, and then execution continues normally. While
there isn't any pressing problem with this construction the trap opcode
is in the middle of the instruction stream as opposed to "off on the
side" despite rarely being taken.
With this method in place all the backends (except riscv64 since I
couldn't figure it out easily enough) have a new lowering of their
`TrapIf` opcode. Now a trap is deferred, which returns a label, and then
that label is jumped to when executing the trap. A fixup is then
recorded in `MachBuffer` to get patched later on during emission, or at
the end of the function. Subsequently all `TrapIf` instructions
translate to a single branch plus a single trap at the end of the
function.
I've additionally further updated some more lowerings in the x64 backend
which were explicitly using traps to instead use `TrapIf` where
applicable to avoid jumping over traps mid-function. Other backends
didn't appear to have many jump-over-the-next-trap patterns.
Lots of tests have had their expectations updated here which should
reflect all the traps being sunk to the end of functions.
* Print trap code on all platforms
* Emit traps before constants
* Preserve source location information for traps
* Fix test expectations
* Attempt to fix s390x
The MachBuffer was registering trap codes with the first byte of the
trap, but the SIGILL handler was expecting it to be registered with the
last byte of the trap. Exploit that SIGILL is always represented with a
2-byte instruction and always march 2-backwards for SIGILL, continuing
to march backwards 1 byte for SIGFPE-generating instructions.
* Back out s390x changes
* Back out more s390x bits
* Review comments
* Speed up index fetches on CI
Use the `sparse` protocol from Rust 1.68.0 which should shave a minute
or two off most steps on CI.
* Update nightly toolchains in CI
prtest:full
* Fix date
Previously it could affect the PartialEq and Hash impls. Ignoring the
sequence number in PartialEq and Hash allows us to not renumber all
blocks in the incremental cache.
This commit goes through the `runtests` folder of the `filetests`
test suite and ensure that everything which uses simd or float-related
instructions on x64 is executed with the baseline support for x86_64 in
addition to adding in AVX support. Most of the instructions used have
AVX equivalents so this should help test all of the equivalents in
addition to the codegen filetests in the x64 folder.
This refactoring from #5766 accidentally broke the tag name calculation
for the `build-src-tarball.sh` script so this fixes it by copying over
the same logic from `build-tarballs.sh`.
* Add support for generating perf maps for simple perf profiling
* add missing enum entry in C code
* bugfix: use hexa when printing the code region's length too (thanks bjorn3!)
* sanitize file name + use bufwriter
* introduce --profile CLI flag for wasmtime
* Update doc and doc comments for new --profile option
* remove redundant FromStr import
* Apply review feedback: make_line receives a Write impl, report errors
* fix tests?
* better docs
* x64: Fix vbroadcastss with AVX2 and without AVX
This commit fixes a corner case in the emission of the
`vbroadcasts{s,d}` instructions. The memory-to-xmm form of these
instructions was available with the AVX instruction set, but the
xmm-to-xmm form of these instructions wasn't available until AVX2.
The instruction requirement for these are listed as AVX but the lowering
rules are appropriately annotated to use either AVX2 or AVX when
appropriate.
While this should work in practice this didn't work for the assertion
about enabled features for each instruction. The `vbroadcastss`
instruction was listed as requiring AVX but could get emitted when AVX2
was enabled (due to the reg-to-reg form being available). This caused an
issue for the fuzzer where AVX2 was enabled but AVX was disabled.
One possible fix would be to add more opcodes, one for reg-to-reg and
one for mem-to-reg. That seemed like somewhat overkill for a pretty
niche situation that shouldn't actually come up in practice anywhere.
Instead this commit changes all the `has_avx` accessors to the
`use_avx_simd` predicate already available in the target flags. The
`use_avx2_simd` predicate was then updated to additionally require
`has_avx`, so if AVX2 is enabled and AVX is disabled then the
`vbroadcastss` instruction won't get emitted any more.
Closes#6059
* Pass `enable_simd` on a few more files
* Add a program for checking the function_generator opcode signatures
* Rework as a test in function_generator instead
* Fix some invalid opcode signatures in the function generator
* Fix bnot exclusions
* Only allow pp_cmp within a single block
Block order shouldn't matter for codegen and restricting pp_cmp to a
single block will allow making instruction sequence numbers local to a
block.
* Make sequence numbers local to instructions
This allows renumbering to be localized to a single block where previously it
could affect the entire function. Also saves 32bit of overhead per block.