Fix shadowing identified in #5322 for imul and swiden_high/swiden_low/uwiden_high/uwiden_low combinations in the x64 backend, and remove some redundant rules from the aarch64 dynamic neon ruleset. Additionally, add tests to the x64 backend showing that the imul specializations are firing.
Add assertions to the OperandCollector that show we're not using pinned vregs, and use reg_fixed_nonallocatable constraints when a real register is used with other constraint generation functions like reg_use etc.
There were several issues with ISLE's existing error reporting
implementation.
- When using Miette for more readable error reports, it would panic if
errors were reported from multiple files in the same run.
- Miette is pretty heavy-weight for what we're doing, with a lot of
dependencies.
- The `Error::Errors` enum variant led to normalization steps in many
places, to avoid using that variant to represent a single error.
This commit:
- replaces Miette with codespan-reporting
- gets rid of a bunch of cargo-vet exemptions
- replaces the `Error::Errors` variant with a new `Errors` type
- removes source info from `Error` variants so they're easy to construct
- adds source info only when formatting `Errors`
- formats `Errors` with a custom `Debug` impl
- shares common code between ISLE's callers, islec and cranelift-codegen
- includes a source snippet even with fancy-errors disabled
I tried to make this a series of smaller commits but I couldn't find any
good split points; everything was too entangled with everything else.
* Cranelift: Define `heap_load` and `heap_store` instructions
* Cranelift: Implement interpreter support for `heap_load` and `heap_store`
* Cranelift: Add a suite runtests for `heap_{load,store}`
There are so many knobs we can twist for heaps and I wanted to exhaustively test
all of them, so I wrote a script to generate the tests. I've checked in the
script in case we want to make any changes in the future, but I don't think it
is worth adding this to CI to check that scripts are up to date or anything like
that.
* Review feedback
In #5174 we decided it doesn't make sense for a rule to have a
bind-pattern at the root of its left-hand side. There's no Rust value
corresponding to the root value of such a term, because it actually
represents a function declaration with one or more arguments.
This commit takes that to its logical conclusion.
`sema::Rule` previously had an `lhs` field whose value must always be a
`Pattern::Term` variant, and anyone using that structure had to deal
with the possibility of finding the wrong variant there.
Now the relevant fields from that variant are stored directly in `Rule`
instead. Also, the (tiny!) portion of `translate_pattern` which applied
when the pattern was the root term is now inlined in `collect_rules`.
Because `translate_pattern` no longer has to special-case the root term,
we can delete its `rule_term` and `is_root` arguments. That brings it
down to a more manageable four arguments, which means many calls fit on
one line now.
As it turns out, that distinction was not necessary for this
representation. Removing it eliminates some complexity around wrapping
expressions as bindings and vice versa. It also clears up some confusion
about which category to put certain constructs in (arguments and
extractors) by refusing to have different categories.
While I was writing this patch I also realized that `add_match_variant`
and `normalize_equivalence_classes` both need to do fundamentally the
same things with enum variants, so I refactored them to share code and
make their relationship clearer.
Finally, I reviewed all the comments in this file and fixed some places
where they could be more clear.
* Shrink the size of SigData in Cranelift
* Update cranelift/codegen/src/machinst/abi.rs
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Change ret arg length to u16
* Add test
Co-authored-by: Jamey Sharp <jamey@minilop.net>
Avoid naming %rcx as written by the CoffTlsGetAddr pseudo-instruction in the x64 backend, and instead emit a fixed-def constraint for a fresh VReg and %rcx.
Remove some unnecessary moves in the x64 gen_memcpy implementation -- the call instruction that's generated will already constrain the args to those registers.
* cranelift: Cleanup `fdemote`/`fpromote` tests
* cranelift: Fix `fdemote`/`fpromote` instruction docs
The verifier fails if the input and output types are the same
for these instructions
* cranelift: Fix `fdemote`/`fpromote` in the interpreter
* fuzzgen: Add `fdemote`/`fpromote`
We can encode more constants into 12-bit immediates if we do the following
rewrite for comparisons with odd constants:
A >= B + 1
==> A - 1 >= B
==> A > B
* cranelift-isle: New IR and revised overlap checks
* Improve error reporting
* Avoid "unused argument" warnings a nicer way
* Remove unused fields
* Minimize diff and "fix" error handling
I had tried to use Miette "right" and made things worse somehow. Among
other changes, revert all my changes to unrelated parts of `error.rs`
and `error_miette.rs`.
* Review comments: Rename "unmatchable" to "unreachable"
* Review comments: newtype wrappers, not type aliases
* Review comments: more comments on overlap checks
* Review comments: Clarify `normalize_equivalence_classes`
* Review comments: use union-find instead of linked list
This saves about 50 lines of code in the trie_again module. The
union-find implementation is about twice as long as that, counting
comments and doc-tests, but that's a worth-while tradeoff.
However, this makes `normalize_equivalence_classes` slower, because now
finding all elements of an equivalence class takes time linear in the
total size of all equivalence classes. If that ever turns out to be a
problem in practice we can find some way to optimize `remove_set_of`.
* Review comments: Hide constraints HashMap
We want to enforce that consumers of this representation can't observe
non-deterministic ordering in any of its public types.
* Review comments: Normalize equivalence classes incrementally
I'm not sure whether this is a good idea. It doesn't make the logic
particularly simpler, and I think it will do more work if three or more
binding sites with enum-variant constraints get set equal to each other.
* More comments and other clarifications
* Revert "Review comments: Normalize equivalence classes incrementally"
* Even more comments
This is a common pattern in sema, so factor it out.
Since this version uses `intern` instead of `intern_mut`, it might be a
tiny bit faster when errors occur due to not writing names into maps
then. When no error occurs, ISLE should do exactly the same work with or
without this commit.
Before, we would do a `heap_addr` to translate the given Wasm memory address
into a native memory address and pass it into the libcall that implemented the
atomic operation, which would then treat the address as a Wasm memory address
and pass it to `validate_atomic_addr` to be bounds checked a second time. This
is a bit nonsensical, as we are validating a native memory address as if it were
a Wasm memory address.
Now, we no longer do a `heap_addr` to translate the Wasm memory address to a
native memory address. Instead, we pass the Wasm memory address to the libcall,
and the libcall is responsible for doing the bounds check (by calling
`validate_atomic_addr` with the correct type of memory address now).
The `is_root` flag to `translate_pattern` just determines whether the
`rule_term` argument is used, which begs a larger cleanup. But that
cleanup is less clear if `is_root` is set anywhere aside from the call
in `collect_rules`. So I wanted to get confirmation that this particular
use of that flag is incorrect first.
These two arguments (`is_root` and `rule_term`) are used to prevent
expansion of a term as an internal extractor ("macro") if:
- that term is also an internal constructor
- and it's the root term on the left-hand side of the current rule
- and the pattern we're currently translating has no parents.
I'm not sure what it should mean to use the term you're currently
defining as the root pattern on the left-hand side of an if-let in the
same rule, but I don't think it should have this particular special
treatment.
It's nice to be able to report these names after sema analysis completes
so rule authors can recognize which names they used.
This isn't used anywhere yet, but I'm planning to use it during codegen,
and the rule-verification folks wanted something like this for debugging
output.
* Cranelift: Make `heap_addr` return calculated `base + index + offset`
Rather than return just the `base + index`.
(Note: I've chosen to use the nomenclature "index" for the dynamic operand and
"offset" for the static immediate.)
This move the addition of the `offset` into `heap_addr`, instead of leaving it
for the subsequent memory operation, so that we can Spectre-guard the full
address, and not allow speculative execution to read the first 4GiB of memory.
Before this commit, we were effectively doing
load(spectre_guard(base + index) + offset)
Now we are effectively doing
load(spectre_guard(base + index + offset))
Finally, this also corrects `heap_addr`'s documented semantics to say that it
returns an address that will trap on access if `index + offset + access_size` is
out of bounds for the given heap, rather than saying that the `heap_addr` itself
will trap. This matches the implemented behavior for static memories, and after
https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked
on this commit) will also match the implemented behavior for dynamic memories.
* Update heap_addr docs
* Factor out `offset + size` to a helper
While checking the call graph of extractors during semantic validation,
save `TermId` instead of `Sym`. The types are both just integer indexes,
but the `TermId` is more useful here. Saving it avoids needing to check
for failed map lookups twice, which simplifies the implementation.
Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.
* Wasmtime+Cranelift: strip out some dead x86-32 code.
I was recently pointed to fastly/Viceroy#200 where it seems some folks
are trying to compile Wasmtime (via Viceroy) for Windows x86-32 and the
failures may not be loud enough. I've tried to reproduce this
cross-compiling to i686-pc-windows-gnu from Linux and hit build failures
(as expected) in several places. Nevertheless, while trying to discern
what others may be attempting, I noticed some dead x86-32-specific code
in our repo, and figured it would be a good idea to clean this up.
Otherwise, it (i) sends some mixed messages -- "hey look, this codebase
does support x86-32" -- and (ii) keeps untested code around, which is
generally not great.
This PR removes x86-32-specific cases in traphandlers and unwind code,
and Cranelift's native feature detection. It adds helpful compile-error
messages in a few cases. If we ever support x86-32 (contributors
welcome! The big missing piece is Cranelift support; see #1980), these
compile errors and git history should be enough to recover any knowledge
we are now encoding in the source.
I left the x86-32 support in `wasmtime-fiber` alone because that seems
like a bit of a special case -- foundation library, separate from the
rest of Wasmtime, with specific care to provide a (presumably working)
full 32-bit version.
* Remove some extraneous compile_error!s, already covered by others.
Remove the dependency on VCode for VReg allocation. This will simplify the changes in #5172, as that PR introduces the need to allocate temporary registers from the ABI context.
This change also allows us to remove some fields from VCode: reftyped_vregs_set and have_ref_values.
Add a MemFlags operand to the bitcast instruction, where only the
`big` and `little` flags are accepted. These define the lane order
to be used when casting between types of different lane counts.
Update all users to pass an appropriate MemFlags argument.
Implement lane swaps where necessary in the s390x back-end.
This is the final part necessary to fix
https://github.com/bytecodealliance/wasmtime/issues/4566.
The sample program in cranelift/filetests/src/function_runner.rs
would abort with an mprotect failure under certain circumstances,
see https://github.com/bytecodealliance/wasmtime/pull/4453#issuecomment-1303803222
Root cause was that enabling PROT_EXEC on the main process heap
may be prohibited, depending on Linux distro and version.
This only shows up in the doc test sample program because the main
clif-util is multi-threaded and therefore allocations will happen
on glibc's per-thread heap, which is allocated via mmap, and not
the main process heap.
Work around the problem by enabling the "selinux-fix" feature of
the cranelift-jit crate dependency in the filetests. Note that
this didn't compile out of the box, so a separate fix is also
required and provided as part of this PR.
Going forward, it would be preferable to always use mmap to allocate
the backing memory for JITted code.
* cranelift: improve syscall error/oom handling in JIT module
The JIT module has several places where it `expect`s or `panic`s
on syscall or allocator errors. For example, `mmap` and `mprotect`
can fail if Linux `vm.max_map_count` is not high enough, and some
users may wish to handle this error rather than immediately
crashing.
This commit plumbs these errors upward as new `ModuleError`
types, so that callers of jit module functions like
`finalize_definitions` and `define_function` can handle them
(or just `unwrap()`, as desired).
* cranelift: Remove ModuleError::Syscall variant
Syscall errors can just be folded into the generic Backend error,
which is an anyhow::Error
* cranelift-jit: return io::ErrorKind::OutOfMemory for alloc failure
Just using `io::Error::last_os_error()` is not correct as global
allocator impls are not required to set errno
Change add_nan_canon_seq to use vselect instead of bitselect.
This is more straightforward and removes bitcast operations.
Codegen should be unchanged.
The simplifier was performing an optimization to replace bitselect
with vselect if the all bytes of the condition mask could be shown
to be all ones or all zeros.
This optimization only ever made any difference in codegen on the
x64 target. Therefore, move this optimization to the x64 back-end
and perform it in ISLE instead. Resulting codegen should be
unchanged, with slightly improved compile time.
This also eliminates a few endian-dependent bitcast operations.