Includes a temporary bugfix for popcnt with 32-bit operand. The popcnt
issue was initially identified by Benjamin Bouvier <public@benj.me>, and
the root cause was debugged by Joey Gouly <joey.gouly@arm.com>. This
patch is simply a quick fix that zero-extends the operand to 64 bits;
Joey plans to contribute a more permanent fix shortly (tracked in
#1537).
This involves some large mask tables that may hurt code size but reduce the number of instructions. See https://github.com/WebAssembly/simd/issues/117 for a more in-depth discussion on this.
Since we now allow constants of any size, we have to verify that `vconst` (currently the only user of the constant pool) is accessing constants that match its controlling type.
This allows us to give names to constants in the constant pool and then use these names in the function body. The original behavior, specifiying the constant value as an instruction immediate, is still supported as a shortcut but some filetests had to change since the canonical way of printing the CLIF constants is now in the preamble.
- Added a filetest for the vcode output of lowering every handled FP opcode.
- Fixed two bugs that were discovered while going through the lowerings:
- Saturating FP->int operators would return `u{32,64}::MIN` rather than
`0` for a NaN input.
- `fcopysign` did not mask off the sign bit of the value whose sign is
overwritten.
These probably would have been caught by Wasm conformance tests soon
(and the validity of these lowerings will ultimately be tested this way)
but let's get them right by inspection, too!
This commit makes the following changes to unwind information generation in
Cranelift:
* Remove frame layout change implementation in favor of processing the prologue
and epilogue instructions when unwind information is requested. This also
means this work is no longer performed for Windows, which didn't utilize it.
It also helps simplify the prologue and epilogue generation code.
* Remove the unwind sink implementation that required each unwind information
to be represented in final form. For FDEs, this meant writing a
complete frame table per function, which wastes 20 bytes or so for each
function with duplicate CIEs. This also enables Cranelift users to collect the
unwind information and write it as a single frame table.
* For System V calling convention, the unwind information is no longer stored
in code memory (it's only a requirement for Windows ABI to do so). This allows
for more compact code memory for modules with a lot of functions.
* Deletes some duplicate code relating to frame table generation. Users can
now simply use gimli to create a frame table from each function's unwind
information.
Fixes#1181.
- Undo temporary changes to default features (`all-arch`) and a
signal-handler test.
- Remove `SIGTRAP` handler: no longer needed now that we've found an
"undefined opcode" option on ARM64.
- Rename pp.rs to pretty_print.rs in machinst/.
- Only use empty stack-probe on non-x86. As per a comment in
rust-lang/compiler-builtins [1], LLVM only supports stack probes on
x86 and x86-64. Thus, on any other CPU architecture, we cannot refer
to `__rust_probestack`, because it does not exist.
- Rename arm64 to aarch64.
- Use `target` directive in vcode filetests.
- Run the flags verifier, but without encinfo, when using new backends.
- Clean up warning overrides.
- Fix up use of casts: use u32::from(x) and siblings when possible,
u32::try_from(x).unwrap() when not, to avoid silent truncation.
- Take immutable `Function` borrows as input; we don't actually
mutate the input IR.
- Lots of other miscellaneous cleanups.
[1] cae3e6ea23/src/probestack.rs (L39)
This patch, the last in the series, adds the filetests for the new ARM64
backend. The filetests cover most of the opcodes, except for the
recently-added floating point support.
This patch contains code written by Julian Seward <jseward@acm.org> and
Benjamin Bouvier <public@benj.me>, originally developed on a side-branch
before rebasing and condensing into this patch series. See the `arm64`
branch at `https://github.com/cfallin/wasmtime` for original development
history.
This patch also contains code written by Joey Gouly
<joey.gouly@arm.com> and contributed to the above branch. These
contributions are "Copyright (c) 2020, Arm Limited."
Co-authored-by: Julian Seward <jseward@acm.org>
Co-authored-by: Benjamin Bouvier <public@benj.me>
Co-authored-by: Joey Gouly <joey.gouly@arm.com>
Preserve FPRs as required by the Windows fastcall calling convention.
This exposes an implementation limit due to Cranelift's approach to stack layout, which conflicts with expectations Windows makes in SEH layout - functions where the Cranelift user desires fastcall unwind information, that require preservation of an ABI-reserved FPR, that have a stack frame 240 bytes or larger, now produce an error when compiled. Several wasm spectests were disabled because they would trip this limit. This is a temporary constraint that should be fixed promptly.
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
In cranelift x86 encodings, it seemed unintuitive to specialize Templates with both `infer_rex()`` and `w()`: if `w()` is specified, the REX.W bit must be set so a REX prefix is alway required--no need to infer it. This change forces us to write `rex().w()``--it's more explicit and shows more clearly what cranelift will emit. This change also modifies the tests that expected DynRex recipes.
- Convert recipes to have necessary size calculator
- Add a missing binemit function, `put_dynrexmp3`
- Modify the meta-encodings of x86 SIMD instructions to use `infer_rex()`, mostly through the `enc_both_inferred()` helper
- Fix up tests that previously always emitted a REX prefix
Until #1306 is resolved (some spilling/regalloc issue with larger FPR register banks), this removes FPR32 support. Only Wasm's `i64x2.mul` was using this register class and that instruction is predicated on AVX512 support; for the time being, that instruction will have to make do with the 16 FPR registers.
* Manually rename BasicBlock to BlockPredecessor
BasicBlock is a pair of (Ebb, Inst) that is used to represent the
basic block subcomponent of an Ebb that is a predecessor to an Ebb.
Eventually we will be able to remove this struct, but for now it
makes sense to give it a non-conflicting name so that we can start
to transition Ebb to represent a basic block.
I have not updated any comments that refer to BasicBlock, as
eventually we will remove BlockPredecessor and replace with Block,
which is a basic block, so the comments will become correct.
* Manually rename SSABuilder block types to avoid conflict
SSABuilder has its own Block and BlockData types. These along with
associated identifier will cause conflicts in a later commit, so
they are renamed to be more verbose here.
* Automatically rename 'Ebb' to 'Block' in *.rs
* Automatically rename 'EBB' to 'block' in *.rs
* Automatically rename 'ebb' to 'block' in *.rs
* Automatically rename 'extended basic block' to 'basic block' in *.rs
* Automatically rename 'an basic block' to 'a basic block' in *.rs
* Manually update comment for `Block`
`Block`'s wikipedia article required an update.
* Automatically rename 'an `Block`' to 'a `Block`' in *.rs
* Automatically rename 'extended_basic_block' to 'basic_block' in *.rs
* Automatically rename 'ebb' to 'block' in *.clif
* Manually rename clif constant that contains 'ebb' as substring to avoid conflict
* Automatically rename filecheck uses of 'EBB' to 'BB'
'regex: EBB' -> 'regex: BB'
'$EBB' -> '$BB'
* Automatically rename 'EBB' 'Ebb' to 'block' in *.clif
* Automatically rename 'an block' to 'a block' in *.clif
* Fix broken testcase when function name length increases
Test function names are limited to 16 characters. This causes
the new longer name to be truncated and fail a filecheck test. An
outdated comment was also fixed.
* All: Drop 'basic-blocks' feature
This makes it so that 'basic-blocks' cannot be disabled and we can
start assuming it everywhere.
* Tests: Replace non-bb filetests with bb version
* Tests: Adapt solver-fixedconflict filetests to use basic blocks
This commit adds a basic test for reference types on 32/64bit systems.
* Storing a ref type in a table
* Loading a ref type from a table
* Passing ref types to a function
* Returning ref types from a function
* `is_null` instruction
* `is_invalid` instruction
This is a breaking API change: the following settings have been renamed:
- jump_tables_enabled -> enable_jump_tables
- colocated_libcalls -> use_colocated_libcalls
- probestack_enabled -> enable_probestack
- allones_funcaddrs -> emit_all_ones_funcaddrs
The Intel manual uses `CMPNLT` and `CMPNLE` to denote not-less-than and not-less-than-or-equals. These were translated previously to `FloatCC::GreaterThan` and `FloatCC::GreaterThanOrEqual` but should be correctly translated to `FloatCC::UnorderedOrGreaterThanOrEqual` and `FloatCC::UnorderedOrGreaterThan`. This change adds the necessary legalizations to make use of these new encodings.
This patch adds a third mode for templates: REX inference is requestable
at template instantiation time. This reduces the number of recipes
by removing rex()/nonrex() redundancy for many instructions.
Previously `fsub` was used but this fails when negating -0.0 and +0.0 in the SIMD spec tests; using more instructions, this change uses shifts to create a constant for flipping the most significant bit of each lane with `bxor`.