On some ISAs like Intel's, all arithmetic instructions set all or some
of the CPU flags, so flag values can't be live across these
instructions. On ISAs like ARM's Aarch32, flags are clobbered by compact
16-bit encodings but not necessarily by 32-bit encodings of the same
instruction.
The "clobbers_flags" bit on the encoding recipe is used to indicate if
CPU flag values can be live across an instruction, or conversely whether
the encoding can be used where flag values are live.
The arm32 ISA technically has separate floating point and integer flags,
but the only useful thing you can do with the floating point flags is to
copy them ti the integer flags, so there is not need to model them.
The arm64 ISA fixes this and the fcmp instruction writes the integer
nzcv flags directly.
RISC-V does not have CPU flags.
The register allocator can't handle branches with constrained register
operands, and the brz.b1/brnz.b1 instructions only have the t8jccd_abcd
in 32-bit mode where no REX prefixes are possible.
This adds a worst case encoding for those cases where a b1 value lives
in a non-ABCD register.
These spills and fills use 32-bit writes, knowing that the spill slot is
minimum 4 bytes which makes it safe.
Also simplify the definition of load/store encodings a bit by
introducing loops.
Most recipes with an ABCD constraint can handle the full GPR register
class when a REX prefix is applied, but not all. The "icscc" macro
recipe always generates a setCC instruction with no REX prefix, so it
can only write the ABCD registers, even in its REX form.
Don't automatically rewrite ABCD constraints to GPR constraints when
applying a REX prefix to a tail recipe. Instead, allow individual ABCD
recipes to specify a "when_prefixed" alternative recipe to use. This
also eliminates the spurious Rex*abcd recipe names which didn't have an
ABCD constraint.
Also allow recipes to specify that a REX prefix is required by setting
the prefix_required flag. This is used by recipes like t8jccb which
explicitly accesses an 8-bit register with a GPR constraint which is
only valid with a prefix.
Some REX-less encodings require an ABCD input because they are looking
at 8-bit registers. This constraint doesn't apply with a REX prefix
where the low 8 bits of all registers are addressable.
It can happen that the currently live registers are blocking a smaller
register class completely, so the only way of solving the allocation
problem is to turn some of the live-through registers into solver
variables.
When the quick_solve attempt fails, try to free up registers in the
critical register class by turning live-through values into solver
variables.
The brz and brnz instructions get support for 32-bit jump displacements
for long range branches.
Also change the way branch ranges are specified on tail recipes for the
Intel instructions. All branch displacements are relative to the end of
the instruction, so just compute the branch range origin as the
instruction size instead of trying to specify it in the tail recipe
definitions.
The x86_divmodx traps on integer overflow, but the srem instruction is
not supposed to trap with a -1 divisor.
Generate a legalization expansion for srem that special-cases the -1
divisor to simply return 0.
Also make sure we generate type checks for the controlling type variable
in legalization patterns. This is not needed for encodings since the
encoding tables are already keyed on the controlling type variable.
The native x86_fmin and x86_fmax instructions don't behave correctly for
NaN inputs and when comparing +0.0 to -0.0, so we need separate branches
for those cases.
These Intel-specific instructions represent the semantics of the minss /
maxss Intel instructions which behave more like a C ternary operator
than the WebAssembly fmin and fmax instructions.
They will be used as building blocks for implementing the WebAssembly
semantics.
This is used to represent the non-trapping semantics of the cvttss2si and
cvttsd2si instructions (and their vectorized counterparts).
The overflow behavior of this instruction is specific to the Intel ISAs.
There is no float-to-i64 instruction on the 32-bit Intel ISA.
Not all floating point condition codes are directly supported by the
ucimiss/ucomisd instructions. Some inequalities need to be reversed and
eq+ne require two separate tests.
To begin with, these are catch-all encodings with a SIB byte and a
32-bit displacement, so they can access any stack slot via both the
stack pointer and the frame pointer.
In the future, we will add encodings for 8-bit displacements as well as
EBP-relative references without a SIB byte.
Use the simplest expansion which materializes the bits of the floating
point constant as an integer and then bit-casts to the floating point
type. In the future, we may want to use constant pools instead. Either
way, we need custom legalization.
Also add a legalize_monomorphic() function to the Python targetISA class
which permits the configuration of a default legalization action for
monomorphic instructions, just like legalize_type() does for polymorphic
instructions.
Use these encodings to test trapz.b1 and trapnz.b1.
When a b1 value is stored in a register, only the low 8 bits are valid.
This is so we can use the various setCC instructions to generate the b1
registers.
* Added Intel x86-64 encodings for 64bit loads and store instructions
* Using GPR registers instead of ABCD for istore8 with REX prefix
Fixed testing of 64bit intel encoding
* Emit REX and REX-less encodings for optional REX prefix
Value renumbering in binary64.cton
The following instructions have simple encodings:
- bitcast.f32.i32
- bitcast.i32.f32
- bitcast.f64.i64
- bitcast.i64.f64
- fpromote.f64.f32
- fdemote.f32.f64
Also add helper functions enc_flt() and enc_i32_i64 to
intel.encodings.py for generating the common set of encodings for an
instruction: I32, I64 w/REX, I64 w/o REX.
Instructions will multiple type variables can now use `any` to indicate
encodings that don't care about the value of a secondary type variable:
ishl.i32.any instead of ishl.i32.i32
This is only allowed for secondary type variables (which are converted
to instruction predicates). The controlling type variable must still be
fully specified because it is used to key the encoding tables.
The encoding tables are keyed by the controlling type variable only. We
need to distinguish different encodings for instructions with multiple
type variables.
Add a TypePredicate instruction predicate which can check the type of an
instruction value operand. Combine type checks into the instruction
predicate for instructions with more than one type variable.
Add Intel encodings for fcvt_from_sint.f32.i64 which can now be
distinguished from fcvt_from_sint.f32.i32.
The encoding list compression algorithm is not the sharpest knife in the
drawer. It can reuse subsets of I64 encoding lists for I32 instructions,
but only when the I64 lists are defined first.
With this change and the previous change to the encoding list format, we
get the following table sizes for the Intel ISA:
ENCLISTS: 1478 B -> 662 B
LEVEL2: 1072 B (unchanged)
LEVEL1: 32 B -> 48 B
Total: 2582 B -> 1782 B (-31%)
When an instruction doesn't have a valid encoding for the target ISA, it
needs to be legalized. Different legalization strategies can be
expressed as separate XFormGroup objects.
Make the choice of XFormGroup configurable per CPU mode, rather than
depending on a hard-coded default.
Add a CPUMode.legalize_type() method which assigns an XFormGroup to
controlling type variables and lets you set a default.
Add a `legalize` field to Level1Entry so the first-level hash table
lookup gives us the configured default legalization action for the
instruction's controlling type variable.