This avoids a lot of dereferences, and InstructionFormat are immutable
once they're created. It removes a lot of code that was keeping the
FormatRegistry around, just in case we needed the format. This is more
in line with the way we create Instructions, and make it easy to
reference InstructionFormats in general.
Only the shifts with applicable SSE2 instructions are implemented here: PSRL* (for ushr) only has 16-64 bit instructions and PSRA* (for sshr) only has 16-32 bit instructions.
Previously, ConstantData was a type alias for `Vec<u8>` which prevented it from having an implementation; this meant that `V128Imm` and `&[u8; 16]` were used in places that otherwise could have accepted types of different byte lengths.
This avoids doing multiple unpacking of the InstructionData for a single
legalization, improving readability and reducing size of the generated
code. For instance, icmp had to unpack the format once per IntCC
condition code.
This adds a `DummyConstant` structure that is converted to something like `let const0 = pos.func.dfg.constants.insert(...)` in `gen_legalizer.rs`. This allows us to create constants during legalization with something like `let ones = constant(vec![0xff; 16])` and then use `ones` within a `def!` block, e.g.: `def!(a = vconst(ones))`. One unfortunate side-effect of this change is that, because the names of the constants in `ConstPool` are dynamic, the `VarPool` and `SymbolTable` structures that previously operated on `&'static str` types now must operate on `String` types; however, since this is a change to the meta code-generation, it should result in no runtime performance impact.
There are two reasons for this change:
1. it reduces confusion; using the `POR` encoding will match the future encodings of `band` and `bxor` and the `ORPS` encoding may be confusing as it is intended for floating-point operations
2. `POR` has slightly more throughput: it only has to wait 0.33 cycles to execute again on all Intel architectures above Core whereas `ORPS` must wait 1 cycle on architectures older than Skylake (Intel Optimization Reference Manual, C.3)
`POR` does add one additional byte to the encoding and requires SSE2 so the `ORPS` opcode is left in for future use.
This change should make the code more clear (and less code) when adding encodings for instructions with specific immediates; e.g., a constant with a 0 immediate could be encoded as an XOR with something like `const.bind(...)` without explicitly creating the necessary predicates. It has several parts:
* Introduce Bindable trait to instructions
* Convert all instruction bindings to use Bindable::bind()
* Add ability to bind immediates to BoundInstruction
This is an attempt to reduce some of the issues in #955.
Add legalizations for icmp and icmp_imm for i64 and i128 operands for
the narrow legalization set, allowing 32-bit ISAs (like x86-32) to
compare 64-bit integers and all ISAs to compare 128-bit integers.
Fixes: https://github.com/bnjbvr/cranelift-x86/issues/2
Only i16x8 and i32x4 are encoded in this commit mainly because i8x16 and i64x2 do not have simple encodings in x86. i64x2 is not required by the SIMD spec and there is discussion (https://github.com/WebAssembly/simd/pull/98#issuecomment-530092217) about removing i8x16.
The x86 ISA has (at least) two encodings for PEXTRW:
1. in the SSE2 opcode (66 0f c5) the XMM operand uses r/m and the GPR operand uses reg
2. in the SSE4.1 opcode (66 0f 3a 15) the XMM operand uses reg and the GPR operand uses r/m
This changes the 16-bit x86_pextr encoding from 1 to 2 to match the other PEXTR* implementations (all #2 style).
Add documentation to the icmp instruction text for both signed and
unsigned overflow, making it very clear why unsigned overflow is
complicated and where to find it.
This move allows the `IntCC`/`FloatCC` enums to be used in both meta (for predicate matching) and in codegen. To avoid breaking any code dependent on the previous location of condcodes.rs (`cranelift-codegen/src/condcodes.rs`), the module is re-exported under `cranelift_codegen::ir`.
Instead of using MOVUPS to expensively load bits from memory, this change uses a predicate to optimize vconst without a memory access:
- when the 128-bit immediate is all zeroes in all bits, use PXOR to zero out an XMM register
- when the 128-bit immediate is all ones in all bits, use PCMPEQB to set an XMM register to all ones
This leaves the constant data in the constant pool, which may increase code size (TODO)