This move allows the `IntCC`/`FloatCC` enums to be used in both meta (for predicate matching) and in codegen. To avoid breaking any code dependent on the previous location of condcodes.rs (`cranelift-codegen/src/condcodes.rs`), the module is re-exported under `cranelift_codegen::ir`.
Instead of using MOVUPS to expensively load bits from memory, this change uses a predicate to optimize vconst without a memory access:
- when the 128-bit immediate is all zeroes in all bits, use PXOR to zero out an XMM register
- when the 128-bit immediate is all ones in all bits, use PCMPEQB to set an XMM register to all ones
This leaves the constant data in the constant pool, which may increase code size (TODO)
This patch:
* removes the "default" opt level, on the basis that it has no definition and
is referred to nowhere in the compiler.
* renames the "fastest" level to "none". The resulting set of transformations
is unchanged.
* renames the "best" level to "speed_and_size". The resulting set of
transformations is unchanged.
* adds a new level, "speed". This is the same as "speed_and_size" except that
it omits transformations aimed only at reducing code size. Currently it
omits only the insn shrinking pass.
This function is responsible for 2.2% of all heap allocation (calls) in CL.
This change avoids all of them in the (presumably) common case where none of
the parameters require splitting. It also slightly reduces the compiler's
instruction count.
Converting something like iadd.i64 on a 32-bits architecture into a
iadd_imm.i64 will result in the instruction being legalized back to an
iadd.i64 later on, creating unnecessary churn.
This commit implements avoid doing so, and changes the target ISA to a
64-bits platform for tests than ran into this, as well as making sure
this won't happen on 32-bits platforms.
This commit is based on the assumption that floats are already stored in XMM registers in x86. When extracting a lane, cranelift was moving the float to a regular register and back to an XMM register; this change avoids this by shuffling the float value to the lowest bits of the XMM register. It also assumes that the upper bits can be left as is (instead of zeroing them out).
This function is responsible for 8.5% of all heap allocation (calls) in CL.
This change avoids almost all of them by using a SmallVec::<[Value; 32]>
instead. Dynamic instruction count falls by 0.25%. The fixed size of 32 was
arrived at after profiling with fixed sizes of 1, 2, 4, 8, 16, 32, 64 and 128.
32 is as high as I can push it without the instruction count starting to creep
up again, and gets almost all the block-reduction win of 64 and 128.
Pushing on the `val_stack` vector is CL's biggest source of calls to
malloc/realloc/free, by some margin. It accounts for about 27.7% of all heap
blocks allocated when compiling wasm_lua_binarytrees. This change removes
pretty much all dynamic allocation by changing to a SmallVec<[Value; 8]>
instead. A fixed size of 4 gets all the gains to be had, in testing, so 8
gives some safety margin and is harmless from a stack-use perspective: 8
Values will occupy 32 bytes.
As a bonus, this change also reduces the compiler's dynamic instruction count
by about 0.5%.
This removes the explicit dependency on target-lexicon for the embedder,
which can instead use the ISA's name directly. It can simplify
dependency management, in particular avoid the need for synchronizing
the target-lexicon dependencies versions.
It also tweak the error when an ISA isn't built as part of Cranelift to
be a SupportDisabled error; this was dead code before this.