Instead of using MOVUPS to expensively load bits from memory, this change uses a predicate to optimize vconst without a memory access:
- when the 128-bit immediate is all zeroes in all bits, use PXOR to zero out an XMM register
- when the 128-bit immediate is all ones in all bits, use PCMPEQB to set an XMM register to all ones
This leaves the constant data in the constant pool, which may increase code size (TODO)
- `fill` attempted to use a GPR recipe, `fillSib32`, instead of its FPR equivalent, `ffillSib32` (code compiled without error but incorrect instructions were allowed, e.g. `v1 = regmove v0, %rdi -> %xmm0`
- `regmove` could be encoded with a GPR recipe, `rmov`, which hid the above incorrectness; now only FPR-to-FPR regmoves are allowed using the `frmov` recipe
This patch:
* removes the "default" opt level, on the basis that it has no definition and
is referred to nowhere in the compiler.
* renames the "fastest" level to "none". The resulting set of transformations
is unchanged.
* renames the "best" level to "speed_and_size". The resulting set of
transformations is unchanged.
* adds a new level, "speed". This is the same as "speed_and_size" except that
it omits transformations aimed only at reducing code size. Currently it
omits only the insn shrinking pass.
Enables automated fuzzing on Fuzzit. Runs fuzz regression tests
every push and PR. Runs full fuzzing every push. Fuzzit emails
if it finds crashes.
Uses the existing fuzz targets:
* translate-module - Fuzz valid WebAssembly modules.
* reader-parse - Fuzz IR text format parsing.
Enables automated fuzzing on Fuzzit. Runs fuzz regression tests
every push and PR. Runs full fuzzing every push. Fuzzit emails
if it finds crashes.
Uses the existing fuzz targets:
* translate-module - Fuzz valid WebAssembly modules.
* reader-parse - Fuzz IR text format parsing.
This function is responsible for 2.2% of all heap allocation (calls) in CL.
This change avoids all of them in the (presumably) common case where none of
the parameters require splitting. It also slightly reduces the compiler's
instruction count.
After we add any instruction to an EBB, we can't add EBB parameters
anymore. There is an assertion in `append_ebb_param` to detect this
error, but the backtrace generated doesn't give any hint what happened.
We add an error message to the assertion so the user can understand what
is wrong.
fixes#1003
The `SecondaryMap` abstraction -- basically, resize-on-demand arrays with a
default value -- is very hot in Cranelift. This small patch is the result of
many profiling runs. It makes two changes:
* `fn index_mut` is changed to be `#[inline(always)]`, based on profile data.
* `fn index` and `fn index_mut` call `self.elems.resize()` directly, rather
than via `self.resize()`. The point of this is not to improve performance.
Rather, it ensures that the public functions for `SecondaryMap` do not call
each other. When public interface functions call each other, it becomes
difficult to interpret profiling results, because it's harder to see what
fraction of costs for `SecondaryMap` as a whole come from outside the
module, and what fraction is the result of "internal" calls to the external
interface.
The overall result, for wasm_lua_binarytrees, is a 1.4% reduction in
instruction count for the compiler, and a 2.2% reduction in loads/stores.