This function is responsible for 2.2% of all heap allocation (calls) in CL.
This change avoids all of them in the (presumably) common case where none of
the parameters require splitting. It also slightly reduces the compiler's
instruction count.
Converting something like iadd.i64 on a 32-bits architecture into a
iadd_imm.i64 will result in the instruction being legalized back to an
iadd.i64 later on, creating unnecessary churn.
This commit implements avoid doing so, and changes the target ISA to a
64-bits platform for tests than ran into this, as well as making sure
this won't happen on 32-bits platforms.
This commit is based on the assumption that floats are already stored in XMM registers in x86. When extracting a lane, cranelift was moving the float to a regular register and back to an XMM register; this change avoids this by shuffling the float value to the lowest bits of the XMM register. It also assumes that the upper bits can be left as is (instead of zeroing them out).
This function is responsible for 8.5% of all heap allocation (calls) in CL.
This change avoids almost all of them by using a SmallVec::<[Value; 32]>
instead. Dynamic instruction count falls by 0.25%. The fixed size of 32 was
arrived at after profiling with fixed sizes of 1, 2, 4, 8, 16, 32, 64 and 128.
32 is as high as I can push it without the instruction count starting to creep
up again, and gets almost all the block-reduction win of 64 and 128.
Pushing on the `val_stack` vector is CL's biggest source of calls to
malloc/realloc/free, by some margin. It accounts for about 27.7% of all heap
blocks allocated when compiling wasm_lua_binarytrees. This change removes
pretty much all dynamic allocation by changing to a SmallVec<[Value; 8]>
instead. A fixed size of 4 gets all the gains to be had, in testing, so 8
gives some safety margin and is harmless from a stack-use perspective: 8
Values will occupy 32 bytes.
As a bonus, this change also reduces the compiler's dynamic instruction count
by about 0.5%.
This removes the explicit dependency on target-lexicon for the embedder,
which can instead use the ISA's name directly. It can simplify
dependency management, in particular avoid the need for synchronizing
the target-lexicon dependencies versions.
It also tweak the error when an ISA isn't built as part of Cranelift to
be a SupportDisabled error; this was dead code before this.
* [codegen] add encodings for iadd carry variants
Add encodings for iadd carry variants (iadd_cout, iadd_cin, iadd_carry)
for x86_32, enabling the legalization for iadd.i64 to work.
* [codegen] remove support for iadd carry variants on riscv
Previously, the carry variants of iadd (iadd_cin, iadd_cout and
iadd_carry) were being legalized for isa/riscv since RISC architectures
lack a flags register.
This forced us to return and accept booleans for these operations, which
proved to be problematic and inconvenient, especially for x86.
This commit removes support for said statements and all dependent
statements for isa/riscv so that we can work on a better legalization
strategy in the future.
* [codegen] change operand type from bool to iflag for iadd carry variants
The type of the carry operands for the carry variants of the iadd
instruction (iadd_cin, iadd_cout, iadd_carry) was bool for compatibility
reasons for isa/riscv. Since support for these instructions on RISC
architectures has been temporarily suspended, we can safely change the
type to iflags.
* the target-lexicon crate no longer has or needs the std feature
in cargo, so we can delete all default-features=false, any mentions
of its std feature, and the nostd configs in many lib.rs files
* the representation of arm architectures has changed, so some case
statements needed refactoring
This commit changes 128-bit constant parsing in two ways:
- it adds the ability to use underscores to separate digits when writing a 128-bit constant in hexadecimal; e.g. `0x00010203...` can now be written as `0x0001_0203_...`
- it adds a new mechanism for parsing 128-bit constants using integer/float/boolean literals; e.g. `vconst.i32x4 [1 2 3 4]`. Note that currently the controlling type of the instruction dictates how many literals to parse inside the brackets.
It was implemented this way before to avoid borrow-checking issues,
where self would be both mutably borrowed (because of the solver) and
immutably borrowed (because of the ABI parameters list). This is worked
around by adding a local AbiParams struct which contains a summary of
the information that's needed by program_input_abi, allowing to retrieve
the ABI params within the method's body itself.
When using basic block instructions cannot be added in-between jump instructions which are ending basic blocks. These changes create extra basic blocks such that extra space is available for the spilling and moving registers where they are expected.