This patch updates or removes all references to the Cranelift repository. It affects links in README documents, issues that were transferred to the Wasmtime repository, CI badges, and a small bunch of sundry items.
Accessing Wasm reference globals that are reference types will
want to use the plain load/store instructions. This commit adds
encodings for these instructions to match loading a i32/i64.
Producers of IR are required to insert the appropriate barriers
around the loads/stores.
Spidermonkey returns a sentinel ref value of '-1' from some VM functions
to indicate failure. This commit adds an instruction analagous to ref.is_null
that checks for this value.
This patch adds a third mode for templates: REX inference is requestable
at template instantiation time. This reduces the number of recipes
by removing rex()/nonrex() redundancy for many instructions.
Previously, the use of `enc_x86_64` emitted two 64-bit mode encodings for `scalar_to_vector.i64`, neither of which contained the REX.W bit telling `MOVD/MOVQ` to move 64 bits of data instead of 32 bits. Now, `scalar_to_vector.i64` will always use a sole 64-bit mode REX.W encoding and `scalar_to_vector` with other widths will have three encodings: a 32-bit mode move, a 64-bit mode move with no REX, and a 64-bit mode move with REX (but not REX.W).
* Add x86 encodings for `bint` converting to `i8` and `i16`
* Introduce tests for many multi-value returns
* Support arbitrary numbers of return values
This commit implements support for returning an arbitrary number of return
values from a function. During legalization we transform multi-value signatures
to take a struct return ("sret") return pointer, instead of returning its values
in registers. Callers allocate the sret space in their stack frame and pass a
pointer to it into the caller, and once the caller returns to them, they load
the return values back out of the sret stack slot. The callee's return
operations are legalized to store the return values through the given sret
pointer.
* Keep track of old, pre-legalized signatures
When legalizing a call or return for its new legalized signature, we may need to
look at the old signature in order to figure out how to legalize the call or
return.
* Add test for multi-value returns and `call_indirect`
* Encode bool -> int x86 instructions in a loop
* Rename `Signature::uses_sret` to `Signature::uses_struct_return_param`
* Rename `p` to `param`
* Add a clarifiying comment in `num_registers_required`
* Rename `num_registers_required` to `num_return_registers_required`
* Re-add newline
* Handle already-assigned parameters in `num_return_registers_required`
* Document what some debug assertions are checking for
* Make "illegalizing" closure's control flow simpler
* Add unit tests and comments for our rounding-up-to-the-next-multiple-of-a-power-of-2 function
* Use `append_isnt_arg` instead of doing the same thing manually
* Fix grammar in comment
* Add `Signature::uses_special_{param,return}` helper functions
* Inline the definition of `legalize_type_for_sret_load` for readability
* Move sret legalization debug assertions out into their own function
* Add `round_up_to_multiple_of_type_align` helper for readability
* Add a debug assertion that we aren't removing the wrong return value
* Rename `RetPtr` stack slots to `StructReturnSlot`
* Make `legalize_type_for_sret_store` more symmetrical to `legalized_type_for_sret`
* rustfmt
* Remove unnecessary loop labels
* Do not pre-assign offsets to struct return stack slots
Instead, let the existing frame layout algorithm decide where they should go.
* Expand "sret" into explicit "struct return" in doc comment
* typo: "than" -> "then" in comment
* Fold test's debug message into the assertion itself
This does a lot at once, since there was no clear way to split the three
commits:
- Instruction need to be passed an explicit InstructionFormat,
- InstructionFormat deduplication is checked once all entities have been
defined;
This avoids a lot of dereferences, and InstructionFormat are immutable
once they're created. It removes a lot of code that was keeping the
FormatRegistry around, just in case we needed the format. This is more
in line with the way we create Instructions, and make it easy to
reference InstructionFormats in general.
Only the shifts with applicable SSE2 instructions are implemented here: PSRL* (for ushr) only has 16-64 bit instructions and PSRA* (for sshr) only has 16-32 bit instructions.
Previously, ConstantData was a type alias for `Vec<u8>` which prevented it from having an implementation; this meant that `V128Imm` and `&[u8; 16]` were used in places that otherwise could have accepted types of different byte lengths.
There are two reasons for this change:
1. it reduces confusion; using the `POR` encoding will match the future encodings of `band` and `bxor` and the `ORPS` encoding may be confusing as it is intended for floating-point operations
2. `POR` has slightly more throughput: it only has to wait 0.33 cycles to execute again on all Intel architectures above Core whereas `ORPS` must wait 1 cycle on architectures older than Skylake (Intel Optimization Reference Manual, C.3)
`POR` does add one additional byte to the encoding and requires SSE2 so the `ORPS` opcode is left in for future use.
This change should make the code more clear (and less code) when adding encodings for instructions with specific immediates; e.g., a constant with a 0 immediate could be encoded as an XOR with something like `const.bind(...)` without explicitly creating the necessary predicates. It has several parts:
* Introduce Bindable trait to instructions
* Convert all instruction bindings to use Bindable::bind()
* Add ability to bind immediates to BoundInstruction
This is an attempt to reduce some of the issues in #955.
Only i16x8 and i32x4 are encoded in this commit mainly because i8x16 and i64x2 do not have simple encodings in x86. i64x2 is not required by the SIMD spec and there is discussion (https://github.com/WebAssembly/simd/pull/98#issuecomment-530092217) about removing i8x16.
The x86 ISA has (at least) two encodings for PEXTRW:
1. in the SSE2 opcode (66 0f c5) the XMM operand uses r/m and the GPR operand uses reg
2. in the SSE4.1 opcode (66 0f 3a 15) the XMM operand uses reg and the GPR operand uses r/m
This changes the 16-bit x86_pextr encoding from 1 to 2 to match the other PEXTR* implementations (all #2 style).
Instead of using MOVUPS to expensively load bits from memory, this change uses a predicate to optimize vconst without a memory access:
- when the 128-bit immediate is all zeroes in all bits, use PXOR to zero out an XMM register
- when the 128-bit immediate is all ones in all bits, use PCMPEQB to set an XMM register to all ones
This leaves the constant data in the constant pool, which may increase code size (TODO)
- `fill` attempted to use a GPR recipe, `fillSib32`, instead of its FPR equivalent, `ffillSib32` (code compiled without error but incorrect instructions were allowed, e.g. `v1 = regmove v0, %rdi -> %xmm0`
- `regmove` could be encoded with a GPR recipe, `rmov`, which hid the above incorrectness; now only FPR-to-FPR regmoves are allowed using the `frmov` recipe