This commit addresses two issues:
* A panic when shifting any non i128 type by i128 amounts (#3064)
* Wrong results when lowering shifts with small types (i8, i16)
In these types when shifting for amounts larger than the size of the
type, we would not get the wrapping behaviour that we see on i32 and i64.
This is because in these larger types, the wrapping behaviour is automatically
implemented by using the appropriate instruction, however we do not
have i8 and i16 specific instructions, so we have to manually wrap
the shift amount with an AND instruction.
This issue is also found on x86_64 and s390x, and a separate issue will
be filed for those.
Closes#3064
Also, reorganize the AArch64-specific VCode instructions for unary
narrowing and widening vector operations, so that they are more
straightforward to use.
Copyright (c) 2021, Arm Limited.
Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD.
This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers.
However currently we only recognize elf_gd so lets start with that as a TLS implementation.
Lowering icmp was duplicated across callers that only cared about
flags, and callers that only cared about the bool result.
Merge both callers into `lower_icmp` which does the correct thing
depending on a new IcmpOutput parameter.
We have 3 different aproaches depending on the type of comparision requested:
* For eq/ne we compare the high bits and low bits and check
if they are equal
* For overflow checks, we perform a i128 add and check the
resulting overflow flag
* For the remaining comparisions (gt/lt/sgt/etc...)
We compare both the low bits and high bits, and if the high bits are
equal we return the result of the unsigned comparision on the low bits
As with other i128 ops, we are still missing immlogic support.
Currently we just basically use a two instruction version of the same i64 ops.
IMMLogic doesn't really support multiple register inputs, so its left as a TODO for future optimizations.
* Add support for x64 packed promote low
* Add support for x64 packed floating point demote
* Update vector promote low and demote by adding constraints
Also does some renaming and minor refactoring
Since `uadd_sat`, `sadd_sat`, `usub_sat`, and `ssub_sat` are now only
available to vector types, this removes the lowering code for the
scalar versions of these instructions in the arm32 and aarch64 backends.
When dealing with params that need to be split, we follow the
arch64 ABI and split the value in two, and make sure that start that
argument in an even numbered xN register.
The apple ABI does not require this, so on those platforms, we start
params anywhere.
This commit enables Cranelift's AArch64 backend to generate code
for instruction set extensions (previously only the base Armv8-A
architecture was supported); also, it makes it possible to detect
the extensions supported by the host when JIT compiling. The new
functionality is applied to the IR instruction `AtomicCas`.
Copyright (c) 2021, Arm Limited.
The StructReturn ABI is fairly simple at the codegen/isel level: we only
need to take care to return the sret pointer as one of the return values
if that wasn't specified in the initial function signature.
Struct arguments are a little more complex. A struct argument is stored
as a chunk of memory in the stack-args space. However, the CLIF
semantics are slightly special: on the caller side, the parameter passed
in is a pointer to an arbitrary memory block, and we must memcpy this
data to the on-stack struct-argument; and on the callee side, we provide
a pointer to the passed-in struct-argument as the CLIF block param
value.
This is necessary to support various ABIs other than Wasm, such as that
of Rust (with the cg_clif codegen backend).
This will allow for support for `I128` values everywhere, and `I64`
values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the
machine backends to build such support; it just adds the framework for
the MachInst backends to *reason* about a `Value` residing in more than
one register.
On AArch64, the zero register (xzr) and the stack pointer (xsp) are
alternately named by the same index `31` in machine code depending on
context. In particular, in the reg-reg-immediate ALU instruction form,
add/subtract will use the stack pointer, not the zero register, if index
31 is given for the first (register) source arg.
In a few places, we were emitting subtract instructions with the zero
register as an argument and a reg/immediate as the second argument. When
an immediate could be incorporated directly (we have the `iconst`
definition visible), this would result in incorrect code being
generated.
This issue was found in `ineg` and in the sequence for vector
right-shifts.
Reported by Ian Cullinan; thanks!