This commit addresses two issues:
* A panic when shifting any non i128 type by i128 amounts (#3064)
* Wrong results when lowering shifts with small types (i8, i16)
In these types when shifting for amounts larger than the size of the
type, we would not get the wrapping behaviour that we see on i32 and i64.
This is because in these larger types, the wrapping behaviour is automatically
implemented by using the appropriate instruction, however we do not
have i8 and i16 specific instructions, so we have to manually wrap
the shift amount with an AND instruction.
This issue is also found on x86_64 and s390x, and a separate issue will
be filed for those.
Closes#3064
When encoding constants as immediates into an RSE Imm12 instruction we need to take special care to check if the value that we are trying to input does not overflow its type when viewed as a signed value. (i.e. iconst.i8 200)
We cannot both put an immediate and sign extend it, so we need to lower it into a separate reg, and emit the sign extend into the instruction.
For more details see the [cg_clif bug report](https://github.com/bjorn3/rustc_codegen_cranelift/issues/1184#issuecomment-873214796).
Also, reorganize the AArch64-specific VCode instructions for unary
narrowing and widening vector operations, so that they are more
straightforward to use.
Copyright (c) 2021, Arm Limited.
Implement the `TlsValue` opcode in the aarch64 backend for ELF_GD.
This is a little bit unusual as the default TLS mechanism for aarch64 is TLS Descriptors in other compilers.
However currently we only recognize elf_gd so lets start with that as a TLS implementation.
Lowering icmp was duplicated across callers that only cared about
flags, and callers that only cared about the bool result.
Merge both callers into `lower_icmp` which does the correct thing
depending on a new IcmpOutput parameter.
The previous address calculation code had a bug where we tried to
add offsets into a temporary register before defining it, causing
the regalloc to complain.
We have 3 different aproaches depending on the type of comparision requested:
* For eq/ne we compare the high bits and low bits and check
if they are equal
* For overflow checks, we perform a i128 add and check the
resulting overflow flag
* For the remaining comparisions (gt/lt/sgt/etc...)
We compare both the low bits and high bits, and if the high bits are
equal we return the result of the unsigned comparision on the low bits
As with other i128 ops, we are still missing immlogic support.
Currently we just basically use a two instruction version of the same i64 ops.
IMMLogic doesn't really support multiple register inputs, so its left as a TODO for future optimizations.
* Add support for x64 packed promote low
* Add support for x64 packed floating point demote
* Update vector promote low and demote by adding constraints
Also does some renaming and minor refactoring
The previous choice to use the WasmtimeSystemV calling convention for
apple-aarch64 devices was incorrect: padding of arguments was
incorrectly computed. So we have to use some flavor of the apple-aarch64
ABI there.
Since we want to support the wasmtime custom convention for multiple
returns on apple-aarch64 too, a new custom Wasmtime calling convention
was introduced to support this.
This is sometimes useful when performing analyses on the generated
machine code: for example, some kinds of code verifiers will want to do
a control-flow analysis, and it is much easier to do this if one does
not have to recover the CFG from the machine code (doing so requires
heavyweight analysis when indirect branches are involved). If one trusts
the control-flow lowering and only needs to verify other properties of
the code, this can be very useful.
Since `uadd_sat`, `sadd_sat`, `usub_sat`, and `ssub_sat` are now only
available to vector types, this removes the lowering code for the
scalar versions of these instructions in the arm32 and aarch64 backends.
When dealing with params that need to be split, we follow the
arch64 ABI and split the value in two, and make sure that start that
argument in an even numbered xN register.
The apple ABI does not require this, so on those platforms, we start
params anywhere.