Legalize b{and,or,xor}_not into component instructions (#5709)

* Remove trailing whitespace in `lower.isle` files

* Legalize the `band_not` instruction into simpler form

This commit legalizes the `band_not` instruction into `band`-of-`bnot`,
or two instructions. This is intended to assist with egraph-based
optimizations where the `band_not` instruction doesn't have to be
specifically included in other bit-operation-patterns.

Lowerings of the `band_not` instruction have been moved to a
specialization of the `band` instruction.

* Legalize `bor_not` into components

Same as prior commit, but for the `bor_not` instruction.

* Legalize bxor_not into bxor-of-bnot

Same as prior commits. I think this also ended up fixing a bug in the
s390x backend where `bxor_not x y` was actually translated as `bnot
(bxor x y)` by accident given the test update changes.

* Simplify not-fused operands for riscv64

Looks like some delegated-to rules have special-cases for "if this
feature is enabled use the fused instruction" so move the clause for
testing the feature up to the lowering phase to help trigger other rules
if the feature isn't enabled. This should make the riscv64 backend more
consistent with how other backends are implemented.

* Remove B{and,or,xor}Not from cost of egraph metrics

These shouldn't ever reach egraphs now that they're legalized away.

* Add an egraph optimization for `x^-1 => ~x`

This adds a simplification node to translate xor-against-minus-1 to a
`bnot` instruction. This helps trigger various other optimizations in
the egraph implementation and also various backend lowering rules for
instructions. This is chiefly useful as wasm doesn't have a `bnot`
equivalent, so it's encoded as `x^-1`.

* Add a wasm test for end-to-end bitwise lowerings

Test that end-to-end various optimizations are being applied for input
wasm modules.

* Specifically don't self-update rustup on CI

I forget why this was here originally, but this is failing on Windows
CI. In general there's no need to update rustup, so leave it as-is.

* Cleanup some aarch64 lowering rules

Previously a 32/64 split was necessary due to the `ALUOp` being different
but that's been refactored away no so there's no longer any need for
duplicate rules.

* Narrow a x64 lowering rule

This previously made more sense when it was `band_not` and rarely used,
but be more specific in the type-filter on this rule that it's only
applicable to SIMD types with lanes.

* Simplify xor-against-minus-1 rule

No need to have the commutative version since constants are already
shuffled right for egraphs

* Optimize band-of-bnot when bnot is on the left

Use some more rules in the egraph algebraic optimizations to
canonicalize band/bor/bxor with a `bnot` operand to put the operand on
the right. That way the lowerings in the backends only have to list the
rule once, with the operand on the right, to optimize both styles of
input.

* Add commutative lowering rules

* Update cranelift/codegen/src/isa/x64/lower.isle

Co-authored-by: Jamey Sharp <jamey@minilop.net>

---------

Co-authored-by: Jamey Sharp <jamey@minilop.net>
This commit is contained in:
Alex Crichton
2023-02-06 13:53:40 -06:00
committed by GitHub
parent 99c3936616
commit de0e0bea3f
17 changed files with 506 additions and 277 deletions

View File

@@ -269,6 +269,36 @@
(value_gprs (x64_and $I64 x_lo y_lo)
(x64_and $I64 x_hi y_hi))))
;; Specialized lowerings for `(band x (bnot y))` which is additionally produced
;; by Cranelift's `band_not` instruction that is legalized into the simpler
;; forms early on.
(decl sse_and_not (Type Xmm XmmMem) Xmm)
(rule (sse_and_not $F32X4 x y) (x64_andnps x y))
(rule (sse_and_not $F64X2 x y) (x64_andnpd x y))
(rule -1 (sse_and_not (multi_lane _bits _lanes) x y) (x64_pandn x y))
;; Note the flipping of operands below as we're match
;;
;; (band x (bnot y))
;;
;; while x86 does
;;
;; pandn(x, y) = and(not(x), y)
(rule 8 (lower (has_type ty @ (multi_lane _bits _lane) (band x (bnot y))))
(sse_and_not ty y x))
(rule 9 (lower (has_type ty @ (multi_lane _bits _lane) (band (bnot y) x)))
(sse_and_not ty y x))
(rule 10 (lower (has_type ty @ (use_bmi1 $true) (band x (bnot y))))
(if (ty_int_ref_scalar_64 ty))
;; the first argument is the one that gets inverted with andn
(x64_andn ty y x))
(rule 11 (lower (has_type ty @ (use_bmi1 $true) (band (bnot y) x)))
(if (ty_int_ref_scalar_64 ty))
(x64_andn ty y x))
;;;; Rules for `bor` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; `{i,b}64` and smaller.
@@ -1085,52 +1115,6 @@
(OperandSize.Size32))))
(x64_pmuludq x2 y2)))
;;;; Rules for `band_not` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(decl sse_and_not (Type Xmm XmmMem) Xmm)
(rule (sse_and_not $F32X4 x y) (x64_andnps x y))
(rule (sse_and_not $F64X2 x y) (x64_andnpd x y))
(rule -1 (sse_and_not (multi_lane _bits _lanes) x y) (x64_pandn x y))
;; Note the flipping of operands below. CLIF specifies
;;
;; band_not(x, y) = and(x, not(y))
;;
;; while x86 does
;;
;; pandn(x, y) = and(not(x), y)
(rule 0 (lower (has_type ty (band_not x y)))
(sse_and_not ty y x))
(rule 1 (lower (has_type ty @ (use_bmi1 $false) (band_not x y)))
(if (ty_int_ref_scalar_64 ty))
(x64_and ty
x
(x64_not ty y)))
(rule 1 (lower (has_type ty @ (use_bmi1 $true) (band_not x y)))
(if (ty_int_ref_scalar_64 ty))
;; the first argument is the one that gets inverted with andn
(x64_andn ty y x))
;;;; Rules for `bxor_not` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(rule 0 (lower (has_type ty (bxor_not x y)))
(if (ty_int_ref_scalar_64 ty))
(x64_xor ty
x
(x64_not ty y)))
;;;; Rules for `bor_not` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(rule 0 (lower (has_type ty (bor_not x y)))
(if (ty_int_ref_scalar_64 ty))
(x64_or ty
x
(x64_not ty y)))
;;;; Rules for `iabs` ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(rule (lower (has_type $I8X16 (iabs x)))
@@ -1167,7 +1151,7 @@
;; it below, since we need to pass it into the cmove
;; before we pass the cmove to with_flags_reg.
(neg_result Gpr (produces_flags_get_reg neg))
;; When the neg instruction sets the sign flag,
;; When the neg instruction sets the sign flag,
;; takes the original (non-negative) value.
(cmove ConsumesFlags (cmove ty (CC.S) src neg_result)))
(with_flags_reg (produces_flags_ignore neg) cmove)))
@@ -3586,7 +3570,7 @@
(u8_from_uimm8 lane))))
(x64_pshufd val lane (OperandSize.Size32)))
;; This is the only remaining case for F64X2
;; This is the only remaining case for F64X2
(rule 1 (lower (has_type $F64 (extractlane val @ (value_type (ty_vec128 ty))
(u8_from_uimm8 1))))
;; 0xee == 0b11_10_11_10