Add support for binary/octal literals to ISLE (#6234)

* Add support for binary/octal literals to ISLE

In a number of x64-changes recently some u8 immediates are interpreted
as four bit-packed 2-bit numbers and I have a tough time going between
hex and these bit-packed numbers. I've been writing `0xAA == 0b...` in
comments to indicate the intent but I figured it'd be a bit clearer if
the binary literal was accepted directly!

This is a minor update to the ISLE lexer to allow for binary `0b00...`
and octal `0o00...` literals in the same manner as hex literals. Some
comments in the x64 backend are then removed to use the binary literal
syntax directly.

* Update ISLE reference for octal/binary

* Update ISLE tests for octal/binary
This commit is contained in:
Alex Crichton
2023-04-18 18:04:04 -05:00
committed by GitHub
parent c17a3d89f7
commit b6bb6a196a
5 changed files with 144 additions and 85 deletions

View File

@@ -1346,21 +1346,21 @@
;; result = [ vec[3] vec[2] tmp[0] tmp[2] ]
(rule (vec_insert_lane $F32X4 vec (RegMem.Reg val) 1)
(let ((tmp Xmm (x64_movlhps val vec)))
(x64_shufps tmp vec 0xe2))) ;; 0xe2 == 0b11_10_00_10
(x64_shufps tmp vec 0b11_10_00_10)))
;; f32x4.replace_lane 2 - without insertps
;; tmp = [ vec[0] vec[3] val[0] val[0] ]
;; result = [ tmp[2] tmp[0] vec[1] vec[0] ]
(rule (vec_insert_lane $F32X4 vec (RegMem.Reg val) 2)
(let ((tmp Xmm (x64_shufps val vec 0x30))) ;; 0x30 == 0b00_11_00_00
(x64_shufps vec tmp 0x84))) ;; 0x84 == 0b10_00_01_00
(let ((tmp Xmm (x64_shufps val vec 0b00_11_00_00)))
(x64_shufps vec tmp 0b10_00_01_00)))
;; f32x4.replace_lane 3 - without insertps
;; tmp = [ vec[3] vec[2] val[1] val[0] ]
;; result = [ tmp[0] tmp[2] vec[1] vec[0] ]
(rule (vec_insert_lane $F32X4 vec (RegMem.Reg val) 3)
(let ((tmp Xmm (x64_shufps val vec 0xe4))) ;; 0xe4 == 0b11_10_01_00
(x64_shufps vec tmp 0x24))) ;; 0x24 == 0b00_10_01_00
(let ((tmp Xmm (x64_shufps val vec 0b11_10_01_00)))
(x64_shufps vec tmp 0b00_10_01_00)))
;; Recursively delegate to the above rules by loading from memory first.
(rule (vec_insert_lane $F32X4 vec (RegMem.Mem addr) idx)
@@ -3422,7 +3422,7 @@
(a Xmm a)
(libcall LibCall (round_libcall $F64 imm))
(result Xmm (libcall_1 libcall a))
(a1 Xmm (libcall_1 libcall (x64_pshufd a 0x0e))) ;; 0x0e == 0b00_00_11_10
(a1 Xmm (libcall_1 libcall (x64_pshufd a 0b00_00_11_10)))
(result Xmm (vec_insert_lane $F64X2 result a1 1))
)
result))
@@ -3818,8 +3818,7 @@
;; This is the only remaining case for F64X2
(rule 1 (lower (has_type $F64 (extractlane val @ (value_type (ty_vec128 ty))
(u8_from_uimm8 1))))
;; 0xee == 0b11_10_11_10
(x64_pshufd val 0xee))
(x64_pshufd val 0b11_10_11_10))
;; Note that the `pextrb` lowering here is relied upon by the `extend_to_gpr`
;; helper because it will elide a `uextend` operation when `extractlane` is the

View File

@@ -21,11 +21,11 @@ instructions. For example:
- An `iadd` (integer add) operator can always be lowered to an x86
`ADD` instruction with two register sources.
- An `iadd` operator with one `iconst` (integer-constant) argument can
be lowered to an x86 `ADD` instruction with a register and an
immediate.
One could write something like the following in ISLE (simplified from
the real code [here](../codegen/src/isa/x64/lower.isle)):
@@ -43,7 +43,7 @@ the real code [here](../codegen/src/isa/x64/lower.isle)):
;; `y` is a `RegMemImm.Imm`.
y)))
```
ISLE lets the compiler backend developer express this information in a
declarative way -- i.e., just write down a list of patterns, without
worrying how the compilation process tries them out -- and the ISLE
@@ -68,16 +68,16 @@ This document is organized into the following sections:
systems work, how to think about nested terms, patterns and rewrite
rules, how they provide a general mechanism for computation, and how
term-rewriting is often used in a compiler-implementation context.
* Core ISLE: the foundational concepts of the ISLE DSL, building upon
a general-purpose term-rewriting base. Covers the type system (typed
terms) and how rules are written.
* ISLE with Rust: covers how ISLE provides an "FFI" (foreign function
interface) of sorts to allow interaction with Rust code, and
describes the scheme by which ISLE execution is mapped onto Rust
(data structures and control flow).[^1]
* ISLE Internals: describes how the ISLE compiler works. Provides
insight into how an unordered collection of rewrite rules are
combined into executable Rust code that efficiently traverses the
@@ -161,7 +161,7 @@ form, or at least can be interpreted that way.[^3]
structure that is present in those symbols in any well-formed
sequence. For example, we can define a TRS that only operates on
terms with balanced parentheses; then we have our tree.
In ISLE and hence in this document, we operate on terms that are
written in an
[S-expression](https://en.wikipedia.org/wiki/S-expression) syntax,
@@ -239,9 +239,9 @@ right-hand nomenclature comes from a common way of writing rules as:
```plain
A -> B ;; any term "A" is rewritten to "B"
(A x) -> (B (C x)) ;; any term (A x), for some x, is rewritten to (B (C x)).
(A _) -> (D) ;; any term (A _), where `_` is a wildcard (any subterm),
;; is rewritten to (D).
```
@@ -254,16 +254,16 @@ subterms:
* `(A pat1 pat2 ...)` matches a constructor `A` with patterms for each
of its arguments.
* `x` matches any subterm and captures its value in a variable
binding, which can be used later when we specify the right-hand side
(so that the rewrite contains parts of the original term).
* `_` is a wildcard and matches anything, without capturing it.
* Primitive constant values, such as `42` or `$Symbol`, match only if
the term is exactly equal to this constant.
These pattern-matching operators can be combined, so we could write,
for example, `(A (B x _) z)`. This pattern would match the term `(A (B
1 2) 3)` but not `(A (C 4 5) 6)`.
@@ -388,7 +388,7 @@ clarity.
separate Rust functions, so factoring rules to use intermediate
terms can provide code-size and compile-time benefits for the
ISLE-generated Rust code as well.
[^6]: The [lambda calculus' reduction
rules](https://en.wikipedia.org/wiki/Lambda_calculus#Reduction)
are a good example of this.
@@ -415,9 +415,9 @@ rewrites it into a machine-*dependent* instruction term. For example:
```plain
(iadd a b) -> (isa.add_reg_reg a b)
(iadd a (iconst 0)) -> a
(iadd a (iconst n)) (isa.add_reg_imm a n)
```
@@ -494,8 +494,11 @@ The pattern (left-hand side) is made up of the following match
operators:
* Wildcards (`_`).
* Integer constants (decimal/hex, positive/negative: `1`, `-1`,
`0x80`, `-0x80`) and boolean constants (`#t`, `#f`).
* Integer constants (decimal/hex/binary/octal, positive/negative: `1`, `-1`,
`0x80`, `-0x80`) and boolean constants (`#t`, `#f`). Hex constants can
start with either `0x` or `0X`. Binary constants start with `0b`. Octal
constants start with `0o`. Integers can also be interspersed with `_` as a
separator, for example `1_000` or `0x1234_5678`, for readability.
* constants imported from the embedding, of arbitrary type
(`$MyConst`).
* Variable captures and matches (bare identifiers like `x`; an
@@ -515,7 +518,7 @@ operators:
defined term (type variant or constructor) and the subpatterns are
applied to each argument value in turn. Note that `term` cannot be a
wildcard; it must be a specific, concrete term.
The expression (right-hand side) is made up of the following
expression operators:
@@ -530,7 +533,7 @@ expression operators:
to the immediately previous variable bindings (i.e., this is like a
`let*` in Scheme). `let`s are lexically-scoped, meaning that bound
variables are available only within the body of the `let`.
When multiple rules are applicable to rewrite a particular term, ISLE
will choose the "more specific" rule according to a particular
heuristic: in the lowered sequence of matching steps, when one
@@ -573,14 +576,14 @@ of type definitions are:
(type u32 (primitive u32)) ;; u32 is a primitive, and is
;; spelled `u32` in the generated Rust code.
(type MyType (enum
(A (x u32) (y u32))
(B (z u32))
(C))) ;; MyType is an enum, with variants
;; `MyType::A { x, y }`, `MyType::B { z }`,
;; and `MyType::C`.
(type MyType2 extern (enum (A)))
;; MyType2 is an enum with variant `MyType2::A`.
;; Its type definition is not included in the
@@ -611,11 +614,11 @@ automatically have the following constructors:
;; These definitions are implicit and do not need to be written (doing
;; so is a compile-time error, actually). We write them here just to
;; show what they would look like.
(decl MyType.A (u32 u32) MyType)
(decl MyType.B (u32) MyType)
(decl MyType.C () MyType)
(decl MyType2.A () MyType2)
```
@@ -671,39 +674,39 @@ The typing rules for patterns in ISLE are:
constant, etc.). This is because compilation and dispatch into rules
is organized by the top-level constructor of the term being
rewritten.
* At each part of the pattern except the root, there is an "expected
type" that is inferred from the surrounding context. We check that
this matches the actual type of the pattern.
* A constructor pattern `(C x y z)`, given a constructor `(decl C (T1
T2 T2) R)`, has type `R` and provides expected types `T1`, `T2`, and
`T3` to its subpatterns.
* A variable capture pattern `x` is compatible with any expected type
the first time it appears, and captures this expected type under the
variable identifier `x` in the type environment. Subsequent
appearances of `x` check that the expected type matches the
already-captured type.
* A conjunction `(and PAT1 PAT2 ...)` checks that each subpattern is
compatible with the expected type.
* Integer constants are compatible with any primitive expected
type. (This may change in the future if we add non-numeric
primitives, such as strings.)
If we are able to typecheck the pattern, we have a type environment
that is a map from variable bindings to types: e.g., `{ x: MyType, y:
MyType2, z: u32 }`. We then typecheck the rewrite expression.
* Every expression also has an expected type, from the surrounding
context. We check that the provided expression matches this type.
* The top-level rewrite expression must have the same type as the
top-level constructor in the pattern. (In other words, a term can
only be rewritten to another term of the same type.)
* Constructors check their return values against the expected type,
and typecheck their argument expressions against their parameter
types.
@@ -712,7 +715,7 @@ MyType2, z: u32 }`. We then typecheck the rewrite expression.
these are added to the type environment while typechecking the
body. The expected type for the body is the same as the expected
type for the `let` itself.
### A Note on Heterogeneous Types
We should illuminate one particular aspect of the ISLE type system
@@ -726,9 +729,9 @@ a `T2`. Concretely:
```lisp
(type T1 ...)
(type T2 ...)
(decl Translate (T1) T2)
(rule (Translate (T1.A ...))
(T2.X ...))
(rule (Translate (T1.B ...))
@@ -760,10 +763,10 @@ one could have:
(type IR ...)
(type Machine1 ...)
(type Machine2 ...)
(decl TranslateToMachine1 (IR) Machine1)
(decl TranslateToMachine2 (IR) Machine2)
(rule (TranslateToMachine1 (IR.add a b)) (Machine1.add a b))
(rule (TranslateToMachine2 (IR.add a b)) (Machine2.weird_inst a b))
```
@@ -872,7 +875,7 @@ For example, if one is writing a rule such as
```lisp
(decl u_to_v (U) V)
(rule ...)
(decl MyTerm (T) V)
(rule (MyTerm t)
(u_to_v t))
@@ -885,7 +888,7 @@ its argument but `t` has type `T`. However, if we define
```lisp
(convert T U t_to_u)
;; For the above to be valid, `t_to_u` should be declared with the
;; signature:
(decl t_to_u (T) U)
@@ -904,12 +907,12 @@ This also works in the extractor position: for example, if one writes
```lisp
(decl defining_instruction (Inst) Value)
(extern extractor definining_instruction ...)
(decl iadd (Value Value) Inst)
(rule (lower (iadd (iadd a b) c))
...)
(convert Inst Value defining_instruction)
```
@@ -933,12 +936,12 @@ A term can have:
1. A single internal extractor body, via a toplevel `(extractor ...)`
form, OR
2. A single external extractor binding (see next section); AND
3. One or more `(rule (Term ...) ...)` toplevel forms, which together
make up an internal constructor definition, OR
4. A single external constructor binding (see next section).
### If-Let Clauses
@@ -1030,8 +1033,8 @@ for purity.
#### `partial` Expressions
ISLE's `partial` keyword on a term indicates that the term's
constructors may fail to match, otherwise, the ISLE compiler assumes
ISLE's `partial` keyword on a term indicates that the term's
constructors may fail to match, otherwise, the ISLE compiler assumes
the term's constructors are infallible.
For example, the following term's constructor only matches if the value
@@ -1043,7 +1046,7 @@ is zero:
(extern constructor is_zero_value is_zero_value)
```
Internal constructors without the `partial` keyword can
Internal constructors without the `partial` keyword can
only use other constructors that also do not have the `partial` keyword.
#### `if` Shorthand
@@ -1104,7 +1107,7 @@ Rust. The basic principles are:
side expression; this can invoke further constructors for its
subparts, kicking off more rewrites, until eventually a value is
returned.
4. This design means that "intermediate terms" -- constructed terms
that are then further rewritten -- are never actually built as
in-memory data-structures. Rather, they exist only as ephemeral
@@ -1112,7 +1115,7 @@ Rust. The basic principles are:
means that there is very little or no performance penalty to
factoring code into many sub-rules (subject only to function-call
overhead and/or the effectiveness of the Rust inliner).
5. Backtracking -- attempting to match rules, and backing up to follow
a different path when a match fails -- exists, but is entirely
internal to the generated Rust function for rewriting one
@@ -1123,16 +1126,16 @@ Rust. The basic principles are:
sides, trying to find a matching rule, and once we find one, we
commit and start to invoke constructors to build the right-hand
side.
Said another way, the principle is that left-hand sides can be
fallible, and have no side-effects as they execute; right-hand
sides, in contrast, are infallible. This simplifies the control
flow and makes reasoning about side-effects (especially with
respect to external Rust actions) easier.
This will become more clear as we look at how Rust interfaces are
defined, and how the generated code appears, below.
### Extern Constructors and Extractors
ISLE programs interact with the surrounding Rust code in which they
@@ -1181,11 +1184,11 @@ and returns a `U`.
External constructors are infallible: that is, they must succeed, and
always return their return type. In contrast, internal constructors
can be fallible because they are implemented by a list of rules whose
patterns may not cover the entire domain (in which case, the term
should be marked `partial`). If fallible behavior is needed when
invoking external Rust code, that behavior should occur in an extractor
(see below) instead: only pattern left-hand sides are meant to be
fallible.
patterns may not cover the entire domain (in which case, the term
should be marked `partial`). If fallible behavior is needed when
invoking external Rust code, that behavior should occur in an extractor
(see below) instead: only pattern left-hand sides are meant to be
fallible.
#### Extractors
@@ -1265,7 +1268,7 @@ This allows code to refer to `$I32` whenever a value of type `Type` is
needed, in either a pattern (LHS) or an expression (RHS). These
constants are pulled in via the same `use super::*` that imports all
external types.
### Exported Interface: Functions and Context Trait
The generated ISLE code provides an interface that is designed to be
@@ -1310,7 +1313,7 @@ we have the following terms and declarations:
```lisp
(decl A (u32 u32) T)
(extern constructor A build_a)
(decl B (T) U)
(external extractor B disassemble_b)
```
@@ -1440,7 +1443,7 @@ newline). The grammar accepted by the parser is as follows:
| "(" "rule" <rule> ")"
| "(" "extractor" <etor> ")"
| "(" "extern" <extern> ")"
<typedecl> ::= <ident> [ "extern" ] <typevalue>
<ident> ::= ( "A".."Z" | "a".."z" | "_" | "$" )
@@ -1449,6 +1452,8 @@ newline). The grammar accepted by the parser is as follows:
<int> ::= [ "-" ] ( "0".."9" )+
| [ "-" ] "0x" ( "0".."9" "A".."F" "a".."f" )+
| [ "-" ] "0o" ( "0".."7" )+
| [ "-" ] "0b" ( "0".."1" )+
<typevalue> ::= "(" "primitive" <ident> ")"
| "(" "enum" <enumvariant>* ")"
@@ -1464,7 +1469,7 @@ newline). The grammar accepted by the parser is as follows:
<rule> ::= <pattern> <expr>
| <prio> <pattern> <expr>
<prio> ::= <int>
<etor> ::= "(" <ident> <ident>* ")" <pattern>
@@ -1487,7 +1492,7 @@ newline). The grammar accepted by the parser is as follows:
| <ident>
| "(" "let" "(" <let-binding>* ")" <expr> ")"
| "(" <ident> <expr>* ")"
<let-binding> ::= "(" <ident> <ty> <expr> ")"
<extern> ::= "constructor" <ident> <ident>

View File

@@ -15,3 +15,12 @@
(rule (Y -0x1000_0000_0000_0000_1234_5678_9abc_def0) 1)
(rule (Y -0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffff) -3)
;; Test some various syntaxes for numbers
(type i32 (primitive i32))
(decl partial Z (i32) i32)
(rule (Z 0) 0x01)
(rule (Z 0x01) 0x0_2)
(rule (Z 0b10) 3)
(rule (Z 0b1_1) 0o4)
(rule (Z 0o7654321) 0b11_00_11_00)

View File

@@ -8,11 +8,44 @@ fn main() {
assert_eq!(iconst::constructor_X(&mut ctx, -1), Some(-2));
assert_eq!(iconst::constructor_X(&mut ctx, -2), Some(-3));
assert_eq!(iconst::constructor_X(&mut ctx, 0x7fff_ffff_ffff_ffff), Some(0x8000_0000_0000_0000u64 as i64));
assert_eq!(iconst::constructor_X(&mut ctx, 0xffff_ffff_ffff_fff0_u64 as i64), Some(1));
assert_eq!(
iconst::constructor_X(&mut ctx, 0x7fff_ffff_ffff_ffff),
Some(0x8000_0000_0000_0000u64 as i64)
);
assert_eq!(
iconst::constructor_X(&mut ctx, 0xffff_ffff_ffff_fff0_u64 as i64),
Some(1)
);
assert_eq!(iconst::constructor_Y(&mut ctx, 0x1000_0000_0000_0000_1234_5678_9abc_def0), Some(-1));
assert_eq!(iconst::constructor_Y(&mut ctx, 0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffffu128 as i128), Some(3));
assert_eq!(iconst::constructor_Y(&mut ctx, -0x1000_0000_0000_0000_1234_5678_9abc_def0), Some(1));
assert_eq!(iconst::constructor_Y(&mut ctx, -(0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffffu128 as i128)), Some(-3));
assert_eq!(
iconst::constructor_Y(&mut ctx, 0x1000_0000_0000_0000_1234_5678_9abc_def0),
Some(-1)
);
assert_eq!(
iconst::constructor_Y(
&mut ctx,
0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffffu128 as i128
),
Some(3)
);
assert_eq!(
iconst::constructor_Y(&mut ctx, -0x1000_0000_0000_0000_1234_5678_9abc_def0),
Some(1)
);
assert_eq!(
iconst::constructor_Y(
&mut ctx,
-(0xffff_ffff_ffff_ffff_ffff_ffff_ffff_ffffu128 as i128)
),
Some(-3)
);
assert_eq!(iconst::constructor_Z(&mut ctx, 0), Some(1));
assert_eq!(iconst::constructor_Z(&mut ctx, 1), Some(2));
assert_eq!(iconst::constructor_Z(&mut ctx, 2), Some(3));
assert_eq!(iconst::constructor_Z(&mut ctx, 3), Some(4));
assert_eq!(
iconst::constructor_Z(&mut ctx, 0o7654321),
Some(0b11_00_11_00)
);
}

View File

@@ -243,14 +243,27 @@ impl<'a> Lexer<'a> {
let mut radix = 10;
// Check for hex literals.
if self.buf.get(self.pos.offset).copied() == Some(b'0')
&& (self.buf.get(self.pos.offset + 1).copied() == Some(b'x')
|| self.buf.get(self.pos.offset + 1).copied() == Some(b'X'))
{
self.advance_pos();
self.advance_pos();
radix = 16;
// Check for prefixed literals.
match (
self.buf.get(self.pos.offset),
self.buf.get(self.pos.offset + 1),
) {
(Some(b'0'), Some(b'x')) | (Some(b'0'), Some(b'X')) => {
self.advance_pos();
self.advance_pos();
radix = 16;
}
(Some(b'0'), Some(b'o')) => {
self.advance_pos();
self.advance_pos();
radix = 8;
}
(Some(b'0'), Some(b'b')) => {
self.advance_pos();
self.advance_pos();
radix = 2;
}
_ => {}
}
// Find the range in the buffer for this integer literal. We'll
@@ -258,7 +271,7 @@ impl<'a> Lexer<'a> {
// string-to-integer conversion.
let mut s = vec![];
while self.pos.offset < self.buf.len()
&& ((radix == 10 && self.buf[self.pos.offset].is_ascii_digit())
&& ((radix <= 10 && self.buf[self.pos.offset].is_ascii_digit())
|| (radix == 16 && self.buf[self.pos.offset].is_ascii_hexdigit())
|| self.buf[self.pos.offset] == b'_')
{