Add meeting notes for Cranelift meeting 2021-08-23

2021-08-23 19:03:48 +02:00
parent b2bcdd13ec
commit 6dba669e54
1 changed files with 124 additions and 0 deletions
--- a/meetings/cranelift/2021/cranelift-08-23.md
+++ b/meetings/cranelift/2021/cranelift-08-23.md
@@ -18,4 +18,128 @@
 ### Attendees
 in no particular order:
 - CF: Chris Fallin
 - AB: Andrew Brown
 - B3: bjorn3
 - UW: Ulrich Weigand
 - SP: Sam Parker
 - AK: Anton Kirilov
 - Afonso Bordado
 - Johnnie Birch
 - BB: Benjamin Bouvier
 ### Notes
 - Semantics of booleans: https://github.com/bytecodealliance/wasmtime/issues/3205
    - Inconsistencies between different backends
    - Tribal knowledge about this, mostly
    - New uses of boolean types, e.g. cg_clif
    - Q: should there be repr for a boolean type?
    - Q: what does it mean to have a bool that’s wider than 1 bit?
    - Historically, did have those wider-than-1 bit. Have to be all 0 or 1. Use
    case: bitcast from boolean to other types to get vector masks.
    - SIMD vector compare instructions are better handled for this use case
    - Q: what are the semantics of storing/loading bool from memory + casting to/from
    ints?
    - Historically, validator error to load/store bool from memory
    - Two main options:
        - A. false = 0, true = 1, wider-than-1-bit is 1 (zero-extended)
        - B. wider-than-1-bit is all ones
    - UW: b1’s documentation says it can’t be loaded/stored from/to memory
    - CF: not true as of last week (fuzz bug), need to update doc
    - AB: SIMD bool types must have a known bit repr
    - Q: do we want boolean types at the clif level to behave as the others (can be
    stored/loaded), or do we want to forbid memory accesses to those?
    - SP/UW: Do we know any arch that has sub-byte load/store? Sounds like
    no.
    - AB: fine to not mandate a repr on b1, but useful to have a repr for SIMD
    vectors, since bool vectors are likely to be stored
    - UW: doc is outdated for bool vectors (still mentions forbidden
    loads/stores)
    - Q: why do we want a bool type?
    - CF: we could just remove all the bool types overall
    - AB: what about return values of SIMD compare?
    - CF: only remove all the scalar bool types
    - UW: weird to have bool types only for vector
    - CF: could have b1 for scalar, and b128 for vectors, only
    - UW: what’s the benefit of e.g. b8 over i8 at the IR level?
    - CF: bitmasking stuff will depend on the actual IR type
    - AB: could remove a few `raw_bitcast` if we didn’t have so many bool
    types
    - CF: still want b1, do not allow load/store of bools, do not allow bitcast
    (they don’t have a repr)
    - B3: how would vselect work without bools?
    - AB: bool vectors give guarantees about the actual repr, so that’s nice
    - CF: can’t rely on lowering that the result of loading a b128 from memory is
    actually all ones or zeroes, so would have to canonicalize anyways
    - AK: could have shorter aarch64 sequences if we knew about the repr of
    bool vectors
    - AK: instead of canonicalization, could use pattern-matching up the
    operand tree that the value got produced by an inst that generated all0 or
    all1
    - CF: Proposal: we have wider bool types, and they are guaranteed to be
    canonicalized (insert checks for load/stores/bitcast). Impl could be
    compare-to-0?
    - UW: or shifts, depend on the situation. Would be a factor slower in any
    case.
    - AB: what about the use case where lowering wasm to clif, we load an
    v128 and use it as a mask in another wasm simd op?
    - CF: would need to cast to a bool type
    - Semantics of `raw_bitcast`?
    - Useful to convert from a CLIF type to another, without any change
    at the machine level
    - CF: think about it for some more time, and get back to it?
    - No one disagrees, so everyone agrees
    - Please make suggestions in the issue
 - ISLE: https://github.com/bytecodealliance/rfcs/pull/15
    - AK: want to be able to spend less/more time to do pattern-matching according to
    opt level. Would need runtime flags for this. Could this be implemented via the
    extractors?
    - CF: possible to have a switch at meta-compile time to exclude certain rules.
    Should it be a compile-time flag, or a runtime flag (more complicated)?
    - AK: really want a runtime flag to get really fast compile times
    - AK: also need a way to pattern match on CPU extensions
    - CF: would be a runtime flag as well
    - Afonso Bordado: commented about having this kind of predicates on instructions;
    proposal to use the `when` syntax
    - CF: implicit conditioning: no special marking, but if a rule uses an e.g.
    avx512-only inst, automatically detect it and add a predicate on the whole rule
    that it requires the CPU ext.
    - UW: how does it compare with LLVM?
    - CF: Studied related work in pre-RFC (#13). Pattern-matching DSL similar to what
    LLVM does. ISLE is less broad in scope than TableGen and would only be used
    for codegen. ISLE is simpler.
    - UW: in the LLVM community there’s been a push away from SelectionDag
    - CF: bigger compile times. It’s a tradeoff with dev productivity + we did have very
    subtle bugs in the past. LLVM moving to FastISel? because it’s faster. We’re
    building the foundational level of rules, “simple” pattern matching, nice to have a
    DSL at this point. How to make it fast in long run is an open research question.
    - UW: In LLVM, FastISel handles more common use cases and then redirects to
    SelectionDag if complicated cases show up. GlobalISel is supposed to be more
    global (can match across basic blocks).
    - CF: Could have a system with foundational rules + simple optimization rules that
    don’t try to match very deep.
    - AB: would like to try out some code when it’s ready so as to give more targeted
    feedback.
    - BB: risk of scattering code between Rust extern functions + high-level DSL.
    Some old problems are becoming new again. Reinventing many concepts
    present in legalization, concepts overload for newcomers. Risk of seeing bugs in
    the “system”, much harder to debug vs just looking at handwritten code. Tradeoff
    between developer experience and complexity, as said before.
    - CF: re: FFI, mostly isolated. re: complexity, “test and fuzz the crap out of it” :). Re:
    cognitive load, tribal knowledge is starting to appear in the current system (how
    to properly do pattern match without causing subtle errors?). Should be better in
    a lot of ways.
    - AK: emphasis on getting better documentation (blog posts / internal docs).
    - CF: if the system is complicated and requires lots of docs, it’s not ideal. Want to
    make the system easy to understand and have good docs.
    - AB: generated code should be in-tree, for better discoverability.
    - CF: agreed, would help compile times + we could maybe include comments in
    generated code.
 - Status updates:
    - UW: CI for s390, qemu patches now in main, some qemu version should work out of the
    box. Yet it (either qemu or wasmtime) doesn’t build anymore on s390. Looking into it
    before being able to run s390 in CI.
    - AK: more aarch64 tests run on qemu. Also have native runners.