From 6dba669e54f08a2588b683c695fea3fa40668847 Mon Sep 17 00:00:00 2001 From: Benjamin Bouvier Date: Mon, 23 Aug 2021 19:03:48 +0200 Subject: [PATCH] Add meeting notes for Cranelift meeting 2021-08-23 --- meetings/cranelift/2021/cranelift-08-23.md | 124 +++++++++++++++++++++ 1 file changed, 124 insertions(+) diff --git a/meetings/cranelift/2021/cranelift-08-23.md b/meetings/cranelift/2021/cranelift-08-23.md index 4a5fbf2bae..68f1e7344e 100644 --- a/meetings/cranelift/2021/cranelift-08-23.md +++ b/meetings/cranelift/2021/cranelift-08-23.md @@ -18,4 +18,128 @@ ### Attendees +in no particular order: + +- CF: Chris Fallin +- AB: Andrew Brown +- B3: bjorn3 +- UW: Ulrich Weigand +- SP: Sam Parker +- AK: Anton Kirilov +- Afonso Bordado +- Johnnie Birch +- BB: Benjamin Bouvier + ### Notes + +- Semantics of booleans: https://github.com/bytecodealliance/wasmtime/issues/3205 + - Inconsistencies between different backends + - Tribal knowledge about this, mostly + - New uses of boolean types, e.g. cg_clif + - Q: should there be repr for a boolean type? + - Q: what does it mean to have a bool that’s wider than 1 bit? + - Historically, did have those wider-than-1 bit. Have to be all 0 or 1. Use + case: bitcast from boolean to other types to get vector masks. + - SIMD vector compare instructions are better handled for this use case + - Q: what are the semantics of storing/loading bool from memory + casting to/from + ints? + - Historically, validator error to load/store bool from memory + - Two main options: + - A. false = 0, true = 1, wider-than-1-bit is 1 (zero-extended) + - B. wider-than-1-bit is all ones + - UW: b1’s documentation says it can’t be loaded/stored from/to memory + - CF: not true as of last week (fuzz bug), need to update doc + - AB: SIMD bool types must have a known bit repr + - Q: do we want boolean types at the clif level to behave as the others (can be + stored/loaded), or do we want to forbid memory accesses to those? + - SP/UW: Do we know any arch that has sub-byte load/store? Sounds like + no. + - AB: fine to not mandate a repr on b1, but useful to have a repr for SIMD + vectors, since bool vectors are likely to be stored + - UW: doc is outdated for bool vectors (still mentions forbidden + loads/stores) + - Q: why do we want a bool type? + - CF: we could just remove all the bool types overall + - AB: what about return values of SIMD compare? + - CF: only remove all the scalar bool types + - UW: weird to have bool types only for vector + - CF: could have b1 for scalar, and b128 for vectors, only + - UW: what’s the benefit of e.g. b8 over i8 at the IR level? + - CF: bitmasking stuff will depend on the actual IR type + - AB: could remove a few `raw_bitcast` if we didn’t have so many bool + types + - CF: still want b1, do not allow load/store of bools, do not allow bitcast + (they don’t have a repr) + - B3: how would vselect work without bools? + - AB: bool vectors give guarantees about the actual repr, so that’s nice + - CF: can’t rely on lowering that the result of loading a b128 from memory is + actually all ones or zeroes, so would have to canonicalize anyways + - AK: could have shorter aarch64 sequences if we knew about the repr of + bool vectors + - AK: instead of canonicalization, could use pattern-matching up the + operand tree that the value got produced by an inst that generated all0 or + all1 + - CF: Proposal: we have wider bool types, and they are guaranteed to be + canonicalized (insert checks for load/stores/bitcast). Impl could be + compare-to-0? + - UW: or shifts, depend on the situation. Would be a factor slower in any + case. + - AB: what about the use case where lowering wasm to clif, we load an + v128 and use it as a mask in another wasm simd op? + - CF: would need to cast to a bool type + - Semantics of `raw_bitcast`? + - Useful to convert from a CLIF type to another, without any change + at the machine level + - CF: think about it for some more time, and get back to it? + - No one disagrees, so everyone agrees + - Please make suggestions in the issue +- ISLE: https://github.com/bytecodealliance/rfcs/pull/15 + - AK: want to be able to spend less/more time to do pattern-matching according to + opt level. Would need runtime flags for this. Could this be implemented via the + extractors? + - CF: possible to have a switch at meta-compile time to exclude certain rules. + Should it be a compile-time flag, or a runtime flag (more complicated)? + - AK: really want a runtime flag to get really fast compile times + - AK: also need a way to pattern match on CPU extensions + - CF: would be a runtime flag as well + - Afonso Bordado: commented about having this kind of predicates on instructions; + proposal to use the `when` syntax + - CF: implicit conditioning: no special marking, but if a rule uses an e.g. + avx512-only inst, automatically detect it and add a predicate on the whole rule + that it requires the CPU ext. + - UW: how does it compare with LLVM? + - CF: Studied related work in pre-RFC (#13). Pattern-matching DSL similar to what + LLVM does. ISLE is less broad in scope than TableGen and would only be used + for codegen. ISLE is simpler. + - UW: in the LLVM community there’s been a push away from SelectionDag + - CF: bigger compile times. It’s a tradeoff with dev productivity + we did have very + subtle bugs in the past. LLVM moving to FastISel? because it’s faster. We’re + building the foundational level of rules, “simple” pattern matching, nice to have a + DSL at this point. How to make it fast in long run is an open research question. + - UW: In LLVM, FastISel handles more common use cases and then redirects to + SelectionDag if complicated cases show up. GlobalISel is supposed to be more + global (can match across basic blocks). + - CF: Could have a system with foundational rules + simple optimization rules that + don’t try to match very deep. + - AB: would like to try out some code when it’s ready so as to give more targeted + feedback. + - BB: risk of scattering code between Rust extern functions + high-level DSL. + Some old problems are becoming new again. Reinventing many concepts + present in legalization, concepts overload for newcomers. Risk of seeing bugs in + the “system”, much harder to debug vs just looking at handwritten code. Tradeoff + between developer experience and complexity, as said before. + - CF: re: FFI, mostly isolated. re: complexity, “test and fuzz the crap out of it” :). Re: + cognitive load, tribal knowledge is starting to appear in the current system (how + to properly do pattern match without causing subtle errors?). Should be better in + a lot of ways. + - AK: emphasis on getting better documentation (blog posts / internal docs). + - CF: if the system is complicated and requires lots of docs, it’s not ideal. Want to + make the system easy to understand and have good docs. + - AB: generated code should be in-tree, for better discoverability. + - CF: agreed, would help compile times + we could maybe include comments in + generated code. +- Status updates: + - UW: CI for s390, qemu patches now in main, some qemu version should work out of the + box. Yet it (either qemu or wasmtime) doesn’t build anymore on s390. Looking into it + before being able to run s390 in CI. + - AK: more aarch64 tests run on qemu. Also have native runners.