This eagerly evaluates the `format!` and produces a `String` with a heap
allocation, regardless whether `foo` is `Some`/`Ok` or `None`/`Err`. Using
`foo.unwrap_or_else(|| panic!(...))` makes it so that the error message
formatting is only evaluated if `foo` is `None`/`Err`.
Fixes#3468.
If a program has many instances of the pattern "goto next; next:" in a
row (i.e., no-op branches to the fallthrough address), the branch
simplification in `MachBuffer` would remove them all, as expected.
However, in order to work correctly, the algorithm needs to track all
labels that alias the current buffer tail, so that they can be adjusted
later if another branch chomp occurs.
When many thousands of this branch-to-next pattern occur, many thousands
of labels will reference the current buffer tail, and this list of
thousands of labels will be shuffled between the branch metadata struct
and the "labels at tail" struct as branches are appended and then
chomped immediately.
It's possible that with smarter data structure design, we could somehow
share the list of labels -- e.g., a single array of all labels, in order
they are bound, with ranges of indices in this array used to represent
lists of labels (actually, that seems like a better design in general);
but let's leave that to future optimization work.
For now, we can avoid the quadratic behavior by just "giving up" if the
list is too long; it's always valid to not optimize a branch. It is very
unlikely that the "normal" case will have more than 100 "goto next"
branches in a row, so this should not have any perf impact; if it does,
we will leave 1 out of every 100 such branches un-optimized in a long
sequence of thousands.
This takes total compilation time down on my machine from ~300ms to
~72ms for the `foo.wasm` case in #3441. For reference, the old backend
(now removed), built from arbitrarily-chosen-1-year-old commit
`c7fcc344`, takes 158ms, so we're ~twice as fast, which is what I would
expect.
It appears that some allocation heuristics have changed slightly since
0.0.31, so some of the golden-output filetests are updated as well.
Ideally we would rely more on runtests rather than golden-compilation
tests; but for now this is sufficient. (I'm not sure exactly what in
regalloc.rs changed to alter these heuristics; it's actually been almost
a year since the 0.0.31 release with several refactorings and tweaks
merged since then.)
Fixes#3441.
This commit fixes an issue in Cranelift where legalization of
`heap_addr` instructions (used by wasm to represent heap accesses) could
be off-by-two where loads that should be valid were actually treated as
invalid. The bug here happened in an optimization where tests against
odd constants were being altered to tests against even constants by
subtracting one from the limit instead of adding one to the limit. The
comment around this area has been updated in accordance with a little
more math-stuff as well to help future readers.
* Add some debug logging for timing in module compiles
This is sometimes helpful when debugging slow compiles from fuzz bugs or
similar.
* Fix total duration calculation to not double-count
Some platforms such as AArch64 Linux support different memory page
sizes, so we need to be conservative when choosing the code section
alignment (which is equal to the page size) by using the maximum.
Copyright (c) 2021, Arm Limited.
This also paves the way for unifying TargetIsa and MachBackend, since now they map one to one. In theory the two traits could be merged, which would be nice to limit the number of total concepts. Also they have quite different responsibilities, so it might be fine to keep them separate.
Interestingly, this PR started as removing RegInfo from the TargetIsa trait since the adapter returned a dummy value there. From the fallout, noticed that all Display implementations didn't needed an ISA anymore (since these were only used to render ISA specific registers). Also the whole family of RegInfo / ValueLoc / RegUnit was exclusively used for the old backend, and these could be removed. Notably, some IR instructions needed to be removed, because they were using RegUnit too: this was the oddball of regfill / regmove / regspill / copy_special, which were IR instructions inserted by the old regalloc. Fare thee well!
* Fix some nightly dead code warnings
Looks like the "struct field not used" lint has improved on nightly and
caught a few more instances of fields that were never actually read.
* Fix windows
* Support full set of ADD LOGICAL / SUBTRACT LOGICAL instructions
* Full implementation of IaddIfcout lowering
* Enable most memory64 tests (except simd and threads)