Commit Graph

3 Commits

Author SHA1 Message Date
Alex Crichton
b4d7ab36f9 Add a dataflow-based representation of components (#4597)
* Add a dataflow-based representation of components

This commit updates the inlining phase of compiling a component to
creating a dataflow-based representation of a component instead of
creating a final `Component` with a linear list of initializers. This
dataflow graph is then linearized in a final step to create the actual
final `Component`.

The motivation for this commit stems primarily from my work implementing
strings in fused adapters. In doing this my plan is to defer most
low-level transcoding to the host itself rather than implementing that
in the core wasm adapter modules. This means that small
cranelift-generated trampolines will be used for adapter modules to call
which then call "transcoding libcalls". The cranelift-generated
trampolines will get raw pointers into linear memory and pass those to
the libcall which core wasm doesn't have access to when passing
arguments to an import.

Implementing this with the previous representation of a `Component` was
becoming too tricky to bear. The initialization of a transcoder needed
to happen at just the right time: before the adapter module which needed
it was instantiated but after the linear memories referenced had been
extracted into the `VMComponentContext`. The difficulty here is further
compounded by the current adapter module injection pass already being
quite complicated. Adapter modules are already renumbering the index
space of runtime instances and shuffling items around in the
`GlobalInitializer` list. Perhaps the worst part of this was that
memories could already be referenced by host function imports or exports
to the host, and if adapters referenced the same memory it shouldn't be
referenced twice in the component. This meant that `ExtractMemory`
initializers ideally needed to be shuffled around in the initializer
list to happen as early as possible instead of wherever they happened to
show up during translation.

Overall I did my best to implement the transcoders but everything always
came up short. I have decided to throw my hands up in the air and try a
completely different approach to this, namely the dataflow-based
representation in this commit. This makes it much easier to edit the
component after initial translation for injection of adapters, injection
of transcoders, adding dependencies on possibly-already-existing items,
etc. The adapter module partitioning pass in this commit was greatly
simplified to something which I believe is functionally equivalent but
is probably an order of magnitude easier to understand.

The biggest downside of this representation I believe is having a
duplicate representation of a component. The `component::info` was
largely duplicated into the `component::dfg` module in this commit.
Personally though I think this is a more appropriate tradeoff than
before because it's very easy to reason about "convert representation A
to B" code whereas it was very difficult to reason about shuffling
around `GlobalInitializer` items in optimal fashions. This may also have
a cost at compile-time in terms of shuffling data around, but my hope is
that we have lots of other low-hanging fruit to optimize if it ever
comes to that which allows keeping this easier-to-understand
representation.

Finally, to reiterate, the final representation of components is not
changed by this PR. To the runtime internals everything is still the
same.

* Fix compile of factc
2022-08-04 15:42:06 -05:00
Alex Crichton
0a6baeddf4 Improve some support in factc: (#4580)
* Support CLI parameters for string encoding
* Fix `--skip-validate`
* Fix printing binary to stdout
2022-08-03 11:21:30 -05:00
Alex Crichton
fb59de15af Implement fused adapters for (list T) types (#4558)
* Implement fused adapters for `(list T)` types

This commit implements one of the two remaining types for adapter
fusion, lists. This implementation is particularly tricky for a number
of reasons:

* Lists have a number of validity checks which need to be carefully
  implemented. For example the byte length of the list passed to
  allocation in the destination module could overflow the 32-bit index
  space. Additionally lists in 32-bit memories need a check that their
  final address is in-bounds in the address space.

* In the effort to go ahead and support memory64 at the lowest layers
  this is where much of the magic happens. Lists are naturally always
  stored in memory and shifting between 64/32-bit address spaces
  is done here. This notably required plumbing an `Options` around
  during flattening/size/alignment calculations due to the size/types of
  lists changing depending on the memory configuration.

I've also added a small `factc` program in this commit which should
hopefully assist in exploring and debugging adapter modules. This takes
as input a component (text or binary format) and then generates an
adapter module for all component function signatures found internally.

This commit notably does not include tests for lists. I tried to figure
out a good way to add these but I felt like there were too many cases to
test and the tests would otherwise be extremely verbose. Instead I think
the best testing strategy for this commit will be through #4537 which
should be relatively extensible to testing adapters between modules in
addition to host-based lifting/lowering.

* Improve handling of lists of 0-size types

* Skip overflow checks on byte sizes for 0-size types
* Skip the copy loop entirely when src/dst are both 0
* Skip the increments of src/dst pointers if either is 0-size

* Update semantics for zero-sized lists/strings

When a list/string has a 0-byte-size the base pointer is no longer
verified to be in-bounds to match the supposedly desired adapter
semantics where no trap happens because no turn of the loop happens.
2022-08-01 17:02:08 -05:00