Update long block comment describing priority trie in codegen.rs.
This commit is contained in:
@@ -8,35 +8,34 @@ use std::fmt::Write;
|
|||||||
|
|
||||||
/// One "input symbol" for the decision tree that handles matching on
|
/// One "input symbol" for the decision tree that handles matching on
|
||||||
/// a term. Each symbol represents one step: we either run a match op,
|
/// a term. Each symbol represents one step: we either run a match op,
|
||||||
/// or we get a result from it.
|
/// or we finish the match.
|
||||||
///
|
///
|
||||||
/// Note that in the original Peepmatic scheme, the problem that this
|
/// Note that in the original Peepmatic scheme, the input-symbol to
|
||||||
/// solves was handled slightly differently. The automaton responded
|
/// the FSM was specified slightly differently. The automaton
|
||||||
/// to alphabet symbols that corresponded only to match results, and
|
/// responded to alphabet symbols that corresponded only to match
|
||||||
/// the "extra state" was used at each automaton node to represent the
|
/// results, and the "extra state" was used at each automaton node to
|
||||||
/// op to run next. This extra state differentiated nodes that would
|
/// represent the op to run next. This extra state differentiated
|
||||||
/// otherwise be merged together by deduplication. That scheme works
|
/// nodes that would otherwise be merged together by
|
||||||
/// well enough, but the "extra state" is slightly confusing and
|
/// deduplication. That scheme works well enough, but the "extra
|
||||||
/// diverges slightly from a pure automaton.
|
/// state" is slightly confusing and diverges slightly from a pure
|
||||||
|
/// automaton.
|
||||||
///
|
///
|
||||||
/// Instead, here, we imagine that the user of the automaton can query
|
/// Instead, here, we imagine that the user of the automaton/trie can
|
||||||
/// the possible transition edges out of the current state. Each of
|
/// query the possible transition edges out of the current state. Each
|
||||||
/// these edges corresponds to one possible match op to run. After
|
/// of these edges corresponds to one possible match op to run. After
|
||||||
/// running a match op, we reach a new state corresponding to
|
/// running a match op, we reach a new state corresponding to
|
||||||
/// successful matches up to that point.
|
/// successful matches up to that point.
|
||||||
///
|
///
|
||||||
/// However, it's a bit more subtle than this; we add one additional
|
/// However, it's a bit more subtle than this. Consider the
|
||||||
/// dimension to each match op, and an additional alphabet symbol.
|
/// prioritization problem. We want to give the DSL user the ability
|
||||||
///
|
/// to change the order in which rules apply, for example to have a
|
||||||
/// First, consider the prioritization problem. We want to give the
|
/// tier of "fallback rules" that apply only if more custom rules do
|
||||||
/// DSL user the ability to change the order in which rules apply, for
|
/// not match.
|
||||||
/// example to have a tier of "fallback rules" that apply only if more
|
|
||||||
/// custom rules do not match.
|
|
||||||
///
|
///
|
||||||
/// A somewhat simplistic answer to this problem is "more specific
|
/// A somewhat simplistic answer to this problem is "more specific
|
||||||
/// rule wins". However, this implies the existence of a total
|
/// rule wins". However, this implies the existence of a total
|
||||||
/// ordering of linearized match sequences that may not fully capture
|
/// ordering of linearized match sequences that may not fully capture
|
||||||
/// the intuitive meaning of "more specific". Consider four left-hand
|
/// the intuitive meaning of "more specific". Consider three left-hand
|
||||||
/// sides:
|
/// sides:
|
||||||
///
|
///
|
||||||
/// - (A _ _)
|
/// - (A _ _)
|
||||||
@@ -44,7 +43,7 @@ use std::fmt::Write;
|
|||||||
/// - (A _ (B _))
|
/// - (A _ (B _))
|
||||||
///
|
///
|
||||||
/// Intuitively, the first is the least specific. Given the input `(A
|
/// Intuitively, the first is the least specific. Given the input `(A
|
||||||
/// (B 1) (B 2)`, we can say for sure that the first should not be
|
/// (B 1) (B 2))`, we can say for sure that the first should not be
|
||||||
/// chosen, because either the second or third would match "more" of
|
/// chosen, because either the second or third would match "more" of
|
||||||
/// the input tree. But which of the second and third should be
|
/// the input tree. But which of the second and third should be
|
||||||
/// chosen? A "lexicographic ordering" rule would say that we sort
|
/// chosen? A "lexicographic ordering" rule would say that we sort
|
||||||
@@ -53,29 +52,35 @@ use std::fmt::Write;
|
|||||||
/// privileging one over the other based on the order of the
|
/// privileging one over the other based on the order of the
|
||||||
/// arguments.
|
/// arguments.
|
||||||
///
|
///
|
||||||
/// Instead, we need a data structure that can associate matching
|
/// Instead, we can accept explicit priorities from the user to allow
|
||||||
/// inputs *with priorities* to outputs, and provide us with a
|
/// either choice. So we need a data structure that can associate
|
||||||
/// decision tree as output.
|
/// matching inputs *with priorities* to outputs.
|
||||||
///
|
///
|
||||||
/// Why a tree and not a fully general FSM? Because we're compiling
|
/// Next, we build a decision tree rather than an FSM. Why? Because
|
||||||
/// to a structured language, Rust, and states become *program points*
|
/// we're compiling to a structured language, Rust, and states become
|
||||||
/// rather than *data*, we cannot easily support a DAG structure. In
|
/// *program points* rather than *data*, we cannot easily support a
|
||||||
/// other words, we are not producing a FSM that we can interpret at
|
/// DAG structure. In other words, we are not producing a FSM that we
|
||||||
/// runtime; rather we are compiling code in which each state
|
/// can interpret at runtime; rather we are compiling code in which
|
||||||
/// corresponds to a sequence of statements and control-flow that
|
/// each state corresponds to a sequence of statements and
|
||||||
/// branches to a next state, we naturally need nesting; we cannot
|
/// control-flow that branches to a next state, we naturally need
|
||||||
/// codegen arbitrary state transitions in an efficient manner. We
|
/// nesting; we cannot codegen arbitrary state transitions in an
|
||||||
/// could support a limited form of DAG that reifies "diamonds" (two
|
/// efficient manner. We could support a limited form of DAG that
|
||||||
/// alternate paths that reconverge), but supporting this in a way
|
/// reifies "diamonds" (two alternate paths that reconverge), but
|
||||||
/// that lets the output refer to values from either side is very
|
/// supporting this in a way that lets the output refer to values from
|
||||||
/// complex (we need to invent phi-nodes), and the cases where we want
|
/// either side is very complex (we need to invent phi-nodes), and the
|
||||||
/// to do this rather than invoke a sub-term (that is compiled to a
|
/// cases where we want to do this rather than invoke a sub-term (that
|
||||||
/// separate function) are rare. Finally, note that one reason to
|
/// is compiled to a separate function) are rare. Finally, note that
|
||||||
/// deduplicate nodes and turn a tree back into a DAG --
|
/// one reason to deduplicate nodes and turn a tree back into a DAG --
|
||||||
/// "output-suffix sharing" as some other instruction-rewriter
|
/// "output-suffix sharing" as some other instruction-rewriter
|
||||||
/// engines, such as Peepmatic, do -- is not done. However,
|
/// engines, such as Peepmatic, do -- is not done, because all
|
||||||
/// "output-prefix sharing" is more important to deduplicate code and
|
/// "output" occurs at leaf nodes; this is necessary because we do not
|
||||||
/// we do do this.)
|
/// want to start invoking external constructors until we are sure of
|
||||||
|
/// the match. Some of the code-sharing advantages of the "suffix
|
||||||
|
/// sharing" scheme can be obtained in a more flexible and
|
||||||
|
/// user-controllable way (with less understanding of internal
|
||||||
|
/// compiler logic needed) by factoring logic into different internal
|
||||||
|
/// terms, which become different compiled functions. This is likely
|
||||||
|
/// to happen anyway as part of good software engineering practice.
|
||||||
///
|
///
|
||||||
/// We prepare for codegen by building a "prioritized trie", where the
|
/// We prepare for codegen by building a "prioritized trie", where the
|
||||||
/// trie associates input strings with priorities to output values.
|
/// trie associates input strings with priorities to output values.
|
||||||
@@ -107,11 +112,12 @@ use std::fmt::Write;
|
|||||||
/// final match could lie along *either* path, so we have to traverse
|
/// final match could lie along *either* path, so we have to traverse
|
||||||
/// both.
|
/// both.
|
||||||
///
|
///
|
||||||
/// So, to avoid this, we perform a sort of NFA-to-DFA conversion "on
|
/// So, to avoid this, we perform a sort of moral equivalent to the
|
||||||
/// the fly" as we insert nodes by duplicating subtrees. At any node,
|
/// NFA-to-DFA conversion "on the fly" as we insert nodes by
|
||||||
/// when inserting with a priority P and when outgoing edges lie in a
|
/// duplicating subtrees. At any node, when inserting with a priority
|
||||||
/// range [P_lo, P_hi] such that P >= P_lo and P <= P_hi, we
|
/// P and when outgoing edges lie in a range [P_lo, P_hi] such that P
|
||||||
/// "priority-split the edges" at priority P.
|
/// >= P_lo and P <= P_hi, we "priority-split the edges" at priority
|
||||||
|
/// P.
|
||||||
///
|
///
|
||||||
/// To priority-split the edges in a node at priority P:
|
/// To priority-split the edges in a node at priority P:
|
||||||
///
|
///
|
||||||
|
|||||||
Reference in New Issue
Block a user