What is the difference between FSA using ternary tree and a trie with transition tables implemented as search trees (e.g. std::map)? It seems like both have O(log k) complexity for reading one symbol and O(S) memory complexity, where k is alphabet size and S is sum of lengths of all accepted input strings.
And wouldn't the best choice be to use sorted vector of (symbol, state) transition pairs along with binary search, if we don't need the automata to change in runtime?
There is no real difference between a Ternary Search Tree (TST) and a Trie implemented with a binary search tree at each node. Indeed, you could regard the latter as an (inefficient) implementation of the former; the advantage of TSTs is that they can easily be optimized, and the space overhead is reasonable.
The classic Trie uses direct lookup at decision nodes with a vector of transitions indexed by symbol. This is
O(1)
time, but the space requirement is substantial. Nonetheless, there are ways of optimizing the storage. Also, hybrid solutions exist, where the Trie structure is only used for the wide decision nodes at the top of the tree; once the number of candidates is reduced to something small, a fast scan or hash table can be used to find the appropriate candidate.Using a sorted vector of (symbol, state) transitions in a naive fashion requires
O(log T)
time for each transition, whereT
is the total number of transitions; essentially the total size of all the input strings. The total time for a given target will be|target|*log(T)
.By contrast, TSTs require no more than
O(log S)
time for each transition, whereS
is the size of the alphabet; that's a much smaller number thanT
. Moreover, the total number of lookups over the entire target string is limited by the number of input strings, so the sum over the entire lookup is rather less than|target|*log(S)
.