Ternary tree vs trie with map as transition table for Aho-Corasick FSA

1k Views Asked by At

What is the difference between FSA using ternary tree and a trie with transition tables implemented as search trees (e.g. std::map)? It seems like both have O(log k) complexity for reading one symbol and O(S) memory complexity, where k is alphabet size and S is sum of lengths of all accepted input strings.

And wouldn't the best choice be to use sorted vector of (symbol, state) transition pairs along with binary search, if we don't need the automata to change in runtime?

2

There are 2 best solutions below

7
On

There is no real difference between a Ternary Search Tree (TST) and a Trie implemented with a binary search tree at each node. Indeed, you could regard the latter as an (inefficient) implementation of the former; the advantage of TSTs is that they can easily be optimized, and the space overhead is reasonable.

The classic Trie uses direct lookup at decision nodes with a vector of transitions indexed by symbol. This is O(1) time, but the space requirement is substantial. Nonetheless, there are ways of optimizing the storage. Also, hybrid solutions exist, where the Trie structure is only used for the wide decision nodes at the top of the tree; once the number of candidates is reduced to something small, a fast scan or hash table can be used to find the appropriate candidate.

Using a sorted vector of (symbol, state) transitions in a naive fashion requires O(log T) time for each transition, where T is the total number of transitions; essentially the total size of all the input strings. The total time for a given target will be |target|*log(T).

By contrast, TSTs require no more than O(log S) time for each transition, where S is the size of the alphabet; that's a much smaller number than T. Moreover, the total number of lookups over the entire target string is limited by the number of input strings, so the sum over the entire lookup is rather less than |target|*log(S).

0
On

Given how Aho-Corasick is illustrated,

Aho-Corasick

Here is my Node:

public class AhoCorasickNode
{

    // This part works as a Trie

    public char literal; // c

    public String stack; // abc

    public AhoCorasickNode previous; // { ab }

    public AhoCorasickNode[] next; // { abca }, { abcb }, { abcc }, ..

    //-----------------------------

    // This part is used when solving

    boolean inDictionary;

    public AhoCorasickNode suffix;

    public AhoCorasickNode dictionarySuffix;

}

Source: