How to find the longest string in TRIE

Question

How to find the longest string in TRIE

I've already implemented DS Trie into 2 classes: Trie and TrieNode. I need to write a function that returns the longest string in Trie in O (h). My TrieNode has a LinkedList field that stores the children of each Node. We still haven't learned about BFS or DFS, so I'm trying to think of some creative way to solve it.

I already have a function (SEPARATE function) that inserts / creates a new Node to the given char: When creating a Trie: create a Node with a field "maxDepth = 0" indicating what is my current depth. For each new Node I created, I will iterate through it up to its parent (each Node already has a pointer to the parent) and so on, until I reach the root and increase its parent's depth by 1. Now, I will create a function. which returns the longest string this way: For each Node: I iterate through my children, find the maximum "maxDepth" number than go down. do this until you reach 'maxDepth == 0'. For example, my algorithm will work fine for this line: "aacgace"

       root      
       / \
   (2)a   g(0)     
     / 
 (1)c        
   / 
(0)e

=> 'ace' is actually the longest. But this line won't work fine: "aacgae"

      root      
      /  \
   (2)a   g(0)     
    /  \
 (0)c  (0)e

=> it seems that Node 'a' has a child that also has a child, but it doesn't.

In general, I'm trying to use the first function, which creates Trie (Running Time: O (h * c)), so the execution time of the second function (which returns the longest string) will be less than I can. O (h)

+3

java algorithm data-structures

user3075653 May 30 '15 at 9:26

source to share

2 answers

fge · Answer 1 · 2015-05-30T10:46:21+0000

Not sure what you really want to do, but you can find a trie example here .

Basically I go through the builder to create a trie; give a short description of how the word is added to the trie:

// In TrieBuilder
final TrieNodeBuilder nodeBuilder = new TrieNodeBuilder();

// ...

/**
 * Add one word to the trie
 *
 * @param word the word to add
 * @return this
 * @throws IllegalArgumentException word is empty
 */
public TrieBuilder addWord(@Nonnull final String word)
{
    Objects.requireNonNull(word);

    final int length = word.length();

    if (length == 0)
        throw new IllegalArgumentException("a trie cannot have empty "
            + "strings (use EMPTY instead)");
    nrWords++;
    maxLength = Math.max(maxLength, length);
    nodeBuilder.addWord(word);
    return this;
}

This defers adding a word to the TrieNodeBuilder, which does this:

private boolean fullWord = false;

private final Map<Character, TrieNodeBuilder> subnodes
    = new TreeMap<>();

TrieNodeBuilder addWord(final String word)
{
    doAddWord(CharBuffer.wrap(word));
    return this;
}

/**
 * Add a word
 *
 * <p>Here also, a {@link CharBuffer} is used, which changes position as we
 * progress into building the tree, character by character, node by node.
 * </p>
 *
 * <p>If the buffer is "empty" when entering this method, it means a match
 * must be recorded (see {@link #fullWord}).</p>
 *
 * @param buffer the buffer (never null)
 */
private void doAddWord(final CharBuffer buffer)
{
    if (!buffer.hasRemaining()) {
        fullWord = true;
        return;
    }

    final char c = buffer.get();
    TrieNodeBuilder builder = subnodes.get(c);
    if (builder == null) {
        builder = new TrieNodeBuilder();
        subnodes.put(c, builder);
    }
    builder.doAddWord(buffer);
}

Let's say we add "nuisance" and "alarm" to the trie; what's happening:

the first time, nodes are created for each individual problem symbol;
the second time, all nodes up to "l" exist; then all nodes are created for "ing".

Now, if we add "problems", another node will be created for "s" after "e".

The variable fullWord

indicates whether we have a potential complete match; here is the search function:

public final class Trie
{
    private final int nrWords;
    private final int maxLength;
    private final TrieNode node;

    // ...

    /**
     * Search for a string into this trie
     *
     * @param needle the string to search
     * @return the length of the match (ie, the string) or -1 if not found
     */
    public int search(final String needle)
    {
        return node.search(needle);
    }
    // ...
}

And in TrieNode

we have:

public final class TrieNode
{
    private final boolean fullWord;

    private final char[] nextChars;
    private final TrieNode[] nextNodes;

    // ...

    public int search(final String needle)
    {
        return doSearch(CharBuffer.wrap(needle), fullWord ? 0 : -1, 0);
    }

    /**
     * Core search method
     *
     * <p>This method uses a {@link CharBuffer} to perform searches, and changes
     * this buffer position as the match progresses. The two other arguments
     * are the depth of the current search (ie the number of nodes visited
     * since root) and the index of the last node where a match was found (ie
     * the last node where {@link #fullWord} was true.</p>
     *
     * @param buffer the charbuffer
     * @param matchedLength the last matched length (-1 if no match yet)
     * @param currentLength the current length walked by the trie
     * @return the length of the match found, -1 otherwise
     */
    private int doSearch(final CharBuffer buffer, final int matchedLength,
        final int currentLength)
    {
        /*
         * Try and see if there is a possible match here; there is if "fullword"
         * is true, in this case the next "matchedLength" argument to a possible
         * child call will be the current length.
         */
        final int nextLength = fullWord ? currentLength : matchedLength;


        /*
         * If there is nothing left in the buffer, we have a match.
         */
        if (!buffer.hasRemaining())
            return nextLength;

        /*
         * OK, there is at least one character remaining, so pick it up and see
         * whether it is in the list of our children...
         */
        final int index = Arrays.binarySearch(nextChars, buffer.get());

        /*
         * If not, we return the last good match; if yes, we call this same
         * method on the matching child node with the (possibly new) matched
         * length as an argument and a depth increased by 1.
         */
        return index < 0
            ? nextLength
            : nextNodes[index].doSearch(buffer, nextLength, currentLength + 1);
    }
}

Notice how -1 is passed as the nextLength argument in the first call doSearch()

.

Suppose we have three with three words above, here is the call sequence to search for "tr" that fails:

doSearch ("tr", -1, 0) (node - root);
doSearch ("tr", -1, 1) (node is 't');
doSearch ("tr", -1, 2) (node is 'r');
no next character: return nextLength; nextLength is -1, no match.

Now, if we have "problems":

doSearch ("troubles", -1, 0) (node - root);
doSearch ("troubles", -1, 1) (node is 't');
doSearch ("problems", -1, 2) (node is 'r');
doSearch ("problems", -1, 3) (node is 'o');
doSearch ("troubles", -1, 4) (node is 'u');
doSearch ("problems", -1, 5) (node is 'b');
doSearch ("problems", -1, 6) (node is 'l');
doSearch ("problems", -1, 7) (node is 'e');
doSearch ("problems", 7, 8) (full text was true! node is 's');
no next character: return nextLength which is 8; we have a match.

FalconUA · Answer 2 · 2015-05-30T11:43:57+0000

Well, you're thinking right: if you want to find the longest string without iterating over the entire tree, you need to store some information while building the tree.
Suppose for node i

we store the maximum length in max_depth[i]

and we remember its child which has the maximum length in max_child[i]

. So, for every new word you inserted into the trie, remember the last added node (which is also a new sheet representing the last char of your string), do the following:

current = last_inserted_leaf
while (current != root):
    if max_depth[parent[current]] < max_depth[current] + 1:
        max_depth[parent[current]] = max_depth[current] + 1
        max_child[parent[current]] = current
    current = parent[current]

And now, to output the longest line, just follow these steps:

current = root
while is_not_leaf(current):
    answer += char_of_child[max_child[current]]
    current = max_child[current]
return answer

So insert takes operations 2*n = O(n)

, and finding the longest string takes O(h)

, where h

is the length of the longest string.

However, the above algorithm takes O(n)

extra memory and this is too much. The easiest way is to store somewhere max_string

, and every time you add a string to the trie, just compare the length of yours new_string

and the lengths max_string

, and if the new length is larger, then assign max_string = new_string

. It will take less memory and the longest string will only be found in O(1)

.

How to find the longest string in TRIE

More articles: