How to find the longest string in TRIE
I've already implemented DS Trie into 2 classes: Trie and TrieNode. I need to write a function that returns the longest string in Trie in O (h). My TrieNode has a LinkedList field that stores the children of each Node. We still haven't learned about BFS or DFS, so I'm trying to think of some creative way to solve it.
I already have a function (SEPARATE function) that inserts / creates a new Node to the given char: When creating a Trie: create a Node with a field "maxDepth = 0" indicating what is my current depth. For each new Node I created, I will iterate through it up to its parent (each Node already has a pointer to the parent) and so on, until I reach the root and increase its parent's depth by 1. Now, I will create a function. which returns the longest string this way: For each Node: I iterate through my children, find the maximum "maxDepth" number than go down. do this until you reach 'maxDepth == 0'. For example, my algorithm will work fine for this line: "aacgace"
root
/ \
(2)a g(0)
/
(1)c
/
(0)e
=> 'ace' is actually the longest. But this line won't work fine: "aacgae"
root
/ \
(2)a g(0)
/ \
(0)c (0)e
=> it seems that Node 'a' has a child that also has a child, but it doesn't.
In general, I'm trying to use the first function, which creates Trie (Running Time: O (h * c)), so the execution time of the second function (which returns the longest string) will be less than I can. O (h)
source to share
Not sure what you really want to do, but you can find a trie example here .
Basically I go through the builder to create a trie; give a short description of how the word is added to the trie:
// In TrieBuilder
final TrieNodeBuilder nodeBuilder = new TrieNodeBuilder();
// ...
/**
* Add one word to the trie
*
* @param word the word to add
* @return this
* @throws IllegalArgumentException word is empty
*/
public TrieBuilder addWord(@Nonnull final String word)
{
Objects.requireNonNull(word);
final int length = word.length();
if (length == 0)
throw new IllegalArgumentException("a trie cannot have empty "
+ "strings (use EMPTY instead)");
nrWords++;
maxLength = Math.max(maxLength, length);
nodeBuilder.addWord(word);
return this;
}
This defers adding a word to the TrieNodeBuilder, which does this:
private boolean fullWord = false;
private final Map<Character, TrieNodeBuilder> subnodes
= new TreeMap<>();
TrieNodeBuilder addWord(final String word)
{
doAddWord(CharBuffer.wrap(word));
return this;
}
/**
* Add a word
*
* <p>Here also, a {@link CharBuffer} is used, which changes position as we
* progress into building the tree, character by character, node by node.
* </p>
*
* <p>If the buffer is "empty" when entering this method, it means a match
* must be recorded (see {@link #fullWord}).</p>
*
* @param buffer the buffer (never null)
*/
private void doAddWord(final CharBuffer buffer)
{
if (!buffer.hasRemaining()) {
fullWord = true;
return;
}
final char c = buffer.get();
TrieNodeBuilder builder = subnodes.get(c);
if (builder == null) {
builder = new TrieNodeBuilder();
subnodes.put(c, builder);
}
builder.doAddWord(buffer);
}
Let's say we add "nuisance" and "alarm" to the trie; what's happening:
- the first time, nodes are created for each individual problem symbol;
- the second time, all nodes up to "l" exist; then all nodes are created for "ing".
Now, if we add "problems", another node will be created for "s" after "e".
The variable fullWord
indicates whether we have a potential complete match; here is the search function:
public final class Trie
{
private final int nrWords;
private final int maxLength;
private final TrieNode node;
// ...
/**
* Search for a string into this trie
*
* @param needle the string to search
* @return the length of the match (ie, the string) or -1 if not found
*/
public int search(final String needle)
{
return node.search(needle);
}
// ...
}
And in TrieNode
we have:
public final class TrieNode
{
private final boolean fullWord;
private final char[] nextChars;
private final TrieNode[] nextNodes;
// ...
public int search(final String needle)
{
return doSearch(CharBuffer.wrap(needle), fullWord ? 0 : -1, 0);
}
/**
* Core search method
*
* <p>This method uses a {@link CharBuffer} to perform searches, and changes
* this buffer position as the match progresses. The two other arguments
* are the depth of the current search (ie the number of nodes visited
* since root) and the index of the last node where a match was found (ie
* the last node where {@link #fullWord} was true.</p>
*
* @param buffer the charbuffer
* @param matchedLength the last matched length (-1 if no match yet)
* @param currentLength the current length walked by the trie
* @return the length of the match found, -1 otherwise
*/
private int doSearch(final CharBuffer buffer, final int matchedLength,
final int currentLength)
{
/*
* Try and see if there is a possible match here; there is if "fullword"
* is true, in this case the next "matchedLength" argument to a possible
* child call will be the current length.
*/
final int nextLength = fullWord ? currentLength : matchedLength;
/*
* If there is nothing left in the buffer, we have a match.
*/
if (!buffer.hasRemaining())
return nextLength;
/*
* OK, there is at least one character remaining, so pick it up and see
* whether it is in the list of our children...
*/
final int index = Arrays.binarySearch(nextChars, buffer.get());
/*
* If not, we return the last good match; if yes, we call this same
* method on the matching child node with the (possibly new) matched
* length as an argument and a depth increased by 1.
*/
return index < 0
? nextLength
: nextNodes[index].doSearch(buffer, nextLength, currentLength + 1);
}
}
Notice how -1 is passed as the nextLength argument in the first call doSearch()
.
Suppose we have three with three words above, here is the call sequence to search for "tr" that fails:
- doSearch ("tr", -1, 0) (node - root);
- doSearch ("tr", -1, 1) (node is 't');
- doSearch ("tr", -1, 2) (node is 'r');
- no next character: return nextLength; nextLength is -1, no match.
Now, if we have "problems":
- doSearch ("troubles", -1, 0) (node - root);
- doSearch ("troubles", -1, 1) (node is 't');
- doSearch ("problems", -1, 2) (node is 'r');
- doSearch ("problems", -1, 3) (node is 'o');
- doSearch ("troubles", -1, 4) (node is 'u');
- doSearch ("problems", -1, 5) (node is 'b');
- doSearch ("problems", -1, 6) (node is 'l');
- doSearch ("problems", -1, 7) (node is 'e');
- doSearch ("problems", 7, 8) (full text was true! node is 's');
- no next character: return nextLength which is 8; we have a match.
source to share
Well, you're thinking right: if you want to find the longest string without iterating over the entire tree, you need to store some information while building the tree.
Suppose for node i
we store the maximum length in max_depth[i]
and we remember its child which has the maximum length in max_child[i]
. So, for every new word you inserted into the trie, remember the last added node (which is also a new sheet representing the last char of your string), do the following:
current = last_inserted_leaf
while (current != root):
if max_depth[parent[current]] < max_depth[current] + 1:
max_depth[parent[current]] = max_depth[current] + 1
max_child[parent[current]] = current
current = parent[current]
And now, to output the longest line, just follow these steps:
current = root
while is_not_leaf(current):
answer += char_of_child[max_child[current]]
current = max_child[current]
return answer
So insert takes operations 2*n = O(n)
, and finding the longest string takes O(h)
, where h
is the length of the longest string.
However, the above algorithm takes O(n)
extra memory and this is too much. The easiest way is to store somewhere max_string
, and every time you add a string to the trie, just compare the length of yours new_string
and the lengths max_string
, and if the new length is larger, then assign max_string = new_string
. It will take less memory and the longest string will only be found in O(1)
.
source to share