Trie data structure

Let’s start with the practical task: imagine that you’re given a large dictionary and you need to implement an auto-complete.

There are a couple of operations that we need to implement:

Here’s an example set of words in a dictionary:

gone
good
goal
god
golf
gold
gum
gun

What data structure would you choose?

Arrays

Surely, the title gives it away - trie. But let’s build up an intuition for why that is from the ground up. The simplest thing we can do is to use just a plain array.

Let’s estimate a time complexity of every operation:

gonegoodgoalgodgungulfgoldgumgolf01234567insert: "gulf"search: "god"startsWith: "gol"
Dictionary example based on array

How can we improve a runtime of these operations?

Hash Map

Another approach that comes to mind, is to pick a hash map as our data structure. Let’s see how the time complexities of operations compare:

gonegoodgoalgodgungoldgumgolf
Dictionary example based on hash map

But wait, what if we store not only words, but also all the possible prefixes in the hash map as well? This approach will indeed make the search and startsWith operations O(1), but now the insert will take O(M) - where M is a length of the word that we’re inserting. There is a drawback, though - we’ve significantly inflated our memory usage.

Is there an approach that we could take that will have balance in runtime performance and memory usage?

Trie

Indeed, there is. And the data structure that we can use is called trie (or prefix tree). This kind of tree is special in the fact, that instead of storing values in nodes, it stores only one character (and marker for the end of the word).

G#UNMOONLFEDDDAL
Dictionary example based on trie

In the example above, # symbol marks a root node, and nodes marked yellow represent the end of the word. Let’s analyze time complexities of our required operations.

As you can see, we’ve achieved balance in all of our operations.

When it’s time to pick a data structure, you always need to think about a trade-offs, here’s a quick summary of how each approach compares in terms of time complexity and memory:

Data structureinsertsearchstartsWithMemory overhead
ArrayO(1) ✅O(N•M) ❌O(N•M) ❌Minimal
Hash map (words)O(1) ✅O(1) ✅O(N•M) ❌Medium
Hash map (words + prefixes)O(M) 👍O(1) ✅O(1) ✅Big
TrieO(M) 👍O(M) 👍O(M) 👍From Big to Medium

Where N = number of words, M = average word length

Implementation

To implement trie, we first need a class that represents a node:

class TrieNode {
  children: Array<TrieNode | null>;
  end: boolean;

  constructor() {
    this.children = new Array(26).fill(null);
    this.end = false;
  }
}

This is where we decide what memory overhead we’ll have - big or medium. I’ve chosen an array where each index will we either null or node, and it will point to next character in a word. Another option could be to have hash map where each key will be a character and value will be a pointer to the next node.

With that done, we can now assemble our trie:

class Trie {
  #root: TrieNode;

  constructor() {
    this.#root = new TrieNode();
  }

  insert(word: string): void {
    let node = this.#root;

    for (let i = 0; i < word.length; ++i) {
      const idx = word.charCodeAt(i) - 97;
      if (node.children[idx] === null) {
        node.children[idx] = new TrieNode();
      }
      node = node.children[idx];
    }

    node.end = true;
  }

  search(word: string): boolean {
    const node = this.#prefix(word);
    return node !== null && node.end;
  }

  startsWith(prefix: string): boolean {
    const node = this.#prefix(prefix);
    return node !== null;
  }

  #prefix(word: string): TrieNode | null {
    let node = this.#root;

    for (let i = 0; i < word.length; ++i) {
      const idx = word.charCodeAt(i) - 97;
      if (node.children[idx] === null) {
        return null;
      }
      node = node.children[idx];
    }

    return node;
  }
}

The insert method walks through the tree and inserts new nodes along the way. The #prefix method also walks through the tree but short-circuits when the next node is null and returns the node at the end. search and startsWith are neatly implemented with just using #prefix method, because, in the end the only difference in their result is to check whether we reached the end of the word or not.

Try to implement it yourself on LeetCode: 208. Implement Trie (Prefix Tree)

Disadvantages

Imagine, that we have only two words in our dictionary: database and document. While these words share the first letter, all other letters are different. In this case, while still maintaining the runtime performance characteristics, trie becomes very wasteful in memory (especially if it’s using full arrays in nodes). To combat this, there are a couple of compression techniques existing:

Conclusion

Wrapping up, while tries have a very specific use cases, I found them particularly interesting to reason about. And I hope that this blog post built up some intuition about them. So I hope that the next time when you encounter an algorithmic task of auto-complete, spell checking or even IP routing 😮 , you’ll have one more powerful tool in your toolkit to solve it!

Comments