Python | Trie Data Structure


Trie Data Structure


What is a Trie?


Trie Data Structure is an efficient information reTrieval data structure. When using Trie Data Structure, the search complexities of code can be brought to optimal limits (key length). It is a tree of nodes that supports Search, Insertion, and Deletion operations. The Search operation returns the value of the key string, the Insert operation inserts a new string (the key) and value into the trie, and the Delete operation deletes the key-value pair from the trie. The major setback working with Trie Data Structure is on the storage requirements.

Today's Agenda

In this post we will learn how to write a python code for Prefix - Searching using Trie Data Structure. The input for the program will be a string paragraph and a prefix. The expected output is the suggestion of all words from that paragraph that matches the given prefix. The whole code for the Trie Data Structure is run using O(1) time and O(n) space, where "n" is the number of words in the input paragraph.


Algorithm
Trie Tree with all the Keys

In the Trie Data Structure, each node consists of multiple sub-nodes(branches). Each branch is used to represent a character of keys(words). We mark the last node of every key as the end of the word node, depicted by is_leaf. For example, the structure to represent all words of the English alphabets is : 

# class to create a node

class Trie:

 

  def __init__(self):

      # each node has a dictionary which holds the

       # information about all of its children

      # is_leaf is used to indicate if it's the last letter for

      # that particular word

      self.children = defaultdict()

      self.is_leaf = False


When inserting a key in the Trie Data Structure, each character of the input key is inserted as an individual Trie node.  The children are a defaultdict() which stores all the next level trie nodes. When the input key is a new one or an extension of the existing key, we add non-existing nodes of the key and mark the end of the word as the last node; whereas in case the input key is a prefix then, we mark that last node as the end of the word.

For searching a key we compare the characters and move along the key nodes until we encounter the end of the string or no key in the Trie Data Structure. 

Lexicographic sorting of a set of keys can be accomplished by building a trie from them, with the children of each node sorted lexicographically, and traversing it in pre-order, printing only the leaves' values. This algorithm is a form of radix sort.

The above image shows all the possible words formed by traversing through the Trie Tree. The traversal starts from the root and ends when we reach the last node of the trie for each branch. NULL is the root node and 'a', 'c', 'd', 'd', 'g' are the leaf nodes(in left to right format).


Applications of Trie Data Structure
  • Trie Data Structure carries out the insert, search, and delete operations in O(L) time complexity, where "L" is the length of the key.
  • It is faster than a Binary Search Tree (BST).
  • Trie works faster than the Hashing function also as in Trie Data Structure there needs to be no hash function and no collision handling.
  • The Prefix-Searching is a major advantage of the Trie Data Structure implementation.
  • In order to print all the words in a lexicographically sorted manner, Trie Data Structure comes to the rescue as it can be done easily without any hashing required.

Let's Get Started

Input: 
Enter Paragraph: In computer science, a data structure is a data organization, management, and storage for information that enables efficient access and modification. more precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data.

Enter Prefix: s

Output: 

['science', 'storage', 'structure']

Enter Prefix: m

Output: 

['management', 'modification', 'more']

Enter Prefix: ‘’ or ‘ ’

Output: 

[]

Note:  

defaultdict is a container like dictionaries present in the module collections. It is a sub-class of the dict class that returns a dictionary-like object. We use defaultdict because it never raises a KeyError, instead provides a default value for the key that does not exists.


# import required packages

from collections import defaultdict

# class to create a node
class Trie:
 
  def
__init__(self):

       # each node has a dictionary which holds the  

       # information about all of its children

       # is_leaf is used to indicate if it's the last letter for

       # that particular word


      self.children = defaultdict()
      self.is_leaf = False



# class that holds the methods to insert and search

# for suggestions
class TrieNode:

  def __init__(self):
      self.root = self.get_node()


  @staticmethod
  def get_node():


      # creates a new node
      return Trie()


  @staticmethod
  def get_key(ch):


       # returns the key to be stored in the dictionary
      return ord(ch) - ord('a')



   # function to insert the word into the tree

   # :param word: word to be inserted


   def insert(self, word):
      root = self.root
      length = len(word)


      for i in range(length):
          key = self.get_key(word[i])


          if key not in root.children:
              root.children[key] = self.get_node()
          root = root.children.get(key)


      root.is_leaf = True



   # function which matches the prefix to retrieve suggestion
  # :param prefix: prefix to be matched
  # :return: list of matched words


  def prefix_match(self, prefix):
      root = self.root
      length = len(prefix)


      for i in range(length):
          key = self.get_key(prefix[i])


          if not root.children.get(key):
              return []


          else:
              root = root.children.get(key)


      suggestion_list = []
      self.traverse(root, list(prefix), suggestion_list)
      return [''.join(suggestion) for suggestion in suggestion_list]



   # function to traverse down the tree to complete the  

   # word from the given prefix
  # :param root: the node from which traversal starts
  # :param prefix: the entered prefix
  # :param suggestion_list: list to be output to the user


  def traverse(self, root, prefix, suggestion_list):
      if root.is_leaf:
          suggestion_list.append(prefix[:])


      for key in root.children.keys():
          prefix.append(chr(key + ord('a')))
          self.traverse(root.children.get(key), prefix, suggestion_list)
          prefix.pop(-1)


# Driver Program
if __name__ == '__main__':


  # flag to check if the prefix entered is valid or not
  is_invalid_prefix = False

  # Creating empty root node
  trie = TrieNode()


  paragraph = input("Enter Paragraph: ")
  word_list = list(paragraph.lower().split())


   for word in sorted(word_list):


      # inserting word by word
      trie.insert(word)


  prefix = input("Enter Prefix: ")


  if prefix ==
"":
      is_invalid_prefix = True
      print("[]")


  else:
      for char in prefix:
          if char.isspace() or char.isdigit() or char ==
"":
              is_invalid_prefix = True
              print("[]")


  if not is_invalid_prefix:
      # if the entered prefix is valid, fetch suggestions
      print(trie.prefix_match(prefix))








Comments