Inverted Index: Introduction and Problem Statement

Pages: 3 (879 words) Published: November 27, 2012
Assignment: Inverted Index
October 19, 2012



Today, top search engines like Google and Yahoo use a data structure called Inverted Index for their matching of queries to the documents and give users the relevant documents according to their rank. Inverted Index is basically a mapping from a word to its position of occurence in the document. Since a word may appear more than once in the document, storing all the positions and the frequency of a word in the document gives an idea of relevance of this document for a particular word. If such an inverted index is build up for each document in the collection, then when a query is fired, a search can be done for the query in these indexes and ranking is obtained according to the frequency. Mathematically, an inverted index for a document D and strings s1 , s2 , ..., sn is of the form s1 − > a1 , a1 , ... 1 2 s2 − > a2 , a2 , ... 1 2 . . . sn − > an , an , ... 2 1 where ak denotes the lth position of k th word in the document D. l To build up this kind of data structure efficiently, Tries are used. Tries are a good data structure for strings as searching becomes very simple here with every leaf node describing one word. To build up an inverted index given a set of documents using trie, following steps are followed • Traverse one document and insert words into a trie. As a leaf node is reached, assign it a number (in increasing order) representing its location in the index (staring from 0). Add the position of this word into the index. • Now for a word which occur more than once in the document, when attempt for second insertion into the trie is made, a leaf node already containing that word would be found and its value would tell the location in the index. So simply go to this index and add another position for this word. • Do this till end of document is reached. Now, you have a trie and an inverted index for the first document. • Repeat this procedure for the rest of the documents. 1

Now follow...
Continue Reading

Please join StudyMode to read the full document

You May Also Find These Documents Helpful

  • Anorexia Nervosa
  • Introduction Statement of the Problem Essay
  • Essay on Statement of the Problem
  • Statement of the Problem Essay
  • Problem Statement Essay
  • Statement Of The Problem Essay

Become a StudyMode Member

Sign Up - It's Free