Preview

The Apostolate

Powerful Essays
Open Document
Open Document
8252 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
The Apostolate
Evaluating Ranked Queries in Limited Time and Memory for Information Retrieval for Distributed Digital Libraries
June Boltzis
LILAC Centre, School of Library Studies Clyde College, Elgin, Australia
Abstract
Ranking techniques are used to evaluate natural-language queries on text databases. Text databases are an important component of digital libraries. Effective ranking can be costly in memory and time: the database may contain millions of documents and queries can contain large numbers of terms. These information retrieval systems must access large volumes of text, often divided into several collections that may be held on separate machines. In many environments, such as current desktop computers, standard CPU speeds and volumes of mem- ory are more than adequate to rapidly resolve queries, even on databases of many gigabytes of text. Techniques for locating answers to queries must therefore consider identification of probable collections as well as identification of documents that are probable answers, to avoid the situation in which all queries must be answered in full by all servers. In other environ- ments, however, both memory and time are limited: examples include Internet search engines, corporate data servers, online product databases, and, at the other extreme, handheld com- puters with PCIMIA-slot disk drives. In this paper we show that use of centralised blocked indexes, expressly designed for a multi-collection environment, can meet these objectives and simultaneously reduce overall query processing costs.
1 Introduction
The use of information retrieval systems for management of text data is widespread, and their use is likely to accelerate with the advent of the digital library. All of these techniques reduce the time or memory required to resolve a query. Newspaper archives, library catalogues, and legislation repositories all require access by record content if they are to be useful and effective. However, they do not necessarily bound it.



References: [BCW90] T. C. Bell, J. G. Cleary, and I. H. Witten. Text Compression. Prentice-Hall, Englewood Cliffs, New Jersey, 1990. [Dat83] C. J. Date. An Introduction to Database Systems, volume II. Addison-Wesley, Massachusetts, 1983. [FBY92] W. B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992. [GGM95] L. Gravano and J. H. Garcia-Molina. Generalising GlOSS to vector-space databases and broker hierarchies. In Proc. Int. Conf. on Very Large Databases, Zurich, Switzerland, 1995. [OV91] M. T. O ̈ zsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall, New Jersey, 1991. [PZSD96] M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Jour. of the American Society for Information Science, 47(10):749–764, 1996. [Sal89] G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989. [VGJL94] E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. The collection fusion problem. In D. K. Harman, editor, Proc. Text Retrieval Conf. (TREC), pages 95–104, Gaithersburg, Maryland, 1994. NIST Special Publication 500-225. [vR79] C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. [WMB99] I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images [ZMR98] J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453–490, 1998.

You May Also Find These Documents Helpful

  • Good Essays

    ECET 370 Week 5 Lab 5

    • 650 Words
    • 3 Pages

    Exercise 1: Review of the Lecture Content Create a project using the ArrayList class and the Main class provided in DocSharing. The ArrayList class contains implementations of the first three search methods explained in this week's lecture: sequential, sorted, and binary search. The Main class uses these three methods. These programs test the code discussed in the lecture. Compile the project, run it, and review the code that is given carefully.…

    • 650 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Nt1310 Unit 3 Study Essay

    • 3921 Words
    • 16 Pages

    |Term-Document Matrix |A frequency matrix created from digitized and organized documents (the corpus) where the columns…

    • 3921 Words
    • 16 Pages
    Good Essays
  • Powerful Essays

    MSCD600 Course Project

    • 1578 Words
    • 18 Pages

    The salespersons have been recording the customer information in the past and even in the present through their personal manual efforts. With increasing customer Strength, managing information of each individual customer is indeed a cumbersome task in file systems .This project focuses on information retrieval, which is one of the foremost problems in manual systems. It is very difficult to gather the overall performance reports of the customer. It enables us with easy access to the customer and employee records with in no time .…

    • 1578 Words
    • 18 Pages
    Powerful Essays
  • Good Essays

    The strong ability of computers that is used is the computer’s ability to retrieve information and is used to find documents relevant to the search.…

    • 310 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Documentum

    • 2306 Words
    • 10 Pages

    Howard Shao and John Newton founded Documentum in 1990. Their vision was to develop a new class of software database for automating the management of structured and unstructured documents across enterprises. Traditional databases only managed structured information that could be neatly stored in rows and columns. Such examples include but are not limited to inventory levels, financial statement and manufacturing data. These traditional databases were limited in their ability to store unstructured documents such as compound documents, graphics, electronic mail, scanned images, multi-media, training manuals, marketing collateral, and regulatory submissions. After listening to repeated complaints from customers at Ingress about the problems with unstructured data, Shao and Newton began to pioneer solutions. Initially, they worked to compile documents from different departments using relational databases. In the process of finding the solution for the unstructured data issue they founded Documentum, calling their service enterprise document management. The system’s primarily focus is the capture, storage, retrieval, and dissemination of digital files for enterprise use.…

    • 2306 Words
    • 10 Pages
    Good Essays
  • Good Essays

    Week 6 Discussion 2

    • 582 Words
    • 3 Pages

    Big data permeates every aspect of modern life. Not even the Library of Congress is beyond needing big data maintenance. In 2010, the library agreed to archive Twitter (Purcell, 2013). The need for big data management in this case is obvious. The challenge for the library lies in performing searches in the database. The ever-growing catalogue contains over 170 billion tweets; more than 130 terabytes of information (Purcell, 2013). Finding a needle in the data haystack of just one eighth of this behemoth currently takes over 24 hours (Purcell, 2013).…

    • 582 Words
    • 3 Pages
    Good Essays
  • Best Essays

    ITEC 610 Assingement 1

    • 1424 Words
    • 4 Pages

    Pardede, E. (2009) Open and Novel Issues in XML Database Applications:Future Directions and Advanced Technologies. Published by IGI Global. Chapter 1 and 2.…

    • 1424 Words
    • 4 Pages
    Best Essays
  • Good Essays

    When accounting files are sent to the archives at the end of the year, the portion taken up by the accounts payable documents usually exceeds that of all other documents combined. For some companies with high accounts payable files, it is a major expense to remove all the paperwork, box it up and identify it, and ship it off to a warehouse, from which it must be recalled occasionally for various tasks. Digitizing the documents is a means of avoiding the expense of archiving. Digitizing a document means that it is laid on a scanner that converts the document image into an electronic image stored in the computer database, which can be recalled by anyone with access to the database. To digitize a document, there should be a high-speed scanner available that is linked to a computer network. Documents are fed into the scanner and assigned one or more index numbers or codes, so that it will be easy to recall the correct documents from storage. For example, a document can be indexed by its purchase order number, date, or supplier number. A combination of several indexes is the best approach, since one can still recall a document, even if one does not remember the first index number. The document images are usually stored on an optical disk since it can hold enormous amounts of storage space (and digitized documents take up a lot of computer storage space). There will probably be many optical disks to provide a sufficient amount of storage, so the disks are usually stored in a “jukebox,” which gives the user access to all the data on all the storage disks. Users can then call up the images from any terminal that is linked to the network where the information is stored.…

    • 556 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Literature Search

    • 952 Words
    • 4 Pages

    1. Use a library database such as CINAHL Plus with full text for your search.…

    • 952 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Database Environment

    • 1121 Words
    • 5 Pages

    A database defines a structure for storing information and it collects information that is organized in such a way that a computer program can quickly select desired pieces of data. A database can also be thought of as an electronic filing system. Data and information are extracted from a database by creating a query and then submitting it to the query database management system (DBMS) and it is posed in a language that only the DBMS can understand. The query can be in the form of a question or just a keyword and once these queries run against the database, it will find a matching record (Reynolds, 2004) .…

    • 1121 Words
    • 5 Pages
    Powerful Essays
  • Good Essays

    References: Coronel, C., Morris, S., & Rob, P. (2012). Database systems. (10th ed.). Independence, KY: Cengage.…

    • 782 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Product Development

    • 969 Words
    • 4 Pages

    References: Alcatel - Lucent | Company Overview. (2006 - 2010). Retrieved April 30, 2010, from Alcatel -…

    • 969 Words
    • 4 Pages
    Powerful Essays
  • Good Essays

    Idars

    • 4985 Words
    • 20 Pages

    Gartner RAS Core Research Note G00140780, Kenneth Chin, Toby Bell, 27 June 2006, R1997 06262007…

    • 4985 Words
    • 20 Pages
    Good Essays
  • Better Essays

    This essay is concerned with critically analysing the work “The History of Information Retrieval Research” by M. Sanderson and W.B Croft. The writing being analysed here acts as a timeline for how information retrieval systems developed from “pre-history” (1) up until the 1990’s, and speculates the future of such tools. While tracing this evolution, Sanderson and Croft explain the various factors which influenced the particular way in which information retrieval systems have matured. Considering the length, language style and level of detail, this work is most appropriate as an introductory tool for students attempting to develop a grounding in information retrieval systems.…

    • 842 Words
    • 4 Pages
    Better Essays
  • Best Essays

    Image Retrieval Using Ann

    • 3358 Words
    • 14 Pages

    Previously the information was primarily text based. But with the rapid growth in the field of computer network and low cost permanent storage media, the shapes of information become more interactive. The people are accessing more multimedia files than the past. In past, images, videos and audio files were only used for the entertainment purpose but nowadays these are the major source of information. Because of intense dependency on multimedia files for information searching, to obtain a desired result is a major problem as the search engine searches within the text associated with the multimedia files, instead…

    • 3358 Words
    • 14 Pages
    Best Essays

Related Topics