The Apostolate

Search engine indexing Information retrieval Cache (computing)

Evaluating Ranked Queries in Limited Time and Memory for Information Retrieval for Distributed Digital Libraries June Boltzis
LILAC Centre, School of Library Studies Clyde College, Elgin, Australia Abstract
Ranking techniques are used to evaluate natural-language queries on text databases. Text databases are an important component of digital libraries. Effective ranking can be costly in memory and time: the database may contain millions of documents and queries can contain large numbers of terms. These information retrieval systems must access large volumes of text, often divided into several collections that may be held on separate machines. In many environments, such as current desktop computers, standard CPU speeds and volumes of mem- ory are more than adequate to rapidly resolve queries, even on databases of many gigabytes of text. Techniques for locating answers to queries must therefore consider identification of probable collections as well as identification of documents that are probable answers, to avoid the situation in which all queries must be answered in full by all servers. In other environ- ments, however, both memory and time are limited: examples include Internet search engines, corporate data servers, online product databases, and, at the other extreme, handheld com- puters with PCIMIA-slot disk drives. In this paper we show that use of centralised blocked indexes, expressly designed for a multi-collection environment, can meet these objectives and simultaneously reduce overall query processing costs. 1 Introduction

The use of information retrieval systems for management of text data is widespread, and their use is likely to accelerate with the advent of the digital library. All of these techniques reduce the time or memory required to resolve a query. Newspaper archives, library catalogues, and legislation repositories all require access by record content if they are to be useful and effective. However, they do not necessarily bound it....
tracking img