How the Web Search Engine Works

Only available on StudyMode
  • Download(s) : 87
  • Published : November 15, 2011
Open Document
Text Preview

DEFINITION: A web search engine is designed to search for information on the World Wide Web. The search results are generally presented in a list of results and are often called hits. The information may consist of web pages, images, information and other types of files. Some search engines also mine data available in databases or open directories. Unlike Web directories, which are maintained by human editors, search engines operate algorithmically or are a mixture of algorithmic and human input. INTRODUCTION

Wikipedia defines a search engine as: ‘a program designed to help find information stored on a computer system such as the World Wide Web, or a personal computer. The search engine allows one to ask for content meeting specific criteria (typically those containing a given word or phrase) and retrieving a list of references that match those criteria. Search engines use regularly updated indexes to operate quickly and efficiently.’ In other words, a search engine is a sophisticated piece of software, accessed through a page on a website that allows you to search the web by entering search queries into a search box. The search engine then attempts to match your search query with the content of web pages that is has stored, or cached, and indexed on its powerful servers in advance of your search.

Technical Considerations
|Server platforms supported |Unix, NT, Win'95/98/NT | |Web servers supported |NCSA HTTPD, CERN HTTPD, OMNI HTTPD, XITAMI, APACHE, PWS, IIS | |Scalability |Indexing support for multiple web servers within an intranet | |Technical support: |E- Mail , Mailing list , Documentation on Web site | |Main program modules | | |Source code availability | | |Ease of Installation and Maintenance |Often related to the technical expertise available |

Indexing features

|File/document formats |HTML, ASCII, PDF, SQL, Spread sheets, WYSIWYG (MS-Word, WP, etc.) | |supported | | |Indexing level support |File/directory level, multi-record files | |Standard formats recognised |MARC, Medline, etc | |Customisation of document formats | | |Stemming |If yes, is this an optional or mandatory feature? | |Stop words support |If yes, is this an optional or mandatory feature?...
tracking img