The CNIDR ISEARCH Text Searching System
Features of the 1.10 Release

Isearch is a software system for searching though large amounts of text.
The system allows a user to very quickly find out what documents are available
that contain certain words.  Unlike older search systems, Isearch does not
use a list of keywords or an abstract; every word of every document can
be checked.  This allows greatly improved chances of discovering new
information in old collections.

As an example, consider this real-world example:  CNIDR uses Isearch to index
and search a collection of over 2000 AIDS-related patents issued by the
U.S. Patent and Trademark Office.  This collection of XXX megabytes of
raw text can be searched in less than 1 second.  A researcher looking for
patents containing either the word "needle" or the word "syringe" can submit
the query and get results back about as fast as his desktop machine can display
them.

ISEARCH Features:

*) Searches large collections using a Free-Text search: no reliance on
   keywords, abstracts, or human-generated indexes.
   
*) Handles very large collections: over 1 gigabyte (1 million megabyte)
   collections can be handled on modest servers.  Essentially unlimited
   textbases can be searched with careful layout and planning.

*) Very sophisticated result sorting: The documents most likely to be
   useful are returned first.  Ranking is based on statistical analysis
   of word frequencies and is generalized for a wide variety of subjects
   and user skill levels.
   
*) Fast: documents are machine-indexed before searching, so non-matching
   documents needn't be read in.  Fast enough to make optical media
   a reasonable solution, and extremely responsive with cheap SCSI disks.
   
*) Works well with OCR document storage and retrieval systems: no need
   for people to classify documents, and the statistical ranking method
   is forgiving of OCR errors.  Potentially millions of pages can be
   made searchable for little more than photocopy costs.
   
*) Handles a wide range of document types: can handle text in formats from
   raw ASCII dumps to richly formatted SGML.  Convenient doctype interface
   allows handling of entirely new and unusual formats in a matter of hours.
   Good supply of free and commercial doctypes available from third parties.
   
*) Efficient use of disk resources: Indexes are relatively compact, generally
   smaller than the original collection, and yet contain references to every
   word in the textbase.

*) Text maintenance commands: old documents can be deleted instantly and new
   data can be added without having to re-index the entire collection.
   
*) Portable and Scalable: works well on Unix machines from Linux PCs to Crays.
   Takes advantage of Very Large Memory (VLM) technology for Digital
   AlphaServers. Support for Windows NT in 3Q96.
   
*) Integrates smoothly with World Wide Web (WWW) and ANSI Z39.50 servers:
   Anyone can search an Isearch textbase using their favorite web browser.
   When used with CNIDR's Isite package, Isearch can be used through a Z39.50
   session to interoperate with library automation software.  Isearch and
   Isite together form a three-tier client-server architecture to allow
   essentially unlimited capacity growth.

*) Easy to customize: The modular, object-oriented structure of Isearch
   means that new features can be added independently of the Isearch core.
   Third party extension is facilitated by using well-defined Application
   Programming Interfaces (APIs) implemented in C++.

