Negative dictionary; Stopwords
Stoplists are lists of words, commonly called stopwords, which are not indexed in an information retrieval system, and/or are not available for use as query terms. A stoplist can be created by sorting the terms in a document collection by frequency of occurrence, and designating some number of high frequency terms as stopwords, or alternately, by using one of the published lists of stopwords available. Stoplists may be generic or domain specific, and are of course language specific. When a stoplist is used for indexing, as a document is added to the system, each word in it is checked against the stoplist (for example through dictionary lookup or hashing), and those which match are eliminated from further processing. In some systems, stopwords are indexed, but the stoplist is used to eliminate the words from processing when they are used as query terms.
Hans Peter Luhn, in pioneering work on automatic abstracting, put forward...
- 1.Dialog online courses: glossary of search terms. Available at: http://training.dialog.com/onlinecourses/glossary/glossary_life.html.
- 3.Fox C. Lexical analysis and stoplists. In: Frakes WB, Baeza-Yates R, editors. Information retrieval: data structures and algorithms. Englewood Cliffs: Prentice-Hall; 1992. p. 102–30.Google Scholar
- 4.Google Web Search Help Center. Search basics: use of common words. Available at: http://www.google.com/support/bin/answer.py?answer=981.
- 5.Korfhage RR. Information storage and retrieval. Wiley: Wiley Computer Pub; 1997.Google Scholar
- 9.Parkins PV. Approaches to vocabulary management in permuted-title indexing of Biological Abstracts. In: Proceedings of the 26th Annual Meeting on American Documentation Institute; 1963. p. 27–9.Google Scholar