Abstract
Natural language processing contributes to improved retrieval performance by extracting from natural language text information about terms and their relationships. This information is far richer than what is obtained with term frequency methods that assume the statistical independence of terms. Once acquired, this linguistic knowledge may be reflected in retrieval and filtering systems in either a modified query or in a modified document incorporating this information. We can expect natural language processing to improve filtering and retrieval performance if, and only if, the application of linguistically derived information increases the ability of the retrieval system to discriminate between documents of differing relevance. While linguistic knowledge may be obtained through purely statistical analysis, humans may extract this same information without using massive number crunching capabilities, and it is likely that, for automated systems, linguistic methods may be ultimately simpler and faster at extracting information that improves retrieval performance than are methods that explicitly incorporate higher order statistical dependencies. We present a model of grammatical parsing and part-of-speech tagging that allows us to make specific claims about the level of retrieval and filtering performance that will be obtained when linguistic knowledge is incorporated. The model provides both upper and lower bounds for performance with the best-case and worst-case part-of-speech tagging.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
He multiplies words without knowledge.
—Job 35:16
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Losee, R.M. (1998). Linguistic Knowledge. In: Text Retrieval and Filtering. The Information Retrieval Series, vol 3. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5705-0_10
Download citation
DOI: https://doi.org/10.1007/978-1-4615-5705-0_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7612-5
Online ISBN: 978-1-4615-5705-0
eBook Packages: Springer Book Archive