Skip to main content
Book cover

SIGIR ’94 pp 260–270Cite as

A System for Discovering Relationships by Feature Extraction from Text Databases

  • Conference paper

Abstract

A method for accessing text-based information using domain-specific features rather than documents alone is presented. The basis of this approach is the ability to automatically extract features from large text databases, and identify statistically significant relationships or associations between those features. The techniques supporting this approach are discussed, and examples from an application using these techniques, named the Associations System, are illustrated using the Wall Street Journal database. In this particular application, the features extracted are company and person names. The series of tests run on the Associations System demonstrate that feature extraction can be quite accurate, and that the relationships generated are reliable. In addition to conventional measures of recall and precision, evaluation measures are currently being studied which will indicate the usefulness of the relationships identified, in various domain-specific contexts.

This research was performed at the Center for Intelligent Information Retrieval at the University of Massachusetts at Amherst.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. H. Thompson and W.B. Croft. Support for browsing in an intelligent text retrieval system. International Journal of Man-Machine Studies, 30: 639–668, 1989.

    Article  Google Scholar 

  2. P. D. Bruza and Th.P. van der Weide. Two level hypermedia. In Proceedings of the International Conference on Database and Expert Systems Applications, pp. 76–83. Springer-Verlag, 1990.

    Google Scholar 

  3. D. Harman. The DARPA tipster project. ACM SIGIR Forum, 26 (2): 26–28, 1992.

    Article  Google Scholar 

  4. W. Lehnert and B. Sundheim. A performance evaluation of text-analysis technologies. AI Magazine, pp. 81–94, 1991.

    Google Scholar 

  5. D. D. Lewis. Text representation for intelligent text retrieval: a classification-oriented view. Text-based Intelligent Systems, ed. Paul S. Jacobs, pp. 179–197, LEA Press, 1992.

    Google Scholar 

  6. J. P. Callan, W.B. Croft, and S.M. Harding. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications, pp. 78–83. Springer-Verlag, 1992.

    Google Scholar 

  7. M. E. Lesk and E. Schmidt. Lex—a lexical analyzer generator. In UNIX Programmer’s Manual, Bell Telephone Laboratories, Inc., 1979.

    Google Scholar 

  8. L. F. Rau. Extracting company names from text. In Proceedings of the Sixth IEEE Conference on Artificial Intelligence Applications, 1991.

    Google Scholar 

  9. C. L. Borgman and S.L. Siegfried. Getty’s Synoname T e and its cousins: a survey of applications of personal name-matching algorithms. JA SIS, 43 (7): 459–476, 1992.

    Google Scholar 

  10. W. B. Croft, H.R. Turtle, and D.D. Lewis. The use of phrases and structured queries in information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 32–45, 1991.

    Google Scholar 

  11. K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. In Proceedings of the 27th Meeting of the ACL, pp. 76–83, 1989.

    Google Scholar 

  12. K. W. Church and W.A. Gale. Concordances for parallel text. In Seventh Annual Conference of the University of Waterloo Centre for the New OED and Text Research, pp. 40–62, 1991.

    Google Scholar 

  13. J. K. Ousterhout. An Introduction to Tel and Th, Addison-Wesley Publishing Company, Inc., 1994.

    Google Scholar 

  14. G. Salton, J. Allan, and C. Buckley. Approaches to passage retrieval in full text information systems. In Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag London Limited

About this paper

Cite this paper

Conrad, J.G., Utt, M.H. (1994). A System for Discovering Relationships by Feature Extraction from Text Databases. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2099-5_27

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-19889-5

  • Online ISBN: 978-1-4471-2099-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics