A System for Discovering Relationships by Feature Extraction from Text Databases

Conrad, Jack G.; Utt, Mary Hunter

doi:10.1007/978-1-4471-2099-5_27

A System for Discovering Relationships by Feature Extraction from Text Databases

Jack G. Conrad³ &
Mary Hunter Utt⁴

Conference paper

433 Accesses
14 Citations

Abstract

A method for accessing text-based information using domain-specific features rather than documents alone is presented. The basis of this approach is the ability to automatically extract features from large text databases, and identify statistically significant relationships or associations between those features. The techniques supporting this approach are discussed, and examples from an application using these techniques, named the Associations System, are illustrated using the Wall Street Journal database. In this particular application, the features extracted are company and person names. The series of tests run on the Associations System demonstrate that feature extraction can be quite accurate, and that the relationships generated are reliable. In addition to conventional measures of recall and precision, evaluation measures are currently being studied which will indicate the usefulness of the relationships identified, in various domain-specific contexts.

This research was performed at the Center for Intelligent Information Retrieval at the University of Massachusetts at Amherst.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. H. Thompson and W.B. Croft. Support for browsing in an intelligent text retrieval system. International Journal of Man-Machine Studies, 30: 639–668, 1989.
Article Google Scholar
P. D. Bruza and Th.P. van der Weide. Two level hypermedia. In Proceedings of the International Conference on Database and Expert Systems Applications, pp. 76–83. Springer-Verlag, 1990.
Google Scholar
D. Harman. The DARPA tipster project. ACM SIGIR Forum, 26 (2): 26–28, 1992.
Article Google Scholar
W. Lehnert and B. Sundheim. A performance evaluation of text-analysis technologies. AI Magazine, pp. 81–94, 1991.
Google Scholar
D. D. Lewis. Text representation for intelligent text retrieval: a classification-oriented view. Text-based Intelligent Systems, ed. Paul S. Jacobs, pp. 179–197, LEA Press, 1992.
Google Scholar
J. P. Callan, W.B. Croft, and S.M. Harding. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications, pp. 78–83. Springer-Verlag, 1992.
Google Scholar
M. E. Lesk and E. Schmidt. Lex—a lexical analyzer generator. In UNIX Programmer’s Manual, Bell Telephone Laboratories, Inc., 1979.
Google Scholar
L. F. Rau. Extracting company names from text. In Proceedings of the Sixth IEEE Conference on Artificial Intelligence Applications, 1991.
Google Scholar
C. L. Borgman and S.L. Siegfried. Getty’s Synoname T e and its cousins: a survey of applications of personal name-matching algorithms. JA SIS, 43 (7): 459–476, 1992.
Google Scholar
W. B. Croft, H.R. Turtle, and D.D. Lewis. The use of phrases and structured queries in information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 32–45, 1991.
Google Scholar
K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. In Proceedings of the 27th Meeting of the ACL, pp. 76–83, 1989.
Google Scholar
K. W. Church and W.A. Gale. Concordances for parallel text. In Seventh Annual Conference of the University of Waterloo Centre for the New OED and Text Research, pp. 40–62, 1991.
Google Scholar
J. K. Ousterhout. An Introduction to Tel and Th, Addison-Wesley Publishing Company, Inc., 1994.
Google Scholar
G. Salton, J. Allan, and C. Buckley. Approaches to passage retrieval in full text information systems. In Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

West Publishing Company, St. Paul, MN, 55164, USA
Jack G. Conrad
Digital Equipment Corporation, Littleton, MA, 01460, USA
Mary Hunter Utt

Authors

Jack G. Conrad
View author publications
You can also search for this author in PubMed Google Scholar
Mary Hunter Utt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Massachusetts, 01003, Amherst, MA, USA
Bruce W. Croft
Department of Computer Science, University of Glasgow, G12 8RZ, 8–17 Lilybank Gardens, Glasgow, Scotland
C. J. van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Conrad, J.G., Utt, M.H. (1994). A System for Discovering Relationships by Feature Extraction from Text Databases. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_27

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2099-5_27
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics