Contextual text representation for unsupervised knowledge discovery in texts

Perrin, Patrick; Petry, Fred

doi:10.1007/3-540-64383-4_21

Contextual text representation for unsupervised knowledge discovery in texts

Patrick Perrin⁹ &
Fred Petry⁹

Papers
Conference paper
First Online: 01 January 2005

1664 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1394))

Abstract

This paper studies the role of lexical contextual relations for the problem of unsupervised knowledge discovery in full texts. Narrative texts have inherent structure dictated by language usage in generating them. We suggest that the relative distance of terms within a text gives sufficient information about its structure and its relevant content. Furthermore, this structure can be used to discover implicit knowledge embedded in the text, therefore serving as a good candidate to represent effectively the text content for knowledge elicitation tasks. We qualitatively demonstrate that a useful text structure and content can be systematically extracted by collocational lexical analysis without the need to encode any supplemental sources of knowledge. We present an algorithm that systematically extracts the most relevant facts in the texts and labels them by their overall theme, dictated by local contextual information. It exploits domain independent lexical frequencies and mutual information measures to find the relevant contextual units in the texts. We report results from experiments in a real-world textual database of psychiatric evaluation renorts.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Varol Akman and Mehmet Surav. Steps toward formalizing context. AI Magazine, 17(3):55–72, Fall 1996.
Google Scholar
Eric Brill and Raymond J. Mooney. An overview of empirical natural language processing. AI Magazine, 18(4):13–24, Winter 1997.
Google Scholar
P. Brown and J. Cocke et al. A statistical approach to machine translation. Computational Linguistics, 16(2):79–85, jun 1990.
Google Scholar
Kenneth Church, William Gale, Patrick Hanks, and Donald Hindle. Using statistics in lexical analysis. In U. Zernick, editor, Lexical acquisition: exploiting on-line resources to build a lexicon, chapter 6, pages 115–164. LEA, Hillsdale, NJ, 1991.
Google Scholar
Ronen Feldman and Ido Dagan. Knowledge discovery in textual databases (kdt). In U. Fayyad and R. Uthurusany, editors, Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 112–117, 1995.
Google Scholar
Marti A. Hearst. Context and structure in automated full-text information access. PhD thesis, Computer Science Division, University of California at Berkeley, 1994.
Google Scholar
Stephane Lapalut. Text clustering to support knowledge acquisition from documents. Research Report 2639, INRIA, aug 1995.
Google Scholar
Christopher Manning. Automatic acquisition of large subcategorization dictionary from corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 235–242, 1993.
Google Scholar
Patrick Perrin. Contextual Representation and Learning for Unsupervised Knowledge Discovery in Texts. PhD thesis, Computer Science Department, Tulane University, New Orleans, LA, March 1997.
Google Scholar
Patrick Perrin and Fred Petry. Fuzzy feature analysis for unsupervised knowledge discovery in narrative texts. In Proceedings of the 6th International Conference on Fuzzy Systems (FUZZ-IEEE'97), Barcelona, Spain. IEEE, jul 1997.
Google Scholar
Patrick Perrin and Fred Petry. On lexical contextual relations for the unsupervised discovery of texts features. In H. Liu and H. Motoda, editors, Feature Extraction, Construction, and Selection-A Data Mining Perspective. Kluwer Academic, 1998. (to appear).
Google Scholar
Ellen Riloff. Automatically generating extraction patterns from untagged text. In AAAI Press/MIT Press, editor, Proceedings of the 13th National Conference on Artificial Intelligence, pages 1044–1049, Menlo Park, CA, 1996.
Google Scholar
John Rotondo. Clustering analysis of subjective partitions of text. Discourse Processes, 7:69–88, 1984.
Article Google Scholar
Claude Shannon and Warren Weaver. The mathematical theory of communication. University of Illinois Press, Urbana, IL, 1963.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tulane University, 70118, New Orleans, LA, U.S.A.
Patrick Perrin & Fred Petry

Authors

Patrick Perrin
View author publications
You can also search for this author in PubMed Google Scholar
Fred Petry
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Software Engineering, Monash university, 900 Dandenong Road, Caulfield East, Victoria, 3145, Australia
Xindong Wu
Department of Computer Science, The University of Melbourne, Parkville, Victoria, 3052, Australia
Ramamohanarao Kotagiri
School of Computer Science and Engineering, Monash university, Clayton, Victoria, 3168, Australia
Kevin B. Korb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perrin, P., Petry, F. (1998). Contextual text representation for unsupervised knowledge discovery in texts. In: Wu, X., Kotagiri, R., Korb, K.B. (eds) Research and Development in Knowledge Discovery and Data Mining. PAKDD 1998. Lecture Notes in Computer Science, vol 1394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64383-4_21

Download citation

DOI: https://doi.org/10.1007/3-540-64383-4_21
Published: 25 August 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64383-8
Online ISBN: 978-3-540-69768-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics