Skip to main content

Contextual text representation for unsupervised knowledge discovery in texts

  • Papers
  • Conference paper
  • First Online:
  • 1664 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1394))

Abstract

This paper studies the role of lexical contextual relations for the problem of unsupervised knowledge discovery in full texts. Narrative texts have inherent structure dictated by language usage in generating them. We suggest that the relative distance of terms within a text gives sufficient information about its structure and its relevant content. Furthermore, this structure can be used to discover implicit knowledge embedded in the text, therefore serving as a good candidate to represent effectively the text content for knowledge elicitation tasks. We qualitatively demonstrate that a useful text structure and content can be systematically extracted by collocational lexical analysis without the need to encode any supplemental sources of knowledge. We present an algorithm that systematically extracts the most relevant facts in the texts and labels them by their overall theme, dictated by local contextual information. It exploits domain independent lexical frequencies and mutual information measures to find the relevant contextual units in the texts. We report results from experiments in a real-world textual database of psychiatric evaluation renorts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Varol Akman and Mehmet Surav. Steps toward formalizing context. AI Magazine, 17(3):55–72, Fall 1996.

    Google Scholar 

  2. Eric Brill and Raymond J. Mooney. An overview of empirical natural language processing. AI Magazine, 18(4):13–24, Winter 1997.

    Google Scholar 

  3. P. Brown and J. Cocke et al. A statistical approach to machine translation. Computational Linguistics, 16(2):79–85, jun 1990.

    Google Scholar 

  4. Kenneth Church, William Gale, Patrick Hanks, and Donald Hindle. Using statistics in lexical analysis. In U. Zernick, editor, Lexical acquisition: exploiting on-line resources to build a lexicon, chapter 6, pages 115–164. LEA, Hillsdale, NJ, 1991.

    Google Scholar 

  5. Ronen Feldman and Ido Dagan. Knowledge discovery in textual databases (kdt). In U. Fayyad and R. Uthurusany, editors, Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 112–117, 1995.

    Google Scholar 

  6. Marti A. Hearst. Context and structure in automated full-text information access. PhD thesis, Computer Science Division, University of California at Berkeley, 1994.

    Google Scholar 

  7. Stephane Lapalut. Text clustering to support knowledge acquisition from documents. Research Report 2639, INRIA, aug 1995.

    Google Scholar 

  8. Christopher Manning. Automatic acquisition of large subcategorization dictionary from corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 235–242, 1993.

    Google Scholar 

  9. Patrick Perrin. Contextual Representation and Learning for Unsupervised Knowledge Discovery in Texts. PhD thesis, Computer Science Department, Tulane University, New Orleans, LA, March 1997.

    Google Scholar 

  10. Patrick Perrin and Fred Petry. Fuzzy feature analysis for unsupervised knowledge discovery in narrative texts. In Proceedings of the 6th International Conference on Fuzzy Systems (FUZZ-IEEE'97), Barcelona, Spain. IEEE, jul 1997.

    Google Scholar 

  11. Patrick Perrin and Fred Petry. On lexical contextual relations for the unsupervised discovery of texts features. In H. Liu and H. Motoda, editors, Feature Extraction, Construction, and Selection-A Data Mining Perspective. Kluwer Academic, 1998. (to appear).

    Google Scholar 

  12. Ellen Riloff. Automatically generating extraction patterns from untagged text. In AAAI Press/MIT Press, editor, Proceedings of the 13th National Conference on Artificial Intelligence, pages 1044–1049, Menlo Park, CA, 1996.

    Google Scholar 

  13. John Rotondo. Clustering analysis of subjective partitions of text. Discourse Processes, 7:69–88, 1984.

    Article  Google Scholar 

  14. Claude Shannon and Warren Weaver. The mathematical theory of communication. University of Illinois Press, Urbana, IL, 1963.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Perrin, P., Petry, F. (1998). Contextual text representation for unsupervised knowledge discovery in texts. In: Wu, X., Kotagiri, R., Korb, K.B. (eds) Research and Development in Knowledge Discovery and Data Mining. PAKDD 1998. Lecture Notes in Computer Science, vol 1394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64383-4_21

Download citation

  • DOI: https://doi.org/10.1007/3-540-64383-4_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64383-8

  • Online ISBN: 978-3-540-69768-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics