Skip to main content

Text clustering to help knowledge acquisition from documents

  • Eliciting Knowledge from Textual and Other Sources
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1076))

Abstract

At the earlier stage of the knowledge acquisition process, interviews of experts produce a large amount of rich but ill-structured texts. Knowledge engineers need some tool to help them in the exploitation of all these texts. We propose the use of a statistical method, the top-down hierarchical classification and a new interpretation of its results. The initial statistical analysis proposed by M. Reinert [16, 17] gives two kinds of results: first a segmentation of texts that reflects their “semantic contexts” that we use to raise structures of texts, and second, classes of significant terms belonging to these contexts, which can be related to the experts or to these specialities. In this paper, we describe the method, its empirical validity and a comparison with similar approaches, its uses with examples and results. We conclude with some research directions to extend the exploitation of the analysis results.

All examples come from a research supported by the French “Ministère de la Recherche et de l'Espace” under contract n.92 C 0757 and the French “Ministère de l'Equipement, des Transports et du Tourisme” under contract n. 93.0003.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.P. Benzecri. L'analyse des Données. Dunod, Paris, 1973.

    Google Scholar 

  2. D. Bourigault. Lexter, a terminology extraction software for knowledge acquisition from texts. In Proceedings of the 9th Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, Canada, 1995.

    Google Scholar 

  3. E. Charniak. Statistical language learning. Bradford books. The MIT Press, Cambridge, Mass., 1993.

    Google Scholar 

  4. J. Chaumier and M. Dejean. L'indexation assistée par ordinateur, principes et méthodes. Documentaliste — Sciences de l'information, 29(1):3–6, 1992.

    Google Scholar 

  5. C. Desjardins, C. Riccardi-Rigault, P. Plante, L. Dumas, and F. Henri. ACTIA. In F. Maurer, editor, 2nd Knowledge Engineering Forum, number SFB 501 Bericht 01/96, Kaiserslautern University, Germany, 1996.

    Google Scholar 

  6. S.K. Fall, T.C. Crawford, S.L. Souders, and M.J. Rabin. Automated knowledge acquisition technics for intelligence analysts. AAI, 1095 of SPIE:66–77, 1989.

    Google Scholar 

  7. B.R. Gaines and M.L.G. Shaw. Using knowledge acquisition and representation tools to support scientific communities. In Proceedings of the Twelve National Conference on Artificial Intelligence, volume 1, pages 707–712. AAAI Press, 1994.

    Google Scholar 

  8. M.A. Hearst. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Annual Meeting of the ACL, Las Cruces, NM, June 1994.

    Google Scholar 

  9. P.S. Jacobs. Using statistical methods to improve knowledge-based news categorization. IEEE Expert, 8(2):13–23, April 1993.

    Google Scholar 

  10. K.A. Kaufman, R.S. Michalsky, and L. Kershberg. Knowledge extraction from database: design principles of the INLEN system. In Proceedings of the 6th International Symposium on Methodology for Intelligent Systems, number 542 in LNCS, pages 152–161, Berlin, 1991. Springer-Verlag.

    Google Scholar 

  11. M. Kendall and A. Stuart. Inference and Relationship, volume 2 of The advanced Theory of Statistics. Charles Griffin and Co Ltd, 1979.

    Google Scholar 

  12. S. Lapalut. Text clustering to support knowledge acquisition from documents. Technical Report RR-2639, INRIA U.R. de Sophia Antipolis, BP 93, 06902 Sophia Antipolis Cedex, 1995. ftp://ftp.inria.fr/INRIA/publication/RR/RR-2639.ps.gz.

    Google Scholar 

  13. S. Lapalut. How to handle multiple expertise from several experts: a general text clustering approach. In F. Maurer, editor, 2nd Knowledge Engineering Forum, number SFB 501 Bericht 01/96, Kaiserslautern University, Germany, 1996.

    Google Scholar 

  14. B. Moulin and D. Rousseau. Automated knowledge acquisition from regulatory texts. IEEE Expert, 7(5):27–35, October 1992.

    Google Scholar 

  15. M.S. Register and N. Kannan. A hybrid architecture for text classification. In Fourth International conference on Tools with Artificial Intelligence, TAI'92, pages 286–92, Arlington, VA, USA, 1992. IEEE Compu. Soc. Press.

    Google Scholar 

  16. M. Reinert. Classification descendante hiérarchique pour l'analyse de contenu et traitement statistique de corpus. PhD thesis, Université Paris 6, Paris, 1979.

    Google Scholar 

  17. M. Reinert. Notice du logiciel ALCESTE, version 2.0, 1992.

    Google Scholar 

  18. M.L.G. Shaw and B.R. Gaines. KITTEN: Knowledge initiation and transfert tools for experts and novices. Int. J. Man-Machine Studies, 27:251–280, 1987.

    Google Scholar 

  19. Z.B. Wu, L.S. Hsu, and C.L Tan. A survey on statistical approaches to natural language processing, Technical report TRA4/92, Departement of Information Systems and Computer Science, National University of Singapore, Kent Ridge, Singapore 0511, April 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Nigel Shadbolt Kieron O'Hara Guus Schreiber

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lapalut, S. (1996). Text clustering to help knowledge acquisition from documents. In: Shadbolt, N., O'Hara, K., Schreiber, G. (eds) Advances in Knowledge Acquisition. EKAW 1996. Lecture Notes in Computer Science, vol 1076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61273-4_8

Download citation

  • DOI: https://doi.org/10.1007/3-540-61273-4_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61273-5

  • Online ISBN: 978-3-540-68391-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics