Text clustering to help knowledge acquisition from documents

Lapalut, Stéphane

doi:10.1007/3-540-61273-4_8

Text clustering to help knowledge acquisition from documents

Stéphane Lapalut¹

Eliciting Knowledge from Textual and Other Sources
Conference paper
First Online: 01 January 2005

169 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1076))

Abstract

At the earlier stage of the knowledge acquisition process, interviews of experts produce a large amount of rich but ill-structured texts. Knowledge engineers need some tool to help them in the exploitation of all these texts. We propose the use of a statistical method, the top-down hierarchical classification and a new interpretation of its results. The initial statistical analysis proposed by M. Reinert [16, 17] gives two kinds of results: first a segmentation of texts that reflects their “semantic contexts” that we use to raise structures of texts, and second, classes of significant terms belonging to these contexts, which can be related to the experts or to these specialities. In this paper, we describe the method, its empirical validity and a comparison with similar approaches, its uses with examples and results. We conclude with some research directions to extend the exploitation of the analysis results.

All examples come from a research supported by the French “Ministère de la Recherche et de l'Espace” under contract n.92 C 0757 and the French “Ministère de l'Equipement, des Transports et du Tourisme” under contract n. 93.0003.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

J.P. Benzecri. L'analyse des Données. Dunod, Paris, 1973.
Google Scholar
D. Bourigault. Lexter, a terminology extraction software for knowledge acquisition from texts. In Proceedings of the 9th Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, Canada, 1995.
Google Scholar
E. Charniak. Statistical language learning. Bradford books. The MIT Press, Cambridge, Mass., 1993.
Google Scholar
J. Chaumier and M. Dejean. L'indexation assistée par ordinateur, principes et méthodes. Documentaliste — Sciences de l'information, 29(1):3–6, 1992.
Google Scholar
C. Desjardins, C. Riccardi-Rigault, P. Plante, L. Dumas, and F. Henri. ACTIA. In F. Maurer, editor, 2nd Knowledge Engineering Forum, number SFB 501 Bericht 01/96, Kaiserslautern University, Germany, 1996.
Google Scholar
S.K. Fall, T.C. Crawford, S.L. Souders, and M.J. Rabin. Automated knowledge acquisition technics for intelligence analysts. AAI, 1095 of SPIE:66–77, 1989.
Google Scholar
B.R. Gaines and M.L.G. Shaw. Using knowledge acquisition and representation tools to support scientific communities. In Proceedings of the Twelve National Conference on Artificial Intelligence, volume 1, pages 707–712. AAAI Press, 1994.
Google Scholar
M.A. Hearst. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Annual Meeting of the ACL, Las Cruces, NM, June 1994.
Google Scholar
P.S. Jacobs. Using statistical methods to improve knowledge-based news categorization. IEEE Expert, 8(2):13–23, April 1993.
Google Scholar
K.A. Kaufman, R.S. Michalsky, and L. Kershberg. Knowledge extraction from database: design principles of the INLEN system. In Proceedings of the 6th International Symposium on Methodology for Intelligent Systems, number 542 in LNCS, pages 152–161, Berlin, 1991. Springer-Verlag.
Google Scholar
M. Kendall and A. Stuart. Inference and Relationship, volume 2 of The advanced Theory of Statistics. Charles Griffin and Co Ltd, 1979.
Google Scholar
S. Lapalut. Text clustering to support knowledge acquisition from documents. Technical Report RR-2639, INRIA U.R. de Sophia Antipolis, BP 93, 06902 Sophia Antipolis Cedex, 1995. ftp://ftp.inria.fr/INRIA/publication/RR/RR-2639.ps.gz.
Google Scholar
S. Lapalut. How to handle multiple expertise from several experts: a general text clustering approach. In F. Maurer, editor, 2nd Knowledge Engineering Forum, number SFB 501 Bericht 01/96, Kaiserslautern University, Germany, 1996.
Google Scholar
B. Moulin and D. Rousseau. Automated knowledge acquisition from regulatory texts. IEEE Expert, 7(5):27–35, October 1992.
Google Scholar
M.S. Register and N. Kannan. A hybrid architecture for text classification. In Fourth International conference on Tools with Artificial Intelligence, TAI'92, pages 286–92, Arlington, VA, USA, 1992. IEEE Compu. Soc. Press.
Google Scholar
M. Reinert. Classification descendante hiérarchique pour l'analyse de contenu et traitement statistique de corpus. PhD thesis, Université Paris 6, Paris, 1979.
Google Scholar
M. Reinert. Notice du logiciel ALCESTE, version 2.0, 1992.
Google Scholar
M.L.G. Shaw and B.R. Gaines. KITTEN: Knowledge initiation and transfert tools for experts and novices. Int. J. Man-Machine Studies, 27:251–280, 1987.
Google Scholar
Z.B. Wu, L.S. Hsu, and C.L Tan. A survey on statistical approaches to natural language processing, Technical report TRA4/92, Departement of Information Systems and Computer Science, National University of Singapore, Kent Ridge, Singapore 0511, April 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Projet ACACIA, INRIA Sophia Antipolis, BP 93, 06 902, Sophia Antipolis Cedex, France
Stéphane Lapalut

Authors

Stéphane Lapalut
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nigel Shadbolt Kieron O'Hara Guus Schreiber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lapalut, S. (1996). Text clustering to help knowledge acquisition from documents. In: Shadbolt, N., O'Hara, K., Schreiber, G. (eds) Advances in Knowledge Acquisition. EKAW 1996. Lecture Notes in Computer Science, vol 1076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61273-4_8

Download citation

DOI: https://doi.org/10.1007/3-540-61273-4_8
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61273-5
Online ISBN: 978-3-540-68391-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics