Abstract
At the earlier stage of the knowledge acquisition process, interviews of experts produce a large amount of rich but ill-structured texts. Knowledge engineers need some tool to help them in the exploitation of all these texts. We propose the use of a statistical method, the top-down hierarchical classification and a new interpretation of its results. The initial statistical analysis proposed by M. Reinert [16, 17] gives two kinds of results: first a segmentation of texts that reflects their “semantic contexts” that we use to raise structures of texts, and second, classes of significant terms belonging to these contexts, which can be related to the experts or to these specialities. In this paper, we describe the method, its empirical validity and a comparison with similar approaches, its uses with examples and results. We conclude with some research directions to extend the exploitation of the analysis results.
All examples come from a research supported by the French “Ministère de la Recherche et de l'Espace” under contract n.92 C 0757 and the French “Ministère de l'Equipement, des Transports et du Tourisme” under contract n. 93.0003.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
J.P. Benzecri. L'analyse des Données. Dunod, Paris, 1973.
D. Bourigault. Lexter, a terminology extraction software for knowledge acquisition from texts. In Proceedings of the 9th Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, Canada, 1995.
E. Charniak. Statistical language learning. Bradford books. The MIT Press, Cambridge, Mass., 1993.
J. Chaumier and M. Dejean. L'indexation assistée par ordinateur, principes et méthodes. Documentaliste — Sciences de l'information, 29(1):3–6, 1992.
C. Desjardins, C. Riccardi-Rigault, P. Plante, L. Dumas, and F. Henri. ACTIA. In F. Maurer, editor, 2nd Knowledge Engineering Forum, number SFB 501 Bericht 01/96, Kaiserslautern University, Germany, 1996.
S.K. Fall, T.C. Crawford, S.L. Souders, and M.J. Rabin. Automated knowledge acquisition technics for intelligence analysts. AAI, 1095 of SPIE:66–77, 1989.
B.R. Gaines and M.L.G. Shaw. Using knowledge acquisition and representation tools to support scientific communities. In Proceedings of the Twelve National Conference on Artificial Intelligence, volume 1, pages 707–712. AAAI Press, 1994.
M.A. Hearst. Multi-paragraph segmentation of expository text. In Proceedings of the 32nd Annual Meeting of the ACL, Las Cruces, NM, June 1994.
P.S. Jacobs. Using statistical methods to improve knowledge-based news categorization. IEEE Expert, 8(2):13–23, April 1993.
K.A. Kaufman, R.S. Michalsky, and L. Kershberg. Knowledge extraction from database: design principles of the INLEN system. In Proceedings of the 6th International Symposium on Methodology for Intelligent Systems, number 542 in LNCS, pages 152–161, Berlin, 1991. Springer-Verlag.
M. Kendall and A. Stuart. Inference and Relationship, volume 2 of The advanced Theory of Statistics. Charles Griffin and Co Ltd, 1979.
S. Lapalut. Text clustering to support knowledge acquisition from documents. Technical Report RR-2639, INRIA U.R. de Sophia Antipolis, BP 93, 06902 Sophia Antipolis Cedex, 1995. ftp://ftp.inria.fr/INRIA/publication/RR/RR-2639.ps.gz.
S. Lapalut. How to handle multiple expertise from several experts: a general text clustering approach. In F. Maurer, editor, 2nd Knowledge Engineering Forum, number SFB 501 Bericht 01/96, Kaiserslautern University, Germany, 1996.
B. Moulin and D. Rousseau. Automated knowledge acquisition from regulatory texts. IEEE Expert, 7(5):27–35, October 1992.
M.S. Register and N. Kannan. A hybrid architecture for text classification. In Fourth International conference on Tools with Artificial Intelligence, TAI'92, pages 286–92, Arlington, VA, USA, 1992. IEEE Compu. Soc. Press.
M. Reinert. Classification descendante hiérarchique pour l'analyse de contenu et traitement statistique de corpus. PhD thesis, Université Paris 6, Paris, 1979.
M. Reinert. Notice du logiciel ALCESTE, version 2.0, 1992.
M.L.G. Shaw and B.R. Gaines. KITTEN: Knowledge initiation and transfert tools for experts and novices. Int. J. Man-Machine Studies, 27:251–280, 1987.
Z.B. Wu, L.S. Hsu, and C.L Tan. A survey on statistical approaches to natural language processing, Technical report TRA4/92, Departement of Information Systems and Computer Science, National University of Singapore, Kent Ridge, Singapore 0511, April 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lapalut, S. (1996). Text clustering to help knowledge acquisition from documents. In: Shadbolt, N., O'Hara, K., Schreiber, G. (eds) Advances in Knowledge Acquisition. EKAW 1996. Lecture Notes in Computer Science, vol 1076. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61273-4_8
Download citation
DOI: https://doi.org/10.1007/3-540-61273-4_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61273-5
Online ISBN: 978-3-540-68391-9
eBook Packages: Springer Book Archive