Clustering of Texts using Semantic Graphs. Application to Open-ended Questions in Surveys

  • Monica Bécue Bertaut
  • Ludovic Lebart
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


A methodology for the automatic classification of short texts is proposed (leading cases are responses to open-ended questions in sample surveys, titles or abstracts of papers in documentary data bases). It aims to take into account a graph structure on the variables (elementary text units). This graph could be a semantic graph provided by an external source, or a co-occurrence graph, built from the corpus itself.


Semantic Network Hierarchical Classification Function Word Projection Pursuit Lexical Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aluja Banet, T., Lebart, L. (1984): Local and Partial Principal Component Analysis and Correspondence Analysis. COMPSTAT Proceedings, 113–118, Physica Verlag, Vienna.Google Scholar
  2. Art, D., Gnanadesikan, R., and Kettenring, J.R. (1982): Data Based Metrics for Cluster Analysis. Utilitas Mathematica, 21 A, 75–99.MathSciNetGoogle Scholar
  3. Bécue, M. (1991): Anaysis de Datos Textuales. Metodos Estadisticos y Algoritmos. CISIA, Paris.Google Scholar
  4. Burtschy, B., Lebart, L. (1991): Contiguity analysis and projection pursuit, in: Applied Stochastic Models and Data Analysis, Gutierrez R. and Valderrama M.J., (eds), World scientific, Singapore, 117–128.Google Scholar
  5. Cazes, P., Moreau, J. (1991): Analysis of a contingency table in which the rows and columns have a graph structure. in: Symbolic and Numeric Data Analysis and Learning, Diday E., and Lechevallier Y. (eds), 271–280, Novascience publisher, New York.Google Scholar
  6. Celeux, G., Hebrail, G., Mkhadri, A., Suchard, M. (1991): Reduction of a large scale and ill-conditionned problem on textual data. in: Applied Stochastic Model and Data Analysis, Gutierrez R. and Valerrama N., J. (eds.), World Scientific, Singapore, 129–137.Google Scholar
  7. Church, K. W., Hanks, P. (1990): Words association norms, mutual information and lexicography. Computational Linguistics, 16, 22–29.Google Scholar
  8. Escofier, B. (1989): Multiple correspondence analysis and neighboring relation Data Analysis Learning Symbolic and Numeric knowledge, Diday E. (eds), 55–62, Novascience publisher, New York.Google Scholar
  9. Fumas, G. W. et al. (1988): Information retrieval using a singular value decomposition model of latent semantic structure. Proceedings of the 14th ACM Conference on R. and D. in Information Retrieval, 465–480.Google Scholar
  10. Gordon, A.D. (1996): Hierarchical Classification. in: Clustering and Classification. P. Arabie, L. J. Hubert, G. De Soete (eds.) World Scientific, River Edge, NJ.Google Scholar
  11. Harris, Z. S. (1954): Distributional Structure. Word, 2–3, 146–162.Google Scholar
  12. Hayashi, C., Suzuki, T., Sasaki, M. (1992): Data Analysis for Social Comparative Research: International Perspective. North-Holland, Amsterdam.Google Scholar
  13. Iwayama, M., Tokunaga, T. (1995): Cluster-based text categorization: a comparison of category search strategies. in: ACM/SIGIR’95, ( Fox E. A, Ingwersen P., Fidel R., eds), Seattle, WA, USA, 273–280.Google Scholar
  14. Lebart, L. (1969): Analyse Statistique de la contiguité. Publication de l’ISUP, 28, 81–1 1Google Scholar
  15. Lebart, L., Salem, A. (1994): Statistique Textuelle. Dunod, Paris.Google Scholar
  16. Lebart, L., Salem, A., Berry, E. (1991): Recent development in the statistical processing of textual data, Applied Stoch. Model and Data Analysis, 7, 47–62.CrossRefGoogle Scholar
  17. Lewis, D. D., Croft, W. (1990): Term clustering of syntactic phrases. SIGIR- 90,. 385–404.Google Scholar
  18. Salem, A. (1995): Les unités lexicométriques. Analisi Statistica dei Dati Testuali, Bolasco et al. (eds), 19–27, CISU, Roma.Google Scholar
  19. Salton, G., Mc Gill, M.J. (1983): Introduction to Modem Information Retrieval,International Student Edition.Google Scholar
  20. Sasaki, M., Suzuki, T. (1989): New directions in the study of general social attitudes: trends and cross-national perspectives, Behaviormetrika, 26, 9–30.CrossRefGoogle Scholar

Copyright information

© Springer Japan 1998

Authors and Affiliations

  • Monica Bécue Bertaut
    • 1
  • Ludovic Lebart
    • 2
  1. 1.Universitat Politecnica de CatalunyaBarcelonaSpain
  2. 2.Centre National de la Recherche ScientifiqueENSTParisFrance

Personalised recommendations