Text Categorization prior to Indexing for the CISMEF Health Catalogue

  • Alexandrina Rogozan
  • Aurélie Néveol
  • Stefan J. Darmoni
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2780)


This paper is positioned within the development of an automated indexing system for the CISMeF quality controlled health gateway. For disambiguation purposes, we wish to perform text categorization prior to indexing. Hence, a global approach contrasting with the classical analytical methods based on the analysis of keyword counts extracted from the text is necessary. The use of statistical compression models enables us to proceed avoiding keyword extraction at this stage. Preliminary results show that althought this method is not as precise as others in terms of resource categorization, it can significantly benefit indexing.


Support Vector Machine Text Categorization Medical Context Compression Model Automatic Indexing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Koch, T.: Quality-controlled subject gateways: definitions, typologies, empirical overview. Subject gateways, Special issue of ”Online Information Review” 24(1), 24–34 (2000)Google Scholar
  2. 2.
    Pouliquen, B.: Indexation de document médicaux par extraction de concepts, et ses utilisation, PhD thesis (2002)Google Scholar
  3. 3.
    Wiener, W., Pedersen, J., Weigend, A.: A neural network approach to topic spotting. In: Proc. of the Symposimum on Document Analysis and Information Retrieval, pp. 317–332 (1995)Google Scholar
  4. 4.
    Dumais, S., Osuna, E., Platt, J., Schölkopf, B.: Using SVMs for text categorization. Hearst, M. (ed.) IEEE Intelligent Systems Magazine, Trends and Controversies 13(4), 18–28 (1998)Google Scholar
  5. 5.
    Wilcox, A., Hripcsak, G.: Classification Algorithms Applied to Narrative Reports. In: Proc of Symp. in AMIA (1999) Google Scholar
  6. 6.
    Néveol, A., Soualmia, L.S., Rogozan, A., Douyère, M., Darmoni, S.J.: Utilisation des propriétés sémantiques de la terminologie CISMeF pour la catégorisation de ressources de santé, à paraître dans Actes des Journées Francophones d’Informatique Médicale (2003)Google Scholar
  7. 7.
    Kosala, R., Blockeel, H.: Web Mining Research: A Survey. ACM SIGKDD 2(1), 1–15 (2000)CrossRefGoogle Scholar
  8. 8.
    Teahan, W., Harper, D.: Using compression based language models for text categorization. In: Callan, J., Croft, B., Lafferty, J. (eds.) Workshop on Language Modelling and Information Retrieval, pp. 83–88 (2001)Google Scholar
  9. 9.
    Soualmia, L.F., Thirion, B., Leroy, J.P., Douyère, M., Darmoni. S.J.: Modélisation et représentation des connaissances dans un catalogue de santé, dans les Actes des Journées Francophones d’Ingénierie des Connaissances 2002, pp. 139-149 (2002)Google Scholar
  10. 10.
    Darmoni, S.J., Leroy, J.P., Baudic, F., Douyère, M., Piot, J., Thirion, B.: CISMeF: a structured health resource guide. Methods of Information in Medicine 39(1), 30–35 (2000)Google Scholar
  11. 11.
    Cleary, T.C., Witten, J.G.: Data compression using adaptive coding and partial string matching. IEEE Transaction on Communications 32(4), 396–402 (1984)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Alexandrina Rogozan
    • 1
  • Aurélie Néveol
    • 1
    • 2
  • Stefan J. Darmoni
    • 1
    • 2
  1. 1.PSI LaboratoryFRE 2645 CNRS – INSA de RouenSaint-Etienne-du-RouvrayFrance
  2. 2.CISMeF et L@sticsRouen University Hospital and Rouen Medical SchoolRouenFrance

Personalised recommendations