Multiple Label Text Categorization on a Hierarchical Thesaurus

  • Francisco J. Ribadas
  • Erica Lloves
  • Victor M. Darriba
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4739)


In this paper we describe our work on the automatic association of relevant topics, taken from a structured thesaurus, to documents written in natural languages. The approach we have followed models thesaurus topic assignment as a multiple label classification problem, where the whole set of possible classes is hierarchically organized.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  2. 2.
    Choi, J.H., Park, J.J., Yang, J.D., Lee, D.K.: An object-based approach to managing domain specific thesauri: semiautomatic thesaurus construction and query-based browsing. Technical Report TR 98/11, Dept. of Computer Science, Chonbuk National University (1998)Google Scholar
  3. 3.
    Chuang, W., Tiyyagura, A., Yang, J., Giuffrida, G.: A fast algorithm for hierarchical text classification. In: Kambayashi, Y., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 409–418. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Dumais, S., Chen, H.: Hierarchical classification of Web content. In: Proc. of ACM-SIGIR-2000, 23rd ACM Int. Conf. on Research and Development in Information Retrieval, pp. 256–263. ACM Press, New York (2000)CrossRefGoogle Scholar
  5. 5.
    Grana, J., Alonso, M.A., Vilares, M.: A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, Springer, Heidelberg (2002)Google Scholar
  6. 6.
    John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)Google Scholar
  7. 7.
    Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proc. of 14th Int. Conf. on Machine Learning, Nashville, US, pp. 170–178 (1997)Google Scholar
  8. 8.
    Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Learning to classify text from labeled and unlabeled documents. In: Proc. of the 15th National Conference on Artifical Intelligence, AAAI-1998 (1998)Google Scholar
  9. 9.
    Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge (1998)Google Scholar
  10. 10.
    Salton, G.: Automatic text processing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1988)Google Scholar
  11. 11.
    Sebastiani, F.: Machine learning in automated text categorization. In: ACM Computing Surveys, vol. 24(1), pp. 1–47. ACM Press, New York (2002)Google Scholar
  12. 12.
    Vilares, J., Alonso, M.A.: A Grammatical Approach to the Extraction of Index Terms. In: Proc. of International Conference on Recent Advances in Natural Language Processing, pp. 500–504 (2003)Google Scholar
  13. 13.
    Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Francisco J. Ribadas
    • 1
  • Erica Lloves
    • 2
  • Victor M. Darriba
    • 1
  1. 1.Departamento de Informática, University of Vigo, Campus de As Lagoas, s/n, 32004, OurenseSpain
  2. 2.Telémaco, I. D. S., S.L., Parque Tecnológico de Galicia, OurenseSpain

Personalised recommendations