Advertisement

COLE Experiments at CLEF 2003 in the Spanish Monolingual Track

  • Jesús Vilares
  • Miguel A. Alonso
  • Francisco J. Ribadas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3237)

Abstract

In this our second participation in the CLEF Spanish monolingual track, we have continued applying Natural Language Processing techniques for single word and multi-word term conflation. Two different conflation approaches have been tested. The first approach is based on the lemmatization of the text in order to avoid inflectional variation. Our second approach consists of the employment of syntactic dependencies as complex index terms, in an attempt to solve the problems derived from syntactic variation and, in this way, to obtain more precise terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers.

Keywords

Natural Language Processing Content Word Cole Experiment Dependency Pair Balance Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    ftp://ftp.cs.cornell.edu/pub/smart (site visited, October 2003)
  2. 2.
    http://www.clef-campaign.org (site visited, October 2003)
  3. 3.
    http://www.itl.nist.gov (site visited, October 2003)
  4. 4.
    Abney, S.: Partial parsing via finite-state cascades. Natural Language Engineering 2(4), 337–344 (1997)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Arampatzis, A., van der Weide, T., Koster, C., van Bommel, P.: Linguistically motivated information retrieval. In: Encyclopedia of Library and Information Science. Marcel Dekker, Inc., New York (2000)Google Scholar
  6. 6.
    Barcala, F.M., Vilares, J., Alonso, M.A., Graña, J., Vilares, M.: Tokenization and proper noun recognition for information retrieval. In: Tjoa, A.M., Wagner, R.R. (eds.) Thirteenth International Workshop on Database and Expert Systems Applications, pp. 246–250. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  7. 7.
    Brants, T.: TnT - a statistical part-of-speech tagger. In: Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP 2000), Seattle (2000)Google Scholar
  8. 8.
    Buckley, C.: Implementation of the SMART information retrieval system. Technical report, Department of Computer Science, Cornell University (1985), Source code available at [1]Google Scholar
  9. 9.
    Graña, J.: Técnicas de Análisis Sintáctico Robusto para la Etiquetación del Lenguaje Natural. PhD thesis, University of La Coruña, La Coruña, Spain (2000)Google Scholar
  10. 10.
    Graña, J., Alonso, M.A., Vilares, M.: A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 3–10. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Graña, J., Barcala, F.M., Alonso, M.A.: Compilation methods of minimal acyclic automata for large dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Graña, J., Barcala, F.M., Vilares, J.: Formal methods of tokenization for part-of-speech tagging. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 240–249. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Graña, J., Chappelier, J.-C., Vilares, M.: Integrating external dictionaries into stochastic part-of-speech taggers. In: Proceedings of the Euroconference Recent Advances in Natural Language Processing (RANLP 2001), Tzigov Chark, Bulgaria, pp. 122–128 (2001)Google Scholar
  14. 14.
    Hull, D.A., Grefenstette, G., Schulze, B.M., Gaussier, E., Schutze, H., Pedersen, J.O.: Xerox TREC-5 site report: routing, filtering, NLP, and Spanish tracks. In: Proceedings of the Fifth Text REtrieval Conference (TREC-5), pp. 167–180 (1997)Google Scholar
  15. 15.
    Jacquemin, C., Tzoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval. Text, Speech and Language Technology, vol. 7, pp. 25–74. Kluwer Academic Publishers, Dordrecht (1999)Google Scholar
  16. 16.
    Perez-Carballo, J., Strzalkowski, T.: Natural language information retrieval: progress report. Information Processing and Management 36(1), 155–178 (2000)CrossRefGoogle Scholar
  17. 17.
    Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. In: Voorhees, E., Harman, D.K. (eds.) Proceedings of the Eighth Text REtrieval Conference (TREC-8), pp. 151–161. NIST Special Publication 500-264 (2000)Google Scholar
  18. 18.
    Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs (1971)Google Scholar
  19. 19.
    Savoy, J.: Report on CLEF 2002 Experiments: Combining Multiple Sources of Evidence. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) Advances in Cross-Language Information Retrieval: Results of the CLEF 2002 Evaluation Campaign. LNCS, vol. 2785, pp. 66–90. Springer, Heidelberg (2003)Google Scholar
  20. 20.
    Savoy, J., Le Calve, A., Vrajitoru, D.: Report on the TREC-5 experiment: Data fusion and collection fusion. In: Proceedings of TREC’5, pp. 489–502. NIST publication #500-238, Gaithersburg (1997)Google Scholar
  21. 21.
    Vilares, J., Alonso, M.A.: A Grammatical Approach to the Extraction of Index Terms. In: Angelova, G., Bontcheva, K., Mitkov, R., Nicolov, N. (eds.) Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria, pp. 500–504 (2003)Google Scholar
  22. 22.
    Vilares, J., Alonso, M.A., Ribadas, F.J.: COLE experiments at CLEF 2003 Spanish monolingual track. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 197–206. Springer, Heidelberg (2004), Available at [2]Google Scholar
  23. 23.
    Vilares, J., Alonso, M.A., Ribadas, F.J., Vilares, M.: COLE experiments in the CLEF 2002 Spanish monolingual track. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) Advances in Cross-Language Information Retrieval: Results of the CLEF 2002 Evaluation Campaign. LNCS, vol. 2785, pp. 265–278. Springer, Heidelberg(2003)Google Scholar
  24. 24.
    Vilares, J., Barcala, F.M., Alonso, M.A.: Using syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 381–390. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  25. 25.
    Vilares, J., Cabrero, D., Alonso, M.A.: Applying productive derivational morphology to term indexing of Spanish texts. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 336–348. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  26. 26.
    Vogt, C., Cottrell, G.W.: Fusion via a linear combination of scores. Information Retrieval 1(3), 151–173 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Jesús Vilares
    • 1
  • Miguel A. Alonso
    • 1
  • Francisco J. Ribadas
    • 2
  1. 1.Departamento de ComputaciónUniversidade da CoruñaLa CoruñaSpain
  2. 2.Escuela Superior de Ingeniería InformáticaUniversidade de VigoOrenseSpain

Personalised recommendations