Skip to main content

Conditional Random Fields for Spanish Named Entity Recognition Using Unsupervised Features

  • Conference paper
  • First Online:
  • 1190 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10022))

Abstract

Unsupervised features based on word representations such as word embeddings and word collocations have shown to significantly improve supervised NER for English. In this work we investigate whether such unsupervised features can also boost supervised NER in Spanish. To do so, we use word representations and collocations as additional features in a linear chain Conditional Random Field (CRF) classifier. Experimental results (82.44 % F-score on the CoNLL-2002 corpus) show that our approach is comparable to some state-of-art Deep Learning approaches for Spanish, in particular when using cross-lingual word representations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.cnts.ua.ac.be/conll2002/ner/.

  2. 2.

    https://code.google.com/archive/p/sofia-ml/.

  3. 3.

    http://github.com/linetcz/spanish-ner.

  4. 4.

    http://crscardellino.me/SBWCE/.

  5. 5.

    http://www.cnts.ua.ac.be/conll2000/chunking/conlleval.txt.

References

  1. Bhattarai, B.: Inducing cross-lingual word representations. Master’s thesis, Multimodal Computing and Interaction, Machine Learning for Natural Language Processing. Universität des Saarlandes (2013)

    Google Scholar 

  2. Bouma, G.: Normalized (pointwise) mutual information in collocation extraction. In: Chiarcos, C., de Castilho, E., Stede, M. (eds.) Von der Form zur Bedeutung: Texte automatisch verarbeiten/From Form to Meaning: Processing Texts Automatically, Proceedings of the Biennial GSCL Conference 2009, pp. 31–40. Gunter Narr Verlag, Tübingen (2009)

    Google Scholar 

  3. Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)

    Google Scholar 

  4. Cardellino, C.: Spanish Billion Words Corpus and Embeddings (2016). http://crscardellino.me/SBWCE/

  5. Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL-2002, Taipei, Taiwan, pp. 167–170 (2002)

    Google Scholar 

  6. Collobert, R., Weston, J.: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 160–167. ACM, New York (2008)

    Google Scholar 

  7. dos Santos, C., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the Fifth Named Entity Workshop, July, Beijing, China, pp. 25–33. Association for Computational Linguistics (2015)

    Google Scholar 

  8. Faruqui, M., Padó, S.: Training and evaluating a German named entity recognizer with semantic generalization. In: Proceedings of KONVENS 2010, Saarbrücken, Germany (2010)

    Google Scholar 

  9. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, Stroudsburg, PA, USA, pp. 363–370. Association for Computational Linguistics (2005)

    Google Scholar 

  10. Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual Language Processing from Bytes. ArXiv e-prints (2015)

    Google Scholar 

  11. Guo, J., Che, W., Wang, H., Liu, T.: Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), October, Doha, Qatar, pp. 110–120. Association for Computational Linguistics (2014)

    Google Scholar 

  12. Lample, G., Ballesteros, M., Kawakami, K., Subramanian, S., Dyer, C: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT (NAACL 2016), San Diego, US (2016)

    Google Scholar 

  13. Liang, P.: Semi-supervised learning for natural language. Master’s thesis, Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology (2005)

    Google Scholar 

  14. Mikolov, T., Chen, K., Corrado, G., Dean. J.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013a)

    Google Scholar 

  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119 (2013b)

    Google Scholar 

  16. Okazaki, N.: CRFsuite: a fast implementation of conditional random fields (CRFs) (2007)

    Google Scholar 

  17. Passos, A., Kumar, V., McCallum, A.: Lexicon infused phrase embeddings for named entity resolution. In: Proceedings of the Eighteenth Conference on Computational Natural Language Learning, June, Ann Arbor, Michigan, pp. 78–86. Association for Computational Linguistics (2014)

    Google Scholar 

  18. Poulsen, S.: Collocations as a language resource. A functional and cognitive study in English phraseology. Ph.D. dissertation, Institute of Language and Communication. University of Southern Denmark (2005)

    Google Scholar 

  19. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, Stroudsburg, PA, USA, pp. 147–155. Association for Computational Linguistics (2009)

    Google Scholar 

  20. Sculley, D.: Combined regression and ranking. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, pp. 979–988. ACM, New York (2010)

    Google Scholar 

  21. Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2012)

    Article  MATH  Google Scholar 

  22. Tjong Kim Sang, E.F.: Language-independent named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning - vol. 20, COLING-02, Stroudsburg, PA, USA, pp. 1–4. Association for Computational Linguistics (2002)

    Google Scholar 

  23. Turian, J., Ratinov, L., Bengio, Y.: A simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394. Association for Computational Linguistics (2010)

    Google Scholar 

  24. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, San Francisco, CA, USA, pp. 412–420. Morgan Kaufmann Publishers Inc. (1997)

    Google Scholar 

  25. Yang, Z., Salakhutdinov, R., Cohen, W.: Multi-task cross-lingual sequence tagging from scratch. CoRR, abs/1603.06270 (2016)

    Google Scholar 

  26. Yu, M., Zhao, T., Dong, D., Tian, H., Yu, D.: Compound embedding features for semi-supervised learning. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9–14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pp. 563–568 (2013)

    Google Scholar 

  27. Yu, M., Zhao, T., Bai, Y., Tian, H., Yu. D.: Cross-lingual projections between languages from different families. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria, pp. 312–317. Association for Computational Linguistics (2013)

    Google Scholar 

Download references

Acknowledgments

We are grateful to the Data and Web Science Group at University of Mannheim. Special thanks to Heiner Stuckenschmidt and Simone Ponzetto for their contributions and comments. This work was supported by the Master Program in Computer Science at Universidad Católica San Pablo and the Peruvian National Fund of Scientific and Technological Development through grant number 011-2013-FONDECYT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenny Copara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Copara, J., Ochoa, J., Thorne, C., Glavaš, G. (2016). Conditional Random Fields for Spanish Named Entity Recognition Using Unsupervised Features. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47955-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47954-5

  • Online ISBN: 978-3-319-47955-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics