Skip to main content

Searching a Mixed Corpus in the Light of the New Portuguese Orthographic Norm

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7243))

Abstract

A mixed corpus of Portuguese is one in which texts of different origins produce different spelling variants for the same word. A new norm, which will bring together the written texts produced both in Portugal and Brazil, giving then a more uniform orthography, has been effective since 2009, but what happens in the perspective of search, to corpora created before the norm came into practice, or within the transition period? Is the information they contain outdated and worthless? Do they need to be converted to the new norm? In the present work we analyse these questions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Costa, L.: 20th Century Esfinge (Sphinx) Solving the Riddles at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 467–476. Springer, Heidelberg (2006), DOI: http://dx.doi.org/10.1007/11878773_52

    Chapter  Google Scholar 

  2. Amaral, C., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C.: Priberam’s Question Answering System for Portuguese. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 410–419. Springer, Heidelberg (2006), DOI: http://dx.doi.org/10.1007/11878773_46

    Chapter  Google Scholar 

  3. Soares da Silva, A.: Integrando a variação social e métodos quantitativos na investigação sobre linguagem e cognição: para uma sociolinguística cognitiva do português europeu e brasileiro. Revista de Estudos da Linguagem 16, 49–81 (2008), http://relin.letras.ufmg.br/revista/upload/02-Augusto_Soares.pdf

    Google Scholar 

  4. João Almeida, J., Santos, A., Simões, A.: Bigorna – a toolkit for orthography migration challenges. In: Proceedings of the Seventh International Conference on LREC 2010, Valletta, Malta, ELRA, pp. 227–232 (May 2010), http://www.lrec-conf.org/proceedings/lrec2010/pdf/898_Paper.pdf

  5. Diário da República - 1 Série-A: Decreto da Presidência da República 43/91 de 23 de Agosto de 1991 - Ratifica o Acordo Ortográfico da Língua Portuguesa de 1990. Imprensa Nacional, Lisboa (1991), http://dre.pt/pdf1sdip/1991/08/193a00/43704388.PDF

  6. Carvalho, G., de Matos, D.M., Rocio, V.: Document Retrieval for Question Answering: A Quantitative Evaluation of Text Preprocessing. In: Proceedings of PIKM 2007, Lisboa, Portugal, November 5-10, pp. 125–130. ACM (2007) ISBN: 978-1-59593-832-9, DOI: http://dx.doi.org/10.1145/1316874.1316894

  7. Alves, M.A.: Engenharia do Léxico Computacional: princípios, tecnologia e o caso das palavras compostas. Master’s thesis, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Lisboa, Portugal (2002)

    Google Scholar 

  8. Lince - Conversor para a nova ortografia: (ILTEC - Instituto de linguística teórica e computacional) (October 20, 2011), http://www.portaldalinguaportuguesa.org/?action=lince&page=main

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carvalho, G., Falé, I., de Matos, D.M., Rocio, V. (2012). Searching a Mixed Corpus in the Light of the New Portuguese Orthographic Norm. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28885-2_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28884-5

  • Online ISBN: 978-3-642-28885-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics