Skip to main content

A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Abstract

Psycholinguistic properties of words have been used in various approaches to Natural Language Processing tasks, such as text simplification and readability assessment. Most of these properties are subjective, involving costly and time-consuming surveys to be gathered. Recent approaches use the limited datasets of psycholinguistic properties to extend them automatically to large lexicons. However, some of the resources used by such approaches are not available to most languages. This study presents a method to infer psycholinguistic properties for Brazilian Portuguese (BP) using regressors built with a light set of features usually available for less resourced languages: word length, frequency lists, lexical databases composed of school dictionaries and word embedding models. The correlations between the properties inferred are close to those obtained by related works. The resulting resource contains 26,874 words in BP annotated with concreteness, age of acquisition, imageability and subjective frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    websites.psychology.uwa.edu.au/school/MRCDatabase/mrc2.html.

  2. 2.

    www.lexicodoportugues.com.

  3. 3.

    celex.mpi.nl/.

  4. 4.

    http://nilc.icmc.usp.br/portlex/index.php/en/psycholinguistic.

References

  1. Graesser, A.C., McNamara, D.S.: Computational analyses of multilevel discourse comprehension. Top. Cogn. Sci. 3(2), 371–98 (2011)

    Article  Google Scholar 

  2. Cameirao, M.L., Vicente, S.G.: Age-of-acquisition norms for a set of 1,749 Portuguese words. Behav. Res. Meth. 42(2), 474–480 (2010)

    Article  Google Scholar 

  3. Fellbaum, C.: Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  4. Feng, S., Cai, Z., Crossley, S.A., McNamara, D.S.: Simulating human ratings on word concreteness. In: Proceedings of 24th FLAIRS Conference, pp. 1–6 (2011)

    Google Scholar 

  5. Geiger, P.: Minidicionário contemporâneo da língua portuguesa (2011)

    Google Scholar 

  6. Janczura, G., Castilho, G., Rocha, N., van Erven, T., Huang, T.: Normas de concretude para 909 palavras da língua portuguesa. Psicologia: Teoria e Pesquisa, pp. 195–204 (2007)

    Google Scholar 

  7. Marques, J.F., Fonseca, F.L., Morais, S., Pinto, I.A.: Estimated age of acquisition norms for 834 Portuguese nouns and their relation with other psycholinguistic variables. Behav. Res. Meth. 39(3), 439–444 (2007)

    Google Scholar 

  8. Marques, J.F.: Normas de imagética e concreteza para substantivos comuns. Laboratório de Psicologia 3, 65–75 (2005)

    Google Scholar 

  9. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  10. Paetzold, G., Specia, L.: Collecting and exploring everyday language for predicting psycholinguistic properties of words. In: Proceedings of COLING 2016, pp. 1669–1679 (2016)

    Google Scholar 

  11. Paetzold, G.H., Specia, L.: Inferring psycholinguistic properties of words. In: Proceedings of NAACL-HLT 2016, pp. 435–440 (2016)

    Google Scholar 

  12. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)

    Google Scholar 

  13. Singh, A.D., Mehta, P., Husain, S., Rajkumar, R.: Quantifying sentence complexity based on eye-tracking measures (2016)

    Google Scholar 

  14. Soares, A.P., Costa, A.S., Machado, J., Comesana, M., Oliveira, H.M.: The Minho Word Pool: norms for imageability, concreteness, and subjective frequency for 3,800 portuguese words. Behav. Res. Meth. 49(3), 1065–1081 (2017)

    Article  Google Scholar 

  15. Tang, K.: A 61 million word corpus of Brazilian Portuguese film subtitles as a resource for linguistic research. UCL Work Pap. Linguist. 24, 208–214 (2012)

    Google Scholar 

  16. Vajjala, S., Meurers, D.: Readability assessment for text simplification from analysing documents to identifying sentential simplifications. Recent Adv. Autom. Readability Assess. Text Simplification 165(2), 194–222 (2014)

    Google Scholar 

  17. Vajjala, S., Meurers, D.: Readability-based sentence ranking for evaluating text simplification. arXiv preprint arXiv:1603.06009 (2016)

  18. Xu, C., Tao, D., Xu, C.: A survey on multi-view learning. arXiv preprint: arXiv:1304.5634 (2013)

Download references

Acknowledgments

This work was supported by CNPq and FAPESP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leandro Borges dos Santos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

dos Santos, L.B., Duran, M.S., Hartmann, N.S., Candido, A., Paetzold, G.H., Aluisio, S.M. (2017). A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics