Skip to main content

MBLA Social Corpus

Multipurpose Multidimensional Corpus on Cyber-Language

  • Conference paper
  • First Online:
Computational and Corpus-Based Phraseology (EUROPHRAS 2019)

Abstract

Technological advances have made it possible for areas such as Corpus Linguistics and Computational Linguistics to advance exponentially. However, the basic evolution followed by corpora, as an essential tool in these areas, has been fundamentally in size. Proof of this is the Google nGram project, which has digitized a vast number of books from 1505 to the present day, allowing studies to be carried out on corpora. However, and as a result of the continuous evolution of new communication media and social networks, we have witnessed the birth of a new genre, called cyber-language, situated between orality and textuality, of which there are no specialized corpora. Our proposal is to design a tool to create a large multidimensional corpus based on the social network Twitter and a set of specific tools to generate subcorpora, conduct quantitative studies and visualize the stored information, from the perspective of bigdata manipulation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    More information on https://www.ibm.com/internet-of-things.

  2. 2.

    An API is a set of commands, functions, protocols, and objects that programmers can use to create software or interact with an external system. It provides developers with standard commands for performing common operations, thus they do not have to write the code from scratch.

  3. 3.

    More information on https://developer.twitter.com/en/docs/api-reference-index.

  4. 4.

    More information on https://www.ibm.com/analytics/hadoop/mapreduce.

  5. 5.

    More information on http://www.nltk.org/ .

References

  1. Michel, J., Shen, Y., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)

    Article  Google Scholar 

  2. Zieba, A.: Google books Ngram viewer in socio-cultural research. Res. Lang. 16, 357–376 (2018). https://doi.org/10.2478/rela-2018-0015

    Article  Google Scholar 

  3. Naveed, A., Aziz, S., Mehfooz, M.: Analysis of cyber language: identifying gender boundaries. Eur. Acad. Res. II(7), 9706–9724 (2014)

    Google Scholar 

  4. Anthony, L., Hardaker, C.: FireAnt (1.1.3) [Computer Software]. Waseda University, Tokio (2019). http://www.laurenceanthony.net/. Accessed 06 July 2019

  5. Morstatter, F., Pfeffer, J., Liu, H., Carley, K.: Is the sample good enough? Comparing data from Twitter’s Streaming API with Twitter’s Firehose. Association for the Advancement of Artificial Intelligence arXiv:1306.5204 (2013)

  6. Church, K.: Corpus methods in a digitized world, pp. 3–15 (2017). https://doi.org/10.1007/978-3-319-69805-2_1

    Chapter  Google Scholar 

  7. Maroto, A.: Big Data, Twitter and Music: New paths in research. https://www.researchgate.net/publication/331479188. Accessed 14 Jan 2019

  8. Maroto, A.: El metadiscurso en las redes sociales: Una extensión multidimensional. Análisis de cinco dirigentes políticos de la coalición Ahora Podemos a través de la red social Twitter. https://www.researchgate.net/publication/331479188. Accessed 14 Jan 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Álvaro L. Maroto Conde or Manuel Bermúdez Vázquez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maroto Conde, Á.L., Bermúdez Vázquez, M. (2019). MBLA Social Corpus. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2019. Lecture Notes in Computer Science(), vol 11755. Springer, Cham. https://doi.org/10.1007/978-3-030-30135-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30135-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30134-7

  • Online ISBN: 978-3-030-30135-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics