MBLA Social Corpus

Maroto Conde, Álvaro L.; Bermúdez Vázquez, Manuel

doi:10.1007/978-3-030-30135-4_21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11755))

Included in the following conference series:

International Conference on Computational and Corpus-Based Phraseology

750 Accesses

Abstract

Technological advances have made it possible for areas such as Corpus Linguistics and Computational Linguistics to advance exponentially. However, the basic evolution followed by corpora, as an essential tool in these areas, has been fundamentally in size. Proof of this is the Google nGram project, which has digitized a vast number of books from 1505 to the present day, allowing studies to be carried out on corpora. However, and as a result of the continuous evolution of new communication media and social networks, we have witnessed the birth of a new genre, called cyber-language, situated between orality and textuality, of which there are no specialized corpora. Our proposal is to design a tool to create a large multidimensional corpus based on the social network Twitter and a set of specific tools to generate subcorpora, conduct quantitative studies and visualize the stored information, from the perspective of bigdata manipulation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
More information on https://www.ibm.com/internet-of-things.
2.
An API is a set of commands, functions, protocols, and objects that programmers can use to create software or interact with an external system. It provides developers with standard commands for performing common operations, thus they do not have to write the code from scratch.
3.
More information on https://developer.twitter.com/en/docs/api-reference-index.
4.
More information on https://www.ibm.com/analytics/hadoop/mapreduce.
5.
More information on http://www.nltk.org/ .

References

Michel, J., Shen, Y., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Article Google Scholar
Zieba, A.: Google books Ngram viewer in socio-cultural research. Res. Lang. 16, 357–376 (2018). https://doi.org/10.2478/rela-2018-0015
Article Google Scholar
Naveed, A., Aziz, S., Mehfooz, M.: Analysis of cyber language: identifying gender boundaries. Eur. Acad. Res. II(7), 9706–9724 (2014)
Google Scholar
Anthony, L., Hardaker, C.: FireAnt (1.1.3) [Computer Software]. Waseda University, Tokio (2019). http://www.laurenceanthony.net/. Accessed 06 July 2019
Morstatter, F., Pfeffer, J., Liu, H., Carley, K.: Is the sample good enough? Comparing data from Twitter’s Streaming API with Twitter’s Firehose. Association for the Advancement of Artificial Intelligence arXiv:1306.5204 (2013)
Church, K.: Corpus methods in a digitized world, pp. 3–15 (2017). https://doi.org/10.1007/978-3-319-69805-2_1
Chapter Google Scholar
Maroto, A.: Big Data, Twitter and Music: New paths in research. https://www.researchgate.net/publication/331479188. Accessed 14 Jan 2019
Maroto, A.: El metadiscurso en las redes sociales: Una extensión multidimensional. Análisis de cinco dirigentes políticos de la coalición Ahora Podemos a través de la red social Twitter. https://www.researchgate.net/publication/331479188. Accessed 14 Jan 2019

Download references

Author information

Authors and Affiliations

University of Córdoba, Córdoba, Spain
Álvaro L. Maroto Conde & Manuel Bermúdez Vázquez

Authors

Álvaro L. Maroto Conde
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Bermúdez Vázquez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Álvaro L. Maroto Conde or Manuel Bermúdez Vázquez .

Editor information

Editors and Affiliations

University of Malaga, Malaga, Spain
Gloria Corpas Pastor
University of Wolverhampton, Wolverhampton, UK
Ruslan Mitkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maroto Conde, Á.L., Bermúdez Vázquez, M. (2019). MBLA Social Corpus. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2019. Lecture Notes in Computer Science(), vol 11755. Springer, Cham. https://doi.org/10.1007/978-3-030-30135-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-30135-4_21
Published: 18 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30134-7
Online ISBN: 978-3-030-30135-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics