Entropy-Based Model for Estimating Veracity of Topics from Tweets

Paryani, Jyotsna; T.K., Ashwin Kumar; George, K. M.

doi:10.1007/978-3-319-67077-5_40

Jyotsna Paryani¹⁸,
Ashwin Kumar T.K.¹⁸ &
K. M. George¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10449))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1860 Accesses
4 Citations

Abstract

Micro-blogging sites like Twitter have gained tremendous growth and importance because these platforms allow users to share their experiences and opinions on various issues as they occur. Since tweets can cover a wide-range of domains many applications analyze them for knowledge extraction and prediction. As its popularity and size increase the veracity of the social media data itself becomes a concern. Applications processing social media data usually make the assumption that all information on social media are truthful and reliable. The integrity of data, data authenticity, trusted origin, trustworthiness are some of the aspects of trust-worthy data. This paper proposes an entropy-based model to estimate the veracity of topics in social media from truthful vantage point. Two existing big data veracity models namely, OTC model (Objectivity, Truthfulness, and Credibility) and DGS model (Diffusion, Geographic and Spam indices) are compared with the proposed model. The proposed model is a bag-of-words model based on keyword distribution, while OTC depends on word sentiment and DGS depends on tweet distribution and the content. For analysis, data from three domains (flu, food poisoning and politics) were used. Our experiments suggest that the approach followed for model definition impacts the resulting measures in ranking of topics, while all measures can place the topics in a veracity spectrum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Quan-Haase, A., Martin, K., McCay-Peet, L.: Networks of digital humanities scholars: the informational and social uses and gratifications of Twitter. Big Data Soc. 2(1), 1–12 (2015)
Article Google Scholar
Felt, M.: Social media and the social sciences: how researchers employ Big Data analytics. Big Data Soc. 3(1), 2053951716645828 (2016)
Article Google Scholar
Gantz, J., Reinsel, D.: The digital universe in 2020: Big Data, bigger digital shadows, and biggest growth in the far east. In: IDC iView: IDC Analyze the Future 2007, pp. 1–16 (2012)
Google Scholar
Demchenko, Y., Ngo, C., de Laat, C., Membrey, P., Gordijenko, D.: Big Security for Big Data: addressing security challenges for the Big Data infrastructure. In: Jonker, W., Petković, M. (eds.) Workshop on Secure Data Management. LNCS, vol. 8425, pp. 76–94. Springer, Cham (2014). doi:10.1007/978-3-319-06811-4_13
Chapter Google Scholar
The Four V’s of Big Data, IBM Big Data and Analytics Hub. http://www.ibmbigdatahub.com/infographic/four-vsbig-data. Accessed 11 June 2017
Eembi, N.B.C., Ishak, I.B., Sidi, F., Affendey, L.S., Mamat, A.: A systematic review on the profiling of digital news portal for Big Data veracity. Procedia Comput. Sci. 72, 390–397 (2015)
Article Google Scholar
Yin, S., Kaynak, O.: Big Data for modern industry: challenges and trends [point of view]. Proc. IEEE 103(2), 143–146 (2015)
Article Google Scholar
TextBlob. https://pypi.python.org/pypi/textblob. Accessed 04 May 2017
Lukoianova, T., Rubin, V.L.: Veracity roadmap: is Big Data objective, truthful and credible? Adv. Classif. Res. Online 24, 4–15 (2014)
Article Google Scholar
Ashwin, K.T., Kammarpally, P., George, K.M.: Veracity of information in twitter data: a case study. In: International Conference on Big Data and Smart Computing BigComp, pp. 129–136. IEEE (2016)
Google Scholar
Sänger, J., Richthammer, C., Hassan, S., Pernul, G.: Trust and Big Data: a roadmap for research. In: 25th International Workshop on Database and Expert Systems Applications DEXA, pp. 278–282. IEEE (2014)
Google Scholar
Reed, C.: Latent Dirichlet allocation: towards a deeper understanding. http://obphio.us/pdfs/lda_tutorial.pdf. Accessed 04 May 2017
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Article Google Scholar
Chuois, S.: Probabilistic latent semantic analysis. http://mlg.postech.ac.kr/~seungjin/courses/ml/handouts/handout06.pdf. Accessed 04 May 2017
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM, New York (1999)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput Commun. Rev. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar
Tapia, A.H., Moore, K.A., Johnson, N.J.: Beyond the trustworthy tweet: a deeper understanding of microblogged data use by disaster response and humanitarian relief organizations. In: Proceedings of the 10th International ISCRAM Conference, Baden-Baden, pp. 770–779 (2013)
Google Scholar
Moser, P.K.: Philosophy After Objectivity: Making Sense in Perspective. Oxford University Press on Demand, New York (1993)
Book Google Scholar
A-Z Index for Foodborne Illness: CDC (2016). https://www.cdc.gov/foodsafety/diseases/index.html. Accessed 04 May 2017
Buller, D.B., Burgoon, J.K.: Interpersonal deception theory. Commun. Theory 6(3), 203–242 (1996)
Article Google Scholar
Zhou, L., Burgoon, J.K., Nunamaker, J.F., Twitchell, D.: Automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communications. Group Decis. Negot. 13(1), 81–106 (2004)
Article Google Scholar
Davies, M.: The Corpus of Contemporary American English (COCA): 400+ million words, 1990–2012. http://www.americancorpus.org. Accessed 04 May 2017
OnTheIssues Home page. http://www.ontheissues.org/default.htm. Accessed 04 May 2017
CDC: Influenza Flu. https://www.cdc.gov/flu/index.htm. Accessed 04 May 2017

Download references

Author information

Authors and Affiliations

Computer Science Department, Oklahoma State University, Stillwater, OK, USA
Jyotsna Paryani, Ashwin Kumar T.K. & K. M. George

Authors

Jyotsna Paryani
View author publications
You can also search for this author in PubMed Google Scholar
Ashwin Kumar T.K.
View author publications
You can also search for this author in PubMed Google Scholar
K. M. George
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. M. George .

Editor information

Editors and Affiliations

Department of Information Systems, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Department of Computer Science, University of Cyprus, Nicosia, Cyprus
George A. Papadopoulos
Department of Information Systems, Gdynia Maritime University, Gdynia, Poland
Piotr Jędrzejowicz
Department of Information Systems, Faculty of Computer Science and Management, Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński
Department of Information Systems, University of Münster, Münster, Germany
Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paryani, J., T.K., A.K., George, K.M. (2017). Entropy-Based Model for Estimating Veracity of Topics from Tweets. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-67077-5_40
Published: 07 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67076-8
Online ISBN: 978-3-319-67077-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics