Skip to main content

Multi-aspect Entity-Centric Analysis of Big Social Media Archives

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (TPDL 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Included in the following conference series:

Abstract

Social media archives serve as important historical information sources, and thus meaningful analysis and exploration methods are of immense value for historians, sociologists and other interested parties. In this paper, we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities are reflected in social media in different time periods and under different aspects (like popularity, attitude, controversiality, and connectedness with other entities). A case study using a large Twitter archive of 4 years illustrates the insights that can be gained by such an entity-centric multi-aspect analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.internetlivestats.com/twitter-statistics/ (June 21, 2017).

  2. 2.

    http://spark.apache.org/.

  3. 3.

    https://github.com/iosifidisvasileios/Large-Scale-Entity-Analysis.

  4. 4.

    http://l3s.de/~iosifidis/tpdl2017/. For each tweet the dataset includes the following information: ID, user (encrypted), post date, extracted entities, positive and negative sentiment values. The text of the tweets is not provided for copyright purposes.

References

  1. Amigó, E., Carrillo-de-Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Meij, E., Rijke, M., Spina, D.: Overview of RepLab 2014: author profiling and reputation dimensions for online reputation management. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 307–322. Springer, Cham (2014). doi:10.1007/978-3-319-11382-1_24

    Google Scholar 

  2. Ardon, S., Bagchi, A., Mahanti, A., Ruhela, A., Seth, A., Tripathy, R.M., Triukose, S.: Spatio-temporal analysis of topic popularity in Twitter. arXiv preprint arXiv:1111.2904 (2011)

  3. Atefeh, F., Khreich, W.: A survey of techniques for event detection in twitter. Computat. Intell. 31(1) (2015)

    Google Scholar 

  4. Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools and platforms. AI & Society 30(1) (2015)

    Google Scholar 

  5. Blanco, R., Ottaviano, G., Meij, E.: Fast and space-efficient entity linking for queries. In: WSDM (2015)

    Google Scholar 

  6. Bruns, A., Stieglitz, S.: Towards more systematic Twitter analysis: metrics for tweeting activities. Internat. J. Soc. Res. Method. 16(2) (2013)

    Google Scholar 

  7. Bruns, A., Weller, K.: Twitter as a first draft of the present: and the challenges of preserving it for the future. In: 8th ACM Conference on Web Science (2016)

    Google Scholar 

  8. Celik, I., Abel, F., Houben, G.-J.: Learning semantic relationships between entities in Twitter. In: Auer, S., Díaz, O., Papadopoulos, G.A. (eds.) ICWE 2011. LNCS, vol. 6757, pp. 167–181. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22233-7_12

    Chapter  Google Scholar 

  9. Christophides, V., Efthymiou, V., Stefanidis, K.: Entity Resolution in the Web of Data. Synthesis Lectures on the SemanticWeb: Theory and Technology. Morgan & Claypool Publishers, San Rafael (2015)

    Google Scholar 

  10. Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by Wikipedia entities). In: CIKM (2010)

    Google Scholar 

  11. Guille, A., Hacid, H., Favre, C., Zighed, D.A.: Information diffusion in online social networks: a survey. ACM SIGMOD Record 42(2) (2013)

    Google Scholar 

  12. Kucuktunc, O., Cambazoglu, B.B., Weber, I., Ferhatosmanoglu, H.: A large-scale sentiment analysis for Yahoo! answers. In: WSDM (2012)

    Google Scholar 

  13. Meng, X., Wei, F., Liu, X., Zhou, M., Li, S., Wang, H.: Entity-centric topic-oriented opinion summarization in Twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2012)

    Google Scholar 

  14. Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Computat. Linguist. 2 (2014)

    Google Scholar 

  15. Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends® Inf. Retrieval 2(1–2) (2008)

    Google Scholar 

  16. Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: Identifying misinformation in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2011)

    Google Scholar 

  17. Roussakis, Y., Chrysakis, I., Stefanidis, K., Flouris, G., Stavrakas, Y.: A flexible framework for understanding the dynamics of evolving RDF datasets. In: Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P., Dumontier, M., Heflin, J., Thirunarayan, K., Staab, S. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 495–512. Springer, Cham (2015). doi:10.1007/978-3-319-25007-6_29

    Chapter  Google Scholar 

  18. Saleiro, P., Soares, C.: Learning from the news: predicting entity popularity on Twitter. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 171–182. Springer, Cham (2016). doi:10.1007/978-3-319-46349-0_15

    Chapter  Google Scholar 

  19. Sedhai, S., Sun, A.: Hspam14: A collection of 14 million tweets for hashtag-oriented spam research. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (2015)

    Google Scholar 

  20. Stefanidis, K., Koloniari, G.: Enabling social search in time through graphs. In: Web-KR@CIKM (2014)

    Google Scholar 

  21. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Am. Soc. Inform. Sci. Technol. 63(1), 163–173 (2012)

    Article  Google Scholar 

  22. Weikum, G., Spaniol, M., Ntarmos, N., Triantafillou, P., Benczúr, A., Kirkpatrick, S., Rigaux, P., Williamson, M.: Longitudinal analytics on web archive data: it’s about time! In: CIDR (2011)

    Google Scholar 

  23. Yao, J.-G., Fan, F., Zhao, W.X., Wan, X., Chang, E., Xiao, J.: Tweet timeline generation with determinantal point processes. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press (2016)

    Google Scholar 

  24. Zhao, X.W., Guo, Y., Yan, R., He, Y., Li, X.: Timeline generation with social attention. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2013)

    Google Scholar 

  25. Zimmer, M.: The Twitter Archive at the Library of Congress: Challenges for information practice and information policy. First Monday 20(7) (2015)

    Google Scholar 

Download references

Acknowledgements

The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA under grant No. 339233.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Iosifidis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Fafalios, P., Iosifidis, V., Stefanidis, K., Ntoutsi, E. (2017). Multi-aspect Entity-Centric Analysis of Big Social Media Archives. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67008-9_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67007-2

  • Online ISBN: 978-3-319-67008-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics