Skip to main content

Social Media Analytics, Types and Methodology

  • Chapter
  • First Online:
Book cover Machine Learning Paradigms

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 1))

Abstract

The rapid growth of Social Media Networks (SMN) initiated a new era for data analytics. We use various data mining and machine learning algorithms to analyze different types of data generated within these complex networks, attempting to produce usable knowledge. When engaging in descriptive analytics, we utilize data aggregation and mining techniques to provide an insight into the past or present, describing patterns, trends, incidents etc. and try to answer the question “What is happening or What has happened”. Diagnostic analytics come with a pack of techniques that act as tracking/monitoring tools aiming to understand “Why something is happening or Why it happened”. Predictive analytics come with a variety of forecasting techniques and statistical models, which combined, produce insights for the future, hopefully answering “What could happen”. Prescriptive analytics, utilize simulation and optimization methodologies and techniques to generate a helping/support mechanism, answering the question “What should we do”. In order to perform any type of analysis, we first need to identify the correct sources of information. Then, we need APIs to initialize data extraction. Once data are available, cleaning and preprocessing are performed, which involve dealing with noise, outliers, missing values, duplicate data and aggregation, discretization, feature selection, feature extraction, sampling. The next step involves analysis, depending on the Social Media Analytics (SMA) task, the choice of techniques and methodologies varies (e.g. similarity, clustering, classification, link prediction, ranking, recommendation, information fusion). Finally, it comes to human judgment to meaningfully interpret and draw valuable knowledge from the output of the analysis step. This chapter discusses these concepts elaborating on and categorizing various mining tasks (supervised and unsupervised) while presenting the required process and its steps to analyze data retrieved from the Social Media (SM) ecosystem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. G. Bello-Orgaz, J.J. Jung, D. Camacho, Social big data: recent achievements and new challenges. Inf. Fusion 28, 45–59 (2016)

    Article  Google Scholar 

  2. M. Kivelä, A. Arenas, M. Barthelemy, J.P. Gleeson, Y. Moreno, M.A. Porter, Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)

    Article  Google Scholar 

  3. J. Han, in International Conference on Discovery Science. Mining Heterogeneous Information Networks by Exploring the Power of Links (Springer, Berlin, Heidelberg, Oct 2009), pp. 13–30

    Chapter  Google Scholar 

  4. Y. Sun, J. Han, Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explor. Newsl. 14(2), 20–28 (2013)

    Article  Google Scholar 

  5. C. Shi, Y. Li, J. Zhang, Y. Sun, S.Y. Philip, A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)

    Article  Google Scholar 

  6. Y. Sun, B. Norick, J. Han, X. Yan, P.S. Yu, X. Yu, Pathselclus: integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Trans. knowl. Discov. Data (TKDD) 7(3), 11 (2013)

    Google Scholar 

  7. X. Kong, P.S. Yu, Y. Ding, D.J. Wild, Meta path-based collective classification in heterogeneous information networks, in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, (ACM, Oct 2012), pp. 1567–1571

    Google Scholar 

  8. C. Shi, X. Kong, P.S. Yu, S. Xie, B. Wu, Relevance search in heterogeneous networks, in Proceedings of the 15th International Conference on Extending Database Technology (ACM, Mar 2012), pp. 180–191

    Google Scholar 

  9. Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, T. Wu, Rankclus: integrating clustering with ranking for heterogeneous information network analysis, in Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (ACM, Mar 2009), pp. 565–576

    Google Scholar 

  10. Y. Sun, J. Han, X. Yan, P.S. Yu, T. Wu, Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)

    Google Scholar 

  11. A. Banerjee, T. Bandyopadhyay, P. Acharya, Data analytics: hyped up aspirations or true potential? Vikalpa 38(4), 1–12 (2013)

    Article  Google Scholar 

  12. M. Minelli, M. Chambers, A. Dhiraj, Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses (Wiley, 2012)

    Google Scholar 

  13. T. Bayrak, A review of business analytics: a business enabler or another passing fad. Procedia—Soc. Behav. Sci. 195, 230–239 (2015)

    Article  Google Scholar 

  14. A. Abbasi, W. Li, V. Benjamin, S. Hu, H. Chen, Descriptive analytics: examining expert hackers in web forums, in 2014 IEEE Joint Intelligence and Security Informatics Conference (JISIC) (IEEE, Sept 2014), pp. 56–63

    Google Scholar 

  15. G.F. Khan, Seven Layers of Social Media Analytics: Mining Business Insights from Social Media Text, Actions, Networks, Hyperlinks, Apps, Search Engine, and Location Data (2015)

    Google Scholar 

  16. M.A. Waller, S.E. Fawcett, Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. J. Bus. Logist. 34(2), 77–84 (2013)

    Article  Google Scholar 

  17. D. Bertsimas, N. Kallus, (2014). From predictive to prescriptive analytics. arXiv:1402.5481

  18. T. Condie, P. Mineiro, N. Polyzotis, M. Weimer, Machine learning on big data, in 2013 IEEE 29th International Conference on Data Engineering (ICDE) (IEEE, Apr 2013), pp. 1242–1244

    Google Scholar 

  19. G. George, M.R. Haas, A. Pentland, Big Data and Management (2014)

    Google Scholar 

  20. P. Gundecha, H. Liu, Mining social media: a brief introduction. Tutorials in Operations Research (2012), p. 1

    Google Scholar 

  21. D.M. Boyd, N.B. Ellison, Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007)

    Article  Google Scholar 

  22. R. Zafarani, M.A. Abbasi, H. Liu, Social Media Mining: an Introduction (Cambridge University Press, 2014)

    Google Scholar 

  23. C.P. Chen, C.Y. Zhang, Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  24. M.T. Thai, W. Wu, H. Xiong (eds.), Big Data in Complex and Social Networks (CRC Press, 2016)

    Google Scholar 

  25. G. Li, B.C. Ooi, J. Feng, J. Wang, L. Zhou, EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (ACM, Jun 2008), pp. 903–914

    Google Scholar 

  26. Digital image. Altova. Web. 2. https://www.altova.com/mapforce/data-sorting.html. Accessed 02 Nov 2016

  27. R.Y. Wang, V.C. Storey, C.P. Firth, A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 4, 623–640 (1995)

    Article  Google Scholar 

  28. H. Xiong, G. Pandey, M. Steinbach, V. Kumar, Enhancing data analysis with noise removal. IEEE Trans. Knowl. Data Eng. 18(3), 304–319 (2006)

    Article  Google Scholar 

  29. X.Zhu, X.Wu (2004). Class noise versus attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210

    Google Scholar 

  30. M.I. Petrovskiy, Outlier detection algorithms in data mining systems. Program. Comput. Softw. 29(4), 228–237 (2003)

    Article  Google Scholar 

  31. S. Vijendra, P. Shivani, Robust Outlier Detection Technique in Data Mining: A Univariate Approach (2014). arXiv:1406.5074

  32. J.W. Grzymala-Busse, J.W. Grzymala-Busse, Handling missing attribute values, in Data Mining and Knowledge Discovery Handbook, (Springer, Boston, MA, 2009), pp. 33–51

    Chapter  MATH  Google Scholar 

  33. J.J. Tamilselvi, C.B. Gifta, Handling duplicate data in data warehouse for data mining. Int. J. Comput. Appl. (0975–8887) 15(4), 1–9 (2011)

    Google Scholar 

  34. Y. Sun, J. Han, Mining heterogeneous information networks: principles and methodologies. Synth. Lect. Data Min. Knowl. Discov. 3(2), 1–159 (2012)

    Article  Google Scholar 

  35. R.W. Floyd, Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)

    Article  Google Scholar 

  36. H. Lietz, Watts, Duncan J./Strogatz, Steven H. (1998). Collective dynamics of  small-world  networks. Nature 393, S. 440–442. Schlüsselwerke der Netzwerkforschung (Springer VS, Wiesbaden), pp. 551–553

    Google Scholar 

  37. D.J. Watts, Networks, dynamics, and the small-world phenomenon. Am. J. Sociol. 105(2), 493–527 (1999)

    Article  Google Scholar 

  38. A.L. Barabási, R. Albert, Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Google Scholar 

  39. Computer science bibliography (dblp). http://dblp.uni-trier.de/. Accessed 15 Oct 2016

  40. A.K. Jain, Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  41. Y. Kanellopoulos, P. Antonellis, C. Tjortjis, C. Makris, N. Tsirakis, k-Attractors: a partitional clustering algorithm for numeric data analysis. Appl. Artif. Intell. 25(2), (2011), pp. 97–115

    Article  Google Scholar 

  42. P. Tzirakis, C. Tjortjis, T3C: Improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv. Data Anal. Classif. 11(2), 353–370 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  43. J. Lafferty, A. McCallum, F.C. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data (2001)

    Google Scholar 

  44. D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  45. A. Popescul, L.H. Ungar, Statistical relational learning for link prediction, in IJCAI Workshop on Learning Statistical Models from Relational Data, vol. 2003, (2003, August)

    Google Scholar 

  46. Y. Sun, Y. Yu, J. Han, Ranking-based clustering of heterogeneous information networks with star network schema, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Jun 2009), pp. 797–806

    Google Scholar 

  47. O. Nalmpantis, C. Tjortjis, The 50/50 recommender: a method incorporating personality into movie recommender systems, in Proceedings of 8th International Conference on Engineering Applications of Neural Networks (EANN 17), Communications in Computer and Information Science (CCIS) 744, (Springer, 2017), pp. 1–10

    Google Scholar 

  48. V.C. Gerogiannis, A. Karageorgos, L. Liu, C. Tjortjis, Personalised fuzzy recommendation for high involvement products in IEEE International Conference on Systems, Man, and Cybernetics (SMC 2013), (2013) pp. 4884–4890

    Google Scholar 

  49. C. Luo, W. Pang, Z. Wang, C. Lin, Hete-cf: Social-based collaborative filtering recommendation using heterogeneous relations, in 2014 IEEE International Conference on Data Mining (ICDM), (IEEE Dec 2014), pp. 917–922

    Google Scholar 

  50. N. Srebro, T. Jaakkola, Weighted low-rank approximations, in Proceedings of the 20th International Conference on Machine Learning (ICML-03) (2003), pp. 720–727

    Google Scholar 

  51. X. Yang, H. Steck, Y. Liu, Circle-based recommendation in online social networks, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2012), pp. 1267–1275

    Google Scholar 

  52. Y.K. Shih, S. Parthasarathy, Scalable global alignment for multiple biological networks, in BMC Bioinformatics vol. 13, no. 3, (BioMed Central, Dec. 2012), p. S11

    Google Scholar 

  53. A. Doan, J. Madhavan, P. Domingos, A. Halevy, Ontology matching: a machine learning approach, in Handbook on Ontologies (Springer, Berlin, Heidelberg, 2004), pp. 385–403

    Chapter  Google Scholar 

  54. L.C. Freeman, Centrality in social networks conceptual clarification. Soc. Netw. 1(3), 215–239 (1978)

    Article  Google Scholar 

  55. P.D. Straffin, Linear algebra in geography: eigenvectors of networks. Math. Mag. 53(5), 269–276 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  56. L. Katz, A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)

    Article  MATH  Google Scholar 

  57. L. Page, S. Brin, R. Motwani, T. Winograd,  The Pagerank Citation Ranking: Bringing Order to the Web (Stanford InfoLab, 1999)

    Google Scholar 

  58. L.C. Freeman, A set of measures of centrality based on betweenness. Sociom 35–41 (1977)

    Article  Google Scholar 

  59. J. Kuck, H. Zhuang, X. Yan, H. Cam, J. Han, Query-based outlier detection in heterogeneous information networks, in Advances in database technology: proceedings, in International Conference on Extending Database Technology, vol. 2015 (NIH Public Access, Mar 2015), p. 325

    Google Scholar 

  60. G. Sabidussi, The centrality index of a graph. Psychometrika 31(4), 581–603 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  61. C. Ni, C. Sugimoto, J. Jiang, Degree, closeness, and betweenness: application of group centrality measurements to explore macro-disciplinary evolution diachronically, in Proceedings of ISSI (2011), pp. 1–13

    Google Scholar 

  62. F. Lorrain, H.C. White, Structural equivalence of individuals in social networks. J. Math. Sociol. 1(1), 49–80 (1971)

    Article  Google Scholar 

  63. A. Rawashdeh, M. Rawashdeh, I. Díaz, A. Ralescu, Measures of semantic similarity of nodes in a social network, in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (Springer, Cham, Jul 2014), pp. 76–85

    Chapter  MATH  Google Scholar 

  64. P. Jaccard, Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull Soc. Vaudoise Sci. Nat. 37, 241–272 (1901)

    Google Scholar 

  65. I.H. Witten, Data mining with weka (Department of Computer Science University of Waikato, New Zealand, Class Lesson, 2013)

    Google Scholar 

  66. T. Bodnar, M. Salathé, Validating models for disease detection using twitter, in Proceedings of the 22nd International Conference on World Wide Web (ACM, May 2013), pp. 699–702

    Google Scholar 

  67. M. Ji, Y. Sun, M. Danilevsky, J. Han, J. Gao, Graph regularized transductive classification on heterogeneous information networks, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, Heidelberg, Sep 2010), pp. 570–586

    Chapter  Google Scholar 

  68. C. Luo, R. Guan, Z. Wang, C. Lin, Hetpathmine: a novel transductive classification algorithm on heterogeneous information networks, in European Conference on Information Retrieval (Springer, Cham, Apr 2014), pp. 210–221

    Google Scholar 

  69. R.G. Rossi, T. de Paulo Faleiros, A. de Andrade Lopes, S.O. Rezende, Inductive model generation for text categorization using a bipartite heterogeneous network, in 2012 IEEE 12th International Conference on Data Mining (ICDM) (IEEE, Dec 2012), pp. 1086–1091

    Google Scholar 

  70. R. Angelova, G. Kasneci, G. Weikum, Graffiti: graph-based classification in heterogeneous networks. World Wide Web 15(2), 139–170 (2012)

    Article  Google Scholar 

  71. X. Kong, B. Cao, P.S. Yu, (2013, August). Multi-label classification by mining label and instance correlations from heterogeneous information networks, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2013), pp. 614–622

    Google Scholar 

  72. Y. Zhou, L. Liu, Activity-edge centric multi-label classification for mining heterogeneous information networks, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (ACM, Aug 2014), pp. 1276–1285

    Google Scholar 

  73. S.D. Chen, Y.Y. Chen, J. Han, P. Moulin, A feature-enhanced ranking-based classifier for multimodal data and heterogeneous information networks, in 2013 IEEE 13th International Conference on Data Mining (ICDM) (IEEE, Dec 2013), pp. 997–1002

    Google Scholar 

  74. S. Jendoubi, A. Martin L. Liétard, B.B. Yaghlane, Classification of message spreading in a heterogeneous social network, in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (Springer, Cham, July 2014), pp. 66–75

    Chapter  Google Scholar 

  75. S.S. Choi, S.H. Cha, C.C. Tappert, A survey of binary similarity and distance measures. J. Syst., Cybern. Inform. 8(1), 43–48 (2010)

    Google Scholar 

  76. J. Shi, J. Malik, Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  77. Y. Zhou, H. Cheng, J.X. Yu, Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)

    Article  Google Scholar 

  78. M. Sales-Pardo, R. Guimera, A.A. Moreira, L.A.N. Amaral, Extracting the hierarchical organization of complex systems. Proc. Natl. Acad. Sci. 104(39), 15224–15229 (2007)

    Article  Google Scholar 

  79. C.C. Aggarwal, Y. Xie, P.S. Yu, Towards community detection in locally heterogeneous networks, in Proceedings of the 2011 SIAM International Conference on Data Mining (Society for Industrial and Applied Mathematics, Apr 2011), (pp. 391–402)

    Google Scholar 

  80. G.J. Qi, C.C. Aggarwal, T.S. Huang, On clustering heterogeneous social media objects with outlier links, in Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (ACM, Feb 2012), (pp. 553–562)

    Google Scholar 

  81. J.D. Cruz, C. Bothorel, Information integration for detecting communities in attributed graphs, in  2013 Fifth International Conference on Computational Aspects of Social Networks (CASoN) (IEEE, Aug 2013), pp. 62–67

    Google Scholar 

  82. M.Z. Ratajczak, M. Kucia, M. Majka, R. Reca, J. Ratajczak, Heterogeneous populations of bone marrow stem cells–are we spotting on the same cells from the different angles? Folia Histochem. Cytobiol. 42(3), 139–146 (2004)

    Google Scholar 

  83. H. Deng, J. Han, B. Zhao, Y. Yu, C.X. Lin, Probabilistic topic models with biased propagation on heterogeneous information networks, in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (ACM, Aug 2011), pp. 1271–1279

    Google Scholar 

  84. Q. Wang, Z. Peng, F. Jiang, Q. Li, LSA-PTM: a propagation-based topic model using latent semantic analysis on heterogeneous information networks, in International Conference on Web-Age Information Management (Springer, Berlin, Heidelberg, June 2013), (pp. 13–24)

    Chapter  Google Scholar 

  85. X. Wang, C. Zhai, X. Hu, R. Sproat, Mining correlated bursty topic patterns from coordinated text streams, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2007), (pp. 784–793)

    Google Scholar 

  86. R. Wang, C. Shi, S. Y. Philip, B. Wu, Integrating clustering and ranking on hybrid heterogeneous information network, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, Berlin, Heidelberg, Apr 2013), pp. 583–594

    Chapter  Google Scholar 

  87. C. Shi, R. Wang, Y. Li, P.S. Yu, B. Wu, Ranking-based clustering on general heterogeneous information networks by network projection, in Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (ACM, Nov 2014) (pp. 699–708)

    Google Scholar 

  88. J. Chen, W. Dai, Y. Sun, J. Dy, Clustering and ranking in heterogeneous information networks via gamma-poisson model, in Proceedings of the 2015 SIAM International Conference on Data Mining (Society for Industrial and Applied Mathematics, June 2015), (pp. 424–432)

    Google Scholar 

  89. C. Wang, J. Liu, N. Desai, M. Danilevsky, J. Han, Constructing topical hierarchies in heterogeneous information networks. Knowl. Inf. Syst. 44(3), 529–558 (2015)

    Article  Google Scholar 

  90. C. Qiu, W. Chen, T. Wang, K. Lei, Overlapping community detection in directed heterogeneous social network, in International Conference on Web-Age Information Management (Springer, Cham, June 2015), (pp. 490–493)

    Chapter  Google Scholar 

  91. M. Gupta, J. Gao, J. Han, Community distribution outlier detection in heterogeneous information networks, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, Heidelberg, Sept 2013), pp. 557–573

    Chapter  Google Scholar 

  92. H. Zhuang, J. Zhang, G. Brova, J. Tang, H. Cam, X. Yan, J. Han, Mining query-based subnetwork outliers in heterogeneous information networks, in 2014 IEEE International Conference on Data Mining (ICDM) (IEEE, Dec 2014), (pp. 1127–1132)

    Google Scholar 

  93. P. Gundecha, H. Liu, Mining social media: a brief introduction. In New Directions in Informatics, Optimization, Logistics, and Production (Informs, 2012) (pp. 1–17)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christos Tjortjis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Koukaras, P., Tjortjis, C. (2019). Social Media Analytics, Types and Methodology. In: Tsihrintzis, G., Virvou, M., Sakkopoulos, E., Jain, L. (eds) Machine Learning Paradigms. Learning and Analytics in Intelligent Systems, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-030-15628-2_12

Download citation

Publish with us

Policies and ethics