Social Media Analytics, Types and Methodology

  • Paraskevas Koukaras
  • Christos TjortjisEmail author
Part of the Learning and Analytics in Intelligent Systems book series (LAIS, volume 1)


The rapid growth of Social Media Networks (SMN) initiated a new era for data analytics. We use various data mining and machine learning algorithms to analyze different types of data generated within these complex networks, attempting to produce usable knowledge. When engaging in descriptive analytics, we utilize data aggregation and mining techniques to provide an insight into the past or present, describing patterns, trends, incidents etc. and try to answer the question “What is happening or What has happened”. Diagnostic analytics come with a pack of techniques that act as tracking/monitoring tools aiming to understand “Why something is happening or Why it happened”. Predictive analytics come with a variety of forecasting techniques and statistical models, which combined, produce insights for the future, hopefully answering “What could happen”. Prescriptive analytics, utilize simulation and optimization methodologies and techniques to generate a helping/support mechanism, answering the question “What should we do”. In order to perform any type of analysis, we first need to identify the correct sources of information. Then, we need APIs to initialize data extraction. Once data are available, cleaning and preprocessing are performed, which involve dealing with noise, outliers, missing values, duplicate data and aggregation, discretization, feature selection, feature extraction, sampling. The next step involves analysis, depending on the Social Media Analytics (SMA) task, the choice of techniques and methodologies varies (e.g. similarity, clustering, classification, link prediction, ranking, recommendation, information fusion). Finally, it comes to human judgment to meaningfully interpret and draw valuable knowledge from the output of the analysis step. This chapter discusses these concepts elaborating on and categorizing various mining tasks (supervised and unsupervised) while presenting the required process and its steps to analyze data retrieved from the Social Media (SM) ecosystem.


Social media networks Social media analytics Social media Data mining Machine learning Supervised/unsupervised learning 


  1. 1.
    G. Bello-Orgaz, J.J. Jung, D. Camacho, Social big data: recent achievements and new challenges. Inf. Fusion 28, 45–59 (2016)CrossRefGoogle Scholar
  2. 2.
    M. Kivelä, A. Arenas, M. Barthelemy, J.P. Gleeson, Y. Moreno, M.A. Porter, Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)CrossRefGoogle Scholar
  3. 3.
    J. Han, in International Conference on Discovery Science. Mining Heterogeneous Information Networks by Exploring the Power of Links (Springer, Berlin, Heidelberg, Oct 2009), pp. 13–30CrossRefGoogle Scholar
  4. 4.
    Y. Sun, J. Han, Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explor. Newsl. 14(2), 20–28 (2013)CrossRefGoogle Scholar
  5. 5.
    C. Shi, Y. Li, J. Zhang, Y. Sun, S.Y. Philip, A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)CrossRefGoogle Scholar
  6. 6.
    Y. Sun, B. Norick, J. Han, X. Yan, P.S. Yu, X. Yu, Pathselclus: integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Trans. knowl. Discov. Data (TKDD) 7(3), 11 (2013)Google Scholar
  7. 7.
    X. Kong, P.S. Yu, Y. Ding, D.J. Wild, Meta path-based collective classification in heterogeneous information networks, in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, (ACM, Oct 2012), pp. 1567–1571Google Scholar
  8. 8.
    C. Shi, X. Kong, P.S. Yu, S. Xie, B. Wu, Relevance search in heterogeneous networks, in Proceedings of the 15th International Conference on Extending Database Technology (ACM, Mar 2012), pp. 180–191Google Scholar
  9. 9.
    Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, T. Wu, Rankclus: integrating clustering with ranking for heterogeneous information network analysis, in Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (ACM, Mar 2009), pp. 565–576Google Scholar
  10. 10.
    Y. Sun, J. Han, X. Yan, P.S. Yu, T. Wu, Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)Google Scholar
  11. 11.
    A. Banerjee, T. Bandyopadhyay, P. Acharya, Data analytics: hyped up aspirations or true potential? Vikalpa 38(4), 1–12 (2013)CrossRefGoogle Scholar
  12. 12.
    M. Minelli, M. Chambers, A. Dhiraj, Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses (Wiley, 2012)Google Scholar
  13. 13.
    T. Bayrak, A review of business analytics: a business enabler or another passing fad. Procedia—Soc. Behav. Sci. 195, 230–239 (2015)CrossRefGoogle Scholar
  14. 14.
    A. Abbasi, W. Li, V. Benjamin, S. Hu, H. Chen, Descriptive analytics: examining expert hackers in web forums, in 2014 IEEE Joint Intelligence and Security Informatics Conference (JISIC) (IEEE, Sept 2014), pp. 56–63Google Scholar
  15. 15.
    G.F. Khan, Seven Layers of Social Media Analytics: Mining Business Insights from Social Media Text, Actions, Networks, Hyperlinks, Apps, Search Engine, and Location Data (2015)Google Scholar
  16. 16.
    M.A. Waller, S.E. Fawcett, Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. J. Bus. Logist. 34(2), 77–84 (2013)CrossRefGoogle Scholar
  17. 17.
    D. Bertsimas, N. Kallus, (2014). From predictive to prescriptive analytics. arXiv:1402.5481
  18. 18.
    T. Condie, P. Mineiro, N. Polyzotis, M. Weimer, Machine learning on big data, in 2013 IEEE 29th International Conference on Data Engineering (ICDE) (IEEE, Apr 2013), pp. 1242–1244Google Scholar
  19. 19.
    G. George, M.R. Haas, A. Pentland, Big Data and Management (2014)Google Scholar
  20. 20.
    P. Gundecha, H. Liu, Mining social media: a brief introduction. Tutorials in Operations Research (2012), p. 1Google Scholar
  21. 21.
    D.M. Boyd, N.B. Ellison, Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007)CrossRefGoogle Scholar
  22. 22.
    R. Zafarani, M.A. Abbasi, H. Liu, Social Media Mining: an Introduction (Cambridge University Press, 2014)Google Scholar
  23. 23.
    C.P. Chen, C.Y. Zhang, Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)CrossRefGoogle Scholar
  24. 24.
    M.T. Thai, W. Wu, H. Xiong (eds.), Big Data in Complex and Social Networks (CRC Press, 2016)Google Scholar
  25. 25.
    G. Li, B.C. Ooi, J. Feng, J. Wang, L. Zhou, EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (ACM, Jun 2008), pp. 903–914Google Scholar
  26. 26.
    Digital image. Altova. Web. 2. Accessed 02 Nov 2016
  27. 27.
    R.Y. Wang, V.C. Storey, C.P. Firth, A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 4, 623–640 (1995)CrossRefGoogle Scholar
  28. 28.
    H. Xiong, G. Pandey, M. Steinbach, V. Kumar, Enhancing data analysis with noise removal. IEEE Trans. Knowl. Data Eng. 18(3), 304–319 (2006)CrossRefGoogle Scholar
  29. 29.
    X.Zhu, X.Wu (2004). Class noise versus attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210Google Scholar
  30. 30.
    M.I. Petrovskiy, Outlier detection algorithms in data mining systems. Program. Comput. Softw. 29(4), 228–237 (2003)CrossRefGoogle Scholar
  31. 31.
    S. Vijendra, P. Shivani, Robust Outlier Detection Technique in Data Mining: A Univariate Approach (2014). arXiv:1406.5074
  32. 32.
    J.W. Grzymala-Busse, J.W. Grzymala-Busse, Handling missing attribute values, in Data Mining and Knowledge Discovery Handbook, (Springer, Boston, MA, 2009), pp. 33–51zbMATHCrossRefGoogle Scholar
  33. 33.
    J.J. Tamilselvi, C.B. Gifta, Handling duplicate data in data warehouse for data mining. Int. J. Comput. Appl. (0975–8887) 15(4), 1–9 (2011)Google Scholar
  34. 34.
    Y. Sun, J. Han, Mining heterogeneous information networks: principles and methodologies. Synth. Lect. Data Min. Knowl. Discov. 3(2), 1–159 (2012)CrossRefGoogle Scholar
  35. 35.
    R.W. Floyd, Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)CrossRefGoogle Scholar
  36. 36.
    H. Lietz, Watts, Duncan J./Strogatz, Steven H. (1998). Collective dynamics of  small-world  networks. Nature 393, S. 440–442. Schlüsselwerke der Netzwerkforschung (Springer VS, Wiesbaden), pp. 551–553Google Scholar
  37. 37.
    D.J. Watts, Networks, dynamics, and the small-world phenomenon. Am. J. Sociol. 105(2), 493–527 (1999)CrossRefGoogle Scholar
  38. 38.
    A.L. Barabási, R. Albert, Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)Google Scholar
  39. 39.
    Computer science bibliography (dblp). Accessed 15 Oct 2016
  40. 40.
    A.K. Jain, Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  41. 41.
    Y. Kanellopoulos, P. Antonellis, C. Tjortjis, C. Makris, N. Tsirakis, k-Attractors: a partitional clustering algorithm for numeric data analysis. Appl. Artif. Intell. 25(2), (2011), pp. 97–115CrossRefGoogle Scholar
  42. 42.
    P. Tzirakis, C. Tjortjis, T3C: Improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv. Data Anal. Classif. 11(2), 353–370 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  43. 43.
    J. Lafferty, A. McCallum, F.C. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data (2001)Google Scholar
  44. 44.
    D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  45. 45.
    A. Popescul, L.H. Ungar, Statistical relational learning for link prediction, in IJCAI Workshop on Learning Statistical Models from Relational Data, vol. 2003, (2003, August)Google Scholar
  46. 46.
    Y. Sun, Y. Yu, J. Han, Ranking-based clustering of heterogeneous information networks with star network schema, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Jun 2009), pp. 797–806Google Scholar
  47. 47.
    O. Nalmpantis, C. Tjortjis, The 50/50 recommender: a method incorporating personality into movie recommender systems, in Proceedings of 8th International Conference on Engineering Applications of Neural Networks (EANN 17), Communications in Computer and Information Science (CCIS) 744, (Springer, 2017), pp. 1–10Google Scholar
  48. 48.
    V.C. Gerogiannis, A. Karageorgos, L. Liu, C. Tjortjis, Personalised fuzzy recommendation for high involvement products in IEEE International Conference on Systems, Man, and Cybernetics (SMC 2013), (2013) pp. 4884–4890Google Scholar
  49. 49.
    C. Luo, W. Pang, Z. Wang, C. Lin, Hete-cf: Social-based collaborative filtering recommendation using heterogeneous relations, in 2014 IEEE International Conference on Data Mining (ICDM), (IEEE Dec 2014), pp. 917–922Google Scholar
  50. 50.
    N. Srebro, T. Jaakkola, Weighted low-rank approximations, in Proceedings of the 20th International Conference on Machine Learning (ICML-03) (2003), pp. 720–727Google Scholar
  51. 51.
    X. Yang, H. Steck, Y. Liu, Circle-based recommendation in online social networks, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2012), pp. 1267–1275Google Scholar
  52. 52.
    Y.K. Shih, S. Parthasarathy, Scalable global alignment for multiple biological networks, in BMC Bioinformatics vol. 13, no. 3, (BioMed Central, Dec. 2012), p. S11Google Scholar
  53. 53.
    A. Doan, J. Madhavan, P. Domingos, A. Halevy, Ontology matching: a machine learning approach, in Handbook on Ontologies (Springer, Berlin, Heidelberg, 2004), pp. 385–403CrossRefGoogle Scholar
  54. 54.
    L.C. Freeman, Centrality in social networks conceptual clarification. Soc. Netw. 1(3), 215–239 (1978)CrossRefGoogle Scholar
  55. 55.
    P.D. Straffin, Linear algebra in geography: eigenvectors of networks. Math. Mag. 53(5), 269–276 (1980)MathSciNetzbMATHCrossRefGoogle Scholar
  56. 56.
    L. Katz, A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)zbMATHCrossRefGoogle Scholar
  57. 57.
    L. Page, S. Brin, R. Motwani, T. Winograd,  The Pagerank Citation Ranking: Bringing Order to the Web (Stanford InfoLab, 1999)Google Scholar
  58. 58.
    L.C. Freeman, A set of measures of centrality based on betweenness. Sociom 35–41 (1977)CrossRefGoogle Scholar
  59. 59.
    J. Kuck, H. Zhuang, X. Yan, H. Cam, J. Han, Query-based outlier detection in heterogeneous information networks, in Advances in database technology: proceedings, in International Conference on Extending Database Technology, vol. 2015 (NIH Public Access, Mar 2015), p. 325Google Scholar
  60. 60.
    G. Sabidussi, The centrality index of a graph. Psychometrika 31(4), 581–603 (1966)MathSciNetzbMATHCrossRefGoogle Scholar
  61. 61.
    C. Ni, C. Sugimoto, J. Jiang, Degree, closeness, and betweenness: application of group centrality measurements to explore macro-disciplinary evolution diachronically, in Proceedings of ISSI (2011), pp. 1–13Google Scholar
  62. 62.
    F. Lorrain, H.C. White, Structural equivalence of individuals in social networks. J. Math. Sociol. 1(1), 49–80 (1971)CrossRefGoogle Scholar
  63. 63.
    A. Rawashdeh, M. Rawashdeh, I. Díaz, A. Ralescu, Measures of semantic similarity of nodes in a social network, in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (Springer, Cham, Jul 2014), pp. 76–85zbMATHCrossRefGoogle Scholar
  64. 64.
    P. Jaccard, Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull Soc. Vaudoise Sci. Nat. 37, 241–272 (1901)Google Scholar
  65. 65.
    I.H. Witten, Data mining with weka (Department of Computer Science University of Waikato, New Zealand, Class Lesson, 2013)Google Scholar
  66. 66.
    T. Bodnar, M. Salathé, Validating models for disease detection using twitter, in Proceedings of the 22nd International Conference on World Wide Web (ACM, May 2013), pp. 699–702Google Scholar
  67. 67.
    M. Ji, Y. Sun, M. Danilevsky, J. Han, J. Gao, Graph regularized transductive classification on heterogeneous information networks, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, Heidelberg, Sep 2010), pp. 570–586CrossRefGoogle Scholar
  68. 68.
    C. Luo, R. Guan, Z. Wang, C. Lin, Hetpathmine: a novel transductive classification algorithm on heterogeneous information networks, in European Conference on Information Retrieval (Springer, Cham, Apr 2014), pp. 210–221Google Scholar
  69. 69.
    R.G. Rossi, T. de Paulo Faleiros, A. de Andrade Lopes, S.O. Rezende, Inductive model generation for text categorization using a bipartite heterogeneous network, in 2012 IEEE 12th International Conference on Data Mining (ICDM) (IEEE, Dec 2012), pp. 1086–1091Google Scholar
  70. 70.
    R. Angelova, G. Kasneci, G. Weikum, Graffiti: graph-based classification in heterogeneous networks. World Wide Web 15(2), 139–170 (2012)CrossRefGoogle Scholar
  71. 71.
    X. Kong, B. Cao, P.S. Yu, (2013, August). Multi-label classification by mining label and instance correlations from heterogeneous information networks, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2013), pp. 614–622Google Scholar
  72. 72.
    Y. Zhou, L. Liu, Activity-edge centric multi-label classification for mining heterogeneous information networks, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (ACM, Aug 2014), pp. 1276–1285Google Scholar
  73. 73.
    S.D. Chen, Y.Y. Chen, J. Han, P. Moulin, A feature-enhanced ranking-based classifier for multimodal data and heterogeneous information networks, in 2013 IEEE 13th International Conference on Data Mining (ICDM) (IEEE, Dec 2013), pp. 997–1002Google Scholar
  74. 74.
    S. Jendoubi, A. Martin L. Liétard, B.B. Yaghlane, Classification of message spreading in a heterogeneous social network, in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (Springer, Cham, July 2014), pp. 66–75CrossRefGoogle Scholar
  75. 75.
    S.S. Choi, S.H. Cha, C.C. Tappert, A survey of binary similarity and distance measures. J. Syst., Cybern. Inform. 8(1), 43–48 (2010)Google Scholar
  76. 76.
    J. Shi, J. Malik, Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  77. 77.
    Y. Zhou, H. Cheng, J.X. Yu, Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)CrossRefGoogle Scholar
  78. 78.
    M. Sales-Pardo, R. Guimera, A.A. Moreira, L.A.N. Amaral, Extracting the hierarchical organization of complex systems. Proc. Natl. Acad. Sci. 104(39), 15224–15229 (2007)CrossRefGoogle Scholar
  79. 79.
    C.C. Aggarwal, Y. Xie, P.S. Yu, Towards community detection in locally heterogeneous networks, in Proceedings of the 2011 SIAM International Conference on Data Mining (Society for Industrial and Applied Mathematics, Apr 2011), (pp. 391–402)Google Scholar
  80. 80.
    G.J. Qi, C.C. Aggarwal, T.S. Huang, On clustering heterogeneous social media objects with outlier links, in Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (ACM, Feb 2012), (pp. 553–562)Google Scholar
  81. 81.
    J.D. Cruz, C. Bothorel, Information integration for detecting communities in attributed graphs, in  2013 Fifth International Conference on Computational Aspects of Social Networks (CASoN) (IEEE, Aug 2013), pp. 62–67Google Scholar
  82. 82.
    M.Z. Ratajczak, M. Kucia, M. Majka, R. Reca, J. Ratajczak, Heterogeneous populations of bone marrow stem cells–are we spotting on the same cells from the different angles? Folia Histochem. Cytobiol. 42(3), 139–146 (2004)Google Scholar
  83. 83.
    H. Deng, J. Han, B. Zhao, Y. Yu, C.X. Lin, Probabilistic topic models with biased propagation on heterogeneous information networks, in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (ACM, Aug 2011), pp. 1271–1279Google Scholar
  84. 84.
    Q. Wang, Z. Peng, F. Jiang, Q. Li, LSA-PTM: a propagation-based topic model using latent semantic analysis on heterogeneous information networks, in International Conference on Web-Age Information Management (Springer, Berlin, Heidelberg, June 2013), (pp. 13–24)CrossRefGoogle Scholar
  85. 85.
    X. Wang, C. Zhai, X. Hu, R. Sproat, Mining correlated bursty topic patterns from coordinated text streams, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2007), (pp. 784–793)Google Scholar
  86. 86.
    R. Wang, C. Shi, S. Y. Philip, B. Wu, Integrating clustering and ranking on hybrid heterogeneous information network, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, Berlin, Heidelberg, Apr 2013), pp. 583–594CrossRefGoogle Scholar
  87. 87.
    C. Shi, R. Wang, Y. Li, P.S. Yu, B. Wu, Ranking-based clustering on general heterogeneous information networks by network projection, in Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (ACM, Nov 2014) (pp. 699–708)Google Scholar
  88. 88.
    J. Chen, W. Dai, Y. Sun, J. Dy, Clustering and ranking in heterogeneous information networks via gamma-poisson model, in Proceedings of the 2015 SIAM International Conference on Data Mining (Society for Industrial and Applied Mathematics, June 2015), (pp. 424–432)Google Scholar
  89. 89.
    C. Wang, J. Liu, N. Desai, M. Danilevsky, J. Han, Constructing topical hierarchies in heterogeneous information networks. Knowl. Inf. Syst. 44(3), 529–558 (2015)CrossRefGoogle Scholar
  90. 90.
    C. Qiu, W. Chen, T. Wang, K. Lei, Overlapping community detection in directed heterogeneous social network, in International Conference on Web-Age Information Management (Springer, Cham, June 2015), (pp. 490–493)CrossRefGoogle Scholar
  91. 91.
    M. Gupta, J. Gao, J. Han, Community distribution outlier detection in heterogeneous information networks, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, Heidelberg, Sept 2013), pp. 557–573CrossRefGoogle Scholar
  92. 92.
    H. Zhuang, J. Zhang, G. Brova, J. Tang, H. Cam, X. Yan, J. Han, Mining query-based subnetwork outliers in heterogeneous information networks, in 2014 IEEE International Conference on Data Mining (ICDM) (IEEE, Dec 2014), (pp. 1127–1132)Google Scholar
  93. 93.
    P. Gundecha, H. Liu, Mining social media: a brief introduction. In New Directions in Informatics, Optimization, Logistics, and Production (Informs, 2012) (pp. 1–17)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Science & TechnologyInternational Hellenic UniversityMoudania, ThermiGreece

Personalised recommendations