Abstract
The rapid growth of Social Media Networks (SMN) initiated a new era for data analytics. We use various data mining and machine learning algorithms to analyze different types of data generated within these complex networks, attempting to produce usable knowledge. When engaging in descriptive analytics, we utilize data aggregation and mining techniques to provide an insight into the past or present, describing patterns, trends, incidents etc. and try to answer the question “What is happening or What has happened”. Diagnostic analytics come with a pack of techniques that act as tracking/monitoring tools aiming to understand “Why something is happening or Why it happened”. Predictive analytics come with a variety of forecasting techniques and statistical models, which combined, produce insights for the future, hopefully answering “What could happen”. Prescriptive analytics, utilize simulation and optimization methodologies and techniques to generate a helping/support mechanism, answering the question “What should we do”. In order to perform any type of analysis, we first need to identify the correct sources of information. Then, we need APIs to initialize data extraction. Once data are available, cleaning and preprocessing are performed, which involve dealing with noise, outliers, missing values, duplicate data and aggregation, discretization, feature selection, feature extraction, sampling. The next step involves analysis, depending on the Social Media Analytics (SMA) task, the choice of techniques and methodologies varies (e.g. similarity, clustering, classification, link prediction, ranking, recommendation, information fusion). Finally, it comes to human judgment to meaningfully interpret and draw valuable knowledge from the output of the analysis step. This chapter discusses these concepts elaborating on and categorizing various mining tasks (supervised and unsupervised) while presenting the required process and its steps to analyze data retrieved from the Social Media (SM) ecosystem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
G. Bello-Orgaz, J.J. Jung, D. Camacho, Social big data: recent achievements and new challenges. Inf. Fusion 28, 45–59 (2016)
M. Kivelä, A. Arenas, M. Barthelemy, J.P. Gleeson, Y. Moreno, M.A. Porter, Multilayer networks. J. Complex Netw. 2(3), 203–271 (2014)
J. Han, in International Conference on Discovery Science. Mining Heterogeneous Information Networks by Exploring the Power of Links (Springer, Berlin, Heidelberg, Oct 2009), pp. 13–30
Y. Sun, J. Han, Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explor. Newsl. 14(2), 20–28 (2013)
C. Shi, Y. Li, J. Zhang, Y. Sun, S.Y. Philip, A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017)
Y. Sun, B. Norick, J. Han, X. Yan, P.S. Yu, X. Yu, Pathselclus: integrating meta-path selection with user-guided object clustering in heterogeneous information networks. ACM Trans. knowl. Discov. Data (TKDD) 7(3), 11 (2013)
X. Kong, P.S. Yu, Y. Ding, D.J. Wild, Meta path-based collective classification in heterogeneous information networks, in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, (ACM, Oct 2012), pp. 1567–1571
C. Shi, X. Kong, P.S. Yu, S. Xie, B. Wu, Relevance search in heterogeneous networks, in Proceedings of the 15th International Conference on Extending Database Technology (ACM, Mar 2012), pp. 180–191
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, T. Wu, Rankclus: integrating clustering with ranking for heterogeneous information network analysis, in Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (ACM, Mar 2009), pp. 565–576
Y. Sun, J. Han, X. Yan, P.S. Yu, T. Wu, Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)
A. Banerjee, T. Bandyopadhyay, P. Acharya, Data analytics: hyped up aspirations or true potential? Vikalpa 38(4), 1–12 (2013)
M. Minelli, M. Chambers, A. Dhiraj, Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses (Wiley, 2012)
T. Bayrak, A review of business analytics: a business enabler or another passing fad. Procedia—Soc. Behav. Sci. 195, 230–239 (2015)
A. Abbasi, W. Li, V. Benjamin, S. Hu, H. Chen, Descriptive analytics: examining expert hackers in web forums, in 2014 IEEE Joint Intelligence and Security Informatics Conference (JISIC) (IEEE, Sept 2014), pp. 56–63
G.F. Khan, Seven Layers of Social Media Analytics: Mining Business Insights from Social Media Text, Actions, Networks, Hyperlinks, Apps, Search Engine, and Location Data (2015)
M.A. Waller, S.E. Fawcett, Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. J. Bus. Logist. 34(2), 77–84 (2013)
D. Bertsimas, N. Kallus, (2014). From predictive to prescriptive analytics. arXiv:1402.5481
T. Condie, P. Mineiro, N. Polyzotis, M. Weimer, Machine learning on big data, in 2013 IEEE 29th International Conference on Data Engineering (ICDE) (IEEE, Apr 2013), pp. 1242–1244
G. George, M.R. Haas, A. Pentland, Big Data and Management (2014)
P. Gundecha, H. Liu, Mining social media: a brief introduction. Tutorials in Operations Research (2012), p. 1
D.M. Boyd, N.B. Ellison, Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007)
R. Zafarani, M.A. Abbasi, H. Liu, Social Media Mining: an Introduction (Cambridge University Press, 2014)
C.P. Chen, C.Y. Zhang, Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
M.T. Thai, W. Wu, H. Xiong (eds.), Big Data in Complex and Social Networks (CRC Press, 2016)
G. Li, B.C. Ooi, J. Feng, J. Wang, L. Zhou, EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (ACM, Jun 2008), pp. 903–914
Digital image. Altova. Web. 2. https://www.altova.com/mapforce/data-sorting.html. Accessed 02 Nov 2016
R.Y. Wang, V.C. Storey, C.P. Firth, A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 4, 623–640 (1995)
H. Xiong, G. Pandey, M. Steinbach, V. Kumar, Enhancing data analysis with noise removal. IEEE Trans. Knowl. Data Eng. 18(3), 304–319 (2006)
X.Zhu, X.Wu (2004). Class noise versus attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210
M.I. Petrovskiy, Outlier detection algorithms in data mining systems. Program. Comput. Softw. 29(4), 228–237 (2003)
S. Vijendra, P. Shivani, Robust Outlier Detection Technique in Data Mining: A Univariate Approach (2014). arXiv:1406.5074
J.W. Grzymala-Busse, J.W. Grzymala-Busse, Handling missing attribute values, in Data Mining and Knowledge Discovery Handbook, (Springer, Boston, MA, 2009), pp. 33–51
J.J. Tamilselvi, C.B. Gifta, Handling duplicate data in data warehouse for data mining. Int. J. Comput. Appl. (0975–8887) 15(4), 1–9 (2011)
Y. Sun, J. Han, Mining heterogeneous information networks: principles and methodologies. Synth. Lect. Data Min. Knowl. Discov. 3(2), 1–159 (2012)
R.W. Floyd, Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)
H. Lietz, Watts, Duncan J./Strogatz, Steven H. (1998). Collective dynamics of small-world networks. Nature 393, S. 440–442. Schlüsselwerke der Netzwerkforschung (Springer VS, Wiesbaden), pp. 551–553
D.J. Watts, Networks, dynamics, and the small-world phenomenon. Am. J. Sociol. 105(2), 493–527 (1999)
A.L. Barabási, R. Albert, Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Computer science bibliography (dblp). http://dblp.uni-trier.de/. Accessed 15 Oct 2016
A.K. Jain, Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Y. Kanellopoulos, P. Antonellis, C. Tjortjis, C. Makris, N. Tsirakis, k-Attractors: a partitional clustering algorithm for numeric data analysis. Appl. Artif. Intell. 25(2), (2011), pp. 97–115
P. Tzirakis, C. Tjortjis, T3C: Improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv. Data Anal. Classif. 11(2), 353–370 (2017)
J. Lafferty, A. McCallum, F.C. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data (2001)
D. Liben-Nowell, J. Kleinberg, The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)
A. Popescul, L.H. Ungar, Statistical relational learning for link prediction, in IJCAI Workshop on Learning Statistical Models from Relational Data, vol. 2003, (2003, August)
Y. Sun, Y. Yu, J. Han, Ranking-based clustering of heterogeneous information networks with star network schema, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Jun 2009), pp. 797–806
O. Nalmpantis, C. Tjortjis, The 50/50 recommender: a method incorporating personality into movie recommender systems, in Proceedings of 8th International Conference on Engineering Applications of Neural Networks (EANN 17), Communications in Computer and Information Science (CCIS) 744, (Springer, 2017), pp. 1–10
V.C. Gerogiannis, A. Karageorgos, L. Liu, C. Tjortjis, Personalised fuzzy recommendation for high involvement products in IEEE International Conference on Systems, Man, and Cybernetics (SMC 2013), (2013) pp. 4884–4890
C. Luo, W. Pang, Z. Wang, C. Lin, Hete-cf: Social-based collaborative filtering recommendation using heterogeneous relations, in 2014 IEEE International Conference on Data Mining (ICDM), (IEEE Dec 2014), pp. 917–922
N. Srebro, T. Jaakkola, Weighted low-rank approximations, in Proceedings of the 20th International Conference on Machine Learning (ICML-03) (2003), pp. 720–727
X. Yang, H. Steck, Y. Liu, Circle-based recommendation in online social networks, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2012), pp. 1267–1275
Y.K. Shih, S. Parthasarathy, Scalable global alignment for multiple biological networks, in BMC Bioinformatics vol. 13, no. 3, (BioMed Central, Dec. 2012), p. S11
A. Doan, J. Madhavan, P. Domingos, A. Halevy, Ontology matching: a machine learning approach, in Handbook on Ontologies (Springer, Berlin, Heidelberg, 2004), pp. 385–403
L.C. Freeman, Centrality in social networks conceptual clarification. Soc. Netw. 1(3), 215–239 (1978)
P.D. Straffin, Linear algebra in geography: eigenvectors of networks. Math. Mag. 53(5), 269–276 (1980)
L. Katz, A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
L. Page, S. Brin, R. Motwani, T. Winograd, The Pagerank Citation Ranking: Bringing Order to the Web (Stanford InfoLab, 1999)
L.C. Freeman, A set of measures of centrality based on betweenness. Sociom 35–41 (1977)
J. Kuck, H. Zhuang, X. Yan, H. Cam, J. Han, Query-based outlier detection in heterogeneous information networks, in Advances in database technology: proceedings, in International Conference on Extending Database Technology, vol. 2015 (NIH Public Access, Mar 2015), p. 325
G. Sabidussi, The centrality index of a graph. Psychometrika 31(4), 581–603 (1966)
C. Ni, C. Sugimoto, J. Jiang, Degree, closeness, and betweenness: application of group centrality measurements to explore macro-disciplinary evolution diachronically, in Proceedings of ISSI (2011), pp. 1–13
F. Lorrain, H.C. White, Structural equivalence of individuals in social networks. J. Math. Sociol. 1(1), 49–80 (1971)
A. Rawashdeh, M. Rawashdeh, I. Díaz, A. Ralescu, Measures of semantic similarity of nodes in a social network, in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (Springer, Cham, Jul 2014), pp. 76–85
P. Jaccard, Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull Soc. Vaudoise Sci. Nat. 37, 241–272 (1901)
I.H. Witten, Data mining with weka (Department of Computer Science University of Waikato, New Zealand, Class Lesson, 2013)
T. Bodnar, M. Salathé, Validating models for disease detection using twitter, in Proceedings of the 22nd International Conference on World Wide Web (ACM, May 2013), pp. 699–702
M. Ji, Y. Sun, M. Danilevsky, J. Han, J. Gao, Graph regularized transductive classification on heterogeneous information networks, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, Heidelberg, Sep 2010), pp. 570–586
C. Luo, R. Guan, Z. Wang, C. Lin, Hetpathmine: a novel transductive classification algorithm on heterogeneous information networks, in European Conference on Information Retrieval (Springer, Cham, Apr 2014), pp. 210–221
R.G. Rossi, T. de Paulo Faleiros, A. de Andrade Lopes, S.O. Rezende, Inductive model generation for text categorization using a bipartite heterogeneous network, in 2012 IEEE 12th International Conference on Data Mining (ICDM) (IEEE, Dec 2012), pp. 1086–1091
R. Angelova, G. Kasneci, G. Weikum, Graffiti: graph-based classification in heterogeneous networks. World Wide Web 15(2), 139–170 (2012)
X. Kong, B. Cao, P.S. Yu, (2013, August). Multi-label classification by mining label and instance correlations from heterogeneous information networks, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2013), pp. 614–622
Y. Zhou, L. Liu, Activity-edge centric multi-label classification for mining heterogeneous information networks, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (ACM, Aug 2014), pp. 1276–1285
S.D. Chen, Y.Y. Chen, J. Han, P. Moulin, A feature-enhanced ranking-based classifier for multimodal data and heterogeneous information networks, in 2013 IEEE 13th International Conference on Data Mining (ICDM) (IEEE, Dec 2013), pp. 997–1002
S. Jendoubi, A. Martin L. Liétard, B.B. Yaghlane, Classification of message spreading in a heterogeneous social network, in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (Springer, Cham, July 2014), pp. 66–75
S.S. Choi, S.H. Cha, C.C. Tappert, A survey of binary similarity and distance measures. J. Syst., Cybern. Inform. 8(1), 43–48 (2010)
J. Shi, J. Malik, Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Y. Zhou, H. Cheng, J.X. Yu, Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)
M. Sales-Pardo, R. Guimera, A.A. Moreira, L.A.N. Amaral, Extracting the hierarchical organization of complex systems. Proc. Natl. Acad. Sci. 104(39), 15224–15229 (2007)
C.C. Aggarwal, Y. Xie, P.S. Yu, Towards community detection in locally heterogeneous networks, in Proceedings of the 2011 SIAM International Conference on Data Mining (Society for Industrial and Applied Mathematics, Apr 2011), (pp. 391–402)
G.J. Qi, C.C. Aggarwal, T.S. Huang, On clustering heterogeneous social media objects with outlier links, in Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (ACM, Feb 2012), (pp. 553–562)
J.D. Cruz, C. Bothorel, Information integration for detecting communities in attributed graphs, in 2013 Fifth International Conference on Computational Aspects of Social Networks (CASoN) (IEEE, Aug 2013), pp. 62–67
M.Z. Ratajczak, M. Kucia, M. Majka, R. Reca, J. Ratajczak, Heterogeneous populations of bone marrow stem cells–are we spotting on the same cells from the different angles? Folia Histochem. Cytobiol. 42(3), 139–146 (2004)
H. Deng, J. Han, B. Zhao, Y. Yu, C.X. Lin, Probabilistic topic models with biased propagation on heterogeneous information networks, in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (ACM, Aug 2011), pp. 1271–1279
Q. Wang, Z. Peng, F. Jiang, Q. Li, LSA-PTM: a propagation-based topic model using latent semantic analysis on heterogeneous information networks, in International Conference on Web-Age Information Management (Springer, Berlin, Heidelberg, June 2013), (pp. 13–24)
X. Wang, C. Zhai, X. Hu, R. Sproat, Mining correlated bursty topic patterns from coordinated text streams, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, Aug 2007), (pp. 784–793)
R. Wang, C. Shi, S. Y. Philip, B. Wu, Integrating clustering and ranking on hybrid heterogeneous information network, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, Berlin, Heidelberg, Apr 2013), pp. 583–594
C. Shi, R. Wang, Y. Li, P.S. Yu, B. Wu, Ranking-based clustering on general heterogeneous information networks by network projection, in Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (ACM, Nov 2014) (pp. 699–708)
J. Chen, W. Dai, Y. Sun, J. Dy, Clustering and ranking in heterogeneous information networks via gamma-poisson model, in Proceedings of the 2015 SIAM International Conference on Data Mining (Society for Industrial and Applied Mathematics, June 2015), (pp. 424–432)
C. Wang, J. Liu, N. Desai, M. Danilevsky, J. Han, Constructing topical hierarchies in heterogeneous information networks. Knowl. Inf. Syst. 44(3), 529–558 (2015)
C. Qiu, W. Chen, T. Wang, K. Lei, Overlapping community detection in directed heterogeneous social network, in International Conference on Web-Age Information Management (Springer, Cham, June 2015), (pp. 490–493)
M. Gupta, J. Gao, J. Han, Community distribution outlier detection in heterogeneous information networks, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, Heidelberg, Sept 2013), pp. 557–573
H. Zhuang, J. Zhang, G. Brova, J. Tang, H. Cam, X. Yan, J. Han, Mining query-based subnetwork outliers in heterogeneous information networks, in 2014 IEEE International Conference on Data Mining (ICDM) (IEEE, Dec 2014), (pp. 1127–1132)
P. Gundecha, H. Liu, Mining social media: a brief introduction. In New Directions in Informatics, Optimization, Logistics, and Production (Informs, 2012) (pp. 1–17)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Koukaras, P., Tjortjis, C. (2019). Social Media Analytics, Types and Methodology. In: Tsihrintzis, G., Virvou, M., Sakkopoulos, E., Jain, L. (eds) Machine Learning Paradigms. Learning and Analytics in Intelligent Systems, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-030-15628-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-15628-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15627-5
Online ISBN: 978-3-030-15628-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)