Abstract
A general overview is provided through examples and case studies, retrieved from research experiences, to foster description and debate on effectiveness in Big Data environments. At issue are early stage case studies relating to: research publishing and research impact; literature, narrative and foundational emotional tracking; and social media, here Twitter, with a social science orientation. Central relevance and importance will be associated with the following aspects of analytical methodology: context, leading to availing of semantics; focus, motivating homology between fields of analytical orientation; resolution scale, which can incorporate a concept hierarchy and aggregation in general; and acknowledging all that is implied by this expression: correlation is not causation. Application areas are: quantitative and also qualitative assessment, narrative analysis and assessing impact, and baselining and contextualizing, statistically and in related aspects such as visualization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bécue-Bertaut, M., Kostov, B., Morin, A., Naro, G.: Rhetorical strategy in forensic speeches: multidimensional statistics-based methodology. J. Classif. 31, 85–106 (2014)
Bienaise, S., Le Roux, B.: Combinatorial typicality test in geometric typicality test in geometric data analysis. Stat. Appl. Italian J. Appl. Stat. 29(2–3), 331–348 (2017)
Blasius, J., Greenacre, M. (eds.): Visualization and Verbalization of Data. Chapman and Hall/CRC Press, Boca Raton (2014)
Gelman, A., Hennig, C.: Beyond subjective and objective in statistics. J. R. Stat. Soc. Ser. A 180(Part 4), 1–31 (2017)
Goeuriot, L., Mothe, J., Mulhem, P., Murtagh, F., SanJuan, E.: Overview of the CLEF 2016 cultural micro-blog contextualization workshop. In: Fuhr, N., Quaresma, P., Goncalves, T., Larsen, B., Balog, K., Macdonald, C., Cappellato, L., Ferro, N. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction, 7th International Conference of the CLEF Association, CLEF 2016, Évora, Portugal, 5–8 September 2016, Proceedings. Lecture Notes in Computer Science, vol. 9822, pp. 371–378 (2016)
Hand, D.J.: Statistical challenges of administrative and transaction data. J. R. Stat. Soc. Ser. A 181(3), 1–24 (2018). Including F. Murtagh comments
Hernández, D.M., Bécue-Bertuat, M., Barahona, I.: How scientific literature has been evolving over the time? A novel statistical approach using tracking verbal-based methods. In: JSM Proceedings, 2014, Section on Statistical Learning and Data Mining. American Statistical Association, pp. 1121–1132 (2014)
Keiding, N., Louis, T.A.: Perils and potentials of self-selected entry to epidemiological studies and surveys. J. R. Stat. Soc. A 179(Part 2), 319–376 (2016) Including F. Murtagh comments
Legendre, P., Legendre, L.: Numerical Ecology, 3rd edn. Elsevier, Amsterdam (2012)
Le Roux, B.: Analyse Géométrique des Données Multidimensionelles. Dunod, Paris (2014)
Le Roux, B., Lebaron, F.: Idées-clefs de l’analyse géometrique des données (Key ideas in the geometric analysis of data). In: Lebaron, F., Le Roux, B. (eds.) La Méthodologie de Pierre Bourdieu en Action: Espace Culturel, Espace Social et Analyse des Données, pp. 3–20. Dunod, Paris (2015)
Le Roux, B., Rouanet, H.: Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis. Kluwer, Dordrecht (2004)
McKee, R.: Story: Substance, Structure, Style, and the Principles of Screenwriting. Methuen, London (1999)
Murtagh, F.: Multidimensional Clustering Algorithms. Physica-Verlag, Würzburg (1985)
Murtagh, F.: Semantic mapping: towards contextual and trend analysis of behaviours and practices. In: Balog, K., Cappellato, L., Ferro, N., MacDonald, C. (eds.) Working Notes of CLEF 2016 – Conference and Labs of the Evaluation Forum, Évora, Portugal, 5–8 September 2016, pp. 1207–1225 (2016). http://ceur-ws.org/Vol-1609/16091207.pdf
Murtagh, F.: Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics. Chapman and Hall, CRC Press, Boca Raton (2017)
Murtagh, F., Farid, M.: Contextualizing geometric data analysis and related data analytics: a virtual microscope for big data analytics. J. Interdiscip. Methodol. Issues Sci. 3 (2017). arXiv:1611.09948v3
Murtagh, F., Ganz, A.: Pattern recognition in narrative: tracking emotional expression in context. J. Data Min. Digit. Humanit. 2015 (2015)
Murtagh, F., Ganz, A., McKie, S.: The structure of narrative: the case of film scripts. Pattern Recognit. 42, 302–312 (2009)
Murtagh, F., Spagat, M., Restrepo, J.A.: Ultrametric wavelet regression of multivariate time series: application to Colombian conflict analysis. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 41, 254–263 (2011)
Murtagh, F., Pianosi, M., Bull, R.: Semantic mapping of discourse and activity, using Habermas’s theory of communicative action to analyze process. Qual. Quant. 50(4), 1675–1694 (2016)
Murtagh, F., Orlov, M., Mirkin, B.: Qualitative judgement of research impact: domain taxonomy as a fundamental framework for judgement of the quality of research. J. Classif. 35(1), 5–28 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Murtagh, F. (2019). Text Mining and Big Textual Data: Relevant Statistical Models. In: Petrucci, A., Racioppi, F., Verde, R. (eds) New Statistical Developments in Data Science. SIS 2017. Springer Proceedings in Mathematics & Statistics, vol 288. Springer, Cham. https://doi.org/10.1007/978-3-030-21158-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-21158-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21157-8
Online ISBN: 978-3-030-21158-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)