Skip to main content

Science Mapping and the Identification of Topics: Theoretical and Methodological Considerations

  • Chapter

Part of the book series: Springer Handbooks ((SHB))

Abstract

This chapter focusses on the drivers for the advancement of mapping of science and the detection of topics as often applied in scientometrics. The chapter identifies three different drivers for this advancement: technological innovation resulting in increased computational power, the improved community detection approaches available today, and advancements in scientometrics itself with respect to the actual linking of documents through citations or lexical approaches. We will show that the main drivers are the first two, with the last one somewhat lagging behind. Next, severe methodological issues have been identified in network science related to the application of these techniques for community detection. The resolution limit and the degeneracy problem are described. The last section shows how different approaches are taken to enable scientometricians to create global maps of science and how they come to comparable results at higher levels of granularity but that the validity of more fine-grained clusters and topics suffers strongly in the discussed problems, which raises serious questions with respect to the applicability of these global techniques with a strong local focus.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • M.E.J. Newman, M. Girvan: Finding and evaluating community structure in networks, Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  • L. Waltman, N.J. van Eck: A smart local moving algorithm for large-scale modularity-based community detection, Eur. Phys. J. B 86(11), 471 (2013)

    Article  Google Scholar 

  • S. Fortunato, M. Barthélemy: Resolution limit in community detection, PNES 104, 36 (2007)

    Article  Google Scholar 

  • B.H. Good, Y.-A. de Montojoye, A. Clauset: Performance of modularity maximization in practical contexts, Phys. Rev. E 81, 046106 (2010)

    Article  Google Scholar 

  • G.E. Moore: Cramming more components onto Integrated Circuits, Electronics 38(8), 33–35 (1965)

    Google Scholar 

  • R. Nambiar, M. Puess: Transaction performance vs. Moore's Law: A trend analysis. In: TPCTC 2010: Performance Evaluation, Measurement and Characterization of Complex Systems (Springer, Berlin, Heidelberg 2011) pp. 110–120

    Chapter  Google Scholar 

  • M. Rosvall, C.T. Bergstrom: Maps of information flow reveal community structure in complex networks, PNAS 105, 1118 (2008)

    Article  Google Scholar 

  • V.D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre: Fast unfolding of communities in large networks, J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)

    Article  Google Scholar 

  • E.C.M. Noyons: Science maps within a science policy context. In: Handbook of Quantitative Science and Technology Research, ed. by H.F. Moed, W. Glänzel, U. Schmoch (Springer, Dordrecht 2004) pp. 187–213

    Google Scholar 

  • E. Garfield: Permuterm subject index – The primordial dictionary of science, Curr. Contents 12(22), 4 (1969)

    Google Scholar 

  • H. Small, B.C. Griffith: The structure of scientific literatures, I: Identifying and graphing specialties, Soc. Stud. Sci. 4, 17–40 (1974)

    Google Scholar 

  • M.M. Kessler: Bibliographic coupling between scientific papers, Am. Doc. 14, 10–25 (1963)

    Article  Google Scholar 

  • R. Klavans, K.W. Boyack: Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge?, JASIST 68(4), 984–998 (2017)

    Google Scholar 

  • D.M. Blei, A.Y. Ng, M.I. Jordan: Latent Dirichlet allocation, J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  • S. Wasserman, K. Faust: Social Network Analysis: Methods and Applications (Cambridge Univ. Press, New York 1994)

    Book  Google Scholar 

  • H.D. White, K.W. McCain: Visualizing a discipline: An author co-citation analysis of information science, 1972–1995, J. Am. Soc. Inf. Sci. 49, 327–355 (1998)

    Google Scholar 

  • L. Waltman, N.J. van Eck: A new methodology for constructing a publication-level classification system of science, JASIST 63(12), 2378–2392 (2012)

    Article  Google Scholar 

  • N.J. van Eck, L. Waltman: Citation-based clustering of publications using CitNetExplorer and VOSviewer, Scientometrics 111(2), 1053–1070 (2017)

    Article  Google Scholar 

  • K.W. Boyack, R. Klavans: Including non-source items in a large-scale map of science: What difference does it make?, J. Informetr. 8, 569–580 (2014)

    Article  Google Scholar 

  • K.W. Boyack, R. Klavans: Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?, JASIST 61(12), 2389–2404 (2010)

    Article  Google Scholar 

  • R. Klavans, K.W. Boyack: Toward an objective, reliable and accurate method for measuring research leadership, Scientometrics 82(3), 539–553 (2010)

    Article  Google Scholar 

  • E. Garfield, M.V. Malin, H. Small: Citation Data as Science Indicators. In: Toward a Metric of Science: The Advent of Science Indicators, ed. by Y. Elkana, J. Lederberg, R.K. Merton, A. Thackray, H. Zuckerman (John Wiley & Sons, New York 1978) pp. 179–207, reprinted in Essays of an Information Scientist, Vol. 6, p. 580, 1983

    Google Scholar 

  • R. Klavans, K.W. Boyack: Using global mapping to create more accurate document-level maps of research fields, JASIST 62(1), 1–18 (2011)

    Article  Google Scholar 

  • W. Glänzel, H.J. Czerwon: A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level, Scientometrics 37(2), 195–221 (1996)

    Article  Google Scholar 

  • G.M. Sheldrick: A short history of SHELX, Acta Chrystallogr. Sect. A 64(1), 112–122 (2008)

    Article  Google Scholar 

  • B. Thijs: Drakkar: A graph based all-nearest neighbour search algorithm for bibliographic coupling. In: Proc. 5th Worksh. Bibliometr.-Enhanc. Inf. Retriev. (BIR), Vol. 1823 (2017) pp. 101–111

    Google Scholar 

  • M. Callon, J.P. Courtial, F. Laville: Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry, Scientometrics 22(1), 155–205 (1991)

    Article  Google Scholar 

  • R.J.W. Tijssen, A.F.J. Van Raan: Mapping changes in science and technology: Bibliometric co-occurrence analysis of the R&D literature, Eval. Rev. 18(1), 98–115 (1994)

    Article  Google Scholar 

  • P. Glenisson, W. Glänzel, O. Person: Combining full-text analysis and bibliometric indicators. A pilot study, Scientometrics 63(1), 163–180 (2005)

    Article  Google Scholar 

  • P. Glenisson, W. Glänzel, F. Janssens, B. De Moor: Combining full text and bibliometric information in mapping scientific disciplines, Inf. Process. Manag. 41, 1548–1572 (2005)

    Article  Google Scholar 

  • M. Zitt, E. Bassecoulard: Development of a method for detection and trend analysis of research fronts built by lexical or co-citation analysis, Scientometrics 30, 333–351 (1994)

    Article  Google Scholar 

  • R. Todorov: Displaying content of scientific journals: A co-heading analysis, Scientometrics 23(2), 319–334 (1992)

    Article  Google Scholar 

  • K.W. Boyack, D. Newman, R.J. Duhon, R. Klavans, M. Patek, J.R. Biberstine: Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches, PLoS One 6(3), e18029 (2011)

    Article  Google Scholar 

  • M.F. Porter: An algorithm for suffix stripping, Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  • T. Dunning: Accurate methods for the statistics of surprise and coincidence, Comput. Linguist. 19, 61–74 (1993)

    Google Scholar 

  • E. Leopold, M. May, G. Paaß: Data mining and text mining for S&T research. In: Handbook of Quantitative Science and Technology Research, ed. by H.F. Moede, W. Glänzel, U. Schmoch (Springer, Dordrecht 2004) pp. 187–213

    Google Scholar 

  • G. Neumann, J. Piskorski: A shallow text processing core engine, Comput. Intell. 18(3), 451–476 (2002)

    Article  Google Scholar 

  • B. Thijs, W. Glänzel, M. Meyer: Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of “Information system research​”. In: Proc. Worksh. Mining Sci. Papers: Comput. Linguist. Bibliometr. International Society of Scientometrics and Informetrics Conference (ISSI), Istanbul, Vol. 1384 (2015)

    Google Scholar 

  • G.J. Udo, R.C. Kick: The determinants of the critical success factors of information systems downsizing, Eur. J. Inf. Syst. 6(4), 218 (1997)

    Article  Google Scholar 

  • W. Glänzel, B. Thijs: Using hybrid methods and ‘core documents' for the representation of clusters and topics: the astronomy dataset, Scientometrics 111(2), 1071–1087 (2017)

    Article  Google Scholar 

  • G. Salton, C. Buckley: Term-weighting approaches in automatic text retrieval, Inf. Process. Manag. 24, 513–523 (1988)

    Article  Google Scholar 

  • B. Thijs, W. Glänzel, M. Meyer: Improved lexical similarities for hybrid clustering through the use of noun phrases extraction. In: FEB Research Report MSI_1703 (KU Leuven – Faculty of Economics and Business, Leuven 2017)

    Google Scholar 

  • K. Spärck Jones: A statistical interpretation of term specificity and its application in retrieval, J. Doc. 28, 11–21 (1972)

    Article  Google Scholar 

  • T. Hofman: Unsupervised learning by probabilistic latent semantic analysis, Mach. Learn. 42, 177–196 (2001)

    Article  Google Scholar 

  • R. Koopman, S. Wang, A. Scharnhorst: Contextualization of topics: Browsing through the universe of bibliographic information, Scientometrics 111(2), 1071–1087 (2017)

    Article  Google Scholar 

  • K. Spärck Jones, S. Walker, S.E. Robertson: A probabilistic model of information retrieval: Development and comparative experiments. Part 1, Inf. Process. Manag. 36, 779–808 (2000)

    Article  Google Scholar 

  • C.D. Manning, P. Raghavan, H. Schütze: Introduction to Information Retrieval (Cambridge Univ. Press, Cambridge 2008)

    Book  Google Scholar 

  • D. Ravichandran, P.E. Pantel: Hovy: Randomized algorithms and NLP: Using locality sensitive hash function for high speed noun clustering. In: Proc. 43rd Annu. Meet. Assoc. Comput. Linguist (2005) pp. 622–629

    Google Scholar 

  • J. Bichteler, E.A. Eaton: The combined use of bibliographic coupling and co-citation for document retrieval, JASIST 31(4), 278–282 (1980)

    Article  Google Scholar 

  • R.R. Braam, H.F. Moed, A.F.J. van Raan: Mapping of science by combined co-citation and word analysis, part 1: Structural aspects, JASIST 42(4), 233–251 (1991)

    Article  Google Scholar 

  • R.R. Braam, H.F. Moed, A.F.J. van Raan: Mapping of science by combined co-citation and word analysis part II: Dynamical aspects, JASIST 42(4), 252–266 (1991)

    Article  Google Scholar 

  • F. Janssens, P. Glenisson, W. Glänzel, B. De Moor: Co-clustering approaches to integrate lexical and bibliographical information. In: Proc. of the 10th Int. Conf. Int. Soc. Scientometr. Informetr. (ISSI) (Karolinska Univ. Press, Stockholm 2005) pp. 284–289

    Google Scholar 

  • R. Albert, A.-L. Barabási: Statistical mechanics of complex networks, Rev. Mod. Phys. 74(1), 47–97 (2002)

    Article  Google Scholar 

  • F. Janssens, W. Glänzel, B. De Moor: A hybrid mapping of information science, Scientometr. 75(3), 607–631 (2008)

    Article  Google Scholar 

  • W. Glänzel, B. Thijs: Using ‘core documents' for detecting and labelling new emerging topics, Scientometrics 91(2), 399–416 (2012)

    Article  Google Scholar 

  • W. Glänzel, B. Thijs: Using ‘core documents' for the representation of clusters and topics, Scientometrics 88(1), 297–309 (2011)

    Article  Google Scholar 

  • M.E.J. Newman: Modularity and community structure in networks, PNAS 103(23), 8577–8582 (2006)

    Article  Google Scholar 

  • R.D. Bock, S.Z. Husain: An adaptation of Holzinger's B-coefficients for the analysis of sociometric data, Sociometry 13, 146–153 (1950)

    Article  Google Scholar 

  • R. Rotta, A. Noack: Multilevel local search algorithms for modularity clustering, J. Exp. Algorithmics (2011), https://doi.org/10.1145/1963190.1970376

    Article  Google Scholar 

  • C.E. Shannon, W. Weaver: The Mathematical Theory of Communication (Univ. of Illinois Press, Champaign 1949)

    Google Scholar 

  • S. Brin, L. Page: The anatomy of a large-scale hypertextual Web search engine, Comput. Netw. ISDN Syst. 30, 107–117 (1998)

    Article  Google Scholar 

  • L. Bohlin, D. Edler, A. Lancichinetti, M. Rosvall: Community detection and visualization of networks with the map equation framework. In: Measuring Scholarly Impact: Methods and Practice, ed. by Y. Ding, R. Rousseau, D. Wolfram (Springer, Cham 2014)

    Google Scholar 

  • M. Rosvall, C.T. Bergstrom: Mapping change in large networks, PLoS ONE 5(1), e8694 (2010)

    Article  Google Scholar 

  • M.T. Schaub, R. Lambiotte, M. Barahona: Encoding dynamics for multiscale community detection: Markov time sweeping for the map equation, Phys. Rev. E 86, 026112 (2012)

    Article  Google Scholar 

  • M. Kheirkhahzadeh, A. Lancichinetti, M. Rosvall: Efficient community detection of network flows for varying Markov times and bipartite networks, Phys. Rev. E 93, 032309 (2016)

    Article  Google Scholar 

  • A.V. Esquivel, M. Rosvall: Compression of flow can reveal overlapping-module organization in networks, Phys. Rev. X 1, 021025 (2011)

    Google Scholar 

  • M. De Domenico, A. Lancichinetti, A. Arenas, M. Rosvall: Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems, Phys. Rev. X 5, 011027 (2015)

    Google Scholar 

  • V.A. Traag, P. Van Dooren, Y. Nesterov: Narrow scope for resolution-limit-free community detection, Phys. Rev. E 84(1), 016114 (2011)

    Article  Google Scholar 

  • T. Kawamoto, M. Rosvall: Estimating the resolution limit of the map equation in community detection, Phys. Rev. E 91, 012809 (2015)

    Article  Google Scholar 

  • A. Lancichinetti, S. Fortunato: Limits of modularity maximization in community detection, Phys. Rev. E 84, 066122 (2011)

    Article  Google Scholar 

  • P.J. Rouseeuw: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20(1), 53–65 (1987)

    Article  Google Scholar 

  • N.X. Vinh, J. Epps, J. Bailey: Information theoretic measures for clustering comparison: Is a correction for chance necessary? (PDF). In: ICML '09: Proc. 26th Annu. Int. Conf. Mach. Learn. ACM (2009) pp. 1073–1080

    Google Scholar 

  • M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: 9th USENIX Symp. Netw. Syst. Design Implement., San Jose (2012)

    Google Scholar 

  • L.G. Valiant: A bridging model for parallel computing, Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  • B. Thijs, L. Zhang, W. Glänzel: Bibliographic coupling and hierarchical clustering for the validation and improvement of subject-classification schemes, Scientometrics 105(3), 1453–1467 (2015)

    Article  Google Scholar 

  • K.W. McCain, K. Turner: Citation context analysis and aging patterns of journal articles in molecular genetics, Scientometrics 17(1), 127–163 (1989)

    Article  Google Scholar 

  • B. Thijs, E. Schiebel, W. Glänzel: Do second-order similarities provide added-value in a hybrid approach?, Scientometrics 96(3), 667–677 (2013)

    Article  Google Scholar 

  • J.H. Ward Jr.: Hierarchical grouping to optimize an objective function, J. Am. Stat. Assoc. 58, 236–244 (1963)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bart Thijs .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Cite this chapter

Thijs, B. (2019). Science Mapping and the Identification of Topics: Theoretical and Methodological Considerations. In: Glänzel, W., Moed, H.F., Schmoch, U., Thelwall, M. (eds) Springer Handbook of Science and Technology Indicators. Springer Handbooks. Springer, Cham. https://doi.org/10.1007/978-3-030-02511-3_9

Download citation

Publish with us

Policies and ethics