Science Mapping and the Identification of Topics: Theoretical and Methodological Considerations

Thijs, Bart

doi:10.1007/978-3-030-02511-3_9

Science Mapping and the Identification of Topics: Theoretical and Methodological Considerations

Bart Thijs⁵

Chapter

3470 Accesses
8 Citations

Part of the book series: Springer Handbooks ((SHB))

Abstract

This chapter focusses on the drivers for the advancement of mapping of science and the detection of topics as often applied in scientometrics. The chapter identifies three different drivers for this advancement: technological innovation resulting in increased computational power, the improved community detection approaches available today, and advancements in scientometrics itself with respect to the actual linking of documents through citations or lexical approaches. We will show that the main drivers are the first two, with the last one somewhat lagging behind. Next, severe methodological issues have been identified in network science related to the application of these techniques for community detection. The resolution limit and the degeneracy problem are described. The last section shows how different approaches are taken to enable scientometricians to create global maps of science and how they come to comparable results at higher levels of granularity but that the validity of more fine-grained clusters and topics suffers strongly in the discussed problems, which raises serious questions with respect to the applicability of these global techniques with a strong local focus.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

M.E.J. Newman, M. Girvan: Finding and evaluating community structure in networks, Phys. Rev. E 69, 026113 (2004)
Article Google Scholar
L. Waltman, N.J. van Eck: A smart local moving algorithm for large-scale modularity-based community detection, Eur. Phys. J. B 86(11), 471 (2013)
Article Google Scholar
S. Fortunato, M. Barthélemy: Resolution limit in community detection, PNES 104, 36 (2007)
Article Google Scholar
B.H. Good, Y.-A. de Montojoye, A. Clauset: Performance of modularity maximization in practical contexts, Phys. Rev. E 81, 046106 (2010)
Article Google Scholar
G.E. Moore: Cramming more components onto Integrated Circuits, Electronics 38(8), 33–35 (1965)
Google Scholar
R. Nambiar, M. Puess: Transaction performance vs. Moore's Law: A trend analysis. In: TPCTC 2010: Performance Evaluation, Measurement and Characterization of Complex Systems (Springer, Berlin, Heidelberg 2011) pp. 110–120
Chapter Google Scholar
M. Rosvall, C.T. Bergstrom: Maps of information flow reveal community structure in complex networks, PNAS 105, 1118 (2008)
Article Google Scholar
V.D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre: Fast unfolding of communities in large networks, J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008)
Article Google Scholar
E.C.M. Noyons: Science maps within a science policy context. In: Handbook of Quantitative Science and Technology Research, ed. by H.F. Moed, W. Glänzel, U. Schmoch (Springer, Dordrecht 2004) pp. 187–213
Google Scholar
E. Garfield: Permuterm subject index – The primordial dictionary of science, Curr. Contents 12(22), 4 (1969)
Google Scholar
H. Small, B.C. Griffith: The structure of scientific literatures, I: Identifying and graphing specialties, Soc. Stud. Sci. 4, 17–40 (1974)
Google Scholar
M.M. Kessler: Bibliographic coupling between scientific papers, Am. Doc. 14, 10–25 (1963)
Article Google Scholar
R. Klavans, K.W. Boyack: Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge?, JASIST 68(4), 984–998 (2017)
Google Scholar
D.M. Blei, A.Y. Ng, M.I. Jordan: Latent Dirichlet allocation, J. Mach. Learn. Res. 3, 993–1022 (2003)
Google Scholar
S. Wasserman, K. Faust: Social Network Analysis: Methods and Applications (Cambridge Univ. Press, New York 1994)
Book Google Scholar
H.D. White, K.W. McCain: Visualizing a discipline: An author co-citation analysis of information science, 1972–1995, J. Am. Soc. Inf. Sci. 49, 327–355 (1998)
Google Scholar
L. Waltman, N.J. van Eck: A new methodology for constructing a publication-level classification system of science, JASIST 63(12), 2378–2392 (2012)
Article Google Scholar
N.J. van Eck, L. Waltman: Citation-based clustering of publications using CitNetExplorer and VOSviewer, Scientometrics 111(2), 1053–1070 (2017)
Article Google Scholar
K.W. Boyack, R. Klavans: Including non-source items in a large-scale map of science: What difference does it make?, J. Informetr. 8, 569–580 (2014)
Article Google Scholar
K.W. Boyack, R. Klavans: Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?, JASIST 61(12), 2389–2404 (2010)
Article Google Scholar
R. Klavans, K.W. Boyack: Toward an objective, reliable and accurate method for measuring research leadership, Scientometrics 82(3), 539–553 (2010)
Article Google Scholar
E. Garfield, M.V. Malin, H. Small: Citation Data as Science Indicators. In: Toward a Metric of Science: The Advent of Science Indicators, ed. by Y. Elkana, J. Lederberg, R.K. Merton, A. Thackray, H. Zuckerman (John Wiley & Sons, New York 1978) pp. 179–207, reprinted in Essays of an Information Scientist, Vol. 6, p. 580, 1983
Google Scholar
R. Klavans, K.W. Boyack: Using global mapping to create more accurate document-level maps of research fields, JASIST 62(1), 1–18 (2011)
Article Google Scholar
W. Glänzel, H.J. Czerwon: A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level, Scientometrics 37(2), 195–221 (1996)
Article Google Scholar
G.M. Sheldrick: A short history of SHELX, Acta Chrystallogr. Sect. A 64(1), 112–122 (2008)
Article Google Scholar
B. Thijs: Drakkar: A graph based all-nearest neighbour search algorithm for bibliographic coupling. In: Proc. 5th Worksh. Bibliometr.-Enhanc. Inf. Retriev. (BIR), Vol. 1823 (2017) pp. 101–111
Google Scholar
M. Callon, J.P. Courtial, F. Laville: Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry, Scientometrics 22(1), 155–205 (1991)
Article Google Scholar
R.J.W. Tijssen, A.F.J. Van Raan: Mapping changes in science and technology: Bibliometric co-occurrence analysis of the R&D literature, Eval. Rev. 18(1), 98–115 (1994)
Article Google Scholar
P. Glenisson, W. Glänzel, O. Person: Combining full-text analysis and bibliometric indicators. A pilot study, Scientometrics 63(1), 163–180 (2005)
Article Google Scholar
P. Glenisson, W. Glänzel, F. Janssens, B. De Moor: Combining full text and bibliometric information in mapping scientific disciplines, Inf. Process. Manag. 41, 1548–1572 (2005)
Article Google Scholar
M. Zitt, E. Bassecoulard: Development of a method for detection and trend analysis of research fronts built by lexical or co-citation analysis, Scientometrics 30, 333–351 (1994)
Article Google Scholar
R. Todorov: Displaying content of scientific journals: A co-heading analysis, Scientometrics 23(2), 319–334 (1992)
Article Google Scholar
K.W. Boyack, D. Newman, R.J. Duhon, R. Klavans, M. Patek, J.R. Biberstine: Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches, PLoS One 6(3), e18029 (2011)
Article Google Scholar
M.F. Porter: An algorithm for suffix stripping, Program 14(3), 130–137 (1980)
Article Google Scholar
T. Dunning: Accurate methods for the statistics of surprise and coincidence, Comput. Linguist. 19, 61–74 (1993)
Google Scholar
E. Leopold, M. May, G. Paaß: Data mining and text mining for S&T research. In: Handbook of Quantitative Science and Technology Research, ed. by H.F. Moede, W. Glänzel, U. Schmoch (Springer, Dordrecht 2004) pp. 187–213
Google Scholar
G. Neumann, J. Piskorski: A shallow text processing core engine, Comput. Intell. 18(3), 451–476 (2002)
Article Google Scholar
B. Thijs, W. Glänzel, M. Meyer: Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of “Information system research”. In: Proc. Worksh. Mining Sci. Papers: Comput. Linguist. Bibliometr. International Society of Scientometrics and Informetrics Conference (ISSI), Istanbul, Vol. 1384 (2015)
Google Scholar
G.J. Udo, R.C. Kick: The determinants of the critical success factors of information systems downsizing, Eur. J. Inf. Syst. 6(4), 218 (1997)
Article Google Scholar
W. Glänzel, B. Thijs: Using hybrid methods and ‘core documents' for the representation of clusters and topics: the astronomy dataset, Scientometrics 111(2), 1071–1087 (2017)
Article Google Scholar
G. Salton, C. Buckley: Term-weighting approaches in automatic text retrieval, Inf. Process. Manag. 24, 513–523 (1988)
Article Google Scholar
B. Thijs, W. Glänzel, M. Meyer: Improved lexical similarities for hybrid clustering through the use of noun phrases extraction. In: FEB Research Report MSI_1703 (KU Leuven – Faculty of Economics and Business, Leuven 2017)
Google Scholar
K. Spärck Jones: A statistical interpretation of term specificity and its application in retrieval, J. Doc. 28, 11–21 (1972)
Article Google Scholar
T. Hofman: Unsupervised learning by probabilistic latent semantic analysis, Mach. Learn. 42, 177–196 (2001)
Article Google Scholar
R. Koopman, S. Wang, A. Scharnhorst: Contextualization of topics: Browsing through the universe of bibliographic information, Scientometrics 111(2), 1071–1087 (2017)
Article Google Scholar
K. Spärck Jones, S. Walker, S.E. Robertson: A probabilistic model of information retrieval: Development and comparative experiments. Part 1, Inf. Process. Manag. 36, 779–808 (2000)
Article Google Scholar
C.D. Manning, P. Raghavan, H. Schütze: Introduction to Information Retrieval (Cambridge Univ. Press, Cambridge 2008)
Book Google Scholar
D. Ravichandran, P.E. Pantel: Hovy: Randomized algorithms and NLP: Using locality sensitive hash function for high speed noun clustering. In: Proc. 43rd Annu. Meet. Assoc. Comput. Linguist (2005) pp. 622–629
Google Scholar
J. Bichteler, E.A. Eaton: The combined use of bibliographic coupling and co-citation for document retrieval, JASIST 31(4), 278–282 (1980)
Article Google Scholar
R.R. Braam, H.F. Moed, A.F.J. van Raan: Mapping of science by combined co-citation and word analysis, part 1: Structural aspects, JASIST 42(4), 233–251 (1991)
Article Google Scholar
R.R. Braam, H.F. Moed, A.F.J. van Raan: Mapping of science by combined co-citation and word analysis part II: Dynamical aspects, JASIST 42(4), 252–266 (1991)
Article Google Scholar
F. Janssens, P. Glenisson, W. Glänzel, B. De Moor: Co-clustering approaches to integrate lexical and bibliographical information. In: Proc. of the 10th Int. Conf. Int. Soc. Scientometr. Informetr. (ISSI) (Karolinska Univ. Press, Stockholm 2005) pp. 284–289
Google Scholar
R. Albert, A.-L. Barabási: Statistical mechanics of complex networks, Rev. Mod. Phys. 74(1), 47–97 (2002)
Article Google Scholar
F. Janssens, W. Glänzel, B. De Moor: A hybrid mapping of information science, Scientometr. 75(3), 607–631 (2008)
Article Google Scholar
W. Glänzel, B. Thijs: Using ‘core documents' for detecting and labelling new emerging topics, Scientometrics 91(2), 399–416 (2012)
Article Google Scholar
W. Glänzel, B. Thijs: Using ‘core documents' for the representation of clusters and topics, Scientometrics 88(1), 297–309 (2011)
Article Google Scholar
M.E.J. Newman: Modularity and community structure in networks, PNAS 103(23), 8577–8582 (2006)
Article Google Scholar
R.D. Bock, S.Z. Husain: An adaptation of Holzinger's B-coefficients for the analysis of sociometric data, Sociometry 13, 146–153 (1950)
Article Google Scholar
R. Rotta, A. Noack: Multilevel local search algorithms for modularity clustering, J. Exp. Algorithmics (2011), https://doi.org/10.1145/1963190.1970376
Article Google Scholar
C.E. Shannon, W. Weaver: The Mathematical Theory of Communication (Univ. of Illinois Press, Champaign 1949)
Google Scholar
S. Brin, L. Page: The anatomy of a large-scale hypertextual Web search engine, Comput. Netw. ISDN Syst. 30, 107–117 (1998)
Article Google Scholar
L. Bohlin, D. Edler, A. Lancichinetti, M. Rosvall: Community detection and visualization of networks with the map equation framework. In: Measuring Scholarly Impact: Methods and Practice, ed. by Y. Ding, R. Rousseau, D. Wolfram (Springer, Cham 2014)
Google Scholar
M. Rosvall, C.T. Bergstrom: Mapping change in large networks, PLoS ONE 5(1), e8694 (2010)
Article Google Scholar
M.T. Schaub, R. Lambiotte, M. Barahona: Encoding dynamics for multiscale community detection: Markov time sweeping for the map equation, Phys. Rev. E 86, 026112 (2012)
Article Google Scholar
M. Kheirkhahzadeh, A. Lancichinetti, M. Rosvall: Efficient community detection of network flows for varying Markov times and bipartite networks, Phys. Rev. E 93, 032309 (2016)
Article Google Scholar
A.V. Esquivel, M. Rosvall: Compression of flow can reveal overlapping-module organization in networks, Phys. Rev. X 1, 021025 (2011)
Google Scholar
M. De Domenico, A. Lancichinetti, A. Arenas, M. Rosvall: Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems, Phys. Rev. X 5, 011027 (2015)
Google Scholar
V.A. Traag, P. Van Dooren, Y. Nesterov: Narrow scope for resolution-limit-free community detection, Phys. Rev. E 84(1), 016114 (2011)
Article Google Scholar
T. Kawamoto, M. Rosvall: Estimating the resolution limit of the map equation in community detection, Phys. Rev. E 91, 012809 (2015)
Article Google Scholar
A. Lancichinetti, S. Fortunato: Limits of modularity maximization in community detection, Phys. Rev. E 84, 066122 (2011)
Article Google Scholar
P.J. Rouseeuw: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math. 20(1), 53–65 (1987)
Article Google Scholar
N.X. Vinh, J. Epps, J. Bailey: Information theoretic measures for clustering comparison: Is a correction for chance necessary? (PDF). In: ICML '09: Proc. 26th Annu. Int. Conf. Mach. Learn. ACM (2009) pp. 1073–1080
Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: 9th USENIX Symp. Netw. Syst. Design Implement., San Jose (2012)
Google Scholar
L.G. Valiant: A bridging model for parallel computing, Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
B. Thijs, L. Zhang, W. Glänzel: Bibliographic coupling and hierarchical clustering for the validation and improvement of subject-classification schemes, Scientometrics 105(3), 1453–1467 (2015)
Article Google Scholar
K.W. McCain, K. Turner: Citation context analysis and aging patterns of journal articles in molecular genetics, Scientometrics 17(1), 127–163 (1989)
Article Google Scholar
B. Thijs, E. Schiebel, W. Glänzel: Do second-order similarities provide added-value in a hybrid approach?, Scientometrics 96(3), 667–677 (2013)
Article Google Scholar
J.H. Ward Jr.: Hierarchical grouping to optimize an objective function, J. Am. Stat. Assoc. 58, 236–244 (1963)
Article Google Scholar

Download references

Author information

Authors and Affiliations

ECOOM, KU Leuven, Leuven, Belgium
Bart Thijs

Authors

Bart Thijs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bart Thijs .

Editor information

Editors and Affiliations

ECOOM and Faculty of Economics and Business, KU Leuven, Leuven, Belgium
Wolfgang Glänzel
Amsterdam, The Netherlands
Henk F. Moed
Competence Center Policy – Industry – Innovation, Fraunhofer Institute for Systems and Innovation Research ISI, Karlsruhe, Germany
Ulrich Schmoch
Faculty of Science and Engineering, University of Wolverhampton, Wolverhampton, UK
Mike Thelwall

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Thijs, B. (2019). Science Mapping and the Identification of Topics: Theoretical and Methodological Considerations. In: Glänzel, W., Moed, H.F., Schmoch, U., Thelwall, M. (eds) Springer Handbook of Science and Technology Indicators. Springer Handbooks. Springer, Cham. https://doi.org/10.1007/978-3-030-02511-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-02511-3_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02510-6
Online ISBN: 978-3-030-02511-3
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics