Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation

Zitt, Michel

doi:10.1007/s11192-014-1482-5

Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation

Published: 07 December 2014

Volume 102, pages 2223–2245, (2015)
Cite this article

Scientometrics Aims and scope Submit manuscript

Michel Zitt¹

1040 Accesses
18 Citations
1 Altmetric
Explore all metrics

Abstract

In this position paper, we comment on various approaches to the delineation of scientific fields or domains, a typical prerequisite for a wide class of bibliometric studies. There is growing evidence that this meso-level, between micro targets of typical IR and large disciplines handled by macro-level bibliometric studies, takes full advantage of hybrid approaches. Firstly, delineation tasks gain to combine the a priori thinking of traditional IR, which typically involves clearly targeted expectations, and the a posteriori thinking of bibliometric mapping, where the decisions are built on external structuring of the domain in a wider context. The combination of the two ways of thought is far from new, with IR increasingly building on bibliometric networks for query expansion, and bibliometrics building on IR for evaluating and refining its outcomes. Secondly, delineation benefits from the multi-network perspective, which gives different representations of the scientific topics, usually all the more converging than the objects are dense and well separated. Focusing on two basic networks—words and citations—various sequences or combinations of operations are discussed. Bibliometrics and IR, especially when properly combined in multi-network approaches, provide an efficient toolbox for studies of domains delimitation. It should be recalled however that the context of such studies is often loaded with policy stakes that ask for cautious supervision and consultation processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to design bibliometric research: an overview and a framework proposal

Article Open access 06 March 2024

Factors affecting number of citations: a comprehensive review of the literature

Article 15 February 2016

A tale of two databases: the use of Web of Science and Scopus in academic papers

Article 22 February 2020

Notes

Amongst Thomson-Reuters nomenclatures, the "subject categories" of SCI classification allow for overlaps mostly in terms of journals or journals sections.
For a typology of IR models and the perspective of the "cognitive actor", see Ingwersen and Järvelin (2005).
Assume articles B and A share the theoretical background and C and A share the domain of application. In bibliographic coupling articles B and C may both be attracted by A on quite different semantic aspects, while without epistemic relation. The argument is already found in Martyn (1964). Even mitigated by statistical aggregation, it expresses the cost associated to the statistical efficiency of bibliometric clustering. The use of hard-clustering, simple and fast, worsens this limitation. An overlapping technique might classify A, once with B, once with C. IR scholars warned against the holistic character of several mapping techniques, source of noise including for query expansion purposes.
we are indebted to an anonymous referee for stressing this point.
Even Latourian citations or negative citations do not add much noise to co-citation topics.
High-precision is expected from a strategy focused on strong forms—heavy intersections—with check of the specific words/references of the native clusters c and w; high-recall is expected from a strategy based on the full content of c and w clusters with strong overlaps. Intermediary strategies can focus on intuitive groupings of areas along the diagonal sequence.
for example by ruling out papers without a given number or proportion of specific references.

References

Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD, 207.
Ahlgren, P., & Colliander, C. (2009). Document-document similarity approaches and science mapping: Experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63.
Article Google Scholar
Archambault E., Beauchesne O. H., & Caruso J. (2011) Towards a multilingual, comprehensive and open scientific journal ontology, in Proceedings 13th ISSI Conference, Durban, South Africa.
Barabasi, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Article MathSciNet Google Scholar
Bassecoulard, E., & Zitt, M. (1999). Indicators in a research institute: A multi-level classification of scientific journals. Scientometrics, 44(3), 23–345.
Article Google Scholar
Benzecri, J. P. (1973) La place de l’a priori, Encyclopedia Universalis, 17, Organum, 11–24.
Benzecri, J. P., et al. (1981). Pratique de l’analyse des données : Linguistique et lexicologie. Paris: Dunod.
MATH Google Scholar
Bergstrom, C. (2007). Eigenfactor: Measuring the value and prestige of scholarly journals. College & Research Libraries News, 68(5). www.ala.org/ala/acrl/acrlpubs/crlnews/backissues2007/may2007/eigenfactor.cfm.
Blair, D. C. (2003). Information retrieval and the philosophy of language. Annual Review of Information Science and Technology, 37, 3–50.
Article Google Scholar
Blondel V. D., Guillaume J. L., Lambiotte R., & Lefebvre E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), 10008.
Börner, K., Chen, C. M., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37, 179–255.
Article Google Scholar
Börner, K., Glänzel, W., Scharnhorst, A., & van den Besselaar, P. (2011). Modeling science: studying the structure and dynamics of science. Scientometrics, 89, 347–348.
Article Google Scholar
Bornmann, L., & Daniels, H. D. (2008). What do citation counts measure? A review of studies on citation behavior. Journal of Documentation, 64(1), 45–80.
Boyack, K. W., Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? JASIST, 61(12), 2389–2404.
Boyack, K., & Klavans, R. (2013). Creation of a highly detailed, dynamic, global model and map of science, forthcoming in JASIST. doi:10.1002/asi.22990.
Boyack, K., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. JASIST, 64(9), 1759–1767.
Article Google Scholar
Braam, R. R., Moed, H. F., & Van Raan, A. F. J. (1991). Mapping of science by combined co-citation and word analysis. I Structural aspects. JASIS, 42(4), 233–251.
Article Google Scholar
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and Isdn Systems, 30(1–7), 107–117.
Article Google Scholar
Cadot M., & Lelu, A. (2011). Combining Explicitness and Classifying Performance via MIDOVA Lossless Representation for Qualitative Datasets. International Journal on Advances in Software, 5(1–2), 1–16.
Callahan, A., Hockema, S., & Eysenbach, G. (2010). Contextual co-citation: Augmenting co-citation analysis and its applications. JASIST, 61(6), 1130–1143.
Google Scholar
Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks: An introduction to co-word analysis. Social Science Information, 22(2), 191–235.
Article Google Scholar
Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics, 22(1), 155–205.
Article Google Scholar
Carayol, N., & Roux, P. (2009). Knowledge flows and the geography of networks: A strategic model of small world formation. Journal of Economic Behavior & Organization, 71(2), 414–427.
Article Google Scholar
Carpineto, G., & Romano, C. (2012). A survey of automatic query expansion in information retrieval. ACM-CSUR, 44(1), 1.
Article Google Scholar
Chavalarias, D., & Cointet, J. P. (2013). Phylomemetic patterns in science evolution—The rise and fall of scientific fields. PLoS ONE, 8(2), e54847.
Article Google Scholar
Chen, C. M. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. JASIS, 57(3), 359–377.
Article Google Scholar
Chen, C. M., Ibekwe-Sanjuan, F., & Hou, J. (2010). The structure and dynamics of co-citation clusters: A multiple-perspective co-citation analysis. JASIST, 61(7), 1386–1409.
Article Google Scholar
Cronin, B. (1984). The citation process; The role and significance of citations in scientific communication (p. 103). London: Taylor Graham.
Google Scholar
de Beaver, D., & Rosen, R. (1979). Studies in scientific collaboration. Part II. Scientific co-authorship, resarch productivity and visibility in the French Scientific Elite, 1799–1830. Scientometrics, 1(2), 133–149.
Article Google Scholar
Deerwester, S., Dumai, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. JASIST, 41(6), 391–407.
Article Google Scholar
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? JASIST, 59(1), 51–62.
Article Google Scholar
Eom, Y. H., & Fortunato, S. (2011). Characterizing and modeling citation dynamics. PLoS ONE, 6(9), e24926. doi:10.1371/journal.pone.0024926.
Article Google Scholar
Garfield, E. (1967). Primordial concepts, citation indexing and historio-bibliography. Journal Library History, 2, 235–249.
Google Scholar
Garfield, E., & Sher, I. H. (1993). Keywords-Plus(Tm) -Algorithmic derivative indexing. JASIST, 44(5), 298–299.
Article Google Scholar
Garfield, E., Pudovkin, A. I., & Istomin, V. S. (2003). Why do we need algorithmic historiography? JASIST, 54(5), 400–412.
Article Google Scholar
Gilbert, G. N. (1977). Referencing as persuasion. Studies of Science, 7, 113–122.
Gilbert, N. (1997). A simulation of the structure of academic science. Sociological Research Online, 2(2), 3. http://www.socresonline.org.uk/socresonline/2/2/3.html.
Google Scholar
Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level. Scientometrics, 37(2), 195–221.
Article Google Scholar
Glänzel, W., & Schubert, A. (2003). A new classification of the science fields and subfields designed for scientometric evaluation purposes. Scientometrics, 56(3), 357–367.
Article Google Scholar
Gläser, J., Lange, S., Laudel, G., & Schimank, U. (2010). The Limits of Universality: How field-specific epistemic conditions affect authority relations and their consequences. In R. Whitley, J. Gläser, & L. Engwall (Eds.), Reconfiguring knowledge production: Changing authority relationships in the sciences and their consequences for intellectual innovation (pp. 291–324). Oxford: Oxford University Press.
Chapter Google Scholar
Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory. Journal of Documentation, 57(6), 715–740.
Article Google Scholar
Ingwersen, P., & Järvelin, K. (2005). The turn: Integration of inversion seeking and retrieval in context (p. 436). Berlin: Springer.
Google Scholar
Janssens, F., Glanzel, W., & De Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631.
Article Google Scholar
Jardine, N., & van Rijsbergen, C. J. (1971). The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7, 217–240.
Article Google Scholar
Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14, 10–25.
Article Google Scholar
Kostoff, R. N., delRio, J. A., Humenik, J. A., Garcia, E. O., & Ramirez, A. M. (2001). Citation mining: Integrating text mining and bibliometrics for research user profiling. JASIST, 52(13), 1148–1156.
Article Google Scholar
Larivière, V., Archambault, E., & Gingras, Y. (2008). Long-term variations in the aging of scientific literature: from exponential growth to steady-state science (1900–2004). JASIST, 59(2), 288–296.
Larsen, B. (2002). Exploiting citation overlaps for information retrieval: Generating a boomerang effect from the network of scientific papers. Scientometrics, 54(2), 155–178.
Article Google Scholar
Latour, B. (1987). Science in action: How to follow Scientists and Engineers through society. Cambridge: Harvard University Press.
Google Scholar
Laurens, P., Zitt, M., & Bassecoulard, E. (2010). Delineation of the genomics field by hybrid citation-lexical methods: Interaction with experts and validation process. Scientometrics, 82(3), 647–662.
Article Google Scholar
Lelu, A. (1994). Clusters and factors: Neural algorithms for a novel representation of huge and highly multidimensional data sets. In E. Diday & Y. Lechevallier (Eds.), New approaches in classification and data analysis (pp. 241–248). Berlin: Springer.
Chapter Google Scholar
Leydesdorff, L., & Cozzens, S. E. (1993). The delineation of specialties in terms of journals using the dynamic journal set of the science citation Index. Scientometrics, 26, 133–154.
Article Google Scholar
Liu, S., & Chen, C. M. (2013). The differences between latent topics in abstracts and citation contexts of citing papers. JASIST, 64(3), 627–639.
Article Google Scholar
Liu, X., Yu, S., Janssens, F., Glänzel, W., Moreau, Y., & De Moor, B. (2010). Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. JASIST, 61(6), 1105–1119.
Google Scholar
Marshakova, I. V. (1973). Document coupling system based on references taken from science citation Index (in Russian). Nauchno-TeknicheskayaInformatsiya, Ser. 2 6.3.
Martyn, J. (1964). Bibliographic coupling. Journal of Documentation, 20(4), 236.
Article Google Scholar
Mc Cain, K. W. (1983). The author co-citation structure of macroeconomics. Scientometrics, 5(5), 277–289.
Article Google Scholar
McCain, K.W. (1989). Descriptor and citation retrieval in the medical behavioral sciences literature: Retrieval over-laps and novelty distribution. JASIS, 40(2), 110–114.
Morris, S. A., Yen, G., Wu, Z., & Asnake, B. (2003). Time line visualization of research fronts. JASIST, 54(5), 413–422.
Article Google Scholar
Mullins, N. C., Hargens, L. L., Hecht, P. K., & Kick, E. L. (1977). The group structure of co-citation clusters: A comparative study. American Sociological Review, 42, 552–562.
Article Google Scholar
Mutschke, P., & Quan-Haase, A. (2001). Collaboration and cognitive structures in social science research fields: Towards socio-cognitive analysis in information systems. Scientometrics, 52(3), 487–502.
Article Google Scholar
Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-added services for scholarly information systems. Scientometrics, 89, 349–364.
Article Google Scholar
Narin, F., Pinski, G., & Gee, H. H. (1976). Structure of the biomedical literature. Journal of the American Society for Information Science, 27(1), 25–45.
Article Google Scholar
Narin, F., & Noma, E. (1985). Is technology becoming science? Scientometrics, 7(3), 369–381.
Article Google Scholar
Noyons, E. C. M. (1999). Bibliometric mapping as a science policy and research management tool. Leiden: Leiden University DSWO Press.
Google Scholar
Palacios-Huerta, I., & Volij, O. (2004). The measurement of intellectual influence. Econometrica, 72(3), 963–977.
Article Google Scholar
Pao, M. L. (1993). Term and citation retrieval -a field-study. Information Processing and Management, 29(1), 95–112.
Article Google Scholar
Papadimitriou, C., Raghavan, P., Tamaki H. & Vempala S. (1998). Latent semantic indexing: A probabilistic analysis, PODS Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART symposium on principles of databases systems. 159–168.
Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management, 12, 297–312.
Article Google Scholar
Polanco, X., Grivel, L. & Royauté, J. (1995). How to do things with terms in informetrics : Terminological variation and stabilization as science watch indicators. In M. Koenig (Ed.), Proceedings of the 5th ISSI Intl Conference (River Forest IL, June 7-10, 1995) 435–444: Learned Information, Medford NJ.
Price, D. J. de Solla. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Price, D. J. de Solla. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27(5), 292–306.
Rafols, I., Porter, A. L., & Leydesdorff, L. (2010). Science overlay maps: A new tool for research policy and library management. JASIS, 61(9), 1871–1887.
Article Google Scholar
Ritchie A., Robertson S. & Teufel S. (2008) Comparing citation context for information retrieval, CIKM’08, Proceedings 17th ACM Conference on Information and knowledge management 213–222.
Rocchio, J. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The smart retrieval system: Experiments in automatic document processing (pp. 313–323). Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Ross, N. C. M., & Wolfram, D. (2000). End user searching on the Internet: An analysis of term pair topics submitted to the Excite search engine. JASIST, 51(10), 949–958.
Article Google Scholar
Rosvall, M., & Bergstrom, C. (2008). Maps of information flows reveal structures in complex networks. PNAS, 105, 1118.
Article Google Scholar
Roth, C., & Cointet, J. P. (2010). Social and semantic coevolution in Knowledge. Social Networks, 32(1), 16–29.
Article Google Scholar
Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. JASIST, 41(4), 288–297.
Article Google Scholar
Scharnhorst, A., Börner, K., & van den Besselaar, P. (Eds.). (2012). Models of science dynamics: Encounters between complexity theory and information sciences (Understanding Complex Systems). Berlin: Springer.
Google Scholar
Small, H. (1973). Co-citation in the scientific literature : A new measure of the relationship between two documents. JASIS, 24(4), 265–269.
Article Google Scholar
Small, H. (1980). Co-citation context analysis and the structure of paradigms. Journal of Documentation, 36(3), 183–196.
Article MathSciNet Google Scholar
Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87(2), 373–388.
Article Google Scholar
Teufel S., Siddharthan A. & Tidhar D. (2006) Automatic classification of citation function, Proceedings EMNLP ‘06 Proceedings 2006 Conference on Empirical Methods in Natural Language Processing.
van den Besselaar, P., & Heimeriks, G. (2006). Mapping research topics using word-reference co-occurrences: A method and an exploratory case study. Scientometrics, 68(3), 377–393.
Article Google Scholar
Waltmann, L., & van Eck, N. (2012). A new methodology for constructing a publication-level classification system of science. JASIS, 63(12), 2378–2392.
Article Google Scholar
Watts, C., & Gilbert, N. (2011). Does cumulative advantage affect collective learning in science? An agent-based simulation, Scientometrics, 89(1), 437–463.
Google Scholar
White, H. D., & Griffith, B. C. (1981). Author co-citation: A literature measure of intellectual structure. JASIS, 32(3), 163–172.
Article Google Scholar
Zitt, M., & Bassecoulard, E. (1996). Reassessment of co-citation methods for science indicators: Effect of methods improving recall rates. Scientometrics, 37(2), 223–244.
Article Google Scholar
Zitt, M., & Bassecoulard, E. (2006). Delineating complex scientific fields by an hybrid lexical-citation method: An application to nanosciences. Information Processing and Management, 42(6), 1513–1531.
Article Google Scholar
Zitt, M., Ramanana-Rahary, S., & Bassecoulard, E. (2005). Relativity of citation performance and excellence measures: From cross-field to cross-scale effects of field-normalisation. Scientometrics, 63(2), 373–401.
Article Google Scholar
Zitt, M., Lelu, A., & Bassecoulard, E. (2011). Hybrid citation-word representations in science mapping: Portolan charts of research fields? JASIST, 62(1), 19–39. doi:10.1002/asi.21440.
Article Google Scholar
Zitt M., & Small, H. (2008). Modifying the journal impact factor by fractional citation weighting: The audience factor. JASIST, 59(11), 1856–1860.

Download references

Acknowledgments

The author thanks Alain Lelu, Université de Franche-Comté and Loria, Nancy, Elise Bassecoulard, formerly Inra-Lereco, and anonymous referees, for helpful remarks; Patricia Laurens and Antoine Schoen, ESIEE, Marne la Vallée, for permission to use the genomics map, from our previous co-work.

Author information

Authors and Affiliations

Lereco U1134, SAE2 Department, INRA, Rue de la Géraudière, BP71627, 44316, Nantes Cedex 03, France
Michel Zitt

Authors

Michel Zitt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michel Zitt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zitt, M. Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation. Scientometrics 102, 2223–2245 (2015). https://doi.org/10.1007/s11192-014-1482-5

Download citation

Received: 23 October 2014
Published: 07 December 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11192-014-1482-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

Factors affecting number of citations: a comprehensive review of the literature

A tale of two databases: the use of Web of Science and Scopus in academic papers

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

Factors affecting number of citations: a comprehensive review of the literature

A tale of two databases: the use of Web of Science and Scopus in academic papers

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation