Spatiotemporal analysis of tropical disease research combining Europe PMC and affiliation mapping web services
- 341 Downloads
Tropical medicine appeared as a distinct sub-discipline in the late nineteenth century, during a period of rapid European colonial expansion in Africa and Asia. After a dramatic drop after World War II, research on tropical diseases have received more attention and research funding in the twenty-first century.
We used Apache Taverna to integrate Europe PMC and MapAffil web services, containing the spatiotemporal analysis workflow from a list of PubMed queries to a list of publication years and author affiliations geoparsed to latitudes and longitudes. The results could then be visualized in the Quantum Geographic Information System (QGIS).
Our workflows automatically matched 253,277 affiliations to geographical coordinates for the first authors of 379,728 papers on tropical diseases in a single execution. The bibliometric analyses show how research output in tropical diseases follow major historical shifts in the twentieth century and renewed interest in and funding for tropical disease research in the twenty-first century. They show the effects of disease outbreaks, WHO eradication programs, vaccine developments, wars, refugee migrations, and peace treaties.
Literature search and geoparsing web services can be combined in scientific workflows performing a complete spatiotemporal bibliometric analyses of research in tropical medicine. The workflows and datasets are freely available and can be used to reproduce or refine the analyses and test specific hypotheses or look into particular diseases or geographic regions. This work exceeds all previously published bibliometric analyses on tropical diseases in both scale and spatiotemporal range.
Astrophysics Data System
European Nucleotide Archive
Microsoft Academic Graph
Medical Subject Headings
National Institutes of Health
National Library of Medicine
Quantum Geographic Information System (now QGIS)
REpresentational State Transfer
Simple Object Access Protocol (now SOAP)
Visualization Of Similarities
World Health Organization
Extensible Markup Language
Tropical medicine first appeared as a distinct sub-discipline and professional specialization toward the end of the nineteenth century, and the heyday of tropical medicine coincided with European colonialism in Africa and Asia around this time. After the decades following World War II, recent years have seen an increasing attention and significant funding to combat tropical diseases in an increasingly globalized world. In this paper, we attempt to visualize these and other aspects of the history of tropical medicine by spatiotemporal bibliometric analyses.
This is not the first bibliometric venture into the history of research on tropical diseases. In 2006, Falagas et al. published two studies [1, 2] on parasitology and tropical medicine research respectively over the 9-year period 1995–2003 identifying Oceania countries as the most productive when adjusting for both gross national income per capita and population. The authors also noted the number of publications on parasitology from Latin America, the Caribbean, and Asia doubled between 1995 and 2003, but that the production from African countries remained low despite many of the diseases being endemic here. More recently, Ramos et al. published a bibliometric analysis of Chagas disease research 1940–2009  and leishmaniasis research 1945–2010 , identifying Brazil as the most productive country in the first decade of the twenty-first century when looking at the first-author affiliations. Similarly and more recently, Zyoud et al. published a spatiotemporal bibliometric analysis of publications on dengue 1872–2015 , noting both the most productive countries in the field and a considerable increase in dengue-related publication in the last decade. For Sub-Saharan Africa [6, 7, 8] and Latin America , biomedical research, including neglected infectious diseases and the relation between disease burden and clinical trials, has been assessed by bibliometric methods. The field has seen rapid development in recent years, and an updated analysis of research output on tropical diseases is therefore motivated. What are the global and historical trends in tropical medicine research, and how do recent outbreaks, attention, and funding compare in these contexts? What else can be learned from broad, spatiotemopral bibliometric analyses?
Here, we also show how to use scientific workflows and freely available web services for spatiotemporal bibliometric analyses. Scientific workflows integrate specialized software, databases, or services into an overall data flow. They are particularly well suited for multi-step analyses using different types of software tools. The workflows are reusable for similar purposes and make analyses reproducible. Using web services and online databases, the workflows always access the latest information. Technical details on how the literature and geoparsing web services are accessed and the returned data parses are abstracted and tucked away in workflow components, allowing less experienced users to focus on the overall workflow logic and scientific hypothesis. To our knowledge, this is the first time literature and geoparsing web services have been integrated this way. The bibliometric analyses were done in Taverna workflows available on myExperiment .
For a spatiotemporal analysis of the scientific literature, in particular using PubMed and other open resources, it is often necessary to parse the author affiliation information. We performed this geoparsing using MapAffil , a tool specifically developed to parse the author affiliation strings in PubMed. MapAffil correctly identifies cities (or similar localities) and assigns the city-center geocodes to about 98% of affiliations in PubMed. The remaining 2% largely lack place information (e.g., only the name of a multi-location institution is given), while errors and unresolved ambiguities are rare.
Geographical information can be visualized using different software tools, including from within Taverna using the rworldmap [13, 15] or RQGIS  R packages. Here, we used the standalone Quantum Geographic Information System (QGIS)  desktop software version 2.18.0 and directly imported the coordinates from the searchPublications_and_MapAffil workflow in Fig. 2 as a delimited text layer in QGIS and overlaid these on a world map. For co-authorship analysis, we used VOSviewer  version 1.6.5 and projected the collaborative network, using latitude and longitudes from MapAffil, but for all co-author affiliations, onto the same world map. Collaborative clusters were extracted using resolution = 0.3 and minimum size = 100. These parameters determine the sensitivity for separating clusters, and how many nodes are required to form a unique cluster.
Results and discussion
Fewer publications on a particular research topic or disease do not imply neglect. Though not exclusively a tropical disease, smallpox was successfully eradicated in 1980. This is clearly seen in the research output, where a period of higher research output with 309 ± 42 publications per year during the WHO Smallpox Eradication Programme 1966–1980 followed by a period of lower output with 129 ± 19 publications per year between 1981 and 1995. With increasing concerns of bioterrorism in the early 2000s, the number of publications increased dramatically, reaching maximum of 756 publications in 2003. Similar trends can be observed for polio, with an increased research output from 1952, the year the first successful vaccine was developed, reaching a local maximum of 326 publications in 1957, and then falling as the incidence declined rapidly following mass vaccination in developed countries, until reaching a steady level of ~ 150 publications/year from the mid-1960s until the mid-1980s.
The 10 most researched tropical diseases
We here used simple search queries and disabled the synonym lookup options in the Europe PMC web services. This will result in the inclusion of a few unrelated publications; for example, one paper from 1958 , 18 years before the first report, the Ebola hemorrhagic fever, on the geographic distribution of endemic goiter, including the areas watered by the Ebola river. Topic disambiguation is possible using Medical Subject Headings (MeSH). For example, Ramos et al. in their work  looked for the MeSH terms “Leishmania” or “leishmaniasis.” Using MeSH may also bridge publications that exclusively refer to a disease by an alternate name, such as leprosy as Hansen’s disease or schistosomiasis as bilharziasis or Katayama fever, though care should be taken that all synonyms are specific and that searches for all diseases are expanded to a similar “depth.” Text-mining methods can also be used to disambiguate topics de novo but will only be usefully accurate for full-text articles. Regardless of query specification, some relevant articles will always be missed, and some less relevant included, in large datasets.
This paper illustrates how literature search and geoparsing web services can be combined in scientific workflows for reproducible, shareable, and reusable spatiotemporal bibliometric analyses. We have demonstrated this using research on 10 tropical diseases, as these exhibit characteristic and interpretable spatiotemporal patterns. Other resources that could, in principle, be combined in similar workflows include, for example, genomic, molecular, and epidemiological data, though geographical mapping of disease is a challenging but rapidly progressing field in itself [26, 27, 28]. The European Nucleotide Archive, ENA, and UniProt are extensively linked with publication Europe PMC. These database links can also be traversed using the searchPublications and getDatabaseLinks web services from Europe PMC and RESTful web services from UniProt.
Research output on tropical diseases has some correlation with disease burden, in particular when comparing countries of similar resources and research output. Shared colonial history and language are also important factors. The Ebola example suggests the research community now reacts faster and more strongly than the past decades upon outbreaks of diseases in Sub-Saharan Africa.
All work was performed on open data using freely available tools, including Taverna Workbench, Europe PMC, MapAffil web services, and QGIS. The two workflows are available from myExperiment for anyone who wishes to repeat or modify our analyses, without the need to download any bibliographic databases. The workflows and results are also available on the Open Science Framework (osf.io/dtkep/).
The authors would like to thank Prof. André M. Deelder for many helpful suggestions and careful reading of the manuscript and Dr. Cathelijn J. F. Waaijer for additional recommendations.
VIT was funded by the US NIH P01AG039347.
Availability of data and materials
The two workflows used and the data produced and analyzed in this work are available on Open Science Framework (https://osf.io/dtkep/) and the workflows on myExperiment (https://www.myexperiment.org/workflows/4980.html and https://www.myexperiment.org/workflows/4981.html).
MP conceived the study, built the workflows, performed the analyses, and wrote the manuscript. VIT adapted the MapAffil geoparser for this work, assisted in the analyses, and co-wrote the manuscript. Both authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 8.Breugelmans JG, Makanga MM, Cardoso AL, Mathewson SB, Sheridan-Jones BR, Gurney KA, Mgone CS. Bibliometric assessment of European and Sub-Saharan African research output on poverty-related and neglected infectious diseases from 2003 to 2011. PLoS Negl Trop Dis. 2015;9:e0003997.CrossRefPubMedPubMedCentralGoogle Scholar
- 11.Wolstencroft K, Haines R, Fellows D, Williams A, Withers D, Owen S, Soiland-Reyes S, Dunlop I, Nenadic A, Fisher P, et al. The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 2013;41:W557–61.CrossRefPubMedPubMedCentralGoogle Scholar
- 12.Torvik VI: MapAffil: a bibliographic tool for mapping author affiliation strings to cities and their geocodes worldwide. Dlib Mag. 2015;21:11-2. doi: 10.1045/november2015-torvik.
- 15.South A. rworldmap: a new R package for mapping global data. The R Journal. 2011;3:35–43.Google Scholar
- 16.Muenchow J, Schratz P. Integrating R with QGIS. 2017.Google Scholar
- 17.Quantum GIS Development Team. Quantum GIS Geographic Information System. Open Source Geospatial Foundation Project. 2017. http://qgis.osgeo.org.
- 19.Agarwal S, Lincoln M, Cai H, Torvik VI. Patci––a tool for identifying scientific articles cited by patents. GSLIS Research Showcase; 2014.Google Scholar
- 21.Programme W-GL. Global leprosy strategy 2016-2020: accelerating towards a leprosy-free world. New Dehli: WHO; 2016.Google Scholar
- 22.Shope R, Baker RH, Buck A, Heyneman D, Krogstad DJ, Western KA, Hornbeak H. Epidemiology and control of vector-borne diseases in Egypt and Israel. In Report by external scientific committee on NIAID research contracts NOI-Al-22667/8. Bethesda: NIAID; 1985.Google Scholar
- 23.Safonova M, Sokolov M. The construction of the academic world-system: regression and social network approaches to analysis of international academic ties. In: Gorraiz J, Schiebel E, Gumpenberger C, editors. 14th International Society of Scientometrics and Informetrics Conference Vienna, Austria. Hörlesberger M: Moed H. AIT Austrian Institute of Technology GmbH; 2013. p. 389–403.Google Scholar
- 24.Gonzalez-Alcaide G, Park J, Huamani C, Ramos JM. Dominance and leadership in research activities: collaboration between countries of differing human development is reflected through authorship order and designation as corresponding authors in scientific publications. PLoS One. 2017;12:e0182513.CrossRefPubMedPubMedCentralGoogle Scholar
- 26.Hay SI, Battle KE, Pigott DM, Smith DL, Moyes CL, Bhatt S, Brownstein JS, Collier N, Myers MF, George DB, Gething PW. Global mapping of infectious disease. Philosophical Transactions of the Royal Society B-Biological Sciences. 2013;368Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.