Encyclopedia of Social Network Analysis and Mining

Living Edition
| Editors: Reda Alhajj, Jon Rokne

Plug-and-Play Macroscopes: Network Workbench (NWB), Science of Science Tool (Sci2), and Epidemiology Tool (EpiC)

  • Katy BörnerEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7163-9_306-1

Synonyms

Glossary

Macroscope

From the Greek macros, or “great,” and skopein, or “to observe,” inspired by de Rosnay’s (1975) futurist science writings on macroscope tools

Data Mining

Extracting implicit information using sophisticated data analysis capabilities and statistical algorithms to discover new patterns and correlations in large datasets

Information Visualization

Process of transforming data and information that are not inherently spatial, into a visual form allowing the user to observe and understand the information (source: Gershon and Eick, First Symposium on Information Visualization)

Definition

The article features three exemplary tools that use the very same core architecture, i.e., algorithm plugins can be exchanged among the tools.

Network Workbench Tool (NWB)

  • Creation year: 2006.

  • Authors: Primary investigators are Katy Börner, Albert-László Barabási, Santiago Schnell, Alessandro Vespignani, Stanley Wasserman, and Eric A. Wernert. Developers are Weixia (Bonnie) Huang, Russell J. Duhon, Micah W. Linnemeier, Patrick Phillips, Chintan Tank, Joseph Biberstine, Timothy Kelley, Duygu Balcan, Mariano Beiró, Bruce W. Herr II, Santo Fortunato, Ben Markines, Felix Terkhorn, Heng Zhang, Megha Ramawat, César A. Hidalgo, Ramya Sabbineni, and Vivek Thakre. Tutorials were developed by Ann McCranie, Alessandro Vespignani, and Katy Börner.

  • General tool.

  • Copyright: free and open, Apache 2.0 license.

  • Type: program.

  • Size limits/scalability/time: Some analysis algorithms scale extremely well. Interactive visualizations scale less well.

  • Platforms: All major operating systems are supported.

  • Programming language: Core architecture is written in Java. Algorithm plugins are written in Java, Python, C, C++, and Fortran.

  • Orientation: Interdisciplinary use by more than 100,000 users.

  • Web site: http://nwb.cns.iu.edu.

  • Cite as: NWB Team. (2006). Network Workbench Tool. Indiana University, Northeastern University, and University of Michigan, http://nwb.cns.iu.edu.

Science of Science Studies Tool (Sci2)

  • Creation year: 2009.

  • Authors: Primary investigators are Katy Börner, Indiana University, and Kevin W. Boyack, SciTech Strategies Inc. Developers are Micah W. Linnemeier, Patrick A. Phillips, Chintan Tank, Joseph Biberstine, Chin Hua Kong, Russell J. Duhon, Thomas G. Smith, and David M. Coe. Many algorithm plugins were derived from the Network Workbench Tool (http://nwb.cns.iu.edu).

  • General tool.

  • Copyright: free and open, Apache 2.0 license.

  • Type: program and Web service.

  • Size limits/scalability/time: Some analysis algorithms scale extremely well. Most visualizations are rendered into PostScript files and scale very well.

  • Platforms: All major operating systems are supported.

  • Programming language: Core architecture is written in Java. Algorithm plugins are written in Java, Python, C, C++, and Fortran.

  • Orientation: science of science studies by researchers and science policy makers in more than 75 countries. Adopted by the National Science Foundation, the National Institutes of Health, the USA Department of Agriculture, and the National Oceanic and Atmospheric Administration in the US and the Biotechnology and Biological Sciences Research Council in the UK.

  • Web site: http://sci2.cns.iu.edu.

  • Cite as: Sci2 Team. (2009). Science of Science (Sci2) Tool. Indiana University and SciTech Strategies, http://sci2.cns.iu.edu.

Epidemiology Tool (EpiC)

  • Creation year: 2009.

  • Authors: Primary investigators are Katy Börner, Alessandro Vespignani, and Jim Sherman. Developers are Micah W. Linnemeier, Patrick Phillips, Chintan Tank, Joseph Biberstine, Chin Hua Kong, David M. Coe, and Russell J. Duhon.

  • General tool.

  • Copyright: free and open, Apache 2.0 license. Type: program.

  • Size limits/scalability/time: Some analysis algorithms scale extremely well. Interactive visualizations scale less well.

  • Platforms: All major operating systems are supported.

  • Programming language: Core architecture is written in Java. Algorithm plugins are written in Java, Python, C, and C++. There exists a bridge to R.

  • Orientation: Epidemiologists Web site: http://epic.cns.iu.edu.

  • Cite as: EpiC Team. (2009). EpiC Tool. Indiana University, http://epic.cns.iu.edu.

Introduction

Decision making in science, industry, and politics, as well as in daily life, requires that we make sense of datasets representing the structure and dynamics of complex systems. Macroscopes provide a “vision of the whole,” helping us “synthesize” the related elements and enabling us to detect patterns, trends, and outliers, while granting access to myriad details. Rather than make things larger or smaller, macroscopes let us observe what is at once too great, slow, or complex for the human eye and mind to notice and comprehend.

While microscopes and telescopes are physical instruments, macroscopes resemble continuously changing bundles of software plugins. Macroscopes make it easy to select and combine algorithm and tool plugins but also interface plugins, workflow support, logging, scheduling, and other plugins needed for scientifically rigorous yet effective work.

They make it easy to share plugins via email, flash drives, or online. To use new plugins, simply copy the files into the plugin directory, and they appear in the tool menu ready for use. Sharing algorithm components, tools, or novel interfaces becomes as easy as sharing images on Flickr or videos on YouTube. Assembling custom tools is as quick as compiling your custom music collection.

Key Points

Today, most scientific communities develop their own custom programming libraries, tools, and online services. Commonly, data formats, core system architectures, and workflow design strategies are incompatible among tools making it hard to use algorithms and tools across disciplinary boundaries; see Fig. 1.
Fig. 1

Algorithms, tools, and services developed by different scientific communities are often incompatible

Börner’s team has been working on plug-and-play cyberinfrastructures since 2000, when they started to design the Information Visualization Cyberinfrastructure (IVC) (http://iv.cns.iu.edu). Her team developed the Cyberinfrastructure Shell (CIShell) (http://cishell.org) that builds upon Open Services Gateway Initiative (OSGi) (http://www.osgi.org), a standardized, component-oriented, computing environment for networked services widely used in industry since more than 10 years.

CIShell is an open source software specification for the integration and utilization of datasets, algorithms, tools, and computing resources (Herr II et al. 2008). It supports (1) algorithm writers, writing and disseminating their algorithms in their favorite programming language while retaining intellectual rights after distribution; (2) data holders, easily disseminating their data for use by others; (3) application writers, designing applications from custom sets of algorithms and datasets that interoperate seamlessly; and finally (4) researchers, educators, and practitioners, using existing datasets and algorithms to further science. At its heart, the CIShell specification defines how datasets and algorithms can be integrated into a common pool, so that they can be accessed generically from tools written to take advantage of the entities contained in the pool. To get an algorithm (written in any programming language) into the pool, the user will run through a wizard-driven template that will get all the information (including, among other things, what citation users should utilize when citing the use of the algorithm) needed to package their contribution. The resulting packaged file can then be plugged directly into any tool based on CIShell and used immediately. That is, CIShell provides “sockets” into which existing and new datasets, algorithms, and tools can be plugged making it possible to share plugins across disciplinary boundaries and to easily compile custom tools; see Fig. 2.
Fig. 2

OSGi/CIShell core architecture makes it possible to share plugins across disciplinary boundaries and to easily compile custom tools

Today, CIShell/OSGi are at the core of several so-called plug-and-play macroscopes (Börner 2011) that serve different research communities.

Historical Background

NWB

The Network Workbench (NWB) is a scalable toolkit for network analysis, modeling, and visualization used by more than 100,000 researchers, educators, and practitioners interested in the study of biomedical, social and behavioral science, physics, and other networks. Built using OSGi/CIShell at its core, it provides a one-stop portal for the exchanges of relevant algorithms and tools but also tutorials across scientific boundaries. Users of the NWB have online access to major network datasets or can upload their own networks. They are able to perform network analysis with some of the most effective algorithms available. In addition, they are able to generate, run, and validate network models to advance their understanding of the structure and dynamics of particular networks. NWB provides advanced visualization tools to interactively explore and understand specific networks, as well as their interaction with other types of networks.

A major computer science challenge was the development of an algorithm integration framework that supports the easy integration and dissemination of existing and new algorithms and can deal with the multitude of network data formats in existence today. Another challenge was the design and implementation of an easy to use menu-based, online portal interface for interactive algorithm selection, data manipulation, and user and session management. The OSGi/CIShell core architecture and the easy graphical user interface aim to address these challenges and have been evaluated in diverse research projects and educational settings in biology, social and behavioral science, and physics research. In addition, all NWB plugins and data formats have been well documented and are available as open source code for easy duplication and re-usage, supporting a direct transfer of knowledge and results from the fields of specialist network research to a wider scientific community. This is expected to enhance and encourage the empirical analysis and model validation of networks, generating an eventual acceleration in the development of network science research.

Sci2

Recent progress in data analysis and visualization and the mapping of science (Börner 2010; Börner et al. 2003; Shiffrin and Börner 2004) make it possible to study and communicate the structure of science in a dynamic and interactive fashion. Examples are maps that show the structure of papers, proteins, and genes relevant for melanoma research (Boyack et al. 2004), bursts of activity in biomedical research (Mane and Börner 2004), or visualizations of job market data harvested from RSS feeds (Zoss et al. 2010). The international Mapping Science exhibit (http://scimaps.org) provides more examples of the state of the art in analyzing and mapping science.

The Science of Science (Sci2) Tool supports the temporal, geospatial, topical, and network analysis and visualization of datasets at the micro (individual), meso (local), and macro (global) levels. Users of the tool can access datasets online or load their own, extract homogeneous and heterogeneous networks from tabular data, perform different types of analysis with some of the most effective algorithms available, and use different visualizations to interactively explore and understand specific datasets. More than 170 different algorithms are now available in the Sci2 Tool and can be used to quickly identify temporal activity patterns, geospatial bursts of activity or properties, and structures of unweighted and weighted, undirected and directed networks. Specifically, the Sci2 Tool is used to calculate a wide range of network measures, such as the number of collaborators (e.g., coauthors, coinvestigator) and their productivity (e.g., number of papers and citations). In addition, it can be applied to answer questions such as the following: Do established fields have a higher author age while new emerging areas have a younger author population? How does the age distribution of these fields, their productivity, and impact, as well as the inflows and outflows of authors and ideas (measured via paper citations across areas) change over time? The Sci2 Tool is open source and workflows are documented at a level of detail that supports rapid replication of results (Börner and David 2014).

EpiC

The EpiC Tool aims to support the modeling of epidemic processes of different kinds and at various levels, from the scale of the single individual to the global scale passing through community/city/country scales. A variety of different initial approaches were developed and introduced, from the most basic models to describe social contagion processes to multiscale approaches that explicitly incorporate several sources of complexity. In order to make this type of research accessible to nonexperts, two major key features need to be addressed in the development of the algorithms: (i) the flexibility of the simulation algorithms and (ii) the ease of interface with the appropriate datasets. Currently, the EpiC Tool provides easy access to contact network models/compartmental approaches based on simple population structure, real-time visualization of temporal evolution of population numbers, and an R-bridge for statistical analysis of results.

Tool Functionality

The three tools discussed here serve the needs of rather different scientific communities. However, all three have a graphical user interface similar to the NWB tool interface shown in Fig. 3: The menu structure is arranged such that a workflow runs from left to right. The File menu on the left allows a user to load data in a number of formats, which can then be prepared, preprocessed, analyzed, and finally visualized. Users also have the option of modeling new networks or finding help online. The Console window documents all operations performed on the data. The Data Manager window lists all loaded and derived data files; icons are used to indicate file type, e.g., table, tree, graph, or database. The Scheduler indicates status of algorithm runs.
Fig. 3

Graphical user interface of the NWB tool with different windows

The data readers and writers from the 3 tools can be interchanged totaling support for 30 different data input formats and 35 different output formats, including diverse image file formats; algorithm and tool plugins can be interchanged as well and total more than 180 data preparation, preprocessing, analysis, modeling, and visualization plugins. A number of tools were made available as plugins: the GnuPlot, a command-line-driven graphing utility (http://www.gnuplot.info), the GUESS (Adar 2006), and Cytoscope (Cytoscape Consortium 2008; Shannon et al. 2003) visualization tools.

In addition, there are well-documented bridges to the R statistical package (http://www.r-project.org/) and the Gephi visualization tool (Bastian et al. 2009; Gephi 2012). Log files exist for workflows and error logs. Extensive online documentation and information on e-mail lists can be accessed via http://cishell.org. The Sci2 Tool offers an “Ask an Expert” feature at https://sci2.cns.iu.edu/user/ask.php that streamlines online help.

Key Applications

NWB

The Network Workbench Tool has been used in Computational Social Science to study large-scale social networks such as Wikipedia (Holloway et al. 2007); see Fig. 4a or the intercitation patterns in 113 years of Physical Review data (Börner et al. 2006; Herr II et al. 2008); see Fig. 4b. Computational Scientometrics to study science by scientific means (Boyack et al. 2007; Shiffrin and Börner 2004); see also Sci2 Tool. Computational Economics to understand if the type of product that a country exports matters for subsequent economic performance (Hausmann et al. 2007; Hidalgo et al. 2007); see Fig. 4c.
Fig. 4

NWB visualizations of large-scale social networks such as Wikipedia (a), intercitation patterns in 113 years of Physical Review data (b), and product coexport networks (c)

Computational Proteomics to answer questions such as “What relationships exist between protein targets of all drugs and all disease-gene products in the human protein–protein interaction network?” (Yildirim et al. 2007) or “Is the intrinsic disorder of proteins the cause of the scalefree architecture of protein–protein interaction networks?” (Schnell et al. 2007).

Sci2

The Science of Science Tool is used by researchers and science policy makers in more than 75 countries, see Fig. 5 for sample visualizations. Among others, it is in active usage at the National Science Foundation, the National Institutes of Health, the US Department of Agriculture (Kosecki et al. 2011) and the National Oceanic and Atmospheric.
Fig. 5

Sci2 visualizations of hierarchical networks (a), geospatial collaboration patterns (b), science map overlays (c), word bursts (d), and horizontal timelines (e) showing funding by the National Science Foundation

Administration (Belter 2012) and the James S. McDonnell Foundation (Bruer 2010) in the USA and the Biotechnology and the Biological Sciences Research Council in the UK. A recent review by Cobo et al. (2011) compares different tools used to study and map science.

EpiC

The EpiC Tool is currently under development but will be used in Computational Epidemics to study reaction–diffusion processes and metapopulation models in heterogeneous networks increasing our understanding of the impact of air travel on the global spread of epidemics (Colizza et al. 2007a, b, c). Figure 6 visualizes modeling results for the global spreading of emerging infectious diseases. Detailed knowledge of the worldwide population distribution and movement patterns of individuals by air travel is explicitly incorporated into the model to describe the spatiotemporal evolution of epidemics in our highly interconnected and globalized world. These maps are used in the identification, design, and implementation of appropriate intervention strategies aimed at possible containment.
Fig. 6

EpiC visualization that illustrates the global spreading of emerging infectious diseases

Future Directions

Recent work resulted in a Web services compatible CIShell v2.0 (http://cishell.org).

In collaboration with commercial partners, we are developing online services that bring a subset of the 180+ plugins to the Web for usage by different communities.

In addition, a number of other projects in the USA and Europe recently adopted OSGi and/or CIShell:
  • Cytoscape (http://cytoscape.org) led by Trey Ideker at the University of California, San Diego, is an open-source bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data (Shannon et al. 2003).

  • MAEviz (https://wiki.ncsa.uiuc.edu/display/MAE/Home) managed by Jong Lee at NCSA is an open-source, extensible software platform which supports seismic risk assessment based on the Mid-America Earthquake (MAE) Center research.

  • Taverna Workbench (http://taverna.org.uk) developed by the myGrid team (http://mygrid.org.uk) and led by Carol Goble at the University of Manchester, UK; is a free software tool for designing and executing workflows (Hull et al. 2006). Taverna allows users to integrate many different software tools, including over 30,000 Web services.

  • TEXTrend (http://textrend.org) led by George Kampis at Eötvös Loránd University, Budapest, Hungary, supports natural language processing (NLP), classification/mining, and graph algorithms for the analysis of business and governmental text corpuses with an inherently temporal component.

  • DynaNets (http://www.dynanets.org) coordinated by Peter M. A. Sloot at the University of Amsterdam, the Netherlands, develops algorithms to study evolving networks.

  • SISOB (http://sisob.lcc.uma.es) an Observatory for Science in Society Based in Social Models will develop tools to measure and predict the social impact of research.

As the functionality of OSGi-based software frameworks improves and the number and diversity of dataset and algorithm plugins increases, the capabilities of custom tools will expand.

Cross-References

Notes

Acknowledgments

Thanks go to the NWB, Sci2, and EpiC teams and particularly to David E. Polley for compiling references and proofreading a draft of the paper. This work is supported in part by the National Science Foundation under IIS- 0513650, SBE-0738111, and IIS-0513650, the National Institutes of Health under RM-07-004 and U01 GM098959, the James S. McDonnell Foundation, the Cyberinfrastructure for Network Science Center, and the School of Informatics and Computing both at Indiana University. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

References

  1. Adar E (2006) GUESS: a language and interface for graph exploration. In: Grinter R, Rodden T, Aoki P, Cutrell E, Jeffries R, Olso G (eds) Paper presented at the proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 791–800CrossRefGoogle Scholar
  2. Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: International AAAI conference on weblogs and social media, San JoseGoogle Scholar
  3. Belter C (2012) Visualizing networks of scientific research. Information Today, Inc. (vol May/June)Google Scholar
  4. Börner K (2010) Atlas of science: visualizing what we know. MIT Press, CambridgeGoogle Scholar
  5. Börner K (2011) Plug-and-play macroscopes. Commun ACM 54(3):60–69. ACMGoogle Scholar
  6. Börner K, David EP (2014) Visual insights: a practical guide to making sense of data. MIT Press, Cambridge, MAGoogle Scholar
  7. Börner K, Chen C, Boyack KW (2003) Visualizing knowledge domains. In: Cronin B (ed) Annual review of information science and technology, vol 37. American Society for Information Science and Technology, Medford, pp 179–255Google Scholar
  8. Börner K, Penumarthy S, Meiss M, Ke W (2006) Mapping the diffusion of information among major US research institutions. Scientometrics 68(3):416–426CrossRefGoogle Scholar
  9. Boyack KW, Mane KK, Börner K (2004) Mapping Medline papers, genes and proteins related to melanoma research. In: IEEE international conference on information visualization (IV2004). IEEE Computer Society, London, pp 965–971Google Scholar
  10. Boyack KW, Klavans R, Paley WB, Börner K (2007) Mapping, illuminating, and interacting with science. In: International conference on computer graphics and interactive techniques: ACM SIGGRAPH 2007 Sketches (vol The Viz Biz, article 2), San Diego, 5–9 AugGoogle Scholar
  11. Bruer JT (2010) Can we talk? How the cognitive neuroscience of attention emerged from neurobiology and psychology, 1980–2005. Scientometrics 83(3):751–764CrossRefGoogle Scholar
  12. Cobo MJ, López-Herrera AG, Herrera-Viedma E, Herrera F (2011) Science mapping software tools: review, analysis, and cooperative study among tools. J Am Soc Inf Sci Technol 62(7):1382–1402CrossRefzbMATHGoogle Scholar
  13. Colizza V, Barrat A, Barthélemy M, Valleron A-J, Vespignani A (2007a) Modeling the worldwide spread of pandemic influenza: baseline case and containment interventions. PLoS Med 4(1):95–110CrossRefGoogle Scholar
  14. Colizza V, Barrat A, Barthelemy M, Vespignani A (2007b) Epidemic modeling in complex realities. C R Biol 330(4):364–374CrossRefGoogle Scholar
  15. Colizza V, Pastor-Satorras R, Vespignani A (2007c) Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nat Phys 3:276–282CrossRefGoogle Scholar
  16. Cytoscape Consortium (2008) Cytoscape http://www.cytoscape.org/index.php. Accessed 14 Sept 2009
  17. de Rosnay J (1975) Le Macroscope. Vers une Vision Globale (du Seuil ed.). Harper & Row Publishers, Inc, New YorkGoogle Scholar
  18. Gephi (2012) Gephi, an open source graph visualization and manipulation software. https://gephi.org/. Accessed 8 Aug 2012Google Scholar
  19. Hausmann R, Hidalgo CA, Bustos S, Cosica M, Chung S, Jimenez J, Simoes A, Yildirim MA (2007) The atlas of economic complexity: mapping paths to prosperity. MIT Media LabGoogle Scholar
  20. Herr II BW, Duhon RJ, Hardy EF, Penumarthy S, Börner K (2008) 113 years of physical review: using flow maps to show temporal and topical citation patterns. In: Proceedings of the 12th information visualization conference (IV 2008), London, UK, pp 421–426Google Scholar
  21. Hidalgo CA, Klinger B, Barabási A-L, Hausmann R (2007) The product space conditions the development of nations. Science 317(5837):482–487CrossRefGoogle Scholar
  22. Holloway T, Bošicevic M, Börner K (2007) Analyzing and visualizing the semantic coverage of wikipedia and its authors. Complexity 12(3):30–40CrossRefGoogle Scholar
  23. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T (2006) Taverna: a tool for building and running workflows of services. Nucl Acids Res 34(Web Server Issue):W729–W732. http://nar.oxfordjournals.org/cgi/screenpdf/34/suppl_2/W729. Accessed 25 Sept 2009
  24. Kosecki S, Shoemaker R, Baer CK (2011) Scope, characteristics, and use of the US department of agriculture’s intramural research. Scientometrics 88(3):707–728CrossRefGoogle Scholar
  25. Mane KK, Börner K (2004) Mapping topics and topic bursts in PNAS. Proc Natl Acad Sci U S A 101(Suppl 1):5183–5185Google Scholar
  26. Schnell S, Fortunato S, Roy S (2007) Is the intrinsic disorder of proteins the cause of the scale-free architecture of protein-protein interaction networks. Proteomics 7(6):961–964CrossRefGoogle Scholar
  27. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Daniel R, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. http://genome.cshlp.org/content/13/11/2498.full.pdf+html. Accessed 25 Sept 2009
  28. Shiffrin RM, Börner K (2004) Mapping knowledge domains. PNAS 101(Suppl 1):5183–5185CrossRefGoogle Scholar
  29. Yildirim MA, Kwan-II G, Cusick ME, Barabási A-L, Vidal M (2007) Drug-target network. Nat Biotechnol 25(10):1119–1126CrossRefGoogle Scholar
  30. Zoss A, Conover M, Börner K (2010) Where are the academic jobs? Interactive exploration of job advertisements in geospatial and topical space. In: Chai S-K, Salerno J, Mabry PL (eds) Paper presented at the advances in social computing: third international conference on social computing, behavioral modeling and prediction, SPB10, Bethesda, 30–31 Mar. Springer, pp 238–247Google Scholar

Recommended Reading

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.School of Informatics and Computing, Cyberinfrastructure for Network Science CenterIndiana UniversityBloomingtonUSA

Section editors and affiliations

  • Vladimir Batagelj
    • 1
  1. 1.Department of MathematicsUniversity of LjubljanaLjubljanaSlovenia