IMMAN: an R/Bioconductor package for Interolog protein network reconstruction, mapping and mining analysis
Reconstruction of protein-protein interaction networks (PPIN) has been riddled with controversy for decades. Particularly, false-negative and -positive interactions make this progress even more complicated. Also, lack of a standard PPIN limits us in the comparison studies and results in the incompatible outcomes. Using an evolution-based concept, i.e. interolog which refers to interacting orthologous protein sets, pave the way toward an optimal benchmark.
Here, we provide an R package, IMMAN, as a tool for reconstructing Interolog Protein Network (IPN) by integrating several Protein-protein Interaction Networks (PPINs). Users can unify different PPINs to mine conserved common networks among species. IMMAN is designed to retrieve IPNs with different degrees of conservation to engage prediction analysis of protein functions according to their networks.
IPN consists of evolutionarily conserved nodes and their related edges regarding low false positive rates, which can be considered as a gold standard network in the contexts of biological network analysis regarding to those PPINs which is derived from.
KeywordsProtein-protein interaction networks (PPINs) Interolog protein network (IPN) Bioconductor Network biology
Interolog Protein Network
Protein-protein Interaction Network
Nowadays, tremendous amount of interactions at the molecular level have been accessible by the development of the technology, endeavors to model cellular and molecular processes [1, 2]. Among these interactions, protein-protein interactions (PPIs) are remarkable due to providing functional and structural description of executive molecules i.e. proteins . Nevertheless, PPI detection and prediction technologies are still entangling with reducing false-positive and -negative interactions [4, 5, 6]. Accordingly, data integration is the best solution overall in spite of the improvement of experimental and computational methods. STRING , BioNetBuilder Cytoscape app , IMP 2.0 , PINALOG , HIPPIE  and BIPS  are using this solution to reconstruct and refine PPI networks (PPINs). In the other works, an evolutionarily conserved network with communal nodes and less false-positive links, Interolog Protein Network (IPN), was introduced as a benchmark for the evaluation of clustering algorithms . IPN clears up the arisen and remained interactions during the evolution and helps to excavate the remnants of ancestor PPIN [13, 14, 15, 16, 17]. In this study, we present IMMAN, a package to integrate several PPINs and mine IPNs. IMMAN is free and is available as an R/Bioconductor package and also a Java program.
- Step 1.
First, the amino acid sequence of each protein of input list is automatically retrieved from UniProt database.
- Step 2.
In the second step, IMMAN infers the orthologous proteins. To this end, the Needleman-Wunsch algorithms is employed to compute the pairwise sequence similarities. The reciprocal best hits are retrieved and applied in the next step to increase the chance of discovering the orthologous pairs. The user can adjust different parameters of alignment algorithm as well as the sequence similarity cutoff for orthology detection.
- Step 3.
In this step, the nodes of the IPN are specified. Each node of the network is defined as a set of mutually orthologous proteins (OPS) such that each OPS belongs to a set of species involved in the analysis.
- Step 4.
In the fourth step, for each species, the PPINs are singly extracted according to the proteins constitute the OPSs or IPN nodes. The PPINs are retrieved from STRING database. Next, the user can adjust the minimal confidence score of STRING networks.
- Step 5.
Finally, the edges of the interolog network are extracted. To this end, for every OPS pair, the number of protein pairs (pik, pjk) are considered such that piand pjare connected in the PPIN of the species k. If this number exceeds a predefined cutoff (coverage cutoff), there would be an edge between the aforementioned nodes. The coverage cutoff can be also specified by the user to tune conservedness.
Although, the size of IPN is tunable by several thresholds, but obviously, missing the edges in IPN is the cost of true positive discovery which is an ideal within PPI studies with inherent inconsistency [6, 20]. However, function prediction is a prominent question in molecular biology and this approach pave its way based on evolutionary mechanism . All routine analysis of network biology related to PPIN become more reliable by the study of IPN. For instance, finding modules within the IPN help us to understand how evolution thinks, provides and preserves cellular mechanism of species to characterize a given biological process . Also, ranking the node’s influence of IPN, based on centrality measures, can shed light on the detailed mechanism of evolutionary processes .
The authors would like to thank Dr. Mehdi Sadeghi for his valuable comments and discussions.
This work has been supported by the grant number No. BS 1395_0_01 provided by the school of biological sciences, Institute for Research in Fundamental Sciences, Tehran, Iran.
Availability of data and materials
Project name: IMMAN.
Project home page: https://bioconductor.org/packages/IMMAN
Archived version: 1.2.0.
Operating system(s): Platform independent.
Programming language: R.
Other requirements: None.
MJ, MM conceived and commenced the project and provided direction and feedback on the final results. PN wrote the basic R codes, gathered datasets used in the package and drafted the manuscript. AS developed and implemented the method and improved the R codes. MA built the R package IMMAN and participated in revising the code and the manuscript. SJT developed and implemented the JAVA script of the procedure. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The author declares that he has no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 5.Hart GT, Ramani AK, Marcotte EM. How complete are current yeast and human protein-interaction networks ? Genome Biology; 2006.Google Scholar
- 7.Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res. 2016:gkw937–7.Google Scholar
- 11.Schaefer MH, Fontaine JF, Vinayagam A, Porras P, Wanker EE, MA A-N. HIPPIE: Integrating protein interaction networks with experiment based quality scores. PLoS One. 2012;7(2). https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0031826.
- 13.Jafari M, Mirzaie M, Sadeghi M. Interlog protein network: an evolutionary benchmark of protein interaction networks for the evaluation of clustering algorithms. BMC bioinformatics. 2015;16(1) 319–319.Google Scholar
- 16.Nguyen PV, Srihari S, Leong HW. Identifying conserved protein complexes between species by constructing interolog networks. BMC Bioinformatics. 2013;14(Suppl 16) S8-S8.Google Scholar
- 18.Ellson J, Gansner E, Koutsofios L, North SC, Woodhull G: Graphviz—open source graph drawing tools. In: International Symposium on Graph Drawing: 2001. Springer: 483–484.Google Scholar
- 19.Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal, Complex Systems. 2006;1695(5):1–9.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.