Construction and analysis of an interologous protein–protein interaction network of Camellia sinensis leaf (TeaLIPIN) from RNA–Seq data sets

  • Gagandeep Singh
  • Vikram Singh
  • Vikram SinghEmail author
Original Article


Key message

An interologous PPI network of tea leaf is designed by developing reference transcriptome assembly and using experimentally validated PPIs in plants. Key regulatory proteins are proposed and potential TFs are predicted.


Worldwide, tea (Camellia sinensis) is the most consumed beverage primarily due to the taste, flavour, and aroma of its newly formed leaves; and has been used as an important ingredient in several traditional medicinal systems because of its antioxidant properties. For this medicinally and commercially important plant, design principles of gene-regulatory and protein–protein interaction (PPI) networks at sub-cellular level are largely un-characterized. In this work, we report a tea leaf interologous PPI network (TeaLIPIN) consisting of 11,208 nodes and 197,820 interactions. A reference transcriptome assembly was first developed from all the 44 samples of 6 publicly available leaf transcriptomes (1,567,288,290 raw reads). By inferring the high-confidence interactions among potential proteins coded by these transcripts using known experimental information about PPIs in 14 plants, an interologous PPI network was constructed and its modular architecture was explored. Comparing this network with 10,000 realizations of two types of corresponding random networks (Erdős–Rényi and Barabási–Albert models) and examining over three network centrality metrics, we predict 2750 bottleneck proteins (having p values < 0.01). 247 of these are deduced to have transcription factor domains by in-house developed HMM models of known plant TFs and these were also mapped to the draft tea genome for searching their probable loci of origin. Co-expression analysis of the TeaLIPIN proteins was also performed and top ranking modules are elaborated. We believe that the proposed novel methodology can easily be adopted to develop and explore the PPI interactomes in other plant species by making use of the available transcriptomic data.


Camellia sinensis (Tea) Protein–protein interaction (PPI) network RNA–Seq data Leaf transcriptome Interolog KEGG pathways Transcription factors (TFs) 



We would like to thank Central University of Himachal Pradesh for providing us computational infrastructure.

Author contribution statement

VS (third author) conceptualized the research framework and supervised the work. GS and VS (second author) performed all the computational studies. All the authors analyzed the data and interpreted results. GS and VS (third author) wrote and finalized the manuscript.

Compliance with ethical standards

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this article.

Supplementary material

299_2019_2440_MOESM1_ESM.rar (80.4 mb)
S1 Supplementary File: Sequences of all the final assembled transcripts. S2 Supplementary File: Annotation details of assembled sequences. S3 Supplementary File: TeaLIPIN interactions. S4 Supplementary File: Detailed list of all the identified functional modules, pathways analysis of key proteins, and transcription factors identified in key proteins. S5 Supplementary File: List of modules identified by weighted gene co-expression network analysis (WGCNA) and pathway enrichment of selected modules. S6 Supplementary File: Mapping of TeaLIPIN proteins on draft genome of tea and available proteomic data at NCBI (RAR 82353 kb)


  1. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410CrossRefGoogle Scholar
  2. Apweiler R, Bairoch A, Wu CH et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32:D115–D119CrossRefGoogle Scholar
  3. Baby UI, Balasubramanian S, Ajay D, Premkumar R (2004) Effect of ergosterol biosynthesis inhibitors on blister blight disease, the tea plant and quality of made tea. Crop Prot 23:795–800CrossRefGoogle Scholar
  4. Bader GD, Hogue CWV (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform 4:2CrossRefGoogle Scholar
  5. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science (80-) 286:509–512CrossRefGoogle Scholar
  6. Bellincampi D, Cervone F, Lionetti V (2014) Plant cell wall dynamics and wall-related susceptibility in plant–pathogen interactions. Front Plant Sci 5:228CrossRefGoogle Scholar
  7. Berardini TZ, Reiser L, Li D et al (2015) The Arabidopsis information resource: making and mining the “gold standard” annotated reference plant genome. Genesis 53:474–485CrossRefGoogle Scholar
  8. Braun P, Aubourg S, Van Leene J, De Jaeger G, Lurin C (2013) Plant protein interactomes. Annu Rev Plant Biol 64:161–187CrossRefGoogle Scholar
  9. Dietz KJ, Jacquot JP, Harris G (2010) Hubs and bottlenecks in plant molecular signalling networks. New Phytol 188:919–938CrossRefGoogle Scholar
  10. Du Z, Zhou X, Ling Y et al (2010) agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res 38:W64–W70CrossRefGoogle Scholar
  11. Erdős P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–60Google Scholar
  12. Fang Y, Liao K, Du H et al (2015) A stress-responsive NAC transcription factor SNAC3 confers heat and drought tolerance through modulation of reactive oxygen species in rice. J Exp Bot 66:6803–6817CrossRefGoogle Scholar
  13. Geisler-Lee J, O’Toole N, Ammar R et al (2007) A predicted interactome for Arabidopsis. Plant Physiol 145:317–329CrossRefGoogle Scholar
  14. Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644CrossRefGoogle Scholar
  15. Huang DW, Sherman BT, Lempicki RA (2008) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13CrossRefGoogle Scholar
  16. Jayaswall K, Mahajan P, Singh G et al (2016) Transcriptome analysis reveals candidate genes involved in blister blight defense in tea (Camellia sinensis (L) Kuntze). Sci Rep 6:30412CrossRefGoogle Scholar
  17. Jeong H, Mason SP, Barabási A-L, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411:41CrossRefGoogle Scholar
  18. Jin J, Tian F, Yang D-C et al (2016) PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res 45:D1040–D1045CrossRefGoogle Scholar
  19. Joy MP, Brock A, Ingber DE, Huang S (2005) High-betweenness proteins in the yeast protein interaction network. Biomed Res Int 2005:96–103Google Scholar
  20. Kanehisa M, Furumichi M, Tanabe M et al (2016) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:D353–D361CrossRefGoogle Scholar
  21. Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinform 9:559CrossRefGoogle Scholar
  22. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357CrossRefGoogle Scholar
  23. Lee K, Thorneycroft D, Achuthan P et al (2010) Mapping plant interactomes using literature curated and predicted protein–protein interaction data sets. Plant Cell 22:997–1005CrossRefGoogle Scholar
  24. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659CrossRefGoogle Scholar
  25. Li C-F, Xu Y-X, Ma J-Q et al (2016) Biochemical and transcriptomic analyses reveal different metabolite biosynthesis profiles among three color and developmental stages in ‘Anji Baicha’(Camellia sinensis). BMC Plant Biol 16:195CrossRefGoogle Scholar
  26. Li X, Lin Y, Zhao S et al (2018) Transcriptome changes and its effect on physiological and metabolic processes in tea plant during mechanical damage. For Pathol 48:e12432CrossRefGoogle Scholar
  27. Mao G, Meng X, Liu Y et al (2011) Phosphorylation of a WRKY transcription factor by two pathogen-responsive MAPKs drives phytoalexin biosynthesis in Arabidopsis. Plant Cell 23:1639–1653CrossRefGoogle Scholar
  28. Matthews LR, Vaglio P, Reboul J et al (2001) Identification of potential interaction networks using sequence-based searches for conserved protein–protein interactions or “interologs”. Genome Res 11:2120–2126CrossRefGoogle Scholar
  29. Merchante C, Brumos J, Yun J et al (2015) Gene-specific translation regulation mediated by the hormone-signaling molecule EIN2. Cell 163:684–697CrossRefGoogle Scholar
  30. Namita P, Mukesh R, Vijay KJ (2012) Camellia sinensis (green tea): a review. Glob J Pharmacol 6:52–59Google Scholar
  31. Newman ME (2008) The mathematics of networks. The new palgraveencyclopedia of economics 2:1–12Google Scholar
  32. Park CJ, Seo YS (2015) Heat shock proteins: a review of the molecular chaperones for plant immunity. Plant Pathol J 31:323–333CrossRefGoogle Scholar
  33. Patel RK, Jain M (2012) NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7:e30619CrossRefGoogle Scholar
  34. Paul A, Jha A, Bhardwaj S et al (2014) RNA-seq-mediated transcriptome analysis of actively growing and winter dormant shoots identifies non-deciduous habit of evergreen tree tea during winters. Sci Rep 4:5932CrossRefGoogle Scholar
  35. Pireyre M, Burow M (2015) Regulation of MYB and bHLH transcription factors: a glance at the protein level. Mol Plant 8:378–388CrossRefGoogle Scholar
  36. Pržulj N, Corneil DG, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20:3508–3515CrossRefGoogle Scholar
  37. Seo PJ, Mas P (2014) Multiple layers of posttranslational regulation refine circadian clock activity in Arabidopsis. Plant Cell 26:79–87CrossRefGoogle Scholar
  38. Shi J, Ma C, Qi D et al (2015) Transcriptional responses and flavor volatiles biosynthesis in methyl jasmonate-treated tea leaves. BMC Plant Biol 15:233CrossRefGoogle Scholar
  39. Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539CrossRefGoogle Scholar
  40. Smoot ME, Ono K, Ruscheinski J et al (2010) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27:431–432CrossRefGoogle Scholar
  41. Stracke R, Werber M, Weisshaar B (2001) The R2R3–MYB gene family in Arabidopsis thaliana. Curr Opin Plant Biol 4:447–456CrossRefGoogle Scholar
  42. Szklarczyk D, Morris JH, Cook H et al (2016) The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Res 45:D362–D368CrossRefGoogle Scholar
  43. Tai Y, Liu C, Yu S et al (2018) Gene co-expression network analysis reveals coordinated regulation of three characteristic secondary biosynthetic pathways in tea plant (Camellia sinensis). BMC Genom 19:616CrossRefGoogle Scholar
  44. Thanasomboon R, Kalapanulak S, Netrphan S, Saithong T (2017) Prediction of cassava protein interactome based on interolog method. Sci Rep 7:17206CrossRefGoogle Scholar
  45. van Dam S, Vosa U, van der Graaf A et al (2017) Gene co-expression analysis for functional classification and gene–disease predictions. Brief Bioinform 19:575–592Google Scholar
  46. Wagner GP, Kin K, Lynch VJ (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci 131:281–285CrossRefGoogle Scholar
  47. Wang F, Liu M, Song B et al (2012) Prediction and characterization of protein–protein interaction networks in swine. Proteome Sci 10:2CrossRefGoogle Scholar
  48. Wang T, Liang L, Xue Y et al (2016) A receptor heteromer mediates the male perception of female attractants in plants. Nature 531:241CrossRefGoogle Scholar
  49. Waterhouse RM, Seppey M, Simão FA et al (2017) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543–548CrossRefGoogle Scholar
  50. Wei C, Yang H, Wang S et al (2018) Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proc Natl Acad Sci 115:E4151–E4158CrossRefGoogle Scholar
  51. Wu Z-J, Li X-H, Liu Z-W et al (2016) Transcriptome-wide identification of Camellia sinensis WRKY transcription factors in response to temperature stress. Mol Genet Genom 291:255–269CrossRefGoogle Scholar
  52. Xia E-H, Zhang H-B, Sheng J et al (2017) The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol Plant 10:866–877CrossRefGoogle Scholar
  53. Yao S, Deng L, Zeng K (2017) Genome-wide in silico identification of membrane-bound transcription factors in plant species. PeerJ 5:e4051CrossRefGoogle Scholar
  54. Ye J, Fang L, Zheng H et al (2006) WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 34:W293–W297CrossRefGoogle Scholar
  55. Yu H, Luscombe NM, Lu HX et al (2004) Annotation transfer between genomes: protein–protein interologs and protein–DNA regulogs. Genome Res 14:1107–1118CrossRefGoogle Scholar
  56. Zhang S, Jin G, Zhang X, Chen L (2007) Discovering functions and revealing mechanisms at molecular level from biological networks. Proteomics 7:2856–2869CrossRefGoogle Scholar
  57. Zhang Q, Cai M, Yu X et al (2017) Transcriptome dynamics of Camellia sinensis in response to continuous salinity and drought stress. Tree Genet Genomes 13:78CrossRefGoogle Scholar
  58. Zhao S-Y, Wang G-D, Zhao W-Y et al (2018) Overexpression of tomato WHIRLY protein enhances tolerance to drought stress and resistance to Pseudomonas solanacearum in transgenic tobacco. Biol Plant 62:55–68CrossRefGoogle Scholar
  59. Zhou Y, Liu Y, Wang S et al (2017) Molecular cloning and characterization of galactinol synthases in Camellia sinensis with different responses to biotic and abiotic stressors. J Agric Food Chem 65:2751–2759CrossRefGoogle Scholar
  60. Zhu J-K (2016) Abiotic stress signaling and responses in plants. Cell 167:313–324CrossRefGoogle Scholar
  61. Zhu P, Gu H, Jiao Y et al (2011) Computational identification of protein–protein interactions in rice based on the predicted rice interactome network. Genom Proteom Bioinform 9:128–137CrossRefGoogle Scholar
  62. Zhu G, Wu A, Xu X-J et al (2016) PPIM: a protein–protein interaction database for maize. Plant Physiol 170:618–626CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Centre for Computational Biology and Bioinformatics, School of Life SciencesCentral University of Himachal PradeshDharamshalaIndia

Personalised recommendations