Skip to main content

Advertisement

Log in

SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein’s location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein—secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at http://distillf.ucd.ie/distill/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  PubMed  CAS  Google Scholar 

  • Bakheet T, Doig A (2009) Properties and identification of human protein drug targets. Bioinformatics 25(4):451–457

    Article  PubMed  CAS  Google Scholar 

  • Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424

    Article  PubMed  CAS  Google Scholar 

  • Bender A, van Dooren G, Ralph S, McFadden G, Schneider G (2003) Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol Biochem Parasitol 132:59–66

    Article  PubMed  CAS  Google Scholar 

  • Bendtsen J, Jensen L, Blom N, Von Heijne G, Brunak S (2004) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 17(4):349–356

    Article  PubMed  CAS  Google Scholar 

  • Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteiger E, Martin M, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:365–370

    Article  PubMed  CAS  Google Scholar 

  • Brayton K, Lau A, Herndon D, Hannick L, Kappmeyer L, Berens S, Bidwell S, Brown W, Crabtree J, Fadrosh D et al (2007) Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa. PLoS Pathog 3(10):e148

    Article  Google Scholar 

  • Burki F, Shalchian-Tabrizi K, Minge M, Skjæveland A, Nikolaev S, Jakobsen K, Pawlowski J (2007) Phylogenomics reshuffles the eukaryotic supergroups. PLoS One 2(8):e790

    Article  PubMed  Google Scholar 

  • Choo K, Tan T, Ranganathan S (2009) A comprehensive assessment of N-terminal signal peptides prediction methods. BMC Bioinformatics 10(15):S2

    Article  PubMed  Google Scholar 

  • Chou K, Shen H (2010) A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One 5(4):e9931

    Article  PubMed  Google Scholar 

  • Emanuelsson O, Nielsen H, Brunak S, von Heijne G et al (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300(4):1005–1016

    Article  PubMed  CAS  Google Scholar 

  • Foth B, Ralph S, Tonkin C, Struck N, Fraunholz M, Roos DS, Cowman A, McFadden G (2003) Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum. Science 299:705

    Article  PubMed  CAS  Google Scholar 

  • Frank K, Sippl M (2008) High-performance signal peptide prediction based on sequence alignment techniques. Bioinformatics 24(19):2172–2176

    Article  PubMed  CAS  Google Scholar 

  • Gardner M, Bishop R, Shah T, de Villiers E, Carlton J, Hall N, Ren Q, Paulsen I, Pain A, Berriman M et al (2005) Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science 309(5731):134

    Article  PubMed  CAS  Google Scholar 

  • Garg A, Raghava G (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinformatics 9(1):503

    Article  PubMed  Google Scholar 

  • Garg A, Bhasin M, Raghava G (2005) Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 280(15):14427–14432

    Article  PubMed  CAS  Google Scholar 

  • Gellin B, Soave R (1992) Coccidian infections in AIDS. Toxoplasmosis, cryptosporidiosis, and isosporiasis. Med Clin N Am 76(1):205

    PubMed  CAS  Google Scholar 

  • Goldberg T, Hamp T, Rost B (2012) LocTree2 predicts localization for all domains of life. Bioinformatics 28(18):i458–i465

    Article  PubMed  CAS  Google Scholar 

  • Horton P, Park K, Obayashi T, Fujita N, Harada H, Adams-Collier C, Naka K (2007) WoLF PSORT:protein localization predictor. Nucleic Acids Res 35:W585–W5857

    Article  PubMed  Google Scholar 

  • Jia P, Qian Z, Zeng Z, Cai Y, Li Y (2007) Prediction of subcellular protein localization based on functional domain composition. Biochem Bioph Res Co 357(2):366–370

    Article  CAS  Google Scholar 

  • Kaundal R, Raghava G (2009) RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information. Proteomics 9(9):2324–2342

    Article  PubMed  CAS  Google Scholar 

  • Keeling P, Burger G, Durnford D, Lang B, Lee R, Pearlman R, Roger A, Gray M (2005) The tree of eukaryotes. Trends Ecol Evol 20(12):670–676

    Article  PubMed  Google Scholar 

  • Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T (2011) Assessment of template based protein structure predictions in CASP9. Proteins 79(S10):37–58

    Article  PubMed  CAS  Google Scholar 

  • Mooney C, Pollastri G et al (2011) SCLpred: protein subcellular localization prediction by N-to-1 neural networks. Bioinformatics 27(20):2812–2819

    Article  PubMed  CAS  Google Scholar 

  • Murray C, Rosenfeld L, Lim S, Andrews K, Foreman K, Haring D, Fullman N, Naghavi M, Lozano R, Lopez A (2012) Global malaria mortality between 1980 and 2010: a systematic analysis. Lancet 379(9814):413–431

    Article  PubMed  Google Scholar 

  • Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24(1):34–35

    Article  PubMed  CAS  Google Scholar 

  • Nancy Y, Wagner J, Laird M, Melli G, Rey S, Lo R, Sahinalp S, Ester M, Foster L et al (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26(13):1608–1615

    Article  Google Scholar 

  • Nielsen H, Engelbrecht J, Brunak S, Von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10(1):1–6

    Article  PubMed  CAS  Google Scholar 

  • Petersen T, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8(10):785–786

    Article  PubMed  CAS  Google Scholar 

  • Pierleoni A, Martelli PL, Fariselli P, Casadio R (2006) BaCelLo: a balanced subcellular localization predictor. Bioinformatics 422(14):408–416

    Article  Google Scholar 

  • Pierleoni A, Martelli P, Casadio R (2011) MemLoci: predicting subcellular localization of membrane proteins in Eukaryotes. Bioinformatics 27(9):1224–1230

    Article  PubMed  CAS  Google Scholar 

  • Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8):1719–1720

    Article  PubMed  CAS  Google Scholar 

  • Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O (2007) SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11):1410–1417

    Article  PubMed  CAS  Google Scholar 

  • Suzek B, Huang H, McGarvey P, Mazumder R, Wu C (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282

    Article  PubMed  CAS  Google Scholar 

  • Tamura T, Akutsu T (2007) Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. BMC Bioinformatics 8(1):466

    Article  PubMed  Google Scholar 

  • Volpato V, Adelfio A, Pollastri G (2013) Accurate prediction of protein enzymatic class by N-to-1 neural networks. BMC Bioinformatics 14(1):S11

    Article  PubMed  CAS  Google Scholar 

  • Yu C, Chen Y, Lu C, Hwang J (2006) Prediction of protein subcellular localization. Proteins 64(3):643–651

    Article  PubMed  CAS  Google Scholar 

  • Yuan Z, Teasdale R (2002) Prediction of Golgi Type II membrane proteins based on their transmembrane domains. Bioinformatics 18(8):1109–1115

    Article  PubMed  CAS  Google Scholar 

  • Zuegge J, Ralph S, Schmuker M, McFadden G, Schneider G (2001) Deciphering apicoplast targeting signals—feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins. Gene 280:19–26

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The work was funded through a Science Foundation Ireland principal investigator grant (08/IN.1/B1864) to D. C. Shields and a Science Foundation Ireland research frontiers grant (10/RFP/GEN2749) to G. Pollastri. The authors wish to acknowledge UCD IT Services, and in particular the Phaeton administrators, for the provision of computational facilities and support. We thank Tatyana Goldberg from the Rost Lab at TU Munich for providing LocTree2 predictions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gianluca Pollastri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mooney, C., Cessieux, A., Shields, D.C. et al. SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor. Amino Acids 45, 291–299 (2013). https://doi.org/10.1007/s00726-013-1491-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-013-1491-3

Keywords

Navigation