Abstract
Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein’s location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein—secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at http://distillf.ucd.ie/distill/.
Similar content being viewed by others
References
Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Bakheet T, Doig A (2009) Properties and identification of human protein drug targets. Bioinformatics 25(4):451–457
Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424
Bender A, van Dooren G, Ralph S, McFadden G, Schneider G (2003) Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol Biochem Parasitol 132:59–66
Bendtsen J, Jensen L, Blom N, Von Heijne G, Brunak S (2004) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 17(4):349–356
Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteiger E, Martin M, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:365–370
Brayton K, Lau A, Herndon D, Hannick L, Kappmeyer L, Berens S, Bidwell S, Brown W, Crabtree J, Fadrosh D et al (2007) Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa. PLoS Pathog 3(10):e148
Burki F, Shalchian-Tabrizi K, Minge M, Skjæveland A, Nikolaev S, Jakobsen K, Pawlowski J (2007) Phylogenomics reshuffles the eukaryotic supergroups. PLoS One 2(8):e790
Choo K, Tan T, Ranganathan S (2009) A comprehensive assessment of N-terminal signal peptides prediction methods. BMC Bioinformatics 10(15):S2
Chou K, Shen H (2010) A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One 5(4):e9931
Emanuelsson O, Nielsen H, Brunak S, von Heijne G et al (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300(4):1005–1016
Foth B, Ralph S, Tonkin C, Struck N, Fraunholz M, Roos DS, Cowman A, McFadden G (2003) Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum. Science 299:705
Frank K, Sippl M (2008) High-performance signal peptide prediction based on sequence alignment techniques. Bioinformatics 24(19):2172–2176
Gardner M, Bishop R, Shah T, de Villiers E, Carlton J, Hall N, Ren Q, Paulsen I, Pain A, Berriman M et al (2005) Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science 309(5731):134
Garg A, Raghava G (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinformatics 9(1):503
Garg A, Bhasin M, Raghava G (2005) Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 280(15):14427–14432
Gellin B, Soave R (1992) Coccidian infections in AIDS. Toxoplasmosis, cryptosporidiosis, and isosporiasis. Med Clin N Am 76(1):205
Goldberg T, Hamp T, Rost B (2012) LocTree2 predicts localization for all domains of life. Bioinformatics 28(18):i458–i465
Horton P, Park K, Obayashi T, Fujita N, Harada H, Adams-Collier C, Naka K (2007) WoLF PSORT:protein localization predictor. Nucleic Acids Res 35:W585–W5857
Jia P, Qian Z, Zeng Z, Cai Y, Li Y (2007) Prediction of subcellular protein localization based on functional domain composition. Biochem Bioph Res Co 357(2):366–370
Kaundal R, Raghava G (2009) RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information. Proteomics 9(9):2324–2342
Keeling P, Burger G, Durnford D, Lang B, Lee R, Pearlman R, Roger A, Gray M (2005) The tree of eukaryotes. Trends Ecol Evol 20(12):670–676
Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T (2011) Assessment of template based protein structure predictions in CASP9. Proteins 79(S10):37–58
Mooney C, Pollastri G et al (2011) SCLpred: protein subcellular localization prediction by N-to-1 neural networks. Bioinformatics 27(20):2812–2819
Murray C, Rosenfeld L, Lim S, Andrews K, Foreman K, Haring D, Fullman N, Naghavi M, Lozano R, Lopez A (2012) Global malaria mortality between 1980 and 2010: a systematic analysis. Lancet 379(9814):413–431
Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24(1):34–35
Nancy Y, Wagner J, Laird M, Melli G, Rey S, Lo R, Sahinalp S, Ester M, Foster L et al (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26(13):1608–1615
Nielsen H, Engelbrecht J, Brunak S, Von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10(1):1–6
Petersen T, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8(10):785–786
Pierleoni A, Martelli PL, Fariselli P, Casadio R (2006) BaCelLo: a balanced subcellular localization predictor. Bioinformatics 422(14):408–416
Pierleoni A, Martelli P, Casadio R (2011) MemLoci: predicting subcellular localization of membrane proteins in Eukaryotes. Bioinformatics 27(9):1224–1230
Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8):1719–1720
Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O (2007) SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11):1410–1417
Suzek B, Huang H, McGarvey P, Mazumder R, Wu C (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282
Tamura T, Akutsu T (2007) Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. BMC Bioinformatics 8(1):466
Volpato V, Adelfio A, Pollastri G (2013) Accurate prediction of protein enzymatic class by N-to-1 neural networks. BMC Bioinformatics 14(1):S11
Yu C, Chen Y, Lu C, Hwang J (2006) Prediction of protein subcellular localization. Proteins 64(3):643–651
Yuan Z, Teasdale R (2002) Prediction of Golgi Type II membrane proteins based on their transmembrane domains. Bioinformatics 18(8):1109–1115
Zuegge J, Ralph S, Schmuker M, McFadden G, Schneider G (2001) Deciphering apicoplast targeting signals—feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins. Gene 280:19–26
Acknowledgments
The work was funded through a Science Foundation Ireland principal investigator grant (08/IN.1/B1864) to D. C. Shields and a Science Foundation Ireland research frontiers grant (10/RFP/GEN2749) to G. Pollastri. The authors wish to acknowledge UCD IT Services, and in particular the Phaeton administrators, for the provision of computational facilities and support. We thank Tatyana Goldberg from the Rost Lab at TU Munich for providing LocTree2 predictions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mooney, C., Cessieux, A., Shields, D.C. et al. SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor. Amino Acids 45, 291–299 (2013). https://doi.org/10.1007/s00726-013-1491-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-013-1491-3