Peptide Bioinformatics- Peptide Classification Using Peptide Machines

  • Zheng Rong Yang
Part of the Methods in Molecular Biology™ book series (MIMB, volume 458)


Peptides scanned from whole protein sequences are the core information for many peptide bioinformatics research subjects, such as functional site prediction, protein structure identification, and protein function recognition. In these applications, we normally need to assign a peptide to one of the given categories using a computer model. They are therefore referred to as peptide classification applications. Among various machine learning approaches, including neural networks, peptide machines have demonstrated excellent performance compared with various conventional machine learning approaches in many applications. This chapter discusses the basic concepts of peptide classification, commonly used feature extraction methods, three peptide machines, and some important issues in peptide classification.


bioinformatics peptide classification peptide machines. 


  1. 1.
    Schechter I, Berger A (1968) On the active site of proteases, 3. Mapping the active site of papain; specific peptide inhibitors of papain. Biochem Biophys Res Comms 32:898.CrossRefGoogle Scholar
  2. 2.
    Blom N, Gammeltoft S, Brunak S (1999) Sequence and structure based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294:1351–1362.CrossRefPubMedGoogle Scholar
  3. 3.
    Kobata A (1984) The carbohydrates of glycoproteins. In: GinsburgV, Robins PW (eds) Biology of carbohydrates. Wiley, New York.Google Scholar
  4. 4.
    Yang ZR, Wang L, Young N, Trudgian D, Chou KC (2005) Machine learning algorithms for protein functional site recognition. Current Protein and Peptide Science 6:479–491.CrossRefPubMedGoogle Scholar
  5. 5.
    Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ (2000) Intrinsic protein disorder in complete genomes. Genome Inform Ser Workshop Genome Inform 11:161–171.PubMedGoogle Scholar
  6. 6.
    Baldi P, Pollastri G, Andersen C, Brunak S (2000) Matching protein beta-sheet partners by feedforward and recurrent neural networks. Proceedings of International Conference on Intelligent Systems for Molecular Biology, ISMB 8:25–36.PubMedGoogle Scholar
  7. 7.
    Yang ZR, Thomson R, McNeil P, Esnouf R (2005) RONN: use of the bio-basis function neural network technique for the detection of natively disordered regions in proteins. Bioinformatics 21:3369–3376.CrossRefPubMedGoogle Scholar
  8. 8.
    Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Molec Biol 202:865–884.CrossRefPubMedGoogle Scholar
  9. 9.
    Thomson R, Hodgman TC, Yang ZR, Doyle AK (2003) Characterising proteolytic cleavage site activity using bio-basis function neural networks. Bioinformatics 19:1741–1747.CrossRefPubMedGoogle Scholar
  10. 10.
    Yang ZR, Thomson R (2005) A novel neural network method in mining molecular sequence data. IEEE Trans on Neural Networks 16:263–274.CrossRefGoogle Scholar
  11. 11.
    Altschul SF, Gish W, Miller W, Myers E, Lipman, D.J. (1990) Basic local alignment search tool. J. Molec. Biol. 215:403–410.PubMedGoogle Scholar
  12. 12.
    Dayhoff MO, Schwartz RM, Orcutt BC 1978 A model of evolutionary change in proteins. matrices for detecting distant relationships. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol. 5, pp.345–358.Google Scholar
  13. 13.
    Johnson MS, Overington JP (1993) A structural basis for sequence comparisons—an evaluation of scoring methodologies. J Mol Biol 233:716–738.CrossRefPubMedGoogle Scholar
  14. 14.
    Yang ZR (2004) Application of support vector machines to biology. Briefings in Bioinformatics 5:328–338.CrossRefPubMedGoogle Scholar
  15. 15.
    Edelman J, White SH (1989) Linear optimization of predictors for secondary structure: application to transbilayer segments of membrane proteins. J Mole Biol 210:195–209.CrossRefGoogle Scholar
  16. 16.
    Yang ZR, Berry E (2004) Reduced bio-basis function neural networks for protease cleavage site prediction. J Comput Biol Bioinformatics 2:511–531.CrossRefGoogle Scholar
  17. 17.
    Thomson R, Esnouf R (2004) Predict disordered proteins using bio-basis function neural networks. Lecture Notes in Computer Science 3177:19–27.CrossRefGoogle Scholar
  18. 18.
    Berry E, Dalby A, Yang ZR (2004) Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms. Comput Biol Chem 28:75–85.CrossRefPubMedGoogle Scholar
  19. 19.
    Senawongse P Dalby AD, Yang ZR (2005) Predicting the phosphorylation sites using hidden Markov models and machine learning methods. J Chem Info Comp Sci 45:1147–1152.Google Scholar
  20. 20.
    Yang ZR, Chou KC (2004) Bio-basis function neural networks for the prediction of the O-linkage sites in glyco-proteins. Bioinformatics 20:903–908.CrossRefPubMedGoogle Scholar
  21. 21.
    Sidhu A, Yang ZR (2006) Predict signal peptides using bio-basis function neural networks. Applied Bioinformatics 5(1):13-19.CrossRefPubMedGoogle Scholar
  22. 22.
    Yang ZR, Dry J, Thomson R, Hodgman C (2006) A bio-basis function neural network for protein peptide cleavage activity characterisation. Neural Networks 19:401–407.CrossRefPubMedGoogle Scholar
  23. 23.
    Yang ZR, Young N (2005) Bio-kernel self-organizing map for HIV drug resistance classification. Lecture Notes in Computer Science 3610:179–184.CrossRefGoogle Scholar
  24. 24.
    Yang ZR (2005) Prediction of caspase cleavage sites using Bayesian bio-basis function neural networks. Bioinformatics 21:1831–1837.CrossRefPubMedGoogle Scholar
  25. 25.
    Yang ZR (2005) Mining SARS-coV protease cleavage data using decision trees, a novel method for decisive template searching. Bioinformatics 21: 2644–2650.CrossRefPubMedGoogle Scholar
  26. 26.
    Yang ZR, Johanathan F (2005) Predict T-cell epitopes using bio-support vector machines. J Chem Inform Comp Sci 45:1142–1148.Google Scholar
  27. 27.
    Vapnik V (1995) The nature of statistical learning theory. Springer-Verlag, New York.Google Scholar
  28. 28.
    Scholkopf B (2000) The kernel trick for distances. Technical report, Microsoft Research, May.Google Scholar
  29. 29.
    Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Machine Learning Res 1:211–244.CrossRefGoogle Scholar
  30. 30.
    Chen S, Cowan CFN, Grant PM (1991) Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans on Neural Networks 2:302–309.CrossRefGoogle Scholar
  31. 31.
    Yang ZR, Chou KC (2003) Bio-support vector machines for computational proteomics. Bioinformatics 19:1–7.CrossRefGoogle Scholar
  32. 32.
    MacKay DJ (1992) A practical Bayesian framework for backpropagation networks. Neural Computation 4:448–472.CrossRefGoogle Scholar
  33. 33.
    Yang ZR(2005) Orthogonal kernel machine in prediction of functional sites in proteins. IEEE Trans on Systems, Man and Cybernetics 35:100–106.CrossRefGoogle Scholar
  34. 34.
    Yang ZR, Thomas A, Young N, Everson R (2006) Relevance peptide machine for HIV-1 drug resistance prediction. IEEE Trans on Computational Biology and Bioinformatics (in press).Google Scholar
  35. 35.
    Putney,S (1992) How antibodies block HIV infection: paths to an AIDS vaccine. Trends in Biochem Sci 7:191–196.CrossRefGoogle Scholar
  36. 36.
    Klausner RD et al. (2003) The need for a global HIV vaccine enterprise. Science 300:2036–2039.CrossRefPubMedGoogle Scholar
  37. 37.
    Kathryn S (2003) HIV vaccine still out of our grasp. The Lancet Infectious Diseases 3:457.Google Scholar
  38. 38.
    Weber IT, Harrison RW (1999) Molecular mechanics analysis of drug-resistant mutants of HIV protease. Protein Eng 12:469–474.CrossRefPubMedGoogle Scholar
  39. 39.
    Jenwitheesuk E, Samudrala R (2005) Prediction of HIV-1 protease inhibitor resistance using a protein-inhibitor flexible docking approach. Antivir Ther 10:157–166.PubMedGoogle Scholar
  40. 40.
    Gallego O, Martin-Carbonero L, Aguero J, de Mendoza C, Corral A, Soriano V (2004) Correlation between rules-based interpretation and virtual phenotype interpretation of HIV-1 genotypes for predicting drug resistance in HIV-infected individuals. J Virol Methods 121:115–118.CrossRefPubMedGoogle Scholar
  41. 41.
    De Luca A, Cingolani A et al. (2003) Variable prediction of antiretroviral treatment outcome by different systems for interpreting genotypic human immunodeficiency virus type 1 drug resistance. J Infect Dis 187:1934–1943.CrossRefPubMedGoogle Scholar
  42. 42.
    Shenderovich MD, Kagan RM, Heseltine PN, Ramnarayan K (2003) Structure-based phenotyping predicts HIV-1 protease inhibitor resistance. Protein Sci 12:1706–1718.CrossRefPubMedGoogle Scholar
  43. 43.
    Brenwinkel N, Schmidt B, Walter H, Kaiser R, Lengauer T, Hoffmann D, Korn K, Selbig J (2002) Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype. PNAS 99:8271–8276.CrossRefGoogle Scholar
  44. 44.
    Sturmer M, Doerr HW, Staszewski S, Preiser W (2003) Comparison of nine resistance interpretation systems for HIV-1 genotyping. Antivir Ther 8:239–244.PubMedGoogle Scholar
  45. 45.
    Vergu E, Mallet A, Golmard JL (2002) The role of resistance characteristics of viral strains in the prediction of the response to antiretroviral therapy in HIV infection. J Acquir Immune Defic Syndr 30:263–270.PubMedGoogle Scholar
  46. 46.
    Smith CJ, Staszewski S et al. (2004) Use of viral load measured after 4 weeks of highly active antiretroviral therapy to predict virologic putcome at 24 weeks for HIV-1-positive individuals. J Acquir Immune Defic Syndr 37:1155–1159.CrossRefPubMedGoogle Scholar
  47. 47.
    Beerenwinkel N, Daumer M et al. (2003) Geno2pheno: estimating phenotypic drug resistance from HIV-1 genotypes. Nucleic Acids Res 31:3850–3855.CrossRefPubMedGoogle Scholar
  48. 48.
    Foulkes AS, De GV (2002) Characterizing the relationship between HIV-1 genotype and phenotype: prediction-based classification. Biometrics 58:145–156.CrossRefPubMedGoogle Scholar
  49. 49.
    De Luca A, Vendittelli et al., (2004) Construction, training and clinical validation of an interpretation system for genotypic HIV-1 drug resistance based on fuzzy rules revised by virological outcomes. Antivir Ther 9:583–593.PubMedGoogle Scholar
  50. 50.
    Draghici S, Potter RB (2003) Predicting HIV drug resistance with neural networks. Bioinformatics 19:98–107.CrossRefPubMedGoogle Scholar
  51. 51.
    Shafer RW, Stevenson D, Chan B (1999) Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res 27:348–352.CrossRefPubMedGoogle Scholar
  52. 52.
    Walter H, Schmidt B, Korn K, Vandamme AM, Harrer T, Uberla K (1999) Rapid, phenotypic HIV-1 drug sensitivity assay for protease and reverse transcriptase inhibitors. J Clin Virol.13:71–80.CrossRefPubMedGoogle Scholar
  53. 53.
    Sa-Filho DJ, Costa LJ, de Oliveira CF, Guimaraes APC, Accetturi CA, Tanuri A, Diaz RS (2003) Analysis of the protease sequences of HIV-1 infected individuals after indinavir monotherapy. J Clin Virology 28:186–202CrossRefGoogle Scholar
  54. 54.
    Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451.PubMedGoogle Scholar
  55. 55.
    Francki R, Fauquet C, Knudson D, Brown F (1991) Classification and nomenclature of virus: fifth report of the International Committee on Taxonomy of Viruses. Arch Virology 2:223s.Google Scholar
  56. 56.
    Choo Q, Kuo G, Weiner AJ, Overby L, Bradley D, Houghton M(1989) Isolation of a cDNA clone derived from a blood-borne non-A non-B viral hepatitis genome. Science 244:359–362.CrossRefPubMedGoogle Scholar
  57. 57.
    Kuo G, Choo Q et al. (1989) An assay for circulating antibodies to a major etiologic virus of human non-A non-B hepatitis. Science 244:362–364.CrossRefPubMedGoogle Scholar
  58. 58.
    Yang ZR (2006) Predicting hepatitis C virus protease cleavage sites using generalised linear indicator regression models. IEEE Trans on Biomedical Engineering 53:2119–2123.CrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science + Business Media, LLC 2008

Authors and Affiliations

  • Zheng Rong Yang
    • 1
  1. 1.School of Engineering, Computer Science and MathematicsUniversity of ExeterExeter, EX4 4QFUK

Personalised recommendations