Skip to main content
Log in

DomSVR: domain boundary prediction with support vector regression from sequence information alone

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Protein domains are structural and fundamental functional units of proteins. The information of protein domain boundaries is helpful in understanding the evolution, structures and functions of proteins, and also plays an important role in protein classification. In this paper, we propose a support vector regression-based method to address the problem of protein domain boundary identification based on novel input profiles extracted from AAindex database. As a result, our method achieves an average sensitivity of ∼36.5% and an average specificity of ∼81% for multi-domain protein chains, which is overall better than the performance of published approaches to identify domain boundary. As our method used sequence information alone, our method is simpler and faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  CAS  PubMed  Google Scholar 

  • Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412–424

    Article  CAS  PubMed  Google Scholar 

  • Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33:w36–w38

    Article  CAS  PubMed  Google Scholar 

  • Chen P, Wang B, Wong HS, Huang DS (2007) Prediction of protein B-factors using multi-class bounded SVM. Protein Pept Lett 14(2):185–190

    Article  CAS  PubMed  Google Scholar 

  • Cheng J, Sweredoski MJ, Baldi P (2006) DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Min Knowl Discov 13:1–10

    Article  Google Scholar 

  • Chivian D, Kim DE, Malmstrom L, Bradley P, Robertson T, Murphy P, Strauss CE, Bonneau R, Rohl CA, Baker D (2003) Automated prediction of CASP-5 structures using the Robetta server. Proteins 53(S6):524–533

    Article  CAS  PubMed  Google Scholar 

  • Copley RR, Doerksa T, Letunica I, Borka P (2002) Protein domain analysis in the era of complete genomes. FEBS Lett 513:129–134

    Article  CAS  PubMed  Google Scholar 

  • Dovidchenko NV, Lobanov MY, Galzitskaya OV (2007) Prediction of number and position of domain boundaries in multi-domain proteins by use of amino acid sequence alone. Curr Protein Pept Sci 8(2):189–195

    Article  CAS  PubMed  Google Scholar 

  • Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1996) Support vector regression machines. In: Proceedings of the NIPS, pp 155–161

  • Dumontier M, Feldman R, Yao HJ, Hogue CWV (2005) Armadillo: doamin boundary prediction by amino acid composition. J Mol Biol 350:1061–1073

    Article  CAS  PubMed  Google Scholar 

  • Edelman GM (1973) Antibody structure and molecular immunology. Science 180:830–840

    Article  CAS  PubMed  Google Scholar 

  • Fukuchi S, Nishikawa K (2001) Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. J Mol Biol 309:835–843

    Article  CAS  PubMed  Google Scholar 

  • Galzitskaya OV, Melnik BS (2003) Prediction of protein domain boundaries from sequence alone. Protein Sci 12:696–701

    Article  CAS  PubMed  Google Scholar 

  • George RA, Heringa J (2002) Protein domain identification and improved sequence similarity searching using PSI-BLAST. Proteins: Struct Funct Gen 48:672–681

    Article  CAS  Google Scholar 

  • George RA, Heringa J (2002) SNAPDRAGON: a new method to predict protein structural domain boundaries from sequence data. J Mol Biol 316:839–851

    Article  CAS  PubMed  Google Scholar 

  • Gewehr JE, Zimmer R (2006) SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles. Bioinformatics 22:181–187

    Article  CAS  PubMed  Google Scholar 

  • Goodall C (1990) Modern methods of data analysis. Sage Publications, Newbury Park, CA

    Google Scholar 

  • Gunn SR (1998) Support vector machines for classification and regression. Faculty of Engineering and Applied Science, University of Southampton

  • Heger A, Holm L (2003) Exhaustive enumeration of protein domain families. J Mol Biol 328:749–767

    Article  CAS  PubMed  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis. Springer, NY.

    Google Scholar 

  • Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2008) AAindex: amino acid index database, progress report. Nucleic Acids Res 36:D202–D205

    Article  CAS  PubMed  Google Scholar 

  • Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261:552–558

    Article  CAS  PubMed  Google Scholar 

  • Lexa M, Valle G (2003) PRIMEX: rapid identification of oligonucleotide matches in whole genomes. Bioinformatics 19:2486–2488

    Article  CAS  PubMed  Google Scholar 

  • Linding R, Russell RB, Neduva V, Gibson TJ (2003) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31:3701–3708

    Article  CAS  PubMed  Google Scholar 

  • Liu J, Rost B (2004) Sequence-based prediction of protein domains. Nucleic Acids Res 32:3522–3530

    Article  CAS  PubMed  Google Scholar 

  • Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35:D237–240

    Article  CAS  PubMed  Google Scholar 

  • Marsden RL, McGuffin LJ, Jones DT (2002) Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci 11:2814–2824

    Article  CAS  PubMed  Google Scholar 

  • Miyazawa S, Jernigan RL (1999) Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues. Proteins 34:49–68

    Article  CAS  PubMed  Google Scholar 

  • Munoz V, Serrano L (1994) Intrinsic secondary structure propensities of the amino acids, using statistical phi–psi matrices: comparison with experimental scale. Proteins 20:301–311

    Article  CAS  PubMed  Google Scholar 

  • Nagarajan N, Yona G (2004) Automatic prediction of protein domains from sequence information using a hybrid learning system. Bioinformatics 20:1335–1360

    Article  CAS  PubMed  Google Scholar 

  • Nanduri S, Carpick BW, Yang Y, Williams BR, Qin J (1998) Structure of the double-stranded RNA-binding domain of the protein kinase PKR reveals the molecular basis of its dsRNA-mediated activation. EMBO J 17:5458–5465

    Article  CAS  PubMed  Google Scholar 

  • Orengo CA, Michie AD, Jones DT, Swindells MB, Thornton JM (1997) CATH: a hierarchic classification of protein domain structures. Structure 5:1093–1108

    Article  CAS  PubMed  Google Scholar 

  • Porter RR (1973) Structural studies of immunoglobulins. Science 180:713–716

    Article  CAS  PubMed  Google Scholar 

  • Rackovsky S, Scheraga HA (1982) Differential geometry and polymer conformation. 4. Conformational and nucleation properties of individual amino acids. Macromolecules 15:1340–1346

    Article  CAS  Google Scholar 

  • Saini HK, Fischer D (2005) Meta-DP: domain prediction meta server. Bioinformatics 21:2917–2920

    Article  CAS  PubMed  Google Scholar 

  • Sikder AR, Zomaya AY (2006) Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index. BMC Bioinform 7:S6

    Article  Google Scholar 

  • Sim J, Kim SY, Lee J (2005) PRODO: prediction of protein domain boundaries using neural networks. Proteins 59:627–632

    Article  CAS  PubMed  Google Scholar 

  • Suyama M, Ohara O (2003) DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 19:673–674

    Article  CAS  PubMed  Google Scholar 

  • von Ohsen N, Sommer I, Zimmer R, Lengauer T (2004) Arby: automatic protein structure prediction using profile-profile alignment and confidence measures. Bioinformatics 20:2228–2235

    Article  PubMed  Google Scholar 

  • Wetlaufer DB (1973) Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci USA 70:697–701

    Article  CAS  PubMed  Google Scholar 

  • Ye L, Liu T, Wu Z, Zhou R (2007) Sequence-based protein domain boundary prediction using BP neural network with various property profiles. Proteins: Struct Funct Bioinform 71:300–307

    Article  Google Scholar 

  • Yoo PD, Sikder AR, Zhou BB, Zomaya AY (2008) Improved general regression network for protein domain boundary prediction. BMC Bioinform 9:S12

    Article  Google Scholar 

  • Zdobnov EM, Apweiler R (2001) InterProScan-an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848

    Article  CAS  PubMed  Google Scholar 

  • Zhou Y, Vitkup D, Karplus M (1999) Native proteins are surface-molten solids: application of the Lindemann criterion for the solid versus liquid state. J Mol Biol 285:1371–1375

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

This work was supported in part by grant 2 G12 RR003048 from the RCMI program, Division of Research Infrastructure, National Center for Research Resources, NIH and the Mordecai Wyatt Johnson program of Howard University. This work was also supported in part by the Singapore MOE ARC Tier-2 funding grant T208B2203 and the National Science Foundation of China (No. 60803107). CL’s work was supported by NSF (CCF-0845888).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, P., Liu, C., Burge, L. et al. DomSVR: domain boundary prediction with support vector regression from sequence information alone. Amino Acids 39, 713–726 (2010). https://doi.org/10.1007/s00726-010-0506-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-010-0506-6

Keywords

Navigation