DomSVR: domain boundary prediction with support vector regression from sequence information alone

Chen, Peng; Liu, Chunmei; Burge, Legand; Li, Jinyan; Mohammad, Mahmood; Southerland, William; Gloster, Clay; Wang, Bing

doi:10.1007/s00726-010-0506-6

DomSVR: domain boundary prediction with support vector regression from sequence information alone

Original Article
Published: 18 February 2010

Volume 39, pages 713–726, (2010)
Cite this article

Amino Acids Aims and scope Submit manuscript

Peng Chen^1,2,
Chunmei Liu¹,
Legand Burge¹,
Jinyan Li²,
Mahmood Mohammad³,
William Southerland⁴,
Clay Gloster⁵ &
…
Bing Wang⁶

1265 Accesses
22 Citations
Explore all metrics

Abstract

Protein domains are structural and fundamental functional units of proteins. The information of protein domain boundaries is helpful in understanding the evolution, structures and functions of proteins, and also plays an important role in protein classification. In this paper, we propose a support vector regression-based method to address the problem of protein domain boundary identification based on novel input profiles extracted from AAindex database. As a result, our method achieves an average sensitivity of ∼36.5% and an average specificity of ∼81% for multi-domain protein chains, which is overall better than the performance of published approaches to identify domain boundary. As our method used sequence information alone, our method is simpler and faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid method for identification of structural domains

Article Open access 15 December 2014

Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

Article 27 December 2016

A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction

Article 01 March 2024

References

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Article CAS PubMed Google Scholar
Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412–424
Article CAS PubMed Google Scholar
Bryson K, McGuffin LJ, Marsden RL, Ward JJ, Sodhi JS, Jones DT (2005) Protein structure prediction servers at University College London. Nucleic Acids Res 33:w36–w38
Article CAS PubMed Google Scholar
Chen P, Wang B, Wong HS, Huang DS (2007) Prediction of protein B-factors using multi-class bounded SVM. Protein Pept Lett 14(2):185–190
Article CAS PubMed Google Scholar
Cheng J, Sweredoski MJ, Baldi P (2006) DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Min Knowl Discov 13:1–10
Article Google Scholar
Chivian D, Kim DE, Malmstrom L, Bradley P, Robertson T, Murphy P, Strauss CE, Bonneau R, Rohl CA, Baker D (2003) Automated prediction of CASP-5 structures using the Robetta server. Proteins 53(S6):524–533
Article CAS PubMed Google Scholar
Copley RR, Doerksa T, Letunica I, Borka P (2002) Protein domain analysis in the era of complete genomes. FEBS Lett 513:129–134
Article CAS PubMed Google Scholar
Dovidchenko NV, Lobanov MY, Galzitskaya OV (2007) Prediction of number and position of domain boundaries in multi-domain proteins by use of amino acid sequence alone. Curr Protein Pept Sci 8(2):189–195
Article CAS PubMed Google Scholar
Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1996) Support vector regression machines. In: Proceedings of the NIPS, pp 155–161
Dumontier M, Feldman R, Yao HJ, Hogue CWV (2005) Armadillo: doamin boundary prediction by amino acid composition. J Mol Biol 350:1061–1073
Article CAS PubMed Google Scholar
Edelman GM (1973) Antibody structure and molecular immunology. Science 180:830–840
Article CAS PubMed Google Scholar
Fukuchi S, Nishikawa K (2001) Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. J Mol Biol 309:835–843
Article CAS PubMed Google Scholar
Galzitskaya OV, Melnik BS (2003) Prediction of protein domain boundaries from sequence alone. Protein Sci 12:696–701
Article CAS PubMed Google Scholar
George RA, Heringa J (2002) Protein domain identification and improved sequence similarity searching using PSI-BLAST. Proteins: Struct Funct Gen 48:672–681
Article CAS Google Scholar
George RA, Heringa J (2002) SNAPDRAGON: a new method to predict protein structural domain boundaries from sequence data. J Mol Biol 316:839–851
Article CAS PubMed Google Scholar
Gewehr JE, Zimmer R (2006) SSEP-Domain: protein domain prediction by alignment of secondary structure elements and profiles. Bioinformatics 22:181–187
Article CAS PubMed Google Scholar
Goodall C (1990) Modern methods of data analysis. Sage Publications, Newbury Park, CA
Google Scholar
Gunn SR (1998) Support vector machines for classification and regression. Faculty of Engineering and Applied Science, University of Southampton
Heger A, Holm L (2003) Exhaustive enumeration of protein domain families. J Mol Biol 328:749–767
Article CAS PubMed Google Scholar
Jolliffe IT (2002) Principal component analysis. Springer, NY.
Google Scholar
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2008) AAindex: amino acid index database, progress report. Nucleic Acids Res 36:D202–D205
Article CAS PubMed Google Scholar
Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261:552–558
Article CAS PubMed Google Scholar
Lexa M, Valle G (2003) PRIMEX: rapid identification of oligonucleotide matches in whole genomes. Bioinformatics 19:2486–2488
Article CAS PubMed Google Scholar
Linding R, Russell RB, Neduva V, Gibson TJ (2003) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31:3701–3708
Article CAS PubMed Google Scholar
Liu J, Rost B (2004) Sequence-based prediction of protein domains. Nucleic Acids Res 32:3522–3530
Article CAS PubMed Google Scholar
Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35:D237–240
Article CAS PubMed Google Scholar
Marsden RL, McGuffin LJ, Jones DT (2002) Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci 11:2814–2824
Article CAS PubMed Google Scholar
Miyazawa S, Jernigan RL (1999) Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues. Proteins 34:49–68
Article CAS PubMed Google Scholar
Munoz V, Serrano L (1994) Intrinsic secondary structure propensities of the amino acids, using statistical phi–psi matrices: comparison with experimental scale. Proteins 20:301–311
Article CAS PubMed Google Scholar
Nagarajan N, Yona G (2004) Automatic prediction of protein domains from sequence information using a hybrid learning system. Bioinformatics 20:1335–1360
Article CAS PubMed Google Scholar
Nanduri S, Carpick BW, Yang Y, Williams BR, Qin J (1998) Structure of the double-stranded RNA-binding domain of the protein kinase PKR reveals the molecular basis of its dsRNA-mediated activation. EMBO J 17:5458–5465
Article CAS PubMed Google Scholar
Orengo CA, Michie AD, Jones DT, Swindells MB, Thornton JM (1997) CATH: a hierarchic classification of protein domain structures. Structure 5:1093–1108
Article CAS PubMed Google Scholar
Porter RR (1973) Structural studies of immunoglobulins. Science 180:713–716
Article CAS PubMed Google Scholar
Rackovsky S, Scheraga HA (1982) Differential geometry and polymer conformation. 4. Conformational and nucleation properties of individual amino acids. Macromolecules 15:1340–1346
Article CAS Google Scholar
Saini HK, Fischer D (2005) Meta-DP: domain prediction meta server. Bioinformatics 21:2917–2920
Article CAS PubMed Google Scholar
Sikder AR, Zomaya AY (2006) Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index. BMC Bioinform 7:S6
Article Google Scholar
Sim J, Kim SY, Lee J (2005) PRODO: prediction of protein domain boundaries using neural networks. Proteins 59:627–632
Article CAS PubMed Google Scholar
Suyama M, Ohara O (2003) DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 19:673–674
Article CAS PubMed Google Scholar
von Ohsen N, Sommer I, Zimmer R, Lengauer T (2004) Arby: automatic protein structure prediction using profile-profile alignment and confidence measures. Bioinformatics 20:2228–2235
Article PubMed Google Scholar
Wetlaufer DB (1973) Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci USA 70:697–701
Article CAS PubMed Google Scholar
Ye L, Liu T, Wu Z, Zhou R (2007) Sequence-based protein domain boundary prediction using BP neural network with various property profiles. Proteins: Struct Funct Bioinform 71:300–307
Article Google Scholar
Yoo PD, Sikder AR, Zhou BB, Zomaya AY (2008) Improved general regression network for protein domain boundary prediction. BMC Bioinform 9:S12
Article Google Scholar
Zdobnov EM, Apweiler R (2001) InterProScan-an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848
Article CAS PubMed Google Scholar
Zhou Y, Vitkup D, Karplus M (1999) Native proteins are surface-molten solids: application of the Lindemann criterion for the solid versus liquid state. J Mol Biol 285:1371–1375
Article CAS PubMed Google Scholar

Download references

Acknowledgments

This work was supported in part by grant 2 G12 RR003048 from the RCMI program, Division of Research Infrastructure, National Center for Research Resources, NIH and the Mordecai Wyatt Johnson program of Howard University. This work was also supported in part by the Singapore MOE ARC Tier-2 funding grant T208B2203 and the National Science Foundation of China (No. 60803107). CL’s work was supported by NSF (CCF-0845888).

Author information

Authors and Affiliations

Department of Systems and Computer Science, Howard University, 2400 Sixth Street, NW, Washington, DC, 20059, USA
Peng Chen, Chunmei Liu & Legand Burge
Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, Singapore, 639798, Singapore
Peng Chen & Jinyan Li
Department of Mathematics, Howard University, 2400 Sixth Street, NW, Washington, DC, 20059, USA
Mahmood Mohammad
Department of Biochemistry, Howard University, 2400 Sixth Street, NW, Washington, DC, 20059, USA
William Southerland
Department of Electrical and Computer Engineering, Howard University, 2400 Sixth Street, NW, Washington, DC, 20059, USA
Clay Gloster
School of Electrical Engineering and Information, Anhui University of Technology, Hudong Road 59, Ma’anshan, 243002, Anhui, People’s Republic of China
Bing Wang

Authors

Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chunmei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Legand Burge
View author publications
You can also search for this author in PubMed Google Scholar
Jinyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Mahmood Mohammad
View author publications
You can also search for this author in PubMed Google Scholar
William Southerland
View author publications
You can also search for this author in PubMed Google Scholar
Clay Gloster
View author publications
You can also search for this author in PubMed Google Scholar
Bing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, P., Liu, C., Burge, L. et al. DomSVR: domain boundary prediction with support vector regression from sequence information alone. Amino Acids 39, 713–726 (2010). https://doi.org/10.1007/s00726-010-0506-6

Download citation

Received: 23 September 2009
Accepted: 25 January 2010
Published: 18 February 2010
Issue Date: August 2010
DOI: https://doi.org/10.1007/s00726-010-0506-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DomSVR: domain boundary prediction with support vector regression from sequence information alone

Abstract

Access this article

Similar content being viewed by others

A hybrid method for identification of structural domains

Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DomSVR: domain boundary prediction with support vector regression from sequence information alone

Abstract

Access this article

Similar content being viewed by others

A hybrid method for identification of structural domains

Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers

A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation