Protein Secondary Structure Prediction Using Machine Learning

Saha, Sriparna; Ekbal, Asif; Sharma, Sidharth; Bandyopadhyay, Sanghamitra; Maulik, Ujjwal

doi:10.1007/978-3-642-32063-7_7

Sriparna Saha³,
Asif Ekbal³,
Sidharth Sharma³,
Sanghamitra Bandyopadhyay⁴ &
…
Ujjwal Maulik⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 182))

1881 Accesses

Abstract

Protein structure prediction is an important component in understanding protein structures and functions. Accurate prediction of protein secondary structure helps in understanding protein folding. In many applications such as drug discovery it is required to predict the secondary structure of unknown proteins. In this paper we report our first attempt to secondary structure predication, and approach it as a sequence classification problem, where the task is equivalent to assigning a sequence of labels (i.e. helix, sheet, and coil) to the given protein sequence. We propose an ensemble technique that is based on two stochastic supervised machine learning algorithms, namely Maximum Entropy Markov Model (MEMM) and Conditional Random Field (CRF). We identify and implement a set of features that mostly deal with the contextual information. The proposed approach is evaluated with a benchmark dataset, and it yields encouraging performance to explore it further. We obtain the highest predictive accuracy of 61.26% and segment overlap score (SOV) of 52.30%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Darroch, J., Ratcliff, D.: Generalized Iterative Scaling for Log-linear Models. Ann. Math. Statistics 43, 1470–1480 (1972)
Article MathSciNet MATH Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic framework for segmenting and labelling sequence data. In: 18th International Conference on Maching Learning, pp. 282–289. Morgan Kaufmann, San Franciso (2001)
Google Scholar
Thorton, J.M.: From genome to function. Science 292, 2095–2097 (2001)
Article Google Scholar
Zemla, A., Venclovas, C., Fidelis, K., Rost, B.: A modified definition of sov, a segment-based measure for protein secondary structure prediction assessment. PROTEINS: Structure, Function, and Genetics 34, 220–223 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna, India
Sriparna Saha, Asif Ekbal & Sidharth Sharma
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Sanghamitra Bandyopadhyay
Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Ujjwal Maulik

Authors

Sriparna Saha
View author publications
You can also search for this author in PubMed Google Scholar
Asif Ekbal
View author publications
You can also search for this author in PubMed Google Scholar
Sidharth Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Sanghamitra Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Ujjwal Maulik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sriparna Saha .

Editor information

Editors and Affiliations

(MIR Labs), Scientific Network for Innovation and, Machine Intelligence Research Labs, MIR Labs Campus, Auburn, 98071, Washington, USA
Ajith Abraham
Technology and Management, Indian Institute of Information, Technopark Campus, Trivandrum, 695581, India
Sabu M Thampi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saha, S., Ekbal, A., Sharma, S., Bandyopadhyay, S., Maulik, U. (2013). Protein Secondary Structure Prediction Using Machine Learning. In: Abraham, A., Thampi, S. (eds) Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32063-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-32063-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32062-0
Online ISBN: 978-3-642-32063-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics