Abstract
We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340–349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159–164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lesk AM (1995) Systematic representation of protein folding patterns. J Mol Graph 13:159–164
Konagurthu AS, Lesk AM, Allison L (2012) Minimum message length inference of secondary structure from protein coordinate data. Bioinformatics 28(12):i97–i105
Subramanian R, Allison L, Stuckey PJ, Garcia De La Banda M, Abramson D, Lesk AM, Konagurthu AS (2017) Statistical compression of protein folding patterns for inference of recurrent substructural themes. In: IEEE data compression conference proceedings (DCC), pp 340–349
Fox NK, Brenner SE, Chandonia JM (2013) SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42(D1):D304–D309
Kamat AP, Lesk AM (2007) Contact patterns between helices and strands of sheet define protein folding patterns. Proteins 66:869–876
Konagurthu AS, Lesk AM (2010) Cataloging topologies of protein folding patterns. J Mol Recognit 23(2):253–257
Konagurthu AS, Stuckey PJ, Lesk AM (2008) Structural search and retrieval using a tableau representation of protein folding patterns. Bioinformatics 24(5):645–651
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423 and 623–656
Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer Science & Business Media, New York
Allison L (2018) Coding Ockham’s Razor. Springer, Cham
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Konagurthu, A.S. et al. (2019). Information-Theoretic Inference of an Optimal Dictionary of Protein Supersecondary Structures. In: Kister, A. (eds) Protein Supersecondary Structures. Methods in Molecular Biology, vol 1958. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9161-7_6
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9161-7_6
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-9160-0
Online ISBN: 978-1-4939-9161-7
eBook Packages: Springer Protocols