Abstract
In this paper, diversity and conservation in the ‘landscape’ of random variation of protein tertiary structures are explored for quantitative feature-vector models of major types of functionally important 3D structural motifs. For this, I have deployed a recently developed nonparametric regression (NPR)-based multidimensional copula method of simulation. Apart from improved accuracy of multidimensional random sample generation, the simulation provides additional insight into diversity in the protein structural landscape in terms of random variation in the feature-vector. It shows the relative importance of several features, with biological implications, in conservation of motifs. Mapping of this landscape in distance-preserving 2D eigenspace also shows consistency in demarcation of different motif classes and preservation of their characteristic patterns in this 2D space.
Similar content being viewed by others
References
Zhang J, Grigoryan G (2013) Methods Enzymol 523:21–40. https://doi.org/10.1016/B978-0-12-394292-0.00002-3
Zhou J, Gevorg GG (2014) Protein Sci 24:508–524. https://doi.org/10.1002/pro.2610
Jun X, Nak-Kyeong K (2005) J Comput Biol 12(7):950–968
Joshi RR, Hira U, Suri D (2009) Protein Pept Lett 16(11):1393–1398
Joshi RR, Sekharan S (2010) Protein Pept Lett 17(10):1198–1206
Joshi RR, Sreenath S (2014) J Mol Model 20(1):2077–2085. https://doi.org/10.1007/s00894-014-2077-z
Henikoff S, Henikoff JG, Alford WJ, Pietrokovski S (1995) Gene 163:7–26
Orengo CA, Michie AD, Jones DT, Swindells MB, Thornton JM (1997) Structure 5:1093–1108
Gonnet P, Lisacek F (2002) Bioinformatics 18:1091–1101
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL, Studholme DJ, Yeats C, Eddy SR (2004) Nucl Acids Res Database Issue 32:D138–D141
Tao T, Zhai CX, Lu X, Fang H (2004) Appl Bioinforma 3(2–3):115–124
Chen BY, Fofanov VY, Kristensen DM, Kimmel M, Lichtarge O, Kavraki LE (2005) Proc Pac Symp Biocompu 10:334–345
Cassela G, George EI (1992) Am Stat 46:167–174
Thijs G, Marchal K, Lescot M, Rombauts S, De Moor B, Rouzé P, Moreau Y (2002) J Comput Biol 9(2):447–464
Mckenzie CO, Zhou J, Grigoryan G (2016) Proc Natl Acad Sci U S A 113(47):E7438–E7447
David P, Leader E, Milner-White J (2015) PROTEINS: Struct Funct Bioinform 83(11):2067–2076
Michalik M, Orwick-Rydmark M, Habeck M, Alva V, Arnold T, Linke D (2017) PLoS One 12(8):e0182016. https://doi.org/10.1371/journal.pone.0182016
Mckenzie CO, Grigoryan G (2017) Curr Opin Struct Biol 44:161–167. https://doi.org/10.1016/j.sbi.2017.03.012
Nepomnyachiya S, Ben-Tala N, Kolodny R (2017) Proc Natl Acad Sci U S A 114(44):11703–11708
Kozakov D, Hall DR, Chuang G-Y, Cencic R, Brenke R, Grove LE, Beglov D, Pelletier J, Whitty A, Vajda S (2011) Proc Natl Acad Sci U S A 108(33):13528–13533
Joshi RR, Krishnanand K (1996) J Comp Biol 3(1):143–162
Joshi RR (2001) Protein Pept Lett 8(4):257–264
Xu D, Li H, Gu T (2008) In: Chen F, Juttler B (ed) Advances in geometrical modelling and processing. Lect Notes Comp Sci 4975:556–562. Springer, Berlin
Chi PH, Scott G, Shyu CR (2005) Int J Softw Eng Knowl Eng 15(3):527–545
Chi PH, Shyu CR, Xu D (2006) BMC Bioinform 7:362. https://doi.org/10.1186/1471-2105-7-362
Joshi RR, Panigrahi P, Patil RN (2012) J Mol Model 18(6):2741–2754. https://doi.org/10.1007/s00894-011-1223-0
Teodorescu D (1977) Biol Cybern 28(2):83–93
Adami C, Ofria C, Collier TC (2000) Proc Natl Acad Sci U S A 97:4463–4468
Adami C (2004) Information theory in molecular biology. Phys Life Rev 1:3–22
Williams OT (ed) (2007) Biological cybernetics – research trends. Nova Science, New York
Joshi RR (1990) Math Comput Model 13(10):59–65
Jones G, Hobert J (2001) Stat Sci 16:312–334
Nelsen RB (2006) Introduction to copulas. Springer, New York
Voet D, Voet JG (2004) Biochemsitry. Wiley, Hoboken
Dewasthaly SS, Bhonde GS, Shankarraman V, Biswas SM, Ayachit VM, Gore MM (2007) Protein Pept Lett 14(6):543–551
McConkey BJ, Sobolev V, Edelman M (2002) Bioiniformatics 18(10):1365–1373
Härdle W (1990) Applied nonparametric regression. Cambridge Univ Press, Cambridge
Everitt BS, Dunn GD (2001) Applied multivariate data analysis, 2nd edn. Arnold, London
Acknowledgements
The author would like to thank Srijit Chakrabarty for implementing the initial version of the author’s algorithm as part of his MSc project. The version used in this work has been developed further with significant modifications.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
(Re: Legend of Figs. 1, 2, 3, 4, 5 and 6)
Let θij(v) denote the coefficient of (Rank or Partial, as the case may be) correlation between Xi and Xj in the Validation sample and let θij(g) denote the same in the Generated sample. Let η(i,j) = | θij(v) - θij(g) |, i, j = 1, .....11.
As the correlation matrices are symmetric, so is the matrix of their difference, i.e., η(i,j) = η(j,i). As each element of the diagonals of the correlation matrices is = 1, (and almost so are their estimates for both the samples with accuracy up to 7th place of decimal), so η(i,i) = 0 for each i. These properties are satisfied in our results for both the methods. Therefore only the values below the diagonal elements (i.e. η(i,j) for i = j+1, to 11, for j = 1, .....,11) are shown in Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12.
In these bar-diagrams, the integers 1 to 10 along the X-axis indicate the suffix-labels (i, j) = (2,1), .... (11,1) respectively; integers, 11 to 19 indicate the suffix-labels (3,2), .... (11,2) respectively; and so on ....., integers, 53 and 54 indicate the suffix-labels (10,9) and (11,9) respectively; and integer 55 indicates the suffix-label (11,10).
The numerical values of the corresponding η(i,j) are shown along the Y-axis.
Rights and permissions
About this article
Cite this article
Joshi, R.R. Diversity and motif conservation in protein 3D structural landscape: exploration by a new multivariate simulation method. J Mol Model 24, 76 (2018). https://doi.org/10.1007/s00894-018-3614-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00894-018-3614-y