Abstract
The sequence parameters for halophilic adaptation are still not fully understood. To understand the molecular basis of protein hypersaline adaptation, a detailed analysis is carried out, and investigated the likely association of protein sequence attributes to halophilic adaptation. A two-stage strategy is implemented, where in the first stage a supervised machine learning classifier is build, giving an overall accuracy of 86 % on stratified tenfold cross validation and 90 % on blind testing set, which are better than the previously reported results. The second stage consists of statistical analysis of sequence features and possible extraction of halophilic molecular signatures. The results of this study showed that, halophilic proteins are characterized by lower average charge, lower K content, and lower S content. A statistically significant preference/avoidance list of sequence parameters is also reported giving insights into the molecular basis of halophilic adaptation. D, Q, E, H, P, T, V are significantly preferred while N, C, I, K, M, F, S are significantly avoided. Among amino acid physicochemical groups, small, polar, charged, acidic and hydrophilic groups are preferred over other groups. The halophilic proteins also showed a preference for higher average flexibility, higher average polarity and avoidance for higher average positive charge, average bulkiness and average hydrophobicity. Some interesting trends observed in dipeptide counts are also reported. Further a systematic statistical comparison is undertaken for gaining insights into the sequence feature distribution in different residue structural states. The current analysis may facilitate the understanding of the mechanism of halophilic adaptation clearer, which can be further used for rational design of halophilic proteins.
Similar content being viewed by others
References
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159. doi:10.1016/S0031-3203(96)00142-2
Britton KL, Stillman TJ, Yip KSP, Forterre P, Engel PC, Rice DW (1998) Insights into the molecular basis of salt tolerance from the study of glutamate dehydrogenase from Halobacterium salinarum. J Biol Chem 273(15):9023–9030. doi:10.1074/jbc.273.15.9023
Brocchieri L (2004) Environmental signatures in proteome properties. Proc Natl Acad Sci USA 101(22):8257–8258. doi:10.1073/pnas.0402797101
Collard MD, Charles D (2007) A razor may be sharper than an ax, but it cannot cut wood. Anesthesiology 106(3):420–422
Delgado-García M, Valdivia-Urdiales B, Aguilar-González CN, Contreras-Esquivel JC, Rodríguez-Herrera R (2012) Halophilic hydrolases as a new tool for the biotechnological industries. J Sci Food Agric 92(13):2575–2580. doi:10.1002/jsfa.5860
Ebrahimie E, Ebrahimi M, Sarvestani N, Ebrahimi M (2011) Protein attributes contribute to halo-stability, bioinformatics approach. Saline Syst 7(1):1
Eisenberg H (1995) Life in unusual environments: progress in understanding the structure and function of enzymes from extreme halophilic bacteria. Arch Biochem Biophys 318(1):1–5. doi:10.1006/abbi.1995.1196
Elcock AH, McCammon JA (1998) Electrostatic contributions to the stability of halophilic proteins. J Mol Biol 280(4):731–748. doi:10.1006/jmbi.1998.1904
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. Paper presented at the proceedings of the fifteenth international conference on machine learning
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. doi:10.1145/1656274.1656278
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637. doi:10.1002/bip.360221211
Kastritis PL, Papandreou NC, Hamodrakas SJ (2007) Haloadaptation: insights from comparative modeling studies of halophilic archaeal DHFRs. Int J Biol Macromol 41(4):447–453. doi:10.1016/j.ijbiomac.2007.06.005
Kawashima S, Ogata H, Kanehisa M (1999) AAindex: amino acid index database. Nucleic Acids Res 27(1):368–369. doi:10.1093/nar/27.1.368
Kimura J, Kimura M (1987) The primary structures of ribosomal proteins S14 and S16 from the archaebacterium Halobacterium marismortui. Comparison with eubacterial and eukaryotic ribosomal proteins. J Biol Chem 262(25):12150–12157
Kira K, Rendell LA (1992) A practical approach to feature selection. Paper presented at the proceedings of the ninth international workshop on machine learning, Aberdeen, Scotland, United Kingdom
Kumari P, Nath A, Chaube R (2015) Identification of human drug targets using machine-learning algorithms. Comp Biol Med 56:175–181. doi:10.1016/j.compbiomed.2014.11.008
Kuncheva LI (2002) A theoretical study on six classifier fusion strategies. IEEE Trans Pattern Anal Mach Intell 24(2):281–286. doi:10.1109/34.982906
Kuntz ID (1971) Hydration of macromolecules. III. Hydration of polypeptides. J Am Chem Soc 93(2):514–516. doi:10.1021/ja00731a036
Lanyi JK (1974) Salt-dependent properties of proteins from extremely halophilic bacteria. Bacteriol Rev 38(3):272–290
Lee B, Richards FM (1971) The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55(3):379–400. doi:10.1016/0022-2836(71)90324-X
Madern D, Ebel C, Zaccai G (2000) Halophilic adaptation of enzymes. Extremophiles 4(2):91–98. doi:10.1007/s007920050142
Madigan MT, Marrs BL (1997) Extremophiles. Sci Am 276(4):82–87
Metpally R, Reddy B (2009) Comparative proteome analysis of psychrophilic versus mesophilic bacterial species: insights into the molecular basis of cold adaptation of proteins. BMC Genom 10(1):11
Mevarech M, Frolow F, Gloss LM (2000) Halophilic enzymes: proteins with a grain of salt. Biophys Chem 86(2–3):155–164. doi:10.1016/S0301-4622(00)00126-5
Nath A, Chaube R, Karthikeyan S (2012) Discrimination of psychrophilic and mesophilic proteins using random forest algorithm. In: Biomedical Engineering and Biotechnology (iCBEB), 2012 International Conference on 28–30 May 2012, pp 179–182. doi:10.1109/iCBEB.2012.151
Nath A, Subbiah K (2014) Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. Comput Biol Chem Part B. doi:10.1016/j.compbiolchem.2014.10.002
Nath A, Chaube R, Subbiah K (2013) An insight into the molecular basis for convergent evolution in fish antifreeze proteins. Comput Biol Med 43(7):817–821. doi:10.1016/j.compbiomed.2013.04.013
Noble WS (2009) How does multiple testing correction work? Nat Biotech 27(12):1135–1137
Paul S, Bag S, Das S, Harvill E, Dutta C (2008) Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes. Genome Biol 9(4):1–19. doi:10.1186/gb-2008-9-4-r70
Pikuta EV, Hoover RB, Tang J (2007) Microbial extremophiles at the limits of life. Crit Rev Microbiol 33(3):183–209. doi:10.1080/10408410701451948
Reed CJ, Lewis H, Trejo E, Winston V, Evilia C (2013) Protein adaptations in archaeal extremophiles. Archaea 2013:14. doi:10.1155/2013/373275
Rost B, Sander C (1994) Conservation and prediction of solvent accessibility in protein families. Proteins Struct Funct Bioinf 20(3):216–226. doi:10.1002/prot.340200303
Seewald AK (2002) How to make stacking better and faster while also taking care of an unknown weakness. Paper presented at the proceedings of the nineteenth international conference on machine learning
Siddiqui KS, Thomas T (eds) (2008) Protein adaptation in extremophiles. Molecular anatomy and physiologyof proteins, Uversky VN (series ed). Nova Biomedical Books, New York
Siglioccolo A, Paiardini A, Piscitelli M, Pascarella S (2011) Structural adaptation of extreme halophilic proteins through decrease of conserved hydrophobic contact surface. BMC Struct Biol 11(1):1–12. doi:10.1186/1472-6807-11-50
Smole Z, Nikolic N, Supek F, Smuc T, Sbalzarini I, Krisko A (2011) Proteome sequence features carry signatures of the environmental niche of prokaryotes. BMC Evol Biol 11(1):26
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci 100(16):9440–9445. doi:10.1073/pnas.1530509100
Tadeo X, López-Méndez B, Trigueros T, Laín A, Castaño D, Millet O (2009) Structural basis for the aminoacid composition of proteins from halophilic archaea. PLoS Biol 7(12):e1000257. doi:10.1371/journal.pbio.1000257
Tekaia F, Yeramian E (2006) Evolution of proteomes: fundamental signatures and global trends in amino acid compositions. BMC Genom 7(1):307
Tekaia F, Yeramian E, Dujon B (2002) Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 297(1–2):51–60. doi:10.1016/S0378-1119(02)00871-5
Wakulicz-Deja A, Boryczka M, Paszek P (1998) Discretization of continuous attributes on decision system in mitochondrial encephalomyopathies. In: Polkowski L, Skowron A (eds) Rough sets and current trends in computing, vol 1424., Lecture notes in computer scienceSpringer, Berlin, pp 483–490. doi:10.1007/3-540-69115-4_66
Wolpert DH (1992) Original contribution: stacked generalization. Neural Netw 5(2):241–259. doi:10.1016/s0893-6080(05)80023-1
Zaccai G, Cendrin F, Haik Y, Borochov N, Eisenberg H (1989) Stabilization of halophilic malate dehydrogenase. J Mol Biol 208(3):491–500. doi:10.1016/0022-2836(89)90512-3
Zhang G, Ge H (2013a) Protein hypersaline adaptation: insight from amino acids with machine learning algorithms. Protein J 32(4):239–245. doi:10.1007/s10930-013-9484-3
Zhang G, Ge H (2013b) Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. Comput Biol Chem 46:16–22. doi:10.1016/j.compbiolchem.2013.05.001
Zhang G, Huihua G, Yi L (2013) Stability of halophilic proteins: from dipeptide attributes to discrimination classifier. Int J Biol Macromol 53:1–6. doi:10.1016/j.ijbiomac.2012.10.031
Zheng J, Khil PP, Camerini-Otero RD, Przytycka TM (2010) Detecting sequence polymorphisms associated with meiotic recombination hotspots in the human genome. Genome Biol 11(10):R103. doi:10.1186/gb-2010-11-10-r103
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that there is no conflict of interest.
Additional information
Handling Editor: S. C. E. Tosatto.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Nath, A. Insights into the sequence parameters for halophilic adaptation. Amino Acids 48, 751–762 (2016). https://doi.org/10.1007/s00726-015-2123-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-015-2123-x