Using Chou’s Five-steps Rule to Classify and Predict Glutathione S-transferases with Different Machine Learning Algorithms and Pseudo Amino Acid Composition

Mohabatkar, Hassan; Ebrahimi, Samira; Moradi, Mohammad

doi:10.1007/s10989-020-10087-7

Using Chou’s Five-steps Rule to Classify and Predict Glutathione S-transferases with Different Machine Learning Algorithms and Pseudo Amino Acid Composition

Published: 13 June 2020

Volume 27, pages 309–316, (2021)
Cite this article

International Journal of Peptide Research and Therapeutics Aims and scope Submit manuscript

Hassan Mohabatkar¹,
Samira Ebrahimi¹ &
Mohammad Moradi¹

281 Accesses
8 Citations
Explore all metrics

Abstract

The Glutathione S-Transferases (GSTs) are detoxification enzymes which exist in variety of living organisms such as bacteria, fungi, plants and animals. These multifunctional enzymes play important roles in the biosynthesis of steroids, prostaglandins, apoptosis regulation, and stress signaling. In this study, we designed a method to independently predict the structures of animal, fungal and plant GSTs using Chou’s pseudo-amino acid composition concept. Support vector machine (SVM), Random Forests (RF), Covariance Discrimination (CD) and Optimized Evidence-Theoretic K-nearest Neighbor (OET-KNN) were used as powerful machine learnings algorithms. Based on our results, Random Forests demonstrated the best prediction for animal GSTs with 0.9339 accuracy and SVM showed the best results for fungal and plant GSTs with 0.8982 and 0.9655 accuracy, respectively. Our study provided an effective prediction for GSTs based on the concept of PseAAC and four different machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

mSRFR: a machine learning model using microalgal signature features for ncRNA classification

Article Open access 21 March 2022

A Support Vector Machine Approach for LTP Using Amino Acid Composition

Bioinformatic Analyses of Peroxiredoxins and RF-Prx: A Random Forest-Based Predictor and Classifier for Prxs

References

Allocati N, Masulli M, Di Ilio C, Federici L (2018) Glutathione transferases: substrates, inihibitors and pro-drugs in cancer and neurodegenerative diseases. Oncogenesis 7:1–15
CAS Google Scholar
Behbahani M, Nosrati M, Moradi M, Mohabatkar H (2019) Using Chou’s general pseudo amino acid composition to classify laccases from bacterial and fungal sources via Chou’s five-step rule. Appl Biochem Biotechnol 190:1035–1048
PubMed Google Scholar
Breiman L (2001) Random forests. Machine Learn 45:5–32
Google Scholar
Cao D-S, Xu Q-S, Liang Y-Z (2013) propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29:960–962
CAS PubMed Google Scholar
Chakrabarti S, Ester M, Fayyad U, Gehrke J, Han J, Morishita S et al (2006) Data mining curriculum: A proposal (Version 1.0). Intensive Working Group of ACM SIGKDD Curriculum Committee
Chen H, Kihara D (2011) Effect of using suboptimal alignments in template-based protein structure prediction. Proteins Struct Funct Bioinf 79:315–334
CAS Google Scholar
Chen C, Chen L, Zou X, Cai P (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Pept Lett 16:27–31
PubMed Google Scholar
Chen W, Lei T-Y, Jin D-C, Lin H, Chou K-C (2014) PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem 456:53–60
CAS PubMed Google Scholar
Chen W, Lin H, Chou K-C (2015) Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol BioSyst 11:2620–2634
CAS PubMed Google Scholar
Chen W, Tang H, Ye J, Lin H, Chou K-C (2016) iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther Nucleic Acids 5:e332
CAS PubMed PubMed Central Google Scholar
Chen Y, Banerjee D, Mukhopadhyay A, Petzold CJ (2020) Systems and synthetic biology tools for advanced bioproduction hosts. Curr Opin Biotechnol 64:101–109
CAS PubMed Google Scholar
Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct Funct Bioinf 43:246–255
CAS Google Scholar
Chou K-C (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19
CAS PubMed Google Scholar
Chou K-C (2009) Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr Proteomics 6:262–274
CAS Google Scholar
Chou K-C (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247
CAS PubMed Google Scholar
Chou K-C (2015) Impacts of bioinformatics to medicinal chemistry. Med Chem 11:218–234
CAS PubMed Google Scholar
Chou K-C (2017) An unprecedented revolution in medicinal chemistry driven by the progress of biological science. Curr Topics Med Chem 17:2337–2358
CAS Google Scholar
Chou K-C (2019) Advances in predicting subcellular localization of multi-label proteins and its implication for developing multi-target drugs. Curr Med Chem 26:4918–4943
CAS Google Scholar
Chou K-C (2020) Proposing 5-steps rule is a notable milestone for studying molecular biology. Nat Sci 12:74
Google Scholar
Chou KC, Cai YD (2003) Predicting protein quaternary structure by pseudo amino acid composition. Proteins Struct Funct Bioinf 53:282–289
CAS Google Scholar
Chou K-C, Cheng X, Xiao X (2019) pLoc_bal-mEuk: predict subcellular localization of eukaryotic proteins by general PseAAC and quasi-balancing training dataset. Med Chem 15:472–485
CAS PubMed Google Scholar
Dasari S, Ganjayi MS, Yellanurkonda P, Basha S, Meriga B (2018) Role of glutathione S-transferases in detoxification of a polycyclic aromatic hydrocarbon, methylcholanthrene. Chemico-Biol Interact 294:81–90
CAS Google Scholar
Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A (2015) Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳ s general PseAAC. J Theor Biol 364:284–294
CAS PubMed Google Scholar
Di Matteo A, Federici L, Masulli M, Carletti E, Santorelli D, Cassidy J et al (2019) Structural characterization of the Xi Class glutathione transferase from the Haloalkaliphilic Archaeon Natrialba magadii. Front Microbiol 10:9
PubMed PubMed Central Google Scholar
Du P, Wang X, Xu C, Gao Y (2012) PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal Biochem 425:117–119
CAS PubMed Google Scholar
Du P, Gu S, Jiao Y (2014) PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 15:3495–3506
CAS PubMed PubMed Central Google Scholar
Esmaeili M, Mohabatkar H, Mohsenzadeh S (2010) Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 263:203–209
CAS PubMed Google Scholar
Galetsi P, Katsaliaki K, Kumar S (2020) Big data analytics in health sector: theoretical framework, techniques and prospects. Int J Inf Manag 50:206–216
Google Scholar
Ghosh C, Saha S, Saha S, Ghosh N, Singha K, Banerjee A et al (2020) Machine Learning Based Supplementary Prediction System Using K Nearest Neighbour Algorithm. Available at SSRN 3517197
Gupta CLP, Bihari A, Tripathi S (2019) Protein classification using machine learning and statistical techniques: a comparative analysis. arXiv preprint arXiv:190106152
Haghighi O, Davaeifar S, Zahiri HS, Maleki H, Noghabi KA (2019) Homology Modeling and Molecular Docking Studies of Glutamate Dehydrogenase (GDH) from Cyanobacterium Synechocystis sp. PCC 6803. Int J Pept Res Ther 26:783–793
Google Scholar
Kam HT (1995) Random decision forest. In: Proceedings of the 3rd international conference on document analysis and recognition, Montreal, Canada, 14–16 August 1995. IEEE, p 278282
Kato T, Miyakawa H, Ishibashi M (2004) Frequency and significance of anti-glutathione S-transferase autoantibody (anti-GST A1-1) in autoimmune hepatitis. J Autoimmun 22:211–216
CAS PubMed Google Scholar
Kilty C, Doyle S, Hassett B, Manning F (1998) Glutathione S-transferases as biomarkers of organ damage: applications of rodent and canine GST enzyme immunoassays. Chemico-Biol Interact 111:123–135
Google Scholar
Landi S (2000) Mammalian class theta GST and differential susceptibility to carcinogens: a review. Mutat Res/Rev Mutat Res 463:247–283
CAS Google Scholar
Lee S, Lee B, Kim D (2006) Prediction of protein secondary structure content using amino acid composition and evolutionary information. Proteins Struct Funct Bioinf 62:1107–1114
CAS Google Scholar
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659
CAS PubMed Google Scholar
Li Y, Wu F-X, Ngom A (2018) A review on machine learning principles for multi-view biological data integration. Brief Bioinform 19:325–340
PubMed Google Scholar
Lin H (2008) The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J Theor Biol 252:350–356
CAS PubMed Google Scholar
Lin W, Xiao X, Qiu W, Chou K-C (2020) Use Chou’s 5-steps rule to predict remote homology proteins by merging grey incidence analysis and domain similarity analysis. Nat Sci 12:181
Google Scholar
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C (2015) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43:W65–W71
CAS PubMed PubMed Central Google Scholar
Liu B, Fang L, Long R, Lan X, Chou K-C (2016) iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 32:362–369
CAS PubMed Google Scholar
Liu B, Wu H, Chou K-C (2017) Pse-in-One 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nat Sci 9:67
CAS Google Scholar
Liu B, Yang F, Huang D-S, Chou K-C (2018) iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 34:33–40
CAS PubMed Google Scholar
Mishra NK, Kumar M, Raghava G (2007) Support vector machine based prediction of glutathione S-transferase proteins. Protein Pept Lett 14:575–580
CAS PubMed Google Scholar
Mohabatkar H (2010) Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Pept Lett 17:1207–1214
CAS PubMed Google Scholar
Mohabatkar H, Beigi MM, Esmaeili A (2011) Prediction of GABAA receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 281:18–23
CAS PubMed Google Scholar
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26:217–222
Google Scholar
Raza K (2012) Application of data mining in bioinformatics. arXiv preprint arXiv:12051125
Roberts E, Eargle J, Wright D, Luthey-Schulten Z (2006) MultiSeq: unifying sequence and structure data for evolutionary analysis. BMC Bioinform 7:382
Google Scholar
Schultz IR, Sylvester SR (2001) Stereospecific toxicokinetics of bromochloro-and chlorofluoroacetate: Effect of GST-ζ depletion. Toxicol Appl Pharmcol 175:104–113
CAS Google Scholar
Schölkopf B, Smola AJ, Bach F (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, Cambridge
Google Scholar
Shen H, Chou K-C (2005) Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. Biochem Biophys Res Commun 334:288–292
CAS PubMed Google Scholar
Shen H-B, Chou K-C (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373:386–388
CAS PubMed Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst 2:2951–2959
Google Scholar
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300
Google Scholar
Sylvestre-Gonon E, Law S, Schwartz M, Robe K, Keech O, Didierjean C et al (2019) Functional, structural and biochemical features of plant serinyl-glutathione transferases. Front Plant Sci 10:608
PubMed PubMed Central Google Scholar
Tew KD, Ronai ZE (1999) GST function in drug and stress response. Drug Resist Updates 2:143–147
CAS Google Scholar
Tian B, Wu X, Chen C, Qiu W, Ma Q, Yu B (2019) Predicting protein–protein interactions by fusing various Chou’s pseudo components and using wavelet denoising approach. J Theor Biol 462:329–346
CAS PubMed Google Scholar
Xia J-F, Han K, Huang D-S (2010) Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. Protein Pept Lett 17:137–145
CAS PubMed Google Scholar
Xiao X, Cheng X, Chen G, Mao Q, Chou K-C (2019) pLoc_bal-mVirus: predict subcellular localization of multi-label virus proteins by Chou’s general PseAAC and IHTS treatment to balance training dataset. Med Chem 15:496–509
CAS PubMed Google Scholar
Yadav SK, Tiwari AK (2015) Classification of enzymes using machine learning based approaches: a review. Machine Learn Appl 2:30–49
Google Scholar
Yu B, Li S, Qiu W-Y, Chen C, Chen R-X, Wang L et al (2017) Accurate prediction of subcellular location of apoptosis proteins combining Chou’s PseAAC and PsePSSM based on wavelet denoising. Oncotarget 8:107640
PubMed PubMed Central Google Scholar
Zou Q, Lin G, Jiang X, Liu X, Zeng X (2020) Sequence clustering in bioinformatics: an empirical study. Brief Bioinform 21:1–10
CAS Google Scholar

Download references

Acknowledgements

Support of this study by the University of Isfahan is acknowledged.

Author information

Authors and Affiliations

Department of Biotechnology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran
Hassan Mohabatkar, Samira Ebrahimi & Mohammad Moradi

Authors

Hassan Mohabatkar
View author publications
You can also search for this author in PubMed Google Scholar
Samira Ebrahimi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Moradi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hassan Mohabatkar.

Ethics declarations

Conflict of interest

There is no conflict to declare.

Informed Consent

There was no human participant and consent was not required.

Research involving Human and/or Animals Participants

No human or animal was participated in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohabatkar, H., Ebrahimi, S. & Moradi, M. Using Chou’s Five-steps Rule to Classify and Predict Glutathione S-transferases with Different Machine Learning Algorithms and Pseudo Amino Acid Composition. Int J Pept Res Ther 27, 309–316 (2021). https://doi.org/10.1007/s10989-020-10087-7

Download citation

Accepted: 08 June 2020
Published: 13 June 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10989-020-10087-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Chou’s Five-steps Rule to Classify and Predict Glutathione S-transferases with Different Machine Learning Algorithms and Pseudo Amino Acid Composition

Abstract

Access this article

Similar content being viewed by others

mSRFR: a machine learning model using microalgal signature features for ncRNA classification

A Support Vector Machine Approach for LTP Using Amino Acid Composition

Bioinformatic Analyses of Peroxiredoxins and RF-Prx: A Random Forest-Based Predictor and Classifier for Prxs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed Consent

Research involving Human and/or Animals Participants

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using Chou’s Five-steps Rule to Classify and Predict Glutathione S-transferases with Different Machine Learning Algorithms and Pseudo Amino Acid Composition

Abstract

Access this article

Similar content being viewed by others

mSRFR: a machine learning model using microalgal signature features for ncRNA classification

A Support Vector Machine Approach for LTP Using Amino Acid Composition

Bioinformatic Analyses of Peroxiredoxins and RF-Prx: A Random Forest-Based Predictor and Classifier for Prxs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed Consent

Research involving Human and/or Animals Participants

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation