Advertisement

3 Biotech

, 8:68 | Cite as

Classifying nitrilases as aliphatic and aromatic using machine learning technique

  • Nikhil Sharma
  • Ruchi Verma
  • Savitri
  • Tek Chand BhallaEmail author
Original Article

Abstract

ProCos (Protein Composition Server, script version), one of the machine learning techniques, was used to classify nitrilases as aliphatic and aromatic nitrilases. Some important feature vectors were used to train the algorithm, which included pseudo-amino acid composition (PAAC) and five-factor solution score (5FSS). This clearly differentiated into two groups of nitrilases, i.e., aliphatic and aromatic, achieving maximum sensitivity of 100.00%, specificity of 90.00%, accuracy of 95.00% and Mathew Correlation Coefficient (MCC) of about 0.90 for the pseudo-amino acid composition. On the other hand, five-factor solution score achieved a sensitivity of 96.00%, specificity of 84.00%, accuracy of 90.00% and Mathew Correlation Coefficient (MCC) of about 0.81. The total count of aliphatic amino acids, Ala (A), Gly (G), Leu (L), Ile (I), Val (V), Met (M) and Pro (P), was found to be higher, i.e., 42.7 in case of aliphatic nitrilases, whereas it was 40.1 in aromatic nitrilases. On the other hand, aromatic amino acids, Tyr (Y), Trp (W), His (H) and Phe (F) number, were found to be higher, i.e., 12.7 in aromatic nitrilases as compared to aliphatic nitrilases which was 10.7. This approach will help in predicting a nitrilase as aromatic or aliphatic nitrilase based on its amino acid sequence. Access to the scripts can be done logging onto GitHub using keyword ‘Nitrilase’ or ‘https://github.com/rover2380/Nitrilase.git’.

Keywords

Aliphatic nitrilase Aromatic nitrilase Amino acid composition Protein composition server (ProCos) 

Notes

Acknowledgements

The authors are thankful to the Department of Biotechnology, New Delhi for the continuous support to the Bioinformatics Centre, Himachal Pradesh University, Summer Hill, Shimla, India.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interests.

Supplementary material

13205_2018_1102_MOESM1_ESM.docx (34 kb)
Supplementary material 1 (DOCX 34 kb)

References

  1. Arakaki AK, Huang Y, Skolnick J (2009) EFICAz2: enzyme function inference by a combined approach enhanced by machine learning. BMC Bioinform 10:107.  https://doi.org/10.1186/1471-2105-10-107 CrossRefGoogle Scholar
  2. Bhatia SK, Mehta PK, Bhatia RK, Bhalla TC (2014) Optimization of arylacetonitrilase production from Alcaligenes sp. MTCC 10675 and its application in mandelic acid synthesis. Appl Microbiol Biot 98:83–94.  https://doi.org/10.1007/s00253-013-5288-9 CrossRefGoogle Scholar
  3. Chakravorty S, Hegde M (2017) Gene and variant annotation for mendelian disorders in the era of advanced sequencing technologies. Annu Rev Genom Hum Genet 18:229–256CrossRefGoogle Scholar
  4. Challis GL, Ravel J (2000) Coelichelin, a new peptide siderophore encoded by the Streptomyces coelicolor genome: structure prediction from the sequence of its non-ribosomal peptide synthetase. FEMS Microbiol Lett 187:111–114.  https://doi.org/10.1111/j.1574-6968.2000.tb09145 CrossRefGoogle Scholar
  5. Chen R, Jeong SS (2000) Functional prediction: identification of protein orthologs and paralogs. Prot Sci 9:2344–2353.  https://doi.org/10.1110/ps.9.12.2344 CrossRefGoogle Scholar
  6. Chou CK (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins Struct Funct Genet 43:246–255.  https://doi.org/10.1002/prot.1035 CrossRefGoogle Scholar
  7. Gong JS, Lu ZM, Li H, Shi JS, Zhou ZM, Xu ZH (2012) Nitrilases in nitrile biocatalysis: recent progress and forthcoming research. Microb Cell Fact 11:142.  https://doi.org/10.1186/1475-2859-11-142 CrossRefGoogle Scholar
  8. Gong JS, Lu ZM, Li H, Zhou ZM, Shi JS, Xu ZH (2013) Metagenomic technology and genome mining: emerging areas for exploring novel nitrilases. Appl Microbiol Biot 97:6603–6611.  https://doi.org/10.1007/s00253-013-4932-8 CrossRefGoogle Scholar
  9. Kaplan O, Bezouska K, Malandra A, Vesela AB, Petrıckova A, Felsberg J, Rinagelova A, Kren V, Martinkova L (2011) Genome mining for the discovery of new nitrilases in filamentous fungi. Biotechnol Lett 33:309–312CrossRefGoogle Scholar
  10. Kaushik S, Mohan U, Banerjee UC (2012) Exploring residues crucial for nitrilase function by site directed mutagenesis to gain better insight into sequence-function relationships. Int J Biochem Biotechnol 3:384–391Google Scholar
  11. Kim M, Lee KH, Yoon SW, Kim BS, Chun J, Yi H (2013) Analytical tools and databases for metagenomics in the next-generation sequencing era. Genom Inform 11:102–113.  https://doi.org/10.5808/GI.2013.11.3.102 CrossRefGoogle Scholar
  12. Kumar N, Bhalla TC (2011) In silico analysis of amino acid sequences in relation to specificity and physiochemical properties of some aliphatic amidases and kynurenine formamidases. J Bioinform Seq Anal 3:116–123Google Scholar
  13. Liu H, Gao Y, Zhang M, Qiu X, Cooper AJ, Niu L, Teng M (2013) Structures of enzyme-intermediate complexes of yeast Nit2: insights into its catalytic mechanism and different substrate specificity compared with mammalian Nit2. Acta Crystallogr D Biol Crystallogr 69:1470–1481.  https://doi.org/10.1107/S0907444913009347 CrossRefGoogle Scholar
  14. Martinkova L, Kren V (2010) Biotransformations with nitrilases. Curr Opin Chem Biol 14:130–137.  https://doi.org/10.1016/j.cbpa.2009.11.018 CrossRefGoogle Scholar
  15. Mills CL, Beuning PJ, Ondrechen MJ (2015) Biochemical functional predictions for protein structures of unknown or uncertain function. Comput Struct Biotechnol J 13:182–191.  https://doi.org/10.1016/j.csbj.2015.02.003 CrossRefGoogle Scholar
  16. Mylerova V, Martinkova L (2003) Synthetic applications of nitrile converting enzymes. Curr Org Chem 7:1–17.  https://doi.org/10.2174/1385272033486486 Google Scholar
  17. Pant B, Pant K, Pardasani KR (2011) Multiclass SVM model for prediction and classification of ribonucleases. Int J Integr Biol 12:44–49Google Scholar
  18. Rishishwar L, Mishra N, Pant B, Pant K, Pardasani KR (2010) ProCoS—PROtein COmposition Server. Bioinformation 5:227CrossRefGoogle Scholar
  19. Rottig M, Rausch C, Kohlbacher O (2010) Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families. PLoS Comput Biol.  https://doi.org/10.1371/journal.pcbi.1000636 Google Scholar
  20. Sharma NN, Sharma M, Kumar H, Bhalla TC (2006) Nocardia globerula NHB-2: bench scale production of nicotinic acid. Process Biochem 41:2078–2081.  https://doi.org/10.1016/j.procbio.2006.04.007 CrossRefGoogle Scholar
  21. Sharma N, Kushwaha R, Sodhi JS, Bhalla TC (2009) In silico analysis of amino acid sequences in relation to specificity and physiochemical properties of some microbial nitrilases. J Proteom Bioinform 2:185–192.  https://doi.org/10.4172/jpb.1000076 CrossRefGoogle Scholar
  22. Sharma NN, Sharma M, Bhalla TC (2012) Nocardia globerula NHB-2 nitrilase catalysed biotransformation of 4-cyanopyridine to isonicotinic acid. AMB Express 2:25.  https://doi.org/10.1186/2191-0855-2-25 CrossRefGoogle Scholar
  23. Sharma N, Thakur N, Raj T, Savitri, Bhalla TC (2017) Mining of microbial genomes for the novel sources of nitrilases. Biomed Res Int 14:2017.  https://doi.org/10.1155/2017/7039245 Google Scholar
  24. Shen HB, Chou KC (2008) PseAAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373:386–388.  https://doi.org/10.1016/j.ab.2007.10.012 CrossRefGoogle Scholar
  25. Stachelhaus T, Mootz HD, Marahiel MA (1999) The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases. Chem Biol 6:493–505.  https://doi.org/10.1016/S1074-5521(99)80082-9 CrossRefGoogle Scholar
  26. Tian W, Skolnick J (2003) How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol 333:863–882.  https://doi.org/10.1016/j.jmb.2003.08.057 CrossRefGoogle Scholar
  27. Wang Y, Jing R, Hua Y, Fu Y, Dai X, Huang L, Menglong L (2014) Classification of multi-family enzymes by multi-label machine learning and sequence-based descriptors. Anal Methods 17:6832–6840.  https://doi.org/10.1039/C4AY01240B CrossRefGoogle Scholar
  28. Yeom SJ, Kim HJ, Lee JK, Kim DE, Oh DK (2008) An amino acid at position 142 in nitrilase from Rhodococcus rhodochrous ATCC 33278 determines the substrate specificity for aliphatic and aromatic nitriles. Biochem J 415:401–407.  https://doi.org/10.1042/BJ20080440 CrossRefGoogle Scholar
  29. Zhang L, Yin B, Wang C, Jiang S, Wang H, Wei YD (2014) Structural insights into enzymatic activity and substrate specificity determination by a single amino acid in nitrilase from Syechocystis sp. PCC6803. J Struct Biol 188:93–101.  https://doi.org/10.1016/j.jsb.2014.10.003 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Nikhil Sharma
    • 1
  • Ruchi Verma
  • Savitri
    • 2
  • Tek Chand Bhalla
    • 2
    Email author
  1. 1.Bioinformatics CentreHimachal Pradesh UniversityShimlaIndia
  2. 2.Department of BiotechnologyHimachal Pradesh UniversityShimlaIndia

Personalised recommendations