Advertisement

Quantitative Biology

, Volume 5, Issue 1, pp 90–98 | Cite as

Construction of precise support vector machine based models for predicting promoter strength

  • Hailin Meng
  • Yingfei Ma
  • Guoqin Mai
  • Yong Wang
  • Chenli Liu
Research Article

Abstract

Background

The prediction of the prokaryotic promoter strength based on its sequence is of great importance not only in the fundamental research of life sciences but also in the applied aspect of synthetic biology. Much advance has been made to build quantitative models for strength prediction, especially the introduction of machine learning methods such as artificial neural network (ANN) has significantly improve the prediction accuracy. As one of the most important machine learning methods, support vector machine (SVM) is more powerful to learn knowledge from small sample dataset and thus supposed to work in this problem.

Methods

To confirm this, we constructed SVM based models to quantitatively predict the promoter strength. A library of 100 promoter sequences and strength values was randomly divided into two datasets, including a training set (⩾10 sequences) for model training and a test set (⩾10 sequences) for model test.

Results

The results indicate that the prediction performance increases with an increase of the size of training set, and the best performance was achieved at the size of 90 sequences. After optimization of the model parameters, a high-performance model was finally trained, with a high squared correlation coefficient for fitting the training set (R2 > 0.99) and the test set (R2 > 0.98), both of which are better than that of ANN obtained by our previous work.

Conclusions

Our results demonstrate the SVM-based models can be employed for the quantitative prediction of promoter strength.

Keywords

support vector machine model quantitative prediction promoter strength machine learning 

Notes

Acknowledgements

This work was financially supported by NSFC (Nos. 31471270, 31301017, 31670056 and 31300686), 973 Program (No. 2014CB745202), 863 Program (No. SS2015AA020936), the Guangdong Natural Science Funds for Distinguished Young Scholar (No. S2013050016987), the Science and Technology Planning Project of Guangdong Province (Nos. 2014B 020201001 and 2014A030304008), Natural Science Foundation of Guangdong Province (No. 2015A030310317), the Guangzhou Science and Technology Scheme (Nos. 201508020091 and 201508020092), and Shenzhen grants (Nos. KQTD2015033ll7210153, JCYJ20140610152828 703, KQJSCX20160301144623, CXZZ20140901004122088, JCYJ20150 521144321007 and JCYJ20140901003939019).

Supplementary material

40484_2017_96_MOESM1_ESM.pdf (73 kb)
Dataset S1

References

  1. 1.
    Blount, B. A., Weenink, T., Vasylechko, S. and Ellis, T. (2012) Rational diversification of a promoter providing fine-tuned expression and orthogonal regulation for synthetic biology. PLoS One, 7, e33279CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Qin, X., Qian, J., Yao, G., Zhuang, Y., Zhang, S. and Chu, J. (2011) GAP promoter library for fine-tuning of gene expression in Pichia pastoris. Appl. Environ. Microbiol., 77, 3600–3608CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Alper, H., Fischer, C., Nevoigt, E. and Stephanopoulos, G. (2005) Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA, 102, 12678–12683CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Salis, H. M., Mirsky, E. A. and Voigt, C. A. (2009) Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol., 27, 946–950CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Lou, C., Stanton, B., Chen, Y. J., Munsky, B. and Voigt, C. A. (2012) Ribozyme-based insulator parts buffer synthetic circuits from genetic context. Nat. Biotechnol., 30, 1137–1142CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Rhodius, V. A. and Mutalik, V. K. (2010) Predicting strength and function for promoters of the Escherichia coli alternative sigma factor, σE. Proc. Natl. Acad. Sci. USA, 107, 2854–2859CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    De Mey, M., Maertens, J., Lequeux, G. J., Soetaert, W. K. and Vandamme, E. J. (2007) Construction and model-based analysis of a promoter library for E. coli: an indispensable tool for metabolic engineering. BMC Biotechnol., 7, 34CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Meng, H., Wang, J., Xiong, Z., Xu, F., Zhao, G. and Wang, Y. (2013) Quantitative design of regulatory elements based on high-precision strength prediction using artificial neural network. PLoS One, 8, e60288CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Meng, H. and Wang, Y. (2015) Cis-acting regulatory elements: from random screening to quantitative design. Quant. Biol., 3, 107–114CrossRefGoogle Scholar
  10. 10.
    Vapnik, V. N. (2000) The Nature of Statistical Learning Theory. New York: Springer-VerlagCrossRefGoogle Scholar
  11. 11.
    Vapnik, V. N. (1999) An overview of statistical learning theory. IEEE Trans. Neural Netw., 10, 988–999CrossRefPubMedGoogle Scholar
  12. 12.
    Hassanien, A. E., Al-Shammari, E. T. and Ghali, N. I. (2013) Computational intelligence techniques in bioinformatics. Comput. Biol. Chem., 47, 37–47CrossRefPubMedGoogle Scholar
  13. 13.
    Ho, H. K., Zhang, L., Ramamohanarao, K. and Martin, S. (2013) A survey of machine learning methods for secondary and supersecondary protein structure prediction. In Methods and Protocols: Methods in Molecular Biology, 932, 87–106 New York: Humana PressGoogle Scholar
  14. 14.
    Cheng, J., Tegge, A. N. and Baldi, P. (2008) Machine learning methods for protein structure prediction. IEEE Rev. Biomed. Eng., 1, 41–49CrossRefPubMedGoogle Scholar
  15. 15.
    Zhao, Y. and Wang, Z. (2008) RNA secondary structure prediction based on support vector machine classification. Chinese Journal of Biotechnology, 24, 1140–1148CrossRefPubMedGoogle Scholar
  16. 16.
    Towsey, M.W., Gordon, J. J. and Hogan, J. M. (2006) The prediction of bacterial transcription start sites using SVMs. Int. J. Neural Syst., 16, 363–370CrossRefPubMedGoogle Scholar
  17. 17.
    Ichikawa, D., Saito, T., Ujita, W. and Oyama, H. (2016) How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach. J. Biomed. Inform., 64, 20–24CrossRefPubMedGoogle Scholar
  18. 18.
    Vyas, R., Bapat, S., Jain, E., Tambe, S. S., Karthikeyan, M. and Kulkarni, B. D. (2015) A study of applications of machine learning based classification methods for virtual screening of lead molecules. Comb. Chem. High Throughput Screen., 18, 658–672CrossRefPubMedGoogle Scholar
  19. 19.
    Burton, J., Ijjaali, I., Petitet, F., Michel, A. and Vercauteren, D. P. (2009) Virtual screening for cytochromes p450: successes of machine learning filters. Comb. Chem. High Throughput Screen., 12, 369–382CrossRefPubMedGoogle Scholar
  20. 20.
    Melville, J. L., Burke, E. K. and Hirst, J. D. (2009) Machine learning in virtual screening. Comb. Chem. High Throughput Screen., 12, 332–343CrossRefPubMedGoogle Scholar
  21. 21.
    Fox, T. and Kriegl, J. M. (2006) Machine learning techniques for in silico modeling of drug metabolism. Curr. Top. Med. Chem., 6, 1579–1591CrossRefPubMedGoogle Scholar
  22. 22.
    Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. and Fotiadis, D. I. (2015) Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J., 13, 8–17CrossRefPubMedGoogle Scholar
  23. 23.
    Polley, M. Y., Freidlin, B., Korn, E. L., Conley, B. A., Abrams, J. S. and McShane, L. M. (2013) Statistical and practical considerations for clinical evaluation of predictive biomarkers. J. Natl. Cancer Inst., 105, 1677–1683CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Liang, G. and Li, Z. (2007) Scores of generalized base properties for quantitative sequence-activity modelings for E. coli promoters based on support vector machine. J. Mol. Graph. Model., 26, 269–281CrossRefPubMedGoogle Scholar
  25. 25.
    Towsey, M., Timms, P., Hogan, J. and Mathews, S. A. (2008) The crossspecies prediction of bacterial promoters using a support vector machine. Comput. Biol. Chem., 32, 359–366CrossRefPubMedGoogle Scholar
  26. 26.
    Xu, W., Zhang, L. and Lu, Y. (2016) SD-MSAEs: promoter recognition in human genome based on deep feature extraction. J. Biomed. Inform., 61, 55–62CrossRefPubMedGoogle Scholar
  27. 27.
    Sato, M. (2012) Promoter analysis with wavelets and support vector machines. Procedia Comput. Sci., 12, 432–437CrossRefGoogle Scholar
  28. 28.
    Holloway, D. T., Kon, M. and Delisi, C. (2007) Machine learning for regulatory analysis and transcription factor target prediction in yeast. Syst. Synth. Biol., 1, 25–46CrossRefPubMedGoogle Scholar
  29. 29.
    Anwar, F., Baker, S. M., Jabid, T., Mehedi Hasan, M., Shoyaib, M., Khan, H. and Walshe, R. (2008) Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach. BMC Bioinformatics, 9, 414CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Carvalho, S. G., Guerra-Sá, R. and de C Merschmann, L. H. (2015) The impact of sequence length and number of sequences on promoter prediction performance. BMC Bioinformatics, 16, S5CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Hwang, W., Oliver, V. F., Merbs, S. L., Zhu, H. and Qian, J. (2015) Prediction of promoters and enhancers using multiple DNA methylation-associated features. BMC Genomics, S16, 11CrossRefGoogle Scholar
  32. 32.
    Li, Y., Lee, K. K., Walsh, S., Smith, C., Hadingham, S., Sorefan, K., Cawley, G. and Bevan, M. W. (2006) Establishing glucose- and ABAregulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine. Genome Res., 16, 414–427CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Sandhu, R. S., Coyne, E. J., Feinstein, H. L. and Youman, C. E. (1996) Role based access control models. IEEE Computer, 29, 38–47CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH 2017

Authors and Affiliations

  1. 1.Bioengineering Research Center, Guangzhou Institute of Advanced TechnologyChinese Academy of SciencesGuangzhouChina
  2. 2.Center for Synthetic Biology Engineering Research, Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
  3. 3.Chinese Academy of Sciences Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiChina

Personalised recommendations