Skip to main content

Developing an Automated Machine Learning Approach to Test Discontinuity in DNA for Detecting Tuberculosis

  • Conference paper
  • First Online:
Proceedings of the Twelfth International Conference on Management Science and Engineering Management (ICMSEM 2018)

Abstract

Abstract: Discontinuity in long DNA sequences creates harmful diseases like Tuberculosis (TB). Given the 21th centurys exponential growth of big-data environments, knowing the precise breaks position of DNA sequences is essential for many reasons including advanced medical intervention. This study designs an automated framework to assess the breaks positions in long DNA sequences which are responsible for TB and then empirically tests it by analyzing a big DNA dataset from the National Center for Biotechnology Information (NCBI) database. The method consists of a range of data cleansing and deep neural network tools for big data situation. Findings reveal that the proposed approach is better than other methods in detecting DNA sequence breaks for TB via resolving a sample size issue of the training dataset and recursively divide the whole dataset into certain length to detect the breaks. It also provides a faster predictive analysis with more accurate and reliable outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 429.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 549.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anandakumar S, Shanmughavel P (2008) Computational annotation for hypothetical proteins of mycobacterium tuberculosis. J Comput Sci Syst Biol

    Google Scholar 

  2. Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc 26(2):211–252

    MATH  Google Scholar 

  3. Burkett KM et al (2016) Sampletrees and rsampletrees: sampling gene genealogies conditional on snp genotype data. Bioinformatics 32(10)

    Article  Google Scholar 

  4. Canaan S et al (2005) Crystal structure of the conserved hypothetical protein Rv1155 from mycobacterium tuberculosis. Febs Lett 579(1):215

    Article  Google Scholar 

  5. Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3):197–387

    Article  MathSciNet  Google Scholar 

  6. Deng SP et al (2016) Predicting hub genes associated with cervical cancer through gene co-expression networks. IEEE Computer Society Press

    Google Scholar 

  7. Doerks T et al (2012a) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. Plos One 7(4):e34,302

    Article  Google Scholar 

  8. Doerks T et al (2012b) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. Plos One 7(4):e34,302

    Article  Google Scholar 

  9. Edelmany A, Heller S (1998) Index transformation algorithms in a linear algebra framework, simultaneously app ears as thinking machines technical report TMC

    Google Scholar 

  10. Erhan D et al (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11(3):625–660

    MathSciNet  MATH  Google Scholar 

  11. Hsieh SY, Chou YC (2016) A faster cdna microarray gene expression data classifier for diagnosing diseases. IEEE/ACM Trans Comput Biol Bioinform 13(1):43–54

    Article  Google Scholar 

  12. Kamal MS, Nimmy SF (2016) Strucbreak: A computational framework for structural break detection in dna sequences. Interdiscip Sci Comput Life Sci 9(4):1–16

    Google Scholar 

  13. Li X et al (2016) Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map. DNA Research

    Google Scholar 

  14. Machado M, Pantano S (2016) Sirah tools: mapping, backmapping and visualization of coarse-grained models. Bioinformatics 32(10)

    Article  Google Scholar 

  15. Mazandu GK, Mulder NJ (2012) Function prediction and analysis ofmycobacterium tuberculosishypothetical proteins. Int J Mol Sci 13(6):7283–302

    Article  Google Scholar 

  16. Nicolau I et al (2012) Research questions and priorities for tuberculosis: a survey of published systematic reviews and meta-analyses. Plos One 7(7):e42,479

    Article  Google Scholar 

  17. Schmidhuber rgen (2015) Deep learning in neural networks. Elsevier Science Ltd

    Google Scholar 

  18. Schwertman P et al (2016) Regulation of dna double-strand break repair by ubiquitin and ubiquitin-like modifiers. Nat Rev Mol Cell Biol 17(6):379–394

    Article  Google Scholar 

  19. Sez JA et al (2016) Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176(C):26–35

    Article  Google Scholar 

  20. Shi S et al (2013) Research on markov property analysis of driving cycle. In: Vehicle power and propulsion conference, pp 171–181

    Google Scholar 

  21. Sivashankari S, Shanmughavel P (2006) Functional annotation of hypothetical proteins-a review. Bioinformation 1(8):335–338

    Article  Google Scholar 

  22. Yafei L, Li Q (2016) A semi-parametric statistical model for integrating gene expression profiles across different platforms. Bmc Bioinform 17 Suppl 1(S1):5

    Google Scholar 

Download references

Acknowledgements

The authors would like to sincerely thank the anonymous reviewers for their valuable comments and stimulus which were used to improve this final version. We also acknowledge all support from the Data Science Research Unit (DSRU) at the Charles Sturt University, Australia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Rahman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rahman, A., Nimmy, S.F., Sarowar, G. (2019). Developing an Automated Machine Learning Approach to Test Discontinuity in DNA for Detecting Tuberculosis. In: Xu, J., Cooke, F., Gen, M., Ahmed, S. (eds) Proceedings of the Twelfth International Conference on Management Science and Engineering Management. ICMSEM 2018. Lecture Notes on Multidisciplinary Industrial Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-93351-1_23

Download citation

Publish with us

Policies and ethics