Developing an Automated Machine Learning Approach to Test Discontinuity in DNA for Detecting Tuberculosis

Rahman, A.; Nimmy, S. F.; Sarowar, G.

doi:10.1007/978-3-319-93351-1_23

A. Rahman⁶,
S. F. Nimmy⁷ &
G. Sarowar⁸

Part of the book series: Lecture Notes on Multidisciplinary Industrial Engineering ((LNMUINEN))

Included in the following conference series:

International Conference on Management Science and Engineering Management

1958 Accesses
2 Citations
1 Altmetric

Abstract

Abstract: Discontinuity in long DNA sequences creates harmful diseases like Tuberculosis (TB). Given the 21th centurys exponential growth of big-data environments, knowing the precise breaks position of DNA sequences is essential for many reasons including advanced medical intervention. This study designs an automated framework to assess the breaks positions in long DNA sequences which are responsible for TB and then empirically tests it by analyzing a big DNA dataset from the National Center for Biotechnology Information (NCBI) database. The method consists of a range of data cleansing and deep neural network tools for big data situation. Findings reveal that the proposed approach is better than other methods in detecting DNA sequence breaks for TB via resolving a sample size issue of the training dataset and recursively divide the whole dataset into certain length to detect the breaks. It also provides a faster predictive analysis with more accurate and reliable outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 429.00; Price excludes VAT (USA)

Softcover Book: USD 549.99; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anandakumar S, Shanmughavel P (2008) Computational annotation for hypothetical proteins of mycobacterium tuberculosis. J Comput Sci Syst Biol
Google Scholar
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc 26(2):211–252
MATH Google Scholar
Burkett KM et al (2016) Sampletrees and rsampletrees: sampling gene genealogies conditional on snp genotype data. Bioinformatics 32(10)
Article Google Scholar
Canaan S et al (2005) Crystal structure of the conserved hypothetical protein Rv1155 from mycobacterium tuberculosis. Febs Lett 579(1):215
Article Google Scholar
Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3):197–387
Article MathSciNet Google Scholar
Deng SP et al (2016) Predicting hub genes associated with cervical cancer through gene co-expression networks. IEEE Computer Society Press
Google Scholar
Doerks T et al (2012a) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. Plos One 7(4):e34,302
Article Google Scholar
Doerks T et al (2012b) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. Plos One 7(4):e34,302
Article Google Scholar
Edelmany A, Heller S (1998) Index transformation algorithms in a linear algebra framework, simultaneously app ears as thinking machines technical report TMC
Google Scholar
Erhan D et al (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11(3):625–660
MathSciNet MATH Google Scholar
Hsieh SY, Chou YC (2016) A faster cdna microarray gene expression data classifier for diagnosing diseases. IEEE/ACM Trans Comput Biol Bioinform 13(1):43–54
Article Google Scholar
Kamal MS, Nimmy SF (2016) Strucbreak: A computational framework for structural break detection in dna sequences. Interdiscip Sci Comput Life Sci 9(4):1–16
Google Scholar
Li X et al (2016) Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map. DNA Research
Google Scholar
Machado M, Pantano S (2016) Sirah tools: mapping, backmapping and visualization of coarse-grained models. Bioinformatics 32(10)
Article Google Scholar
Mazandu GK, Mulder NJ (2012) Function prediction and analysis ofmycobacterium tuberculosishypothetical proteins. Int J Mol Sci 13(6):7283–302
Article Google Scholar
Nicolau I et al (2012) Research questions and priorities for tuberculosis: a survey of published systematic reviews and meta-analyses. Plos One 7(7):e42,479
Article Google Scholar
Schmidhuber rgen (2015) Deep learning in neural networks. Elsevier Science Ltd
Google Scholar
Schwertman P et al (2016) Regulation of dna double-strand break repair by ubiquitin and ubiquitin-like modifiers. Nat Rev Mol Cell Biol 17(6):379–394
Article Google Scholar
Sez JA et al (2016) Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176(C):26–35
Article Google Scholar
Shi S et al (2013) Research on markov property analysis of driving cycle. In: Vehicle power and propulsion conference, pp 171–181
Google Scholar
Sivashankari S, Shanmughavel P (2006) Functional annotation of hypothetical proteins-a review. Bioinformation 1(8):335–338
Article Google Scholar
Yafei L, Li Q (2016) A semi-parametric statistical model for integrating gene expression profiles across different platforms. Bmc Bioinform 17 Suppl 1(S1):5
Google Scholar

Download references

Acknowledgements

The authors would like to sincerely thank the anonymous reviewers for their valuable comments and stimulus which were used to improve this final version. We also acknowledge all support from the Data Science Research Unit (DSRU) at the Charles Sturt University, Australia.

Author information

Authors and Affiliations

Data Science Research Unit, School of Computing and Mathematics, Charles Sturt University, Wagga, NSW, Australia
A. Rahman
Notre Dame University, Dhaka, Bangladesh
S. F. Nimmy
East West University, Dhaka, Bangladesh
G. Sarowar

Authors

A. Rahman
View author publications
You can also search for this author in PubMed Google Scholar
S. F. Nimmy
View author publications
You can also search for this author in PubMed Google Scholar
G. Sarowar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Rahman .

Editor information

Editors and Affiliations

Sichuan University, Chengdu, China
Jiuping Xu
Monash University, Department of Management, Melbourne, VIC, Australia
Fang Lee Cooke
Fuzzy Logic Systems Institute, Tokyo University of Science, Tokyo, Japan
Mitsuo Gen
Department of Mathematics and Statistics, McMaster University, Hamilton, ON, Canada
Syed Ejaz Ahmed

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahman, A., Nimmy, S.F., Sarowar, G. (2019). Developing an Automated Machine Learning Approach to Test Discontinuity in DNA for Detecting Tuberculosis. In: Xu, J., Cooke, F., Gen, M., Ahmed, S. (eds) Proceedings of the Twelfth International Conference on Management Science and Engineering Management. ICMSEM 2018. Lecture Notes on Multidisciplinary Industrial Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-93351-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-93351-1_23
Published: 26 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93350-4
Online ISBN: 978-3-319-93351-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics