Abstract
Abstract: Discontinuity in long DNA sequences creates harmful diseases like Tuberculosis (TB). Given the 21th centurys exponential growth of big-data environments, knowing the precise breaks position of DNA sequences is essential for many reasons including advanced medical intervention. This study designs an automated framework to assess the breaks positions in long DNA sequences which are responsible for TB and then empirically tests it by analyzing a big DNA dataset from the National Center for Biotechnology Information (NCBI) database. The method consists of a range of data cleansing and deep neural network tools for big data situation. Findings reveal that the proposed approach is better than other methods in detecting DNA sequence breaks for TB via resolving a sample size issue of the training dataset and recursively divide the whole dataset into certain length to detect the breaks. It also provides a faster predictive analysis with more accurate and reliable outcomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anandakumar S, Shanmughavel P (2008) Computational annotation for hypothetical proteins of mycobacterium tuberculosis. J Comput Sci Syst Biol
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc 26(2):211–252
Burkett KM et al (2016) Sampletrees and rsampletrees: sampling gene genealogies conditional on snp genotype data. Bioinformatics 32(10)
Canaan S et al (2005) Crystal structure of the conserved hypothetical protein Rv1155 from mycobacterium tuberculosis. Febs Lett 579(1):215
Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3):197–387
Deng SP et al (2016) Predicting hub genes associated with cervical cancer through gene co-expression networks. IEEE Computer Society Press
Doerks T et al (2012a) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. Plos One 7(4):e34,302
Doerks T et al (2012b) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. Plos One 7(4):e34,302
Edelmany A, Heller S (1998) Index transformation algorithms in a linear algebra framework, simultaneously app ears as thinking machines technical report TMC
Erhan D et al (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11(3):625–660
Hsieh SY, Chou YC (2016) A faster cdna microarray gene expression data classifier for diagnosing diseases. IEEE/ACM Trans Comput Biol Bioinform 13(1):43–54
Kamal MS, Nimmy SF (2016) Strucbreak: A computational framework for structural break detection in dna sequences. Interdiscip Sci Comput Life Sci 9(4):1–16
Li X et al (2016) Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map. DNA Research
Machado M, Pantano S (2016) Sirah tools: mapping, backmapping and visualization of coarse-grained models. Bioinformatics 32(10)
Mazandu GK, Mulder NJ (2012) Function prediction and analysis ofmycobacterium tuberculosishypothetical proteins. Int J Mol Sci 13(6):7283–302
Nicolau I et al (2012) Research questions and priorities for tuberculosis: a survey of published systematic reviews and meta-analyses. Plos One 7(7):e42,479
Schmidhuber rgen (2015) Deep learning in neural networks. Elsevier Science Ltd
Schwertman P et al (2016) Regulation of dna double-strand break repair by ubiquitin and ubiquitin-like modifiers. Nat Rev Mol Cell Biol 17(6):379–394
Sez JA et al (2016) Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176(C):26–35
Shi S et al (2013) Research on markov property analysis of driving cycle. In: Vehicle power and propulsion conference, pp 171–181
Sivashankari S, Shanmughavel P (2006) Functional annotation of hypothetical proteins-a review. Bioinformation 1(8):335–338
Yafei L, Li Q (2016) A semi-parametric statistical model for integrating gene expression profiles across different platforms. Bmc Bioinform 17 Suppl 1(S1):5
Acknowledgements
The authors would like to sincerely thank the anonymous reviewers for their valuable comments and stimulus which were used to improve this final version. We also acknowledge all support from the Data Science Research Unit (DSRU) at the Charles Sturt University, Australia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Rahman, A., Nimmy, S.F., Sarowar, G. (2019). Developing an Automated Machine Learning Approach to Test Discontinuity in DNA for Detecting Tuberculosis. In: Xu, J., Cooke, F., Gen, M., Ahmed, S. (eds) Proceedings of the Twelfth International Conference on Management Science and Engineering Management. ICMSEM 2018. Lecture Notes on Multidisciplinary Industrial Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-93351-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-93351-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93350-4
Online ISBN: 978-3-319-93351-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)