Abstract
Proliferation of genomic, diagnostic, medical, and other forms of biological data resulted in categorizing of biological data as bigdata. The low-cost sequencing machinery, even in small research labs, is generating large volumes of data which now needs to be mined for useful biological features and knowledge. In this paper, we have used a NoSQL approach to handle the repeat information of the entire human genome. A total of 12 million repeats have been extracted from the entire human genome and have been stored using MongoDB, a popular NoSQL database. A web application has been developed to query data from the database at ease. It is evident that bioinformaticians tend to shift their database development approach from traditional relational model to novel approaches like NoSQL in order to handle the massive amounts of biological data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cook, C.E., Bergman, M.T., Finn, R.D., Cochrane, G., Birney, E., Apweiler, R.: The European bioinformatics institute in 2016: data growth and integration. Nucleic Acids Res. 44(D1), D20–D26 (2015)
Marx, V.: Biology: the big challenges of big data. Nature 498(7453), 255–260 (2013)
Singer, E.: Biology’s big problem: there’s too much data to handle. Quanta Mag. 2014. Accessed 26 Jan
Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)
Manyam, G., Payton, M.A., Roth, J.A., Abruzzo, L.V., Coombes, K.R.: Relax with CouchDB—into the non-relational DBMS era of bioinformatics. Genomics 100(1), 1–7 (2012)
Chodorow, K.: MongoDB: The Definitive Guide: Powerful and Scalable Data Storage. O’Reilly Media, Inc. (2013)
Mudunuri, S.B., Nagarajaram, H.A.: IMEx: imperfect microsatellite extractor. Bioinformatics 23(10), 1181–1187 (2007)
Ellegren, H.: Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5(6), 435–445 (2004)
Sutherland, G.R., Richards, R.I.: Simple tandem DNA repeats and human genetic disease. Proc Natl Acad Sci 92(9) 3636–3641 (1995)
Mudunuri, S.B., Patnana, S., Nagarajaram, H.A.: MICdb3.0: a comprehensive resource of microsatellite repeats from prokaryotic genomes. Database (2014)
Archak, S., Meduri, E., Kumar, P.S., Nagaraju, J.: InSatDb: a microsatellite database of fully sequenced insect genomes. Nucleic Acids Res 35(suppl_1), D36–D39 (2006)
Sablok, G., Padma Raju, G.V., Mudunuri, S.B., Prabha, R., Singh, D.P., Baev, V., Yahubyan, G., Ralph, P.J., Porta, N.L.: ChloroMitoSSRDB 2.00: more genomes, more repeats, unifying SSRs search patterns and on-the-fly repeat detection. Database (2015)
Aishwarya, V., Grover, A., Sharma, P.C.: EuMicroSat db: a database for microsatellites in the sequenced genomes of eukaryotes. BMC Genomics 8(1), 225 (2007)
Blenda, A., Scheffler, J., Scheffler, B., Palmer, M., Lacape, J.M., John, Z.Y., Jesudurai, C., Jung, S., Muthukumar, S., Yellambalase, P., Ficklin, S.: CMD: a cotton microsatellite database resource for Gossypium genomics. BMC Genomics 7(1), 132 (2006)
Acknowledgements
The authors would like to thank Ms. Kranthi Chennamsetti, Centre for Bioinformatics Research and SRKR for her help in the extraction of microsatellites from human genome. This work is supported by SERB, Department of Science and Technology (DST), India (Grant ID://ECR/2016/000346).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chigurupati, S., Vegesna, K., Siva Rama Krishna Boddu, L.V., Nookala, G.K.M., Mudunuri, S.B. (2019). NO SQL Approach for Handling Bioinformatics Data Using MongoDB. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_25
Download citation
DOI: https://doi.org/10.1007/978-981-13-1498-8_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)