Abstract
Bioinformatics is an interdisciplinary subject of bonded relationship in between computer science, mathematics, and molecular biology. Biological information keeps growing tremendously. Molecular biologists are specialized in solving bioinformatics issues such as to store, analyze, and retrieve biological data by applying algorithm and techniques of computer science. This review is from the computer science perspective. The fundamental terminology of bioinformatics and its definition are essential to understand bioinformatics in depth. There are main three components of bioinformatics and data types. Data types are input format for tools or software. Real-life databases of bioinformatics are also discussed which are important for analyzing the algorithms. We then provide bioinformatics applications in various areas. As bioinformatics is a fusion from many disciplines, there are lots of research issues and challenges, but computational and biological research issues and challenges are quite significant. Nowadays, the tremendous amount of biological data are being generated, Due to them, bioinformatics has emerging future research trends in big data, machine learning, and deep learning which are presented at last.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Luscombe NM, Greenbaum D, Gerstein M, What is bioinformatics? An introduction and overview, NCBI, 83–99 (2001)
SABU M. THAMPI Introduction to Bioinformatics, CoRR (2009)
Hogeweg, Paulien, The Roots of Bioinformatics in Theoretical Biology, PLoS Computational Biology (2011)
National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov
Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K. and Walter, P. Molecular Biology of the Cell. 4th Edn, Annals of Botany, vol. 91.3 (2003)
Ribonucleic Acid, https://www.nature.com/scitable/definition/ribonucleic-acid-rna-45
Pearson H., Genetics: what is a gene?, Nature, 441, 398–401 (2006)
International Human Genome Sequencing Consortium, Finishing the euchromatic sequence of the human genome, Nature, 431, 931–45 (2004)
The Chimpanzee Sequencing and Analysis Consortium 2005, Initial Sequence of the Chimpanzee Genome and Comparison with the Human Genome, Nature, 37, 69–7 (2005)
Allele, https://www.nature.com/scitable/definition/allele-48
DNA Sequencing, https://www.genome.gov/10001177/dna-sequencing-fact-sheet
Griffiths AJF, Miller JH, Suzuki DT, An Introduction to Genetic Analysis-7th edition. W. H. Freeman, New York (2000)
Chromosomes, https://www.ncbi.nlm.nih.gov/pubmedhealth/PMHT0025047
Proteins, https://www.nature.com/subjects/proteins
J.Christopher Anderson, Thomas J Magliery, Peter G Schultz, Exploring the Limits of Codon and Anticodon Size, In Chemistry & Biology, Vol. 9, Issue 2, pp. 237–244, (2002)
Annunziato, A. T. Split decision: What happens to nucleosomes during DNA replication? Journal of Biological Chemistry, 280, pp. 12065–12068 (2005)
Ribosomes, Transcription, and Translation, https://www.nature.com/scitable/topicpage/ribosomes-transcription-and-translation-14120660
Genetic Mutation, https://www.nature.com/scitable/topicpage/genetic-mutation-1127
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL. GenBank. Nucleic Acids Research, 28, (2000)
Okayama T, Tamura T, Gojobori T, Tateno Y, Ikeo K, Miyazaki S, Formal design and implementation of an improved DDBJ DNA database with a new schema and object-oriented library, Bioinformatics 14, (1998)
Baker W, van den Broek A, Camon E, Hingamp P, Sterk P, Stoesser G, The EMBL nucleotide sequence database. Nucleic Acids Research, 28, pp. 19–23 (2000)
The National Center for Biotechnology Information Programs and Activities, https://www.nlm.nih.gov/pubs/factsheets/ncbi.html
Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000, Nucleic Acids Research, 28 (2000)
McGarvey PB, Huang H, Barker WC, Orcutt BC, Garavelli JS, Srinivasarao GY, et al. PIR: a new resource for bioinformatics. Bioinformatics, 16, pp. 290–291 (2000)
Bleasby AJ, Akrigg D, Attwood TK. OWL—a non-redundant composite protein sequence database. Nucleic Acids Research, 22, pp. 3574–3577 (1994)
Bleasby AJ, Wootton JC. Construction of validated, non-redundant composite protein sequence databases. Protein Eng, 3, pp. 153–159 (1990)
Hofmann K, Bucher P, Falquet L, Bairoch A. The PROSITE database, its status in 1999. Nucleic Acids Research, 27, pp. 215–219 (1999)
Attwood TK, Croning MD, Flower DR, Lewis AP, Mabey JE, Scordis P, PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Research 2000, 28, pp. 225–227 (2000)
Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL. The Pfam protein families database. Nucleic Acids Research, 28, pp. 263–266 (2000)
Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr., Brice MD, Rodgers JR, The Protein Data Bank. A computer-based archival file for macromolecular structures. Eur J Biochem, 80, (1977)
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, The Protein Data Bank. Nucleic Acids Research, 28, (2000)
Berman HM, Olson WK, Beveridge DL, Westbrook J, Gelbin A, Demeny T, The Nucleic Acid Database. A comprehensive relational database of threedimensional structures of nucleic acids. Biophys J, 63, pp. 751–759 (1992)
Vondrasek J, Wlodawer A. Database of HIV proteinase structures. TIBS, 22, (1997)
Hendlich M. Databases for protein-ligand complexes. Acta Cryst D 54, (1998)
Laskowski RA, Hutchinson EG, Michie AD, Wallace AC, Jones ML, Thornton JM. PDBsum: a Web-based database of summaries and analyses of all PDB structures. TIBS, 22, pp. 488–490 (1997)
Pearl FM, Lee D, Bray JE, Sillitoe I, Todd AE, Harrison AP, Assigning genomic sequences to CATH. Nucleic Acids Research, 28, pp. 277–282 (2000)
Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C. SCOP: a structural classification of proteins database. Nucleic Acids Res, 28, pp. 257–259 (2000)
Holm L, Sander C. Touring protein fold space with Dali/FSSP. Nucleic Acids Research, 26, pp. 316–319 (1998)
Brenner SE, Koehl P, Levitt M. The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Research, 28, pp. 254–256 (2000)
Mizuguchi K, Deane CM, Blundell TL, Overington JP. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Science : A Publication of the Protein Society, 7, pp. 2469–2471 (1998)
Tatusova TA, Karsch-Mizrachi I, Ostell JA. Complete genomes in WWW Entrez: data representation and analysis. Bioinformatics, 15, (1999)
Lin J, Gerstein M. Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Research, 10 (2000)
Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science, 278, (1997)
Attwood TK, Flower DR, Lewis AP, Mabey JE, Morgan SR, Scordis P, PRINTS prepares for the new millennium. Nucleic Acids Research, 27 pp. 220–225 (1999)
Etzold T, Ulyanov A, Argos P. SRS: information retrieval system for molecular biology data banks. Methods Enzymol, 266 (1996)
Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Methods Enzymol, 266, (1996)
Makarova, Kira S. Genome of the Extremely Radiation-Resistant Bacterium Deinococcus Radiodurans Viewed from the Perspective of Comparative Genomics. Microbiology and Molecular Biology Reviews, 65, pp. 44–79 (2001)
Samuel Levy, Granger Sutton, Pauline C Ng, Lars Feuk, Aaron L Halpern, Brian P Walenz, Nelson Axelrod, Jiaqi Huang, Ewen F Kirkness, Gennady Denisov, Yuan Lin, Jeffrey R MacDonald, Andy Wing Chun Pang, Mary Shago, Timothy B Stockwell, Alexia Tsiamouri, Vineet Bafna, Vikas Bansal, Saul A Kravitz, Dana A Busam, Karen Y Beeson, Tina C McIntosh, Karin A Remington, Josep F Abril, John Gill, Jon Borman, Yu-Hui Rogers, Marvin E Frazier, Stephen W Scherer, Robert L Strausberg, J. Craig Venter, “The Diploid Genome Sequence of an Individual Human”, PLoS Biology, 5 (2007)
Eva Bianconi and Allison Piovesan and Federica Facchin and Alina Beraudi and Raffaella Casadei and Flavia Frabetti and Lorenza Vitale and Maria Chiara Pelleri and Simone Tassani and Francesco Piva and Soledad Perez-Amodio and Pierluigi Strippoli and Silvia Canaider, An estimation of the number of cells in the human body, Annals of Human Biology, Vol. 40, pp. 463–471 (2013)
K. Shvachko, H. Kuang, S. Radia and R. Chansler, The Hadoop Distributed File System, In: 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), Incline Village, NV, pp. 1–10 (2010)
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica., Spark: cluster computing with working sets. In: 2nd USENIX conference on Hot topics in cloud computing (HotCloud’10), USENIX Association, Berkeley, CA, USA, 10 (2010)
Naresh Kumar Gundla, Zhengxin Chen, Creating NoSQL Biological Databases with Ontologies for Query Relaxation, In: Computer Science, Vol. 91, pp. 460–469, (2016)
J. R. Quinlan. Induction of Decision Trees. Mach. Learn. 1, pp. 81–106 (1986)
Pedro Domingos and Michael Pazzani. 1997. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Mach. Learn. 29, pp. 103–130 (1997)
Christopher J. C. Burges. 1998. A Tutorial on Support Vector Machines for Pattern Recognition. Data Min. Knowl. Discov. 2, pp. 121–167 (1998)
Hyunsoo Yoon, Cheong-Sool Park, Jun Seok Kim, Jun-Geol Baek, Algorithm learning based neural network integrating feature selection and classification, Expert Systems with Applications, Vol. 40, pp. 231–241 (2013)
T. Cover and P. Hart. 2006. Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13, pp. 21–27 (2006)
John A. Hartigan. Clustering Algorithms, John Wiley & Sons, Inc., New York, NY, USA (1975)
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks. In: 25th International Conference on Neural Information Processing Systems (NIPS’12), F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), Vol. 1. Curran Associates Inc., USA, pp. 1097–1105 (2012)
Zachary C. Lipton, John Berkowitz, Charles Elkan, A Critical Review of Recurrent Neural Networks for Sequence Learning, CoRR, (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Goel, P., Padole, M. (2019). Bioinformatics: An Application in Information Science. In: Bapi, R., Rao, K., Prasad, M. (eds) First International Conference on Artificial Intelligence and Cognitive Computing . Advances in Intelligent Systems and Computing, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-13-1580-0_22
Download citation
DOI: https://doi.org/10.1007/978-981-13-1580-0_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1579-4
Online ISBN: 978-981-13-1580-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)