Abstract
Technology to gather biomic data now far exceeds the capabilities of tools to extract useful information and knowledge from it, a challenging predicament facing demands in our time, such as personalized medicine. We propose a new family of data structures to represent and process omics data in a way that is more anchored in biological reality and processed by algorithms that are more consistent with it, so that DNA itself can be used to process it to extract useful knowledge, organize and store it as needed. These structures enable much more efficient crunching of genomic and proteomics data and can be used as a foundation of a truly universal Genomic Positioning System (GenIS). The power of this approach is illustrated by applications to two important problems in biology, a new universal set of biomarkers and methods to do phylogenetic analysis and species identification and classification. We show that certain metrics on these representations can be used to obtain ab initio, from genomic data alone (possibly including full genomes), in a matter of minutes or hours, well established and accepted phylogenies crafted in biology (such as the 16S rRNA-based plylogenies) in the course of the last 50 years. We also show how the same representation can also be used to solve recognition problems associated with genomic data, which includes in particular the problem of species identification and a solution to the problem of storing large genomes into compact representations while preserving the ability to query them efficiently. We also sketch other applications to be explored in the future, including objective criteria to produce biological taxonomies to produce a truly universal and comprehensive “Atlas of Life”, as it is or as it could be on earth.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adleman, L.: Molecular computation of solutions of combinatorial problems. Science 266, 1021–1024 (1994)
Bi, H., Chen, J., Deaton, R., Garzon, M., Rubin, H., Wood, D.H.: A PCR protocol for in Vitro selection of non-crosshybridizing oligonucleotides. J. Nat. Comput. 2(3), 417–426 (2003)
Bobba, K.C., Neel, A.J., Phan, V., Garzon, M.H.: “Reasoning” and “Talking” DNA: can DNA understand English? In: Mao, C., Yokomori, T. (eds.) DNA 2006. LNCS, vol. 4287, pp. 337–349. Springer, Heidelberg (2006). doi:10.1007/11925903_26
Chen, J., Chen, S., Deng, L.-Y., Bowman, D., Shiau, J.-J., Wong, T.-Y., Madahian, B., Henry, L.: Phylogenetic tree construction using Trinucleotide Usage Profile (TUP). BMC 17(13), 381 (2016)
Deaton, J., Chen, J., Garzon, M., Wood, D.H.: Test Tube Selection of Large Independent Sets of DNA Oligonucleotides R, pp. 152–166. World Publishing Co. Singapore (Volume dedicated to Ned Seeman on occasion of his 60th birthday)
Garzon, M.H., Mainali, S.: Towards reliable microarray analysis and design. In: The 9th International Conference on Bioinformatics and Computational Biology (2017)
Garzon, M.H., Wong, T.-Y., Garzon, M.H., Wong, T.Y.: DNA chips for species identification and biological phylogenies. Nat. Comput. 10, 375–389 (2011)
Garzon, M.H., Bobba, K.C.: A geometric approach to Gibbs energy landscapes and optimal DNA codeword design. In: Stefanovic, D., Turberfield, A. (eds.) DNA 2012. LNCS, vol. 7433, pp. 73–85. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32208-2_6
Garzon, M.H., Phan, V., Neel, A.: Optimal codes for computing and self-assembly. Int. J. Nanotechnol. Mol. Comput. 1, 1–17 (2009)
Garzon, M.H., Yan, H. (eds.): DNA 2007. LNCS, vol. 4848. Springer, Heidelberg (2008). doi:10.1007/978-3-540-77962-9
Garzon, M.H., Phan, V., Roy, S., Neel, A.J.: In search of optimal codes for DNA computing. In: Mao, C., Yokomori, T. (eds.) DNA 2006. LNCS, vol. 4287, pp. 143–156. Springer, Heidelberg (2006). doi:10.1007/11925903_11
Garzon, M.H., Phan, V., Bobba, K.C., Kontham, R.: Sensitivity and capacity of microarray encodings. In: Carbone, A., Pierce, N.A. (eds.) DNA 2005. LNCS, vol. 3892, pp. 81–95. Springer, Heidelberg (2006). doi:10.1007/11753681_7
Garzon, M.H., Blain, D., Neel, A.J.: Virtual test tubes for biomolecular computing. J. Nat. Comput. 3(4), 461–477 (2004)
Hennig, W.: Grundzüge einer Theorie der Phylogenetischen Systematik (1950). English revision, Phylogenetic Systematics. (tr. D. Davis and R. Zangerl), Univ. of Illinois Press, Urbana 1966, reprinted 1979
Neel, A.J., Garzon, M.H.: DNA-based memories: a survey. In: Bel-Enguix, G., Jiménez-López, M.D., Martín-Vide, C. (eds.) New Developments in Formal Languages and Applications. SCI, vol. 113, pp. 259–275. Springer, Heidelberg (2008)
Reif,H., LaBean, T.H., Pirrung, M., Rana, V.S., Guo, B., Kingsford, C., Wickham, G.S.: Experimental construction of very large scale DNA databases with associative search capability. In: Jonoska, N., Seeman, N.C. (eds.) DNA 2001. LNCS, vol. 2340, pp. 231–247. Springer, Heidelberg (2002). doi:10.1007/3-540-48017-X_22
Schena, M.: Microarray Analysis. Wiley, Hoboken (2003)
Seeman, N.: DNA in a material world. Nature 421, 427–431 (2003)
Stekel, D.: Microarray Bioinformatics. Cambridge University Press, Cambridge (2003)
Volff, J.N., Altenbuchner, J.: A new beginning with new ends: linearisation of circular chromosomes during bacterial evolution. FEMS Microbiol. Lett. 186(2), 143–150 (2000)
Huget, J.M., Bizarro, C.V., Forns, N., Smith, S.B., Bustamante, C.A., Ritort, F.: Single-molecule derivation of salt-dependent base-pair free energies in DNA. PNAS 107(35), 15431–15436 (2010)
Winfree, E., Liu, F., Wenzler, L.A., Seeman, N.C.: Design and self-assembly of two-dimensional DNA crystals. Nature 394, 539–544 (1998)
Woese, C., Fox, G.: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. U.S.A. 74, 5088–5090 (1977)
http://en.wikipedia.org/wiki/Phylogenetics (2008). Accessed Feb 2017
Acknowledgements
Many thanks to the High Performance Computing Center (HPC) at the U of Memphis for the time to compute the digital signatures.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Garzon, M.H., Mainali, S. (2017). Towards a Universal Genomic Positioning System: Phylogenetics and Species IDentification. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-56154-7_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)