Skip to main content

Towards a Universal Genomic Positioning System: Phylogenetics and Species IDentification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Abstract

Technology to gather biomic data now far exceeds the capabilities of tools to extract useful information and knowledge from it, a challenging predicament facing demands in our time, such as personalized medicine. We propose a new family of data structures to represent and process omics data in a way that is more anchored in biological reality and processed by algorithms that are more consistent with it, so that DNA itself can be used to process it to extract useful knowledge, organize and store it as needed. These structures enable much more efficient crunching of genomic and proteomics data and can be used as a foundation of a truly universal Genomic Positioning System (GenIS). The power of this approach is illustrated by applications to two important problems in biology, a new universal set of biomarkers and methods to do phylogenetic analysis and species identification and classification. We show that certain metrics on these representations can be used to obtain ab initio, from genomic data alone (possibly including full genomes), in a matter of minutes or hours, well established and accepted phylogenies crafted in biology (such as the 16S rRNA-based plylogenies) in the course of the last 50 years. We also show how the same representation can also be used to solve recognition problems associated with genomic data, which includes in particular the problem of species identification and a solution to the problem of storing large genomes into compact representations while preserving the ability to query them efficiently. We also sketch other applications to be explored in the future, including objective criteria to produce biological taxonomies to produce a truly universal and comprehensive “Atlas of Life”, as it is or as it could be on earth.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Adleman, L.: Molecular computation of solutions of combinatorial problems. Science 266, 1021–1024 (1994)

    Article  Google Scholar 

  2. Bi, H., Chen, J., Deaton, R., Garzon, M., Rubin, H., Wood, D.H.: A PCR protocol for in Vitro selection of non-crosshybridizing oligonucleotides. J. Nat. Comput. 2(3), 417–426 (2003)

    Article  MATH  Google Scholar 

  3. Bobba, K.C., Neel, A.J., Phan, V., Garzon, M.H.: “Reasoning” and “Talking” DNA: can DNA understand English? In: Mao, C., Yokomori, T. (eds.) DNA 2006. LNCS, vol. 4287, pp. 337–349. Springer, Heidelberg (2006). doi:10.1007/11925903_26

    Chapter  Google Scholar 

  4. Chen, J., Chen, S., Deng, L.-Y., Bowman, D., Shiau, J.-J., Wong, T.-Y., Madahian, B., Henry, L.: Phylogenetic tree construction using Trinucleotide Usage Profile (TUP). BMC 17(13), 381 (2016)

    Google Scholar 

  5. Deaton, J., Chen, J., Garzon, M., Wood, D.H.: Test Tube Selection of Large Independent Sets of DNA Oligonucleotides R, pp. 152–166. World Publishing Co. Singapore (Volume dedicated to Ned Seeman on occasion of his 60th birthday)

    Google Scholar 

  6. Garzon, M.H., Mainali, S.: Towards reliable microarray analysis and design. In: The 9th International Conference on Bioinformatics and Computational Biology (2017)

    Google Scholar 

  7. Garzon, M.H., Wong, T.-Y., Garzon, M.H., Wong, T.Y.: DNA chips for species identification and biological phylogenies. Nat. Comput. 10, 375–389 (2011)

    Article  MathSciNet  Google Scholar 

  8. Garzon, M.H., Bobba, K.C.: A geometric approach to Gibbs energy landscapes and optimal DNA codeword design. In: Stefanovic, D., Turberfield, A. (eds.) DNA 2012. LNCS, vol. 7433, pp. 73–85. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32208-2_6

    Chapter  Google Scholar 

  9. Garzon, M.H., Phan, V., Neel, A.: Optimal codes for computing and self-assembly. Int. J. Nanotechnol. Mol. Comput. 1, 1–17 (2009)

    Article  Google Scholar 

  10. Garzon, M.H., Yan, H. (eds.): DNA 2007. LNCS, vol. 4848. Springer, Heidelberg (2008). doi:10.1007/978-3-540-77962-9

    MATH  Google Scholar 

  11. Garzon, M.H., Phan, V., Roy, S., Neel, A.J.: In search of optimal codes for DNA computing. In: Mao, C., Yokomori, T. (eds.) DNA 2006. LNCS, vol. 4287, pp. 143–156. Springer, Heidelberg (2006). doi:10.1007/11925903_11

    Chapter  Google Scholar 

  12. Garzon, M.H., Phan, V., Bobba, K.C., Kontham, R.: Sensitivity and capacity of microarray encodings. In: Carbone, A., Pierce, N.A. (eds.) DNA 2005. LNCS, vol. 3892, pp. 81–95. Springer, Heidelberg (2006). doi:10.1007/11753681_7

    Chapter  Google Scholar 

  13. Garzon, M.H., Blain, D., Neel, A.J.: Virtual test tubes for biomolecular computing. J. Nat. Comput. 3(4), 461–477 (2004)

    Article  Google Scholar 

  14. Hennig, W.: Grundzüge einer Theorie der Phylogenetischen Systematik (1950). English revision, Phylogenetic Systematics. (tr. D. Davis and R. Zangerl), Univ. of Illinois Press, Urbana 1966, reprinted 1979

    Google Scholar 

  15. Neel, A.J., Garzon, M.H.: DNA-based memories: a survey. In: Bel-Enguix, G., Jiménez-López, M.D., Martín-Vide, C. (eds.) New Developments in Formal Languages and Applications. SCI, vol. 113, pp. 259–275. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Reif,H., LaBean, T.H., Pirrung, M., Rana, V.S., Guo, B., Kingsford, C., Wickham, G.S.: Experimental construction of very large scale DNA databases with associative search capability. In: Jonoska, N., Seeman, N.C. (eds.) DNA 2001. LNCS, vol. 2340, pp. 231–247. Springer, Heidelberg (2002). doi:10.1007/3-540-48017-X_22

    Chapter  Google Scholar 

  17. Schena, M.: Microarray Analysis. Wiley, Hoboken (2003)

    Google Scholar 

  18. Seeman, N.: DNA in a material world. Nature 421, 427–431 (2003)

    Article  MathSciNet  Google Scholar 

  19. Stekel, D.: Microarray Bioinformatics. Cambridge University Press, Cambridge (2003)

    Book  Google Scholar 

  20. Volff, J.N., Altenbuchner, J.: A new beginning with new ends: linearisation of circular chromosomes during bacterial evolution. FEMS Microbiol. Lett. 186(2), 143–150 (2000)

    Article  Google Scholar 

  21. Huget, J.M., Bizarro, C.V., Forns, N., Smith, S.B., Bustamante, C.A., Ritort, F.: Single-molecule derivation of salt-dependent base-pair free energies in DNA. PNAS 107(35), 15431–15436 (2010)

    Article  Google Scholar 

  22. Winfree, E., Liu, F., Wenzler, L.A., Seeman, N.C.: Design and self-assembly of two-dimensional DNA crystals. Nature 394, 539–544 (1998)

    Article  Google Scholar 

  23. Woese, C., Fox, G.: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. U.S.A. 74, 5088–5090 (1977)

    Article  Google Scholar 

  24. http://en.wikipedia.org/wiki/Phylogenetics (2008). Accessed Feb 2017

Download references

Acknowledgements

Many thanks to the High Performance Computing Center (HPC) at the U of Memphis for the time to compute the digital signatures.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max H. Garzon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Garzon, M.H., Mainali, S. (2017). Towards a Universal Genomic Positioning System: Phylogenetics and Species IDentification. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56154-7_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56153-0

  • Online ISBN: 978-3-319-56154-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics