Advertisement

Multidimensional Scaling for Genomic Data

  • Audrone JakaitieneEmail author
  • Mara Sangiovanni
  • Mario R. Guarracino
  • Panos M. Pardalos
Chapter
Part of the Springer Optimization and Its Applications book series (SOIA, volume 107)

Abstract

Scientists working with genomic data face challenges to analyze and understand an ever-increasing amount of data. Multidimensional scaling (MDS) refers to the representation of high dimensional data in a low dimensional space that preserves the similarities between data points. Metric MDS algorithms aim to embed inter-point distances as close as the input dissimilarities. The computational complexity of most metric MDS methods is over O(n2), which restricts application to large genomic data (n ≫ 106). The application of non-metric MDS might be considered, in which inter-point distances are embedded considering only the relative order of the input dissimilarities. A non-metric MDS method has lower complexity compared to a metric MDS, although it does not preserve the true relationships. However, if the input dissimilarities are unreliable, too difficult to measure or simply unavailable, a non-metric MDS is the appropriate algorithm. In this paper, we give overview of both metric and non-metric MDS methods and their application to genomic data analyses.

Keywords

metric multidimensional scaling; non-metric multidimensional scaling; data mining; genomic data 

References

  1. 1.
    Agarwal, S., Wills, J., Cayton, L., Lanckriet, G., Kriegman, D.J., Belongie, S.: Generalized non-metric multidimensional scaling. In: International Conference on Artificial Intelligence and Statistics, pp. 11–18 (2007)Google Scholar
  2. 2.
    Arndt, D., Xia, J., Liu, Y., Zhou, Y., Guo, A.C., Cruz, J.A., Sinelnikov, I., Budwill, K., Nesbø, C.L., Wishart, D.S.: Metagenassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res. 40, W88–W95 (2012)CrossRefGoogle Scholar
  3. 3.
    Bécavin, C., Tchitchek, N., Mintsa-Eya, C., Lesne, A., Benecke, A.: Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition. Bioinformatics 27 (10), 1413–1421 (2011)CrossRefGoogle Scholar
  4. 4.
    Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer Series in Statistics, vol. 1. Springer, New York (2005)Google Scholar
  5. 5.
    Clarke, K., Warwick, R.: Change in Marine Communities: An Approach to Statistical Analysis and Interpretation. Primer-E Ltd., Devon (2001)Google Scholar
  6. 6.
    Cox, T.F., Cox, M.A.: Multidimensional Scaling. Chapman and HallCRC Monographs on Statistics and Applied Probability, vol. 88. Chapman and Hall/CRC Press, London/Boca Raton (2000)Google Scholar
  7. 7.
    Dzemyda, G., Kurasova, O., Žilinskas, J.: Multidimensional Data Visualization. Methods and Applications Series: Springer Optimization and its Applications, vol. 75, pp. 122. Springer, Berlin (2013)Google Scholar
  8. 8.
    Floudas, C.A., Pardalos, P.M.: Encyclopedia of Optimization, vol. 1. Springer Science and Business Media, Berlin (2008)zbMATHGoogle Scholar
  9. 9.
    Goll, J., Rusch, D.B., Tanenbaum, D.M., Thiagarajan, M., Li, K., Methé, B.A., Yooseph, S.: METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26 (20), 2631–2632 (2010)CrossRefGoogle Scholar
  10. 10.
    Gonzalez, A., Knight, R.: Advancing analytical algorithms and pipelines for billions of microbial sequences. Curr. Opin. Biotechnol. 23 (1), 64–71 (2012)CrossRefGoogle Scholar
  11. 11.
    Heinrich, V., Kamphans, T., Stange, J., Parkhomchuk, D., Hecht, J., Dickhaus, T., Robinson, P.N., Krawitz, P.M.: Estimating exome genotyping accuracy by comparing to data from large scale sequencing projects. Genome Med. 5, 1–11 (2013)CrossRefGoogle Scholar
  12. 12.
    Hughes, A., Ruan, Y., Ekanayake, S., Bae, S.H., Dong, Q., Rho, M., Qiu, J., Fox, G.: Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets. In: Proceedings from the Great Lakes Bioinformatics Conference 2011, vol. 13, p. S9. BioMed Central Ltd, London (2012)Google Scholar
  13. 13.
    Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29 (1), 1–27 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Malaspinas, A.S., Tange, O., Moreno-Mayar, J.V., Rasmussen, M., DeGiorgio, M., Wang, Y., Valdiosera, C.E., Politis, G., Willerslev, E., Nielsen, R.: Bammds: a tool for assessing the ancestry of low-depth whole-genome data using multidimensional scaling (MDS). Bioinformatics 30 (20), 2962–2964 (2014)CrossRefGoogle Scholar
  15. 15.
    Marx, V.: Biology: the big challenges of big data. Nature 498 (7453), 255–260 (2013)CrossRefGoogle Scholar
  16. 16.
    McCue, M.E., Bannasch, D.L., Petersen, J.L., Gurr, J., Bailey, E., Binns, M.M., Distl, O., Guérin, G., Hasegawa, T., Hill, E.W., et al.: A high density SNP array for the domestic horse and extant perissodactyla: utility for association mapping, genetic diversity, and phylogeny studies. PLoS Genet. 8 (1), e1002,451 (2012)CrossRefGoogle Scholar
  17. 17.
    Metzker, M.L.: Sequencing technologies—the next generation. Nat. Rev. Genet. 11 (1), 31–46 (2010)CrossRefGoogle Scholar
  18. 18.
    Morrison, A., Ross, G., Chalmers, M.: Fast multidimensional scaling through sampling, springs and interpolation. Inf. Vis. 2 (1), 68–77 (2003)CrossRefGoogle Scholar
  19. 19.
    Nekrutenko, A., Taylor, J.: Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13 (9), 667–672 (2012)CrossRefGoogle Scholar
  20. 20.
    Pardalos, P.M., Shalloway, D., Xue, G., et al.: Global Minimization of Nonconvex Energy Functions: Molecular Conformation and Protein Folding. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 23. American Mathematical Society, Providence, RI (1996)Google Scholar
  21. 21.
    Park, S., Shin, S.Y., Hwang, K.B.: CFMDS: CUDA-based fast multidimensional scaling for genome-scale data. BMC Bioinf. 13 (Suppl 17), 1–23 (2012)Google Scholar
  22. 22.
    Park, J., Brureau, A., Kernan, K., Starks, A., Gulati, S., Ogunnaike, B., Schwaber, J., Vadigepalli, R.: Inputs drive cell phenotype variability. Genome Res. 24 (6), 930–941 (2014)CrossRefGoogle Scholar
  23. 23.
    Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., De Bakker, P.I., Daly, M.J., et al.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81 (3), 559–575 (2007)CrossRefGoogle Scholar
  24. 24.
    Ruan, Y., Ekanayake, S., Rho, M., Tang, H., Bae, S.H., Qiu, J., Fox, G.: DACIDR: deterministic annealed clustering with interpolative dimension reduction using a large collection of 16s rRNA sequences. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB ’12, pp. 329–336. ACM, New York (2012)Google Scholar
  25. 25.
    Ruan, Y., House, G.L., Ekanayake, S., Schutte, U., Bever, J.D., Tang, H., Fox, G.: Integration of clustering and multidimensional scaling to determine phylogenetic trees as spherical phylograms visualized in 3 dimensions. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 720–729. IEEE, New York (2014)Google Scholar
  26. 26.
    Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., et al.: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75 (23), 7537–7541 (2009)CrossRefGoogle Scholar
  27. 27.
    Schloss, P.D., Gevers, D., Westcott, S.L.: Reducing the effects of pcr amplification and sequencing artifacts on 16s rRNA-based studies. PloS One 6 (12), e27,310 (2011)CrossRefGoogle Scholar
  28. 28.
    Staley, C., Unno, T., Gould, T.J., Jarvis, B., Phillips, J., Cotner, J.B., Sadowsky, M.J.: Application of Illumina next-generation sequencing to characterize the bacterial community of the Upper Mississippi River. J. Appl. Microbiol. 115 (5), 1147–1158 (2013)CrossRefGoogle Scholar
  29. 29.
    Stanberry, L., Higdon, R., Haynes, W., Kolker, N., Broomall, W., Ekanayake, S., Hughes, A., Ruan, Y., Qiu, J., Kolker, E., et al.: Visualizing the protein sequence universe. Concurr. Comput. Pract. Exper. 26 (6), 1313–1325 (2014)CrossRefGoogle Scholar
  30. 30.
    Taguchi, Y.h., Oono, Y.: Relational patterns of gene expression via non-metric multidimensional scaling analysis. Bioinformatics 21 (6), 730–740 (2005)Google Scholar
  31. 31.
    Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17 (4), 401–419 (1952)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Tzeng, J., Lu, H.H., Li, W.H.: Multidimensional scaling for large genomic data sets. BMC Bioinf. 9 (1), 179 (2008)CrossRefGoogle Scholar
  33. 33.
    Wolfe, P.J.: Making sense of big data. Proc. Natl. Acad. Sci. 110 (45), 18031–18032 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Zhu, C., Yu, J.: Nonmetric multidimensional scaling corrects for population structure in association mapping with different sample types. Genetics 182 (3), 875–888 (2009)CrossRefGoogle Scholar
  35. 35.
    Žilinskas, A., Jakaitiene, A.: A conjugate gradient method for two dimensional scaling. Commun. Cognition. Monograph. 43 (3–4), 3–13 (2010)Google Scholar
  36. 36.
    Žilinskas, A., Žilinskas, J.: Parallel genetic algorithm: assessment of performance in multidimensional scaling. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07, pp. 1492–1501. ACM, New York (2007)Google Scholar
  37. 37.
    Žilinskas, A., Žilinskas, J.: Two level minimization in multidimensional scaling. J. Glob. Optim. 38 (4), 581–596 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Žilinskas, A., Žilinskas, J.: Optimization-based visualization. In: Encyclopedia of Optimization, pp. 2785–2791. Springer, Berlin (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Audrone Jakaitiene
    • 1
    Email author
  • Mara Sangiovanni
    • 2
  • Mario R. Guarracino
    • 2
  • Panos M. Pardalos
    • 3
  1. 1.System Analysis Department, Institute of Mathematics and InformaticsVilnius UniversityVilniusLithuania
  2. 2.High Performance Computing and Networking InstituteNational Research CouncilNaplesItaly
  3. 3.Center for Applied OptimizationUniversity of FloridaGainesvilleUSA

Personalised recommendations