Handling High-Dimension (High-Feature) MicroRNA Data

  • Yue Hu
  • Wenjun LanEmail author
  • Daniel Miller
Part of the Methods in Molecular Biology book series (MIMB, volume 1617)


High-dimensional data, or high-feature variables, are often used to describe the characteristics of microRNA sequence and microarray data. As a consequence, the curse of high dimension often becomes a problem. High-dimension variables lead to many difficulties in processing and can be hard to understand. On the other aspect, as the sample size rather limited, the more variables, the more statistical error would be produced in the data processing. For the purpose of decreasing the dimension of variables, a degenerated k-mer method was suggested. To enhance the statistical robustness, the gapped k-mer method was introduced. In the last part of this chapter, some traditional supervised and unsupervised mathematical methods that used to decrease the dimensionality of the data are also described.

Key words

High-dimension miRNA Degenerated k-mer Gapped k-mer Dimension decreasing 


  1. 1.
    Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5):843–854CrossRefPubMedGoogle Scholar
  2. 2.
    Pritchard CC, Cheng HH, Tewari M (2012) MicroRNA profiling: approaches and considerations. Nat Rev Genet 13(5):358–369CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie AE, Horvitz HR, Ruvkun G (2000) The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403(6772):901–906CrossRefPubMedGoogle Scholar
  4. 4.
    Wightman B, Ha I, Ruvkun G (1993) Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75(5):855–862CrossRefPubMedGoogle Scholar
  5. 5.
    Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA (2005) MicroRNA expression profiles classify human cancers. Nature 435(7043):834–838CrossRefPubMedGoogle Scholar
  6. 6.
    Herrera B, Lockstone H, Taylor J, Ria M, Barrett A, Collins S, Kaisaki P, Argoud K, Fernandez C, Travers M (2010) Global microRNA expression profiles in insulin target tissues in a spontaneous rat model of type 2 diabetes. Diabetologia 53(6):1099–1109CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Pandey AK, Agarwal P, Kaur K, Datta M (2009) MicroRNAs in diabetes: tiny players in big disease. Cell Physiol Biochem 23(4–6):221–232CrossRefPubMedGoogle Scholar
  8. 8.
    Zampetaki A, Kiechl S, Drozdov I, Willeit P, Mayr U, Prokopi M, Mayr A, Weger S, Oberhollenzer F, Bonora E (2010) Plasma microRNA profiling reveals loss of endothelial miR-126 and other microRNAs in type 2 diabetes. Circ Res 107(6):810–817CrossRefPubMedGoogle Scholar
  9. 9.
    Liu B, Fang L, Wang S, Wang X, Li H, Chou K-C (2015) Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. J Theor Biol 385:153–159CrossRefPubMedGoogle Scholar
  10. 10.
    Li A, Zhang J, Zhou Z (2014) PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 15(1):311CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Zhang Y, Wang X, Kang L (2011) A k-mer scheme to predict piRNAs and characterize locust piRNAs. Bioinformatics 27(6):771–776CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Chen W, Lei T-Y, Jin D-C, Lin H, Chou K-C (2014) PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem 456:53–60CrossRefPubMedGoogle Scholar
  13. 13.
    Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou K-C (2014) PseKNC-general: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31(1):119–120CrossRefPubMedGoogle Scholar
  14. 14.
    Chou K-C (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19CrossRefPubMedGoogle Scholar
  15. 15.
    Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255CrossRefPubMedGoogle Scholar
  16. 16.
    Xue C, Li F, He T, Liu G-P, Li Y, Zhang X (2005) Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6(1):310CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Liu B, Fang L, Liu F, Wang X, Chen J, Chou K-C (2015) Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 10:e0121501CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER (2013) The next-generation sequencing revolution and its impact on genomics. Cell 155(1):27–38CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M (2012) Comparison of next-generation sequencing systems. Biomed Res Int 2012:11Google Scholar
  20. 20.
    Ghandi M, Lee D, Mohammad-Noori M, Beer MA (2014) Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput Biol 10(7):e1003711CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Ghandi M, Mohammad-Noori M, Beer MA (2014) Robust k-mer frequency estimation using gapped k-mers. J Math Biol 69(2):469–500CrossRefPubMedGoogle Scholar
  22. 22.
    Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, Beer MA (2015) A method to predict the impact of regulatory variants from DNA sequence. Nat Genet 47(8):955–961CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Boulesteix A-L (2004) PLS dimension reduction for classification with microarray data. Stat Appl Genet Mol Biol 3(1):1–30Google Scholar
  24. 24.
    Dai JJ, Lieu L, Rocke D (2006) Dimension reduction for classification with gene expression microarray data. Stat Appl Genet Mol Biol 5(1)Google Scholar
  25. 25.
    Hero AO Dimension reduction for classification[J]Google Scholar
  26. 26.
    Li L, Simonoff JS, Tsai C-L (2007) Tobit model estimation and sliced inverse regression. Stat Modelling 7(2):107–123CrossRefGoogle Scholar
  27. 27.
    Liu Y, Rayens W (2007) PLS and dimension reduction for classification. Comput Stat 22(2):189–208CrossRefGoogle Scholar
  28. 28.
    Lue H-H (2009) Sliced inverse regression for multivariate response regression. J Stat Plan Inference 139(8):2656–2664CrossRefGoogle Scholar
  29. 29.
    Wang H, Xia Y (2008) Sliced regression for dimension reduction. J Am Stat Assoc 103(482):811–821CrossRefGoogle Scholar
  30. 30.
    Wu Q, Mukherjee S, Liang F (2009) Localized sliced inverse regression. In: Advances in neural information processing systems. MIT Press, Cambridge MA, pp 1785–1792Google Scholar
  31. 31.
    Li L, Li H (2004) Dimension reduction methods for microarrays with application to censored survival data. Bioinformatics 20(18):3406–3412CrossRefPubMedGoogle Scholar
  32. 32.
    Hisaoka M, Matsuyama A, Nagao Y, Luan L, Kuroda T, Akiyama H, Kondo S, Hashimoto H (2011) Identification of altered MicroRNA expression patterns in synovial sarcoma. Genes Chromosomes Cancer 50(3):137–145CrossRefPubMedGoogle Scholar
  33. 33.
    Li W, Ruan K (2009) MicroRNA detection by microarray. Anal Bioanal Chem 394(4):1117–1124CrossRefPubMedGoogle Scholar
  34. 34.
    Konishi H, Ichikawa D, Komatsu S, Shiozaki A, Tsujiura M, Takeshita H, Morimura R, Nagata H, Arita T, Kawaguchi T (2012) Detection of gastric cancer-associated microRNAs on microRNA microarray comparing pre-and post-operative plasma. Br J Cancer 106(4):740–747CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459CrossRefGoogle Scholar
  36. 36.
    Jolliffe I (2002) Principal component analysis. Wiley Online Library, New JerseyGoogle Scholar
  37. 37.
    Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1):37–52CrossRefGoogle Scholar
  38. 38.
    Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2(6):418–427CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.College of BioengineeringQilu University of TechnologyJinanPeople’s Republic of China
  2. 2.School of BioengineeringQilu University of TechnologyJinanPeople’s Republic of China
  3. 3.School of ComputingUniversity of South AlabamaMobileUSA

Personalised recommendations