Advertisement

Journal of Systems Science and Complexity

, Volume 28, Issue 5, pp 1212–1230 | Cite as

On Eigen-matrix translation method for classification of biological data

  • Hao JiangEmail author
  • Yushan Qiu
  • Xiaoqing Cheng
  • Waiki Ching
Article
  • 108 Downloads

Abstract

Driven by the challenge of integrating large amount of experimental data, classification technique emerges as one of the major and popular tools in computational biology and bioinformatics research. Machine learning methods, especially kernel methods with Support Vector Machines (SVMs) are very popular and effective tools. In the perspective of kernel matrix, a technique namely Eigenmatrix translation has been introduced for protein data classification. The Eigen-matrix translation strategy has a lot of nice properties which deserve more exploration. This paper investigates the major role of Eigen-matrix translation in classification. The authors propose that its importance lies in the dimension reduction of predictor attributes within the data set. This is very important when the dimension of features is huge. The authors show by numerical experiments on real biological data sets that the proposed framework is crucial and effective in improving classification accuracy. This can therefore serve as a novel perspective for future research in dimension reduction problems.

Keywords

Classification dimension reduction eigen-matrix translation glycan data kernel method (KM) support vector machine (SVM) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Fielding A H, Cluster and Classification Techniques for the Biosciences, 1st Edition Cambridge, U.K., 2007.Google Scholar
  2. [2]
    Watanabe S, Knowing and Guessing: A Quantitative Study of Inference and Information, New York U.S.A., 1969.zbMATHGoogle Scholar
  3. [3]
    Agrawal R, Gehrke J, Gunopulos D and Raghavan R, Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, Proceedings of the 1998 ACM-SIGMOD International Conference on the Management of Data (SIGMOD98), Seattle, WA, June 2–4, 1998.Google Scholar
  4. [4]
    Dy J and Brodley C E, Feature subset selection and order identification for unsupervised learning, The Seventeenth International Conference on Machine Learning, Stanford, CA, USA, June 29, 2000.Google Scholar
  5. [5]
    Schölkopf B and Smola A J, A short introduction to learning with kernels, Advanced Lectures on Machine Learning, New York, U.S., 2003.Google Scholar
  6. [6]
    Borgwardt K and Kriegel H, Kernel Methods for Protein Function Prediction, AFP-SIG, Detroit, USA: Oxford, 2005.Google Scholar
  7. [7]
    Jaakola T, Diekhans M, and Haussler D, A discriminant framework for detecting remote protein homologies, Journal of Computational Biology, 2000, 7: 95–114.CrossRefGoogle Scholar
  8. [8]
    Shawe-Taylor J and Cristianini N, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004.CrossRefGoogle Scholar
  9. [9]
    Leslie C, Eskin E, Cohen A, and Noble W, The spectrum kernel: A string kernel for SVM protein classification, Proceedings of the Pacific Biocomputing Symposium, Hawaii, 2002.Google Scholar
  10. [10]
    Leslie C, Eskin E, Weston J, and Noble W, Mismatch string kernel for discriminative protein classification, Bioinformatics, 2004, 20: 467–476.CrossRefGoogle Scholar
  11. [11]
    Yuan Y, Lin L, Dong Q, Wang X, and Li M, A protein classification method based on latent semantic analysis, Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, Shanghai, 2005.Google Scholar
  12. [12]
    Ratsch G, Sonnenburg S, and Scolkopf B, RASE: Recognition of alternatively spliced exons in c. elegans, Bioinformatics, 2005, 21: 1369–1377.CrossRefGoogle Scholar
  13. [13]
    Webb-Robertson B, Ratuiste K, and Oehmen C, Physicochemical property distributions for accurate and rapid pairwise protein homology detection, BMC Bioinformatics, 2010, 11: 145.CrossRefGoogle Scholar
  14. [14]
    Jiang H and Ching W, Physico-chemically weighted kernel for SVM protein classification, Proceedings of the 2nd International Conference on Biomedical Engineering and Computer Science (ICBECS 2011), 23–24 April, Wuhan, China, 2011.Google Scholar
  15. [15]
    Horn R and Johnson C, Matrix Analysis, Cambridge University Press Cambridge, 1985.CrossRefzbMATHGoogle Scholar
  16. [16]
    Donoho D, High-dimensional data analysis: The curses and blessings of dimensionality, American Mathematical Society Conference of Math Challenges of the 21st Century, Los Angeles, August, 2000.Google Scholar
  17. [17]
    Bellman R, Adaptive Control Processes: A Guided Tour, Princeton University Press Princeton, New Jersey, 1961.zbMATHGoogle Scholar
  18. [18]
    Breiman L, Random forests, Machine Learning, 2001, 45: 5–32.CrossRefzbMATHGoogle Scholar
  19. [19]
    Jiang H and Ching W, Kernel techniques in support vector machines for classification of biological data, International Journal of Information Technology and Computer Science, 2011, 3: 1–8.MathSciNetCrossRefGoogle Scholar
  20. [20]
    He H, Eigenvectors and reconstruction, The Electronic Journal of Combinatorics, 2007, 14: 1–8.Google Scholar
  21. [21]
    Functional Glycomics Gateway, Available at http://www.functionalglycomics.org.Google Scholar
  22. [22]
    Yang Y, Lin L, Dong Q, Wang X, and Li M, Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties, Journal of Theorietical Biology, 2008, 252: 145–154.CrossRefGoogle Scholar
  23. [23]
    http://hkumath.hku.hk/~wkc/papers/ieeeadditionalfile1.pdf.Google Scholar
  24. [24]
    Mamitsuka H, Selecting features in microarray classification using ROC curves, Pattern Recognition, 2006, 39: 2393–2404.CrossRefzbMATHGoogle Scholar
  25. [25]
    Fan J Q and Fan Y Y, High-dimensional classification using features annealed independence rules, Annals of Statistics, 2008, 36: 2605–2637.MathSciNetCrossRefzbMATHGoogle Scholar
  26. [26]
    Jiang H and Ching W, The role of eigen-matrix translation in classification of biological datasets, Proceedings of the IEEE International Conference on Bioinformatics & Biomedicine (BIBM 2012) 2012, Philadelphia, U.S., 2012.Google Scholar

Copyright information

© Institute of Systems Science, Academy of Mathematics and Systems Science, CAS and Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Hao Jiang
    • 1
    Email author
  • Yushan Qiu
    • 2
  • Xiaoqing Cheng
    • 2
  • Waiki Ching
    • 2
  1. 1.Department of Mathematics, School of InformationRenmin University of ChinaBeijingChina
  2. 2.Advanced Modeling and Applied Computing Laboratory, Department of MathematicsThe University of Hong KongHong KongChina

Personalised recommendations