A New Approach to Detect Splice-Sites Based on Support Vector Machines and a Genetic Algorithm

  • Jair Cervantes
  • De-Shuang Huang
  • Xiaoou Li
  • Wen Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8259)

Abstract

This paper presents a method for classification of imbalanced splice-site classification problems, the proposed method consists of the generation of artificial instances that are incorporated to the dataset. Additionally, the method uses a genetic algorithm to introduce just instances that improve the performance. Experimental results show that the proposed algorithm obtains a better accuracy to detect splice-sites than other implementations on skewed data-sets.

Keywords

SVM Skewed datasets Classification DNA splice sites 

References

  1. 1.
    Baten, A., Chang, B., Halgamuge, S., Li, J.: Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics 7, S15 (2006)CrossRefGoogle Scholar
  2. 2.
    Yiming, C., Robert, M.M., Bin, T.: Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics 22(19), 2320–2325 (2006)CrossRefGoogle Scholar
  3. 3.
    Damaevicius, R.: Splice Site Recognition in DNA Sequences Using K-mer Frequency Based Mapping for SVM with Power Series Kernel. In: CISIS 2008, pp. 687–692 (2008)Google Scholar
  4. 4.
    Jing, X., Doina, C., Susan, B.: Exploring Alternative Splicing Features Using SVM. In: Proc. 2008 IEEE Int. Conf. on Bioinf. and Biomed, pp. 231–238 (2008)Google Scholar
  5. 5.
    Chawla, N., Bowyer, K., Hall, L.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 321–357 (2002)Google Scholar
  6. 6.
    Nguyen, H., Cooper, E., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigm 3(1), 4–21 (2011)CrossRefGoogle Scholar
  7. 7.
    Zou, S., Huang, Y., Wang, Y., Wang, J., Zhou, C.: SVM learning from imbalanced data by GA sampling for protein domain prediction. In: ICYCS 2008, pp. 982–987 (2008)Google Scholar
  8. 8.
    Fernández, A., del Jesus, M.J., Herrera, F.: Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets. International Journal of Approximate Reasoning 50(3), 561–577 (2009)CrossRefMATHGoogle Scholar
  9. 9.
    García, S., Derrac, J., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems 25(1), 3–12 (2012)CrossRefGoogle Scholar
  10. 10.
    Haibo, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  11. 11.
    Zhang, X.H.-F., Heller, K.A., Hefter, I., Leslie, C.S.: Sequence Information for the Splicing of Human Pre-mRNA Identified by SVM Classification. Genome Research 13, 2637–2650 (2003)CrossRefGoogle Scholar
  12. 12.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jair Cervantes
    • 1
  • De-Shuang Huang
    • 2
  • Xiaoou Li
    • 3
  • Wen Yu
    • 4
  1. 1.Posgrado e Investigacíon, UAEM-TexcocoMexico
  2. 2.Department of Control Science & EngineeringTongji UniversityShanghaiChina
  3. 3.Departmento de ComputaciónCINVESTAV-IPNMexico CityMexico
  4. 4.Departamento de Control AutomáticoCINVESTAV-IPNMexico CityMexico

Personalised recommendations