Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 10, pp 13047–13066 | Cite as

A combined support vector machine-FCGS classification based on the wavelet transform for Helitrons recognition in C.elegans

  • Rabeb TouatiEmail author
  • Imen Messaoudi
  • Afef Elloumi Oueslati
  • Zied Lachiri
Article

Abstract

The Helitrons, an important sub-class of the transposable elements (TEs) class II, have been revealed in diverse eukaryotic genomes. They are mobile elements with great impact on genomic evolution. Till today, there is no systematic classification model of helitrons; that’s why we thought of creating an efficient automatic model to identify these sequences. This paper focuses on the discrimination between helitrons and non-helitrons using the Support Vector Machine (SVM). In this study, we use all the SVM kernels and the higher accuracy rates are obtained by reaching the optimal kernels-parameters (d, c and σ). Further, we introduce two methods to represent the genomic sequences in the form of features to be considered later for the classification task: (i) the temporal and the spectral features extracted from the Frequency Chaos Game Signals order 2 (FCGS2) (ii) the features extracted from the Continuous Wavelet Transform (CWT) applied to the FCGS2 signals. The dataset we used regards two types DNA classes in C.elegans: the helitrons and the repetitive DNA sequences that contain microsatellites and do not form helitrons. The classification results prove that the wavelet energy feature is more effective than the FCGS2 features in the helitron’s recognition system. The performance of our system achieves a high recognition rate (Globally accuracy rate) reaching the value of 92.27%.

Keywords

Helitrons Repetitive DNA Microsatellites C.Elegans FCGS2coding SVM Features Continuous wavelet transform Kernel tricks 

Notes

References

  1. 1.
    Amin HU, Malik AS, Ahmad RF (2015) Feature extraction and classification for EEG signals using wavelet transform and machine learning techniques. Australas Phys Eng Sci Med 38:139–149.  https://doi.org/10.1007/s1324 CrossRefGoogle Scholar
  2. 2.
    Barbaglia AM, Klusman KM, Higgins J, Shaw JR, Hannah LC, Lal SK (2012) Gene capture by Helitron transposons reshuffles the transcriptome of maize. Genetics 190:965–975.  https://doi.org/10.1534/genetics.111.136176 CrossRefGoogle Scholar
  3. 3.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 2:273–297zbMATHGoogle Scholar
  4. 4.
    Dias GB, Heringer P, Kuhn GC (2016) Helitrons in Drosophila: chromatin modulation and tandem insertions. Mob Genet Elements 62:e1154638CrossRefGoogle Scholar
  5. 5.
    Du C, Caronna J, He L, Dooner HK (2008) Computational prediction and molecular confirmation of Helitron transposons in the maize genome. BMC Genomics 9:51.  https://doi.org/10.1186/1471-2164-9-51 CrossRefGoogle Scholar
  6. 6.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797.  https://doi.org/10.1093/nar/gkh340 CrossRefGoogle Scholar
  7. 7.
    Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914.  https://doi.org/10.1093/bioinformatics/16.10.906 CrossRefGoogle Scholar
  8. 8.
    Ghimire D, Jeong S, Lee J, Park SH (2017) Facial expression recognition based on local region specific features and support vector machines. MTAP 76:7803–7821.  https://doi.org/10.1007/s11042-016-3418-y Google Scholar
  9. 9.
    Grossmann A, Morlet J (1984) Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM J Math Anal 15:723–736.  https://doi.org/10.1137/0515056 MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Gutschoven B, Verlinde P (2000) Multi-modal identity verification using support vector machines (SVM). In: Information Fusion. FUSION 2000. Proceedings of the Third International Conference on IEEE, Vol. 2, pp. THB3–3, July. 2000Google Scholar
  11. 11.
    Hood ME (2005) Repetitive DNA in the automictic fungus Microbotryumviolaceum. Genetica 124:1–10.  https://doi.org/10.1007/s10709-004-6615-y CrossRefGoogle Scholar
  12. 12.
    Huang Y, Yang YB, Gao XC et al (2017) Genome-wide identification and characterization of microRNAs and target prediction by computational approaches in common carp. Gene Reports 8:30–36CrossRefGoogle Scholar
  13. 13.
    Jahankhani P, Kodogiannis V, Revett K (2006) EEG signal classification using wavelet feature extraction and neural networks. In: Modern Computing IEEE John Vincent Atanasoff 2006 International Symposium 120–124Google Scholar
  14. 14.
    Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenetic Genome Res 110:462–467.  https://doi.org/10.1159/000084979 CrossRefGoogle Scholar
  15. 15.
    Kapitonov VV, Jurka J (2001) Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci 98:8714–8719.  https://doi.org/10.1073/pnas.151269298 CrossRefGoogle Scholar
  16. 16.
    Kapitonov VV, Jurka J (2007) Helitrons on a roll: eukaryotic rolling-circle transposons. Trends Genet 23:521–529.  https://doi.org/10.1016/j.tig.2007.08.004 CrossRefGoogle Scholar
  17. 17.
    Kaur B, Singh D, Roy PP (2017) A novel framework of eeg-based user identification by analyzing music-listening behavior. MTAP 76(24):25581–25602.  https://doi.org/10.1007/s11042-016-4232-2 Google Scholar
  18. 18.
    Kumar M, Gromiha MM, Raghava GP (2011) SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J Mol Recognit 24:303–313.  https://doi.org/10.1002/jmr.1061 CrossRefGoogle Scholar
  19. 19.
    Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. WileyGoogle Scholar
  20. 20.
    Li L, Luo Q, Xiao W et al (2017) A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features. J Bioinforma Comput Biol 15:01: 1650025.  https://doi.org/10.1142/S0219720016500256 Google Scholar
  21. 21.
    Lin HT, Lin CJ (2003) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Comput 3:1–32Google Scholar
  22. 22.
    Mateos A, Dopazo J, Jansen R, Tu Y, Gerstein M, Stolovitzky G (2002) Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res 12:1703–1715 http://www.genome.org/cgi/doi/10.1101/gr.192502 CrossRefGoogle Scholar
  23. 23.
    Mena-Chalco J, Carrer H, Zana Y, Cesar RM (2008) Identification of protein coding regions using the modified Gabor-wavelet transform. IEEE/ACM TCBB 5:198–207Google Scholar
  24. 24.
    Merry RJE, Steinbuch M (2005) Wavelet theory and applications. Literature Study, Eindhoven University of Technology, Department of Mechanical Engineering, Control Systems Technology GroupGoogle Scholar
  25. 25.
    Messaoudi I, Oueslati AE, Lachiri Z (2014) Building specific signals from frequency chaos game and revealing periodicities using a smoothed Fourier analysis. IEEE/ACM Trans Comput Biol Bioinform 11:863–877.  https://doi.org/10.1109/TCBB.2014.2315991 CrossRefGoogle Scholar
  26. 26.
    Messaoudi I, Oueslati AE, Lachiri Z (2015) 2D DNA representations generated using a new coding and the time-frequency analysis. JMIHI 5:1035–1044.  https://doi.org/10.1166/jmihi.2015.1498 Google Scholar
  27. 27.
    NAJMI AH, SADOWSKY J (1997) The continuous wavelet transform and variable resolution time-frequency analysis. Johns Hopkins APL Tech Dig 18:134–140Google Scholar
  28. 28.
    Nigatu D, Sobetzko P, Yousef M et al (2017) Sequence-based information-theoretic features for gene essentiality prediction. BMC Bioinformatics 18:1: 473.  https://doi.org/10.1186/s12859-017-1884-5 CrossRefGoogle Scholar
  29. 29.
    Orhan U, Hekim M, Ozer M (2011) EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Syst Appl 38:13475–13481.  https://doi.org/10.1016/j.eswa.2011.04.149 CrossRefGoogle Scholar
  30. 30.
    Oueslati AE, Ellouze N, Lachiri Z (2007) 3D spectrum analysis of DNA sequence: application to Caenorhabditis elegans genome. In: Bioinformatics and Bioengineering (BIBE 2007) 864–871Google Scholar
  31. 31.
    Oueslati AE, Messaoudi I, Lachiri Z, Ellouze N (2015) A new way to visualize DNA’s base succession: the Caenorhabditis elegans chromosome landscapes. Med Biol Eng Comput 53:1165–1176.  https://doi.org/10.1007/s11517-015-1304-9 CrossRefGoogle Scholar
  32. 32.
    Öz E, Kaya H (2013) Support vector machines for quality control of DNA sequencing. JIAP 2013:85.  https://doi.org/10.1186/1029-242X-2013-85 zbMATHGoogle Scholar
  33. 33.
    Poulter RTM, Goodwin TJD (2005) DIRS-1 and the other tyrosine recombinase retrotransposons. Cytogenet Genome Res 110:575–588.  https://doi.org/10.1159/000084991 CrossRefGoogle Scholar
  34. 34.
    Poulter RT, Goodwin TJ, Butler MI (2003) Vertebrate helentrons and othernovel Helitrons. Gene 313:201–212.  https://doi.org/10.1016/S0378-1119(03)00679-6 CrossRefGoogle Scholar
  35. 35.
    Pritham EJ, Feschotte C (2007) Massive amplification of rolling-circle transposons in the lineage of the bat Myotislucifugus. Proc Natl Acad Sci 104:1895–1900.  https://doi.org/10.1073/pnas.0609601104 CrossRefGoogle Scholar
  36. 36.
    Schiilkopf B (2001) The kernel trick for distances. Adv Neural Inf Proces Syst 13:301–307Google Scholar
  37. 37.
    Schlötterer C (2000) Evolutionary dynamics of microsatellite DNA. Chromosoma 109:365–371.  https://doi.org/10.1007/s004120000089 CrossRefGoogle Scholar
  38. 38.
    Shawe-Taylor J et al (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory 44:1926–1940.  https://doi.org/10.1109/18.705570 MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Song J, Li F, Takemoto K et all (2018) PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 443:125–137  https://doi.org/10.1016/j.jtbi.2018.01.023
  40. 40.
    Suo H, Li M, Lu P, Yan Y (2008) Using SVM as back-end classifier for language identification. EURASIP ASMP  2008:674859.  https://doi.org/10.1155/2008/674859 Google Scholar
  41. 41.
    Sweredoski M, DeRose-Wilson L, Gaut BSA (2008) Comparative computational analysis of nonautonomous helitron elements between maize and rice. BMC Genomics 9:467.  https://doi.org/10.1186/1471-2164-9-467 CrossRefGoogle Scholar
  42. 42.
    Takezaki N, Nei M (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144:389–399Google Scholar
  43. 43.
    Tempel S (2007) Dynamique des hélitronsdans le génomed’arabidopsisthaliana: développement de nouvellesstratégiesd’analyse des élémentstransposables. PHD Thesis, IRISA, Université de Rennes I. https://tel.archives-ouvertes.fr/tel-00185256
  44. 44.
    The NCBI GenBank database. [Online]. Available: http://www.ncbi.nlm.nih.gov/Genbank/. Accessed 15 Sept 2005
  45. 45.
    Thomas J, Pritham EJ (2015) Helitrons, the eukaryotic rolling-circle transposable elements. Mobile DNAIII ASMscience  3:893–926.  https://doi.org/10.1128/microbiolspec.MDNA3-0049-2014 Google Scholar
  46. 46.
    Touati R, Messaoudi I, Oueslati AE, Lachiri Z (2018) Helitron’s periodicities identification in C. Elegans based on the smoothed spectral analysis and the frequency Chaos game signal coding. Int J Adv Comput Sci Appl 9(4).  https://doi.org/10.14569/IJACSA.2018.090438
  47. 47.
    Touati R, Messaoudi I, Oueslati AE, Lachiri, Z (2018) Classification of Helitron’s Types in the C. elegans Genome based on Features Extracted from Wavelet Transform and SVM Methods. Bioinformatics 127–134.  https://doi.org/10.5220/0006631001270134
  48. 48.
    Valli I, Marquand AF, Mechelli A et al (2016) Identifying individuals at high risk of psychosis: predictive utility of support vector machine using structural and functional Mri data. Front Psychiatry 7:52.  https://doi.org/10.3389/fpsyt.2016.00052 CrossRefGoogle Scholar
  49. 49.
    Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business MediaGoogle Scholar
  50. 50.
    Vapnik VN, Vapnik V (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar
  51. 51.
    Wicker T, Sabot F, Hua-Van A et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982.  https://doi.org/10.1038/nrg2165 CrossRefGoogle Scholar
  52. 52.
    Xie D, Li A, Wang M, Fan Z, Feng H (2005) LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res 33:W105–W110.  https://doi.org/10.1093/nar/gki359 CrossRefGoogle Scholar
  53. 53.
    Xiong W, He L, Lai J, Dooner HK, Du C (2014) HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc Natl Acad Sci 111:10263–10268.  https://doi.org/10.1073/pnas.1410068111 CrossRefGoogle Scholar
  54. 54.
    Yang L, Bennetzen JL (2009) Structure-based discovery and description of plant and animal Helitrons. Proc Natl Acad Sci 106:12832–12837.  https://doi.org/10.1073/pnas.0905563106 CrossRefGoogle Scholar
  55. 55.
    Zhou Q et al (2006) Helitron transposons on the sex chromosomes of the Platyfish Xiphophorus maculatus and their evolution in animal genomes. Zebrafish 3:39–52.  https://doi.org/10.1089/zeb.2006.3.39 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.SITI Laboratory, National School of Engineers of Tunis (ENIT)University Tunis El ManarTunisTunisia
  2. 2.Higher Institute of Information Technologies and Communications, Industrial Computing DepartmentUniversity of CarthageCarthageTunisia
  3. 3.National School of Engineers of Cartage (ENICarthage), Electrical Engineering DepartmentUniversity of CarthageCarthageTunisia

Personalised recommendations