A Numerical Representation Method for a DNA Sequence Using Gray Code Method

  • M. Raman Kumar
  • Vaegae Naveen KumarEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1057)


The exceptional speed in increase of genomic data at public databases requires advanced computational tools to perform quick gene analysis. The tools can be devised with the aid of genomic signal processing. The pivotal task in genomic signal processing is numerical mapping. In numerical mapping, the string of nucleotides is transformed into discrete numerical sequence by assigning optimum mathematical descriptor to a nucleotide. The descriptor must be compatible with the further stages of genomic application in order to achieve high efficiency. In this work, a simple numerical mapping method is proposed in which the optimum descriptor value is obtained by applying Gray code concept. The proposed method is evaluated on benchmark databases HRM195 and ASP67 for an identification of protein coding region application. The proposed method exhibits improved exon prediction efficiency in terms of performance accuracy and equal error rate when compared with similar methods.


Exon identification Genomic signal processing Gene encoding Gray code Numerical mapping Three-base periodicity 


  1. 1.
    Vaidyanathan, P.P., Yoon, B.J.: The role of signal-processing concepts in genomics and proteomics. J. Franklin Inst. 341(1–2), 111–135 (2004)CrossRefGoogle Scholar
  2. 2.
    Anastassiou, D.: Genomic signal processing. IEEE Signal Process. Mag. 18, 8–20 (2001)CrossRefGoogle Scholar
  3. 3.
    Akhtar, M., Epps, J., Ambikairajah, E.: On DNA numerical representations for period-3 based exon prediction. In: GENSIPS’07—5th IEEE International Workshop on Genomic Signal Processing and Statistics (2007)Google Scholar
  4. 4.
    Ahmad, M., Jung, L.T., Bhuiyan, A.A.: A biological inspired fuzzy adaptive window median filter (FAWMF) for enhancing DNA signal processing. Comput. Methods Programs Biomed. 149, 11–17 (2017)CrossRefGoogle Scholar
  5. 5.
    Marhon, S.A., Kremer, S.C.: Prediction of protein coding regions using a wide-range wavelet window method. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(4), 742–753 (2016)CrossRefGoogle Scholar
  6. 6.
    Rao, K.D., Swamy, M.N.S.: Analysis of genomics and proteomics using DSP techniques. IEEE Trans. Circuits Syst. I Regul. Pap. 55(1), 370–378 (2008)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Yu, N., Li, Z., Yu, Z.: Survey on encoding schemes for genomic data representation and feature learning—from signal processing to machine learning. Big Data Min. Anal. 1(3), 191–210 (2018)Google Scholar
  8. 8.
    Das, B., Turkoglu, I.: A novel numerical mapping method based on entropy for digitizing DNA sequences. Neural Comput. Appl. 29(8), 207–215 (2018)CrossRefGoogle Scholar
  9. 9.
    Mo, Z., et al.: One novel representation of DNA sequence based on the global and local position information. Sci. Rep. 8(1), 1–7 (2018)CrossRefGoogle Scholar
  10. 10.
    Singha Roy, S., Barman, S.: Polyphase filtering with variable mapping rule in protein coding region prediction. Microsyst. Technol. 23(9), 4111–4121 (2017)CrossRefGoogle Scholar
  11. 11.
    Voss, R.F.: Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68(25), 3805–3808 (1992)CrossRefGoogle Scholar
  12. 12.
    Cristea, P.D.: Genetic signal representation and analysis. In: Proc. SPIE Conference on International Symposium on Biomedical Optics (BIOS’02), vol. 4623, pp. 77–84 (2002)Google Scholar
  13. 13.
    Hebert, P.D.N., Cywinska, A., Ball, S.L., DeWaard, J.R.: Biological identifications through DNA barcodes. In: Proceedings of the Royal Society of London. Series B: Biological Sciences, vol. 270, no. 1512, pp. 313–321 (2003)CrossRefGoogle Scholar
  14. 14.
    Rosen, G.L.: Biologically-inspired gradient source localization and DNA sequence analysis. Georg. Inst. Technol., August, 2006Google Scholar
  15. 15.
    Chakravarthy, N., Spanias, A., Iasemidis, L.D., Tsakalis, K.: Autoregressive modeling and feature analysis of DNA sequences. EURASIP J. Appl. Signal Process. 1, 13–28 (2004)Google Scholar
  16. 16.
    Rosen, G.L., Moore, J.D.: Investigation of coding structure in DNA. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03), 6 April 2003Google Scholar
  17. 17.
    Cristea, P.D.: Conversion of nucleotides sequences into genomic signals. J. Cell. Mol. Med. 6(2), 279–303 (2002)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Lucal, H.M.: Arithmetic operations for digital computers using a modified reflected binary code. IRE Trans. Electron. Comput. EC-8(4), 449–458 (1959)CrossRefGoogle Scholar
  19. 19.

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of Electronics EngineeringVellore Institute of TechnologyVelloreIndia

Personalised recommendations