# Descriptors of 2D-dynamic graphs as a classification tool of DNA sequences

- 712 Downloads
- 8 Citations

## Abstract

A new tool of the classification of DNA sequences is introduced. The method is based on 2D-dynamic graphs and their descriptors. Using the descriptors created by centers of masses, moments of inertia, angles between the *x* axis and the principal axis of inertia of the 2D-dynamic graphs one can obtain classification diagrams in which similar sequences are clustered in separated areas.

## Keywords

Similarity/dissimilarity analysis of DNA sequences Moments of inertia Center of mass Descriptors Graphical representations of DNA sequences## 1 Introduction

Similarity/dissimilarity analysis of DNA sequences is an important topic in many problems of biology and medicine. Very popular alignment methods often do not give the information detailed enough. For example, they do not distinguish which bases (A, C, T, or G) have been aligned. Recently, we have proposed some corrections to these methods which may enrich the information derived from these methods [1]. Alternatively, one may use methods called *Graphical Representations*. These approaches allow for both visual and numerical comparison of the objects. Recently, a variety of graphical representations have been created. In particular, easy for visualization 2D-methods are of interest, as for example [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]. A review of graphical representation methods may be found in [1, 24, 25].

The 2D graphical representations of DNA sequences have been used for many applications including phylogenetic relationships of coronavirus gene sequences [26], long range palindromes [27], characterization of avian flu neuraminidase genes [28], study of plant germplasm identificators [29], among others. The first order descriptors had also been used to propose a scale for grading toxicity of chemicals [30] and classification of SNP genes [31]. In this paper we propose to use the improved descriptors in our 2D-dynamic representation model to construct classification diagrams.

The studies on the classification of DNA sequences are the continuation of our earlier works [1, 9, 32] where we have constructed classification diagrams based on some other methods. In these diagrams different groups of the sequences are located in different parts of the plots. We have also shown that using these diagrams, sequences which differ by only one base can be distinguished [1].

## 2 Methods

In the present work we use a graphical representation of DNA sequences called by us 2D-dynamic representation. In this method a DNA sequence is represented by a 2D-graph described in [8]. The name of the representation comes from another area of science due to form of the descriptors (numerical characteristics) representing these graphs. The sequence is represented by material-like points which are treated as rigid bodies as in the Newtonian dynamics. We have proposed several descriptors related to the 2D-dynamic graphs: centers of mass [8], moments of inertia of the graphs [8], moments of the mass-density distribution [33, 34], angles between the *x* axis and the principal axis of inertia of the graphs [35]. These descriptors were the basis for the creation of similarity measures between the sequences. We also performed a similarity/dissimilarity analysis using mass-overlaps of the 2D-dynamic graphs [35].

We have used a similar methodology, based on the distribution moments, which aims at the classification studies in other areas of science, as molecular physics [36, 37, 38, 39, 40], astrophysics [41, 42], and dynamics [43, 44].

In the present work we construct the classification diagrams using the descriptors built of the coordinates of the centers of the mass \((\mu _x, \mu _y)\), the principal moments of inertia (\(I_{11}, I_{22}\)), and of the angles between the *x* axis and the principal axis of inertia of the 2D-dynamic graphs (\(\alpha \)).

*principal moments of inertia*. They are equal to the moments of inertia associated with the rotations about the principal axes. The principal axes are defined by the Eigenvectors of the tensor of inertia.

The descriptors are related to some particular properties of the graphs and their interpretation is very intuitive and analogous as it is in dynamics. The moments of inertia are associated with the rotations about the principal axes. If the mass is concentrated close to the axis of rotation, the moment of inertia is small and it is easier to accelerate the spinning of the body. If the mass is dispersed, the moment of inertia is large and the acceleration of spinning is more difficult. Thus, these descriptors carry the information about the concentrations of masses around the axes.

The location of the center of the mass of the 2D-dynamic graph depends on the number of particular bases in the sequence. Each base is represented by a 2D unit vector in the \((x,y)\) plane: A = (\(-\)1,0), G = (1,0), C = (0,1), T = (0,\(-\)1). Since the graph is obtained using a method of walk in 2D space [8] the location of the graph depends on the relative number of particular bases. Thus, if for example, the number of A bases is larger than the number of G bases then the graph is shifted towards the negative *x* values by the appropriate amount.

In the present work we consider diagrams \(D_k^{\gamma } - D_l^{\beta }, k \ne l\). We show that using these diagrams one can classify different groups of DNA sequences. We also use a diagram \(\alpha -I_{22}\) for some detailed classification (see subsequent section).

## 3 Results and discussion

The descriptors for histone H4 coding sequences and alpha globin coding sequences of different species have been calculated using the values of centers of masses, moments of inertia, and angles \(\alpha \) between x axis and the principal axes of the 2D-dynamic graphs obtained in our earlier works [8, 35]. The descriptors have been used to the construction of the classification diagrams.

Summarizing, a variety of graphical representations of DNA sequences gives an opportunity of considering different properties of the sequences. Different aspects of similarity can be compared. It is interesting, that the ideas brought from different areas of science can be mixed and, in effect, we can reveal various aspects of similarity of the DNA sequences. In particular, Four-Component Spectral Representation [9] only visually resembles molecular spectrum. Also 2D-dynamic graphs are not real dynamical objects. However, using methods and terminology from other fields one can obtain a convenient and intuitive classification tool of the DNA sequences.

## References

- 1.D. Bielińska-Wąż, J. Math. Chem.
**49**, 2345 (2011)Google Scholar - 2.X. Guo, M. Randić, S.C. Basak, Chem. Phys. Lett.
**350**, 106 (2001)CrossRefGoogle Scholar - 3.B. Liao, M. Tan, K. Ding, Chem. Phys. Lett.
**414**, 296 (2005)CrossRefGoogle Scholar - 4.Y. Liu, X. Guo, L. Pan, S. Wang, J. Chem. Inf. Comput. Sci.
**42**, 529 (2002)CrossRefGoogle Scholar - 5.G. Huang, B. Liao, Y. Li, Z. Liu, Chem. Phys. Lett.
**462**, 129 (2008)CrossRefGoogle Scholar - 6.G. Huang, B. Liao, Y. Li, Y. Yu, Biophys. Chem.
**143**, 55 (2009)CrossRefGoogle Scholar - 7.C. Li, J. Wang, Internet Electron. J. Mol. Des.
**1**, 000 (2003)Google Scholar - 8.D. Bielińska-Wąż, T. Clark, P. Wąż, W. Nowak, A. Nandy, Chem. Phys. Lett.
**442**, 140 (2007)Google Scholar - 9.D. Bielińska-Wąż, J. Math. Chem.
**47**, 41 (2010)Google Scholar - 10.Z.-J. Zhang, Bioinformatics
**25**, 1112 (2009)CrossRefGoogle Scholar - 11.M. Randić, M. Vračko, N. Lerš, D. Plavsić, Chem. Phys. Lett.
**368**, 1 (2003)CrossRefGoogle Scholar - 12.P.A. Scholes,
*The Oxford Companion to Music*, 10th edn. (Oxford University Press, Oxford, UK, 1986)Google Scholar - 13.C. Li, J. Wang, Comb. Chem. High Throughput Screen.
**6**, 795 (2003)CrossRefGoogle Scholar - 14.J. Song, H. Tang, J. Biochem. Biophys. Methods
**63**, 228 (2005)CrossRefGoogle Scholar - 15.B. Liao, T. Wang, J. Comput. Chem.
**25**, 1364 (2004)CrossRefGoogle Scholar - 16.J. Wang, Y. Zhang, Chem. Phys. Lett.
**423**, 50 (2006)CrossRefGoogle Scholar - 17.Y. Yao, T. Wang, Chem. Phys. Lett.
**398**, 318 (2004)CrossRefGoogle Scholar - 18.M. Randić, Chem. Phys. Lett.
**456**, 84 (2008)CrossRefGoogle Scholar - 19.H.I. Jefrey, Nucleic Acids Res.
**18**, 2163 (1990)CrossRefGoogle Scholar - 20.H.I. Jefrey, J. Comput. Graph.
**16**, 25 (1992)CrossRefGoogle Scholar - 21.M. Randić, M. Vračko, J. Zupan, M. Novič, Chem. Phys. Lett.
**373**, 558 (2003)CrossRefGoogle Scholar - 22.M. Randić, Chem. Phys. Lett.
**386**, 468 (2004)CrossRefGoogle Scholar - 23.M. Randić, N. Lerš, D. Plavsić, S.C. Basak, A.T. Balaban, Chem. Phys. Lett.
**407**, 205 (2005)CrossRefGoogle Scholar - 24.A. Nandy, M. Harle, S.C. Basak, ARKIVOC
**ix**, 211 (2006)Google Scholar - 25.H. González-Diaz, L. Santana, E. Uriarte, Curr. Top. Med. Chem.
**7**, 1025 (2007)CrossRefGoogle Scholar - 26.B. Liao, Y. Liu, R. Li, W. Zhu, Chem. Phys. Lett.
**421**, 313 (2006)CrossRefGoogle Scholar - 27.S. Larionov, A. Loskutov, E. Ryadchenko, Chaos
**18**, 013105 (2008)CrossRefGoogle Scholar - 28.A. Nandy, S.C. Basak, B.D. Gute, J. Chem. Inf. Model.
**47**, 945 (2007)CrossRefGoogle Scholar - 29.I. Wiesner, D. Wiesnerová, Biologia Plantarum
**54**, 353 (2010)CrossRefGoogle Scholar - 30.A. Nandy, S.C. Basak, J. Chem. Inf. Comput. Sci.
**40**, 915 (2000)CrossRefGoogle Scholar - 31.A. Nandy, P. Nandy, S.C. Basak, Internet Electron. J. Mol. Des.
**1**, 367 (2002)Google Scholar - 32.D. Bielińska-Wąż, S. Subramaniam, J. Theor. Biol.
**266**, 667 (2010)Google Scholar - 33.D. Bielińska-Wąż, W. Nowak, P. Wąż, A. Nandy, T. Clark, Chem. Phys. Lett.
**443**, 408 (2007)Google Scholar - 34.D. Bielińska-Wąż, P. Wąż, W. Nowak, A. Nandy, S.C. Basak, in
*AIP Conference Proceedings 963*, ed. by T.E. Simos, G. Maroulis (New York, 2007), pp. 28–30Google Scholar - 35.D. Bielińska-Wąż, P. Wąż, T. Clark, Chem. Phys. Lett.
**445**, 68 (2007)Google Scholar - 36.D. Bielińska-Wąż, P. Wąż, S.C. Basak, Eur. Phys. J. B
**50**, 333 (2006)Google Scholar - 37.D. Bielińska-Wąż, P. Wąż, S.C. Basak, J. Math. Chem.
**42**, 1003 (2007)Google Scholar - 38.D. Bielińska-Wąż, P. Wąż, J. Math. Chem.
**43**, 1287 (2008)Google Scholar - 39.D. Bielińska-Wąż, W. Nowak, Ł. Pepłowski, P. Wąż, S.C. Basak, R. Natarajan, J. Math. Chem.
**43**, 1560 (2008)Google Scholar - 40.D. Bielińska-Wąż, P. Wąż, T. Clark, T. Puzyn, Ł. Pepłowski, W. Nowak, J. Math. Chem.
**51**, 857 (2013)Google Scholar - 41.P. Wąż, D. Bielińska-Wąż, A. Pleskacz, A. Strobel, Acta Phys. Pol. B.
**39**, 1993 (2008)Google Scholar - 42.P. Wąż, D. Bielińska-Wąż, A. Strobel, A. Pleskacz, Acta Astron
**60**, 283 (2010)Google Scholar - 43.P. Wąż, D. Bielińska-Wąż, Acta Phys. Pol. A
**116**, 987 (2009)Google Scholar - 44.P. Wąż, D. Bielińska-Wąż, Acta Phys. Pol. A
**123**, 647 (2013)Google Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.