Multiple Sequence Alignment Based Upon Statistical Approach of Curve Fitting

  • Vineet Jha
  • Mohit Mazumder
  • Hrishikesh Bhuyan
  • Ashwani Jha
  • Abhinav Nagar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5909)

Abstract

The main objective of our work is to align multiple sequences together on the basis of statistical approach in lieu of heuristics approach. Here we are proposing a novel idea for aligning multiple sequences in which we will be considering the DNA sequences as lines not as strings where each character represents a point in the line. DNA sequences are aligned in such a way that maximum overlap can occur between them, so that we get maximum matching of characters which will be treated as our seeds of the alignment. The proposed algorithm will first find the seeds in the aligning sequences and then it will grow the alignment on the basis of statistical approach of curve fitting using standard deviation.

Keywords

Multiple Sequence Alignment Sequence Alignment Word Method Statistically Optimized Algorithm Comparative Genome Analysis Cross Referencing Evolutionary Relationship 

References

  1. 1.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)CrossRefGoogle Scholar
  2. 2.
    Morgenstern, B.: DIALIGN: Multiple DNA and Protein Sequence Alignment at BiBiServ. Nucleic Acids Research 32, W33–W36 (2004)CrossRefGoogle Scholar
  3. 3.
    Notredame, C., Higgins, D., Heringa, J.: T-Coffee: a novel algorithm for multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)CrossRefGoogle Scholar
  4. 4.
    Notredame, C.: Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3, 131–144 (2002)CrossRefGoogle Scholar
  5. 5.
    Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464 (2002)CrossRefGoogle Scholar
  6. 6.
    Edgar, R.: MUSCLE: Multiple sequence alignment with high score accuracy and high throughput. Nuc. Acids Res. 32, 1792–1797 (2004)CrossRefGoogle Scholar
  7. 7.
    Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340 (2005)CrossRefGoogle Scholar
  8. 8.
    Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002)CrossRefGoogle Scholar
  9. 9.
    Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)CrossRefGoogle Scholar
  10. 10.
    Gotoh, O.: Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838 (1996)CrossRefGoogle Scholar
  11. 11.
    Gotoh, O.: A weighting system and algorithm for aligning many phylogenetically related sequences. Comput. Appl. Biosci. 11, 543–551 (1995)Google Scholar
  12. 12.
    Van Walle, I., Lasters, I., Wyns, L.: Align-m-a new algorithm for multiple alignment of highly divergent sequences. Bioinformatics 20, 1428–1435 (2004)CrossRefGoogle Scholar
  13. 13.
    Morgenstern, B.: DIALIGN: 2 improvement of the segment-tosegment approach to multiple sequence alignment. Bioinformatics 15, 211–218 (1999)CrossRefGoogle Scholar
  14. 14.
    Grasso, C., Lee, C.: Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20, 1546–1556 (2004)CrossRefGoogle Scholar
  15. 15.
    Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464 (2002)CrossRefGoogle Scholar
  16. 16.
    Edgar, R.C., Sjölander, K.: SATCHMO: sequence alignment and tree construction using hidden Markov models. Bioinformatics 19, 1404–1411 (2003)CrossRefGoogle Scholar
  17. 17.
    Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol. 5(3), pp. 345–352 (1978)Google Scholar
  18. 18.
    Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89(biochemistry), 10915–10919 (1992)CrossRefGoogle Scholar
  19. 19.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)Google Scholar
  20. 20.
    Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: Genome Res. 12(6), 996–1006 (June 2002)Google Scholar
  21. 21.
    University of California santa Cruz, http://genome.ucsc.edu/
  22. 22.
    Rice, P., Longden, I., Bleasby, A.: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)CrossRefGoogle Scholar
  23. 23.
    MacLaughlin, D.S.: MATCHER: a program to create and analyze matched sets. Comput. Programs Biomed. 14(2), 191–195 (1982)CrossRefGoogle Scholar
  24. 24.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  25. 25.
    Smith, T.F., Waterman, M.S., Fitch, W.M.: Comparative biosequence metrics. J. Mol. Evol. 18(1), 38–46 (1981)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Vineet Jha
    • 1
  • Mohit Mazumder
    • 1
  • Hrishikesh Bhuyan
    • 1
  • Ashwani Jha
    • 1
  • Abhinav Nagar
    • 1
  1. 1.InSilico BiosolutionNorth GuwahatiIndia

Personalised recommendations