Advertisement

DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method

  • Bachir Balech
  • Alfonso Monaco
  • Michele Perniola
  • Monica Santamaria
  • Giacinto Donvito
  • Saverio Vicario
  • Giorgio Maggi
  • Graziano Pesole
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1746)

Abstract

Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called “Gene” and “Genome,” accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user’s interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/.

Key words

Genomic rearrangement Multiple sequence alignment Conserved protein domains Phylogeny Sequence assignment 

Notes

Acknowledgment

This work was supported by the Italian nodes of Lifewatch and ELIXIR Research Infrastructures.

References

  1. 1.
    Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541.  https://doi.org/10.1128/AEM.01541-09 CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Matsen FA, Kodner RB, Armbrust EV (2010) pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11:538.  https://doi.org/10.1186/1471-2105-11-538 CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Balech B, Vicario S, Donvito G, Monaco A, Notarangelo P, Pesole G (2015) MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information. Bioinformatics 31(15):2571–2573.  https://doi.org/10.1093/bioinformatics/btv141 CrossRefPubMedGoogle Scholar
  4. 4.
    Yang XF, Peng JJ, Liang HR, Yang YT, Wang YF, Wu XW, Pan JJ, Luo YW, Guo XF (2014) Gene order rearrangement of the M gene in the rabies virus leads to slower replication. Virusdisease 25(3):365–371.  https://doi.org/10.1007/s13337-014-0220-1 CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Flanagan EB, Zamparo JM, Ball LA, Rodriguez LL, Wertz GW (2001) Rearrangement of the genes of vesicular stomatitis virus eliminates clinical disease in the natural host: new strategy for vaccine development. J Virol 75(13):6107–6114.  https://doi.org/10.1128/JVI.75.13.6107-6114.2001 CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    D’Onorio de Meo P, D’Antonio M, Griggio F, Lupi R, Borsani M, Pavesi G, Castrignano T, Pesole G, Gissi C (2012) MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa. Nucleic Acids Res 40(Database issue):D1168–D1172.  https://doi.org/10.1093/nar/gkr1144 CrossRefPubMedGoogle Scholar
  7. 7.
    Gai Y, Song D, Sun H, Yang Q, Zhou K (2008) The complete mitochondrial genome of Symphylella sp. (Myriapoda: Symphyla): extensive gene order rearrangement and evidence in favor of Progoneata. Mol Phylogenet Evol 49(2):574–585.  https://doi.org/10.1016/j.ympev.2008.08.010 CrossRefPubMedGoogle Scholar
  8. 8.
    Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113.  https://doi.org/10.1186/1471-2105-5-113 CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Katoh K, Standley DM (2016) A simple method to control over-alignment in the MAFFT multiple sequence alignment program. Bioinformatics 32(13):1933–1942.  https://doi.org/10.1093/bioinformatics/btw108 CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Sievers F, Higgins DG (2014) Clustal omega, accurate alignment of very large numbers of sequences. Methods Mol Biol 1079:105–116.  https://doi.org/10.1007/978-1-62703-646-7_6 CrossRefPubMedGoogle Scholar
  11. 11.
    Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23(7):802–808.  https://doi.org/10.1093/bioinformatics/btm017 CrossRefPubMedGoogle Scholar
  12. 12.
    Loytynoja A (2014) Phylogeny-aware alignment with PRANK. Methods Mol Biol 1079:155–170.  https://doi.org/10.1007/978-1-62703-646-7_10 CrossRefPubMedGoogle Scholar
  13. 13.
    Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res 38(Web Server issue):W7–13.  https://doi.org/10.1093/nar/gkq291 CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16(6):276–277CrossRefGoogle Scholar
  15. 15.
    Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):D222–D230.  https://doi.org/10.1093/nar/gkt1223 CrossRefPubMedGoogle Scholar
  16. 16.
    Johnson AD (2010) An extended IUPAC nomenclature code for polymorphic nucleic acids. Bioinformatics 26(10):1386–1389.  https://doi.org/10.1093/bioinformatics/btq098 CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Coordinators NR (2017) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 45 (D1):D12-D17. doi: https://doi.org/10.1093/nar/gkw1071
  18. 18.
    Ratnasingham S, Hebert PD (2007) Bold: the barcode of life data system (http://www.barcodinglife.org). Mol Ecol Notes 7(3):355–364.  https://doi.org/10.1111/j.1471-8286.2007.01678.x
  19. 19.
    Pickett BE, Greer DS, Zhang Y, Stewart L, Zhou L, Sun G, Gu Z, Kumar S, Zaremba S, Larsen CN, Jen W, Klem EB, Scheuermann RH (2012) Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community. Virus 4(11):3209–3226.  https://doi.org/10.3390/v4113209 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2018

Authors and Affiliations

  • Bachir Balech
    • 1
  • Alfonso Monaco
    • 2
  • Michele Perniola
    • 2
  • Monica Santamaria
    • 1
  • Giacinto Donvito
    • 2
  • Saverio Vicario
    • 3
  • Giorgio Maggi
    • 2
    • 4
  • Graziano Pesole
    • 1
    • 5
  1. 1.Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNRBariItaly
  2. 2.National Institute of Nuclear PhysicsBariItaly
  3. 3.Institute of Atmospheric Pollution Research, CNR, c/o Department of PhysicsUniversity of Bari “A. Moro”BariItaly
  4. 4.Politecnico di BariBariItaly
  5. 5.Department of Biosciences, Biotechnology and BiopharmaceuticsUniversity of Bari “A. Moro”BariItaly

Personalised recommendations