Skip to main content

Alignment and Mapping

  • Chapter
  • First Online:
Phylogenomics
  • 93k Accesses

Abstract

Alignments represent hypotheses of positional homologies between nucleotides or amino acids of sequences. Global alignments infer positional homology across all sites of aligned sequences, whereas for local alignments positional homology is only optimized for substrings of sequences. Pairwise alignments of sequences can contain matches, mismatches and gaps, which can be used to define scoring functions to compare the quality of any two pairwise alignments. A solution for finding the optimal global pairwise alignment is the Needleman and Wunsch algorithm, which consists of matrix initialization, matrix filling and traceback. Local alignments are used to find similarities (and putative homologies) between two sequences, as used in database searches. Optimal local alignments are recovered by the Smith and Waterman algorithm. Faster database searches can be conducted using BLAST, a local alignment tool based on a seed-and-extend approach. Heuristic approaches are used to conduct multiple alignments. Most popular are progressive alignments, which use a series of pairwise alignment operations and a phylogenetic guide tree to construct the multiple sequence alignment. Masking and exclusion of unreliably aligned positions of sequence alignments can be used to improve the signal-to-noise ratio of the data. Noisy alignment positions can be inferred by identifying conserved blocks, using model-based approaches, or by investigating the consistency of alignments with respect to its used parameters. A specific alignment problem is the mapping of sequence reads to reference sequences. Most mapping algorithms are either based on a seed-and-extend approach or methods related to the Burrows-Wheeler transform, which is more memory efficient and less time-consuming. Specific mapping approaches exist to recover splice junctions or to recover methylation patterns. Finally, whole genomes can be aligned using methods broadly classified into hierarchical and local approaches, to recover syntenic regions across genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Altschul S, Gish W, Miller W, Myers E, Lipman D (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2013) GenBank. Nucleic Acids Res 41:D36–D42

    Article  CAS  PubMed  Google Scholar 

  • Bentley SD, Parkhill J (2004) Comparative genomic structure of prokaryotes. Annu Rev Genet 38:771–791

    Article  CAS  PubMed  Google Scholar 

  • Bradley RK, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L (2009) Fast statistical alignment. PLoS Comput Biol 5:e1000392

    Article  PubMed  PubMed Central  Google Scholar 

  • Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525–527

    Article  CAS  PubMed  Google Scholar 

  • Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. Digital Equipment Corporation Technical Report 124, Palo Alto

    Google Scholar 

  • Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552

    Article  CAS  PubMed  Google Scholar 

  • Chan SC, Wong AKC, Chiu DKY (1992) A survey of multiple sequence comparison methods. Bull Math Biol 54:563–598

    Article  CAS  PubMed  Google Scholar 

  • Chen P-Y, Cokus SJ, Pellegrini M (2010) BS seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics 11:203

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM (2009) The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141–D145

    Article  CAS  PubMed  Google Scholar 

  • Cooper L, Cooper MW (1981) Introduction to dynamic programming. Pergamon Press, New York

    Google Scholar 

  • Cristianini N, Hahn MW (2007) Introduction to computational genomics. Cambridge University Press, Cambridge, UK, A case studies approach

    Google Scholar 

  • Darling AE, Mau B, Perna NT (2010) progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147

    Article  PubMed  PubMed Central  Google Scholar 

  • Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL (1999) Alignment of whole genomes. Nucleic Acids Res 27:2369–2376

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dewey CN (2012) Whole-genome alignment. In: Anisimova M (ed) Evolutionary genomics: statistical and computational methods, vol 1. Humana Press, Totowa, pp 237–257

    Google Scholar 

  • Dewey CN, Pachter L (2006) Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Hum Mol Genet 15:R51–R56

    Article  CAS  PubMed  Google Scholar 

  • Eddy SR (2004) Where did the BLOSUM62 alignment score matrix come from? Nat Biotechnol 22:1035–1036

    Article  CAS  PubMed  Google Scholar 

  • Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:1–19

    Article  Google Scholar 

  • Edgar RC, Batzoglou S (2006) Multiple sequence alignment. Curr Opin Struct Biol 16:368–373

    Article  CAS  PubMed  Google Scholar 

  • Engström PG, Ho Sui SJ, Drivenes Ø, Becker TS, Lenhard B (2007) Genomic regulatory blocks underlie extensive microsynteny conservation in insects. Genome Res 17:1898–1908

    Article  PubMed  PubMed Central  Google Scholar 

  • Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, The RC, Ratsch G, Goldman N, Hubbard TJ, Harrow J, Guigo R, Bertone P (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 10:1185–1191

    Article  PubMed  PubMed Central  Google Scholar 

  • Ewing AD (2015) Transposable element detection from whole genome sequence data. Mob DNA 6:24

    Article  PubMed  PubMed Central  Google Scholar 

  • Feng D-F, Doolittle RF (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 25:351–360

    Article  CAS  PubMed  Google Scholar 

  • Ferragina P, Manzini G (2000) Opportunistic data structures with applications. In: 41st annual symposium on Foundations of Computer Science, Washington, DC

    Google Scholar 

  • Ferragina P, Manzini G (2001) An experimental study of an opportunistic index. Paper presented at the proceedings of the twelfth annual ACM-SIAM symposium on Discrete Algorithms, Washington, DC

    Google Scholar 

  • Ferrier DEK, Holland PWH (2001) Ancient origin of the Hox gene cluster. Nat Rev Genet 2:33–38

    Article  CAS  PubMed  Google Scholar 

  • Feuk L, Carson AR, Scherer SW (2006) Structural variation in the human genome. Nat Rev Genet 7:85–97

    Article  CAS  PubMed  Google Scholar 

  • Fonseca NA, Rung J, Brazma A, Marioni JC (2012) Tools for mapping high-throughput sequencing data. Bioinformatics 28:3169–3177

    Article  CAS  PubMed  Google Scholar 

  • Gardner PP, Wilm A, Washietl S (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res 33:2433–2439

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Giribet G, Wheeler WC (1999) On Gaps. Mol Phylogenet Evol 13:132–143

    Article  CAS  PubMed  Google Scholar 

  • Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89:10915–10919

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hillis DM, Huelsenbeck JP (1992) Signal, noise, and reliability in molecular phylogenetic analyses. J Hered 83:189–195

    Article  CAS  PubMed  Google Scholar 

  • Hoffmann S, Otto C, Doose G, Tanzer A, Langenberger D, Christ S, Kunz M, Holdt L, Teupser D, Hackermuller J, Stadler P (2014) A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection. Genome Biol 15:R34

    Article  PubMed  PubMed Central  Google Scholar 

  • Homer N, Merriman B, Nelson SF (2009) BFAST: an alignment tool for large scale genome resequencing. PLoS One 4:e7767

    Article  PubMed  PubMed Central  Google Scholar 

  • Hurst LD, Pal C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet 5:299–310

    Article  CAS  PubMed  Google Scholar 

  • Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engström PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, Ghislain J, Pezeron G, Mourrain P, Ellingsen S, Oates AC, Thisse C, Thisse B, Foucher I, Adolf B, Geling A, Lenhard B, Becker TS (2007) Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res 17:545–555

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kück P, Meusemann K, Dambach J, Thormann B, von Reumont BM, Wägele JW, Misof B (2010) Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Front Zool 7:10

    Article  PubMed  PubMed Central  Google Scholar 

  • Landan G, Graur D (2007) Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol 24:1380–1383

    Article  CAS  PubMed  Google Scholar 

  • Langmead B, Salzberg SL (2012) Fast gapped-read alignment with bowtie 2. Nat Methods 9:357–359

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Langmead B, Trapnell C, Pop M, Salzberg S (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25

    Article  PubMed  PubMed Central  Google Scholar 

  • Lassmann T, Sonnhammer ELL (2005) Automatic assessment of alignment quality. Nucleic Acids Res 33:7120–7128

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Levinson G, Gutman GA (1987) Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203–221

    CAS  PubMed  Google Scholar 

  • Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li H, Homer N (2010) A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 11:473–483

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li H, Ruan J, Durbin R (2008a) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li R, Li Y, Kristiansen K, Wang J (2008b) SOAP: short oligonucleotide alignment program. Bioinformatics 24:713–714

    Article  CAS  PubMed  Google Scholar 

  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Subgroup GPDP (2009a) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079

    Article  PubMed  PubMed Central  Google Scholar 

  • Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J (2009b) SNP detection for massively parallel whole-genome resequencing. Genome Res 19:1124–1132

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lipman D, Pearson W (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441

    Article  CAS  PubMed  Google Scholar 

  • Löytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632–1635

    Article  PubMed  Google Scholar 

  • Löytynoja A, Milinkovitch MC (2001) SOAP, cleaning multiple alignments from unstable blocks. Bioinformatics 17:573–574

    Article  PubMed  Google Scholar 

  • McGuffin L (2009) Insertion and deletion events, their molecular mechanisms, and their impact on sequence alignments. In: Rosenberg M (ed) Sequence alignment: methods, models, concepts and strategies. Universtiy of California Press, Berkeley, pp 23–38

    Google Scholar 

  • Misof B, Misof K (2009) A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. Syst Biol 58:21–34

    Article  CAS  PubMed  Google Scholar 

  • Morgenstern B (2009) Local versus global alignments. In: Rosenberg M (ed) Sequence alignment: methods, models, concepts and strategies. Universtiy of California Press, Berkeley, pp 39–53

    Google Scholar 

  • Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of apicomplexa. Mol Biol Evol 14:428–441

    Article  CAS  PubMed  Google Scholar 

  • Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5:621–628

    Article  CAS  PubMed  Google Scholar 

  • Mount SM (1982) A catalogue of splice junction sequences. Nucleic Acids Res 10:459–472

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nadeau JH, Taylor BA (1984) Lengths of chromosomal segments conserved since divergence of man and mouse. Proc Natl Acad Sci U S A 81:814–818

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320:1344–1349

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  CAS  PubMed  Google Scholar 

  • Notredame C, Higgins DG, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment1. J Mol Biol 302:205–217

    Article  CAS  PubMed  Google Scholar 

  • Ohno S (1973) Ancient linkage groups and frozen accidents. Nature 244:259–262

    Article  Google Scholar 

  • Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C (2016) Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference. bioRxiv. doi.org/10.1101/021592.

  • Pei J, Grishin NV (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 17:700–712

    Article  CAS  PubMed  Google Scholar 

  • Penn O, Privman E, Ashkenazy H, Landan G, Graur D, Pupko T (2010a) GUIDANCE: a web server for assessing alignment confidence scores. Nucleic Acids Res 38:W23–W28

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Penn O, Privman E, Landan G, Graur D, Pupko T (2010b) An alignment confidence score capturing robustness to guide tree uncertainty. Mol Biol Evol 27:1759–1767

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pevsner J (2015) Bioinformatics and functional genomics, 3rd edn. Wiley-Blackwell, Hoboken

    Google Scholar 

  • Phillips A, Janies D, Wheeler W (2000) Multiple sequence alignment in phylogenetic analysis. Mol Phylogenet Evol 16:317–330

    Article  CAS  PubMed  Google Scholar 

  • Privman E, Penn O, Pupko T (2012) Improving the performance of positive selection inference by filtering unreliable alignment regions. Mol Biol Evol 29:1–5

    Article  CAS  PubMed  Google Scholar 

  • Rice P, Longden I, Bleasby A (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet 16:276–277

    Article  CAS  PubMed  Google Scholar 

  • Rosenberg M (2009) Sequence alignment: concepts and history. In: Rosenberg M (ed) Sequence alignment: methods, models, concepts and strategies. Universtiy of California Press, Berkeley, pp 1–22

    Google Scholar 

  • Sela I, Ashkenazy H, Katoh K, Pupko T (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43:W7–W14

    Article  PubMed  PubMed Central  Google Scholar 

  • Simmons MP, Ochoterena H (2000) Gaps as characters in sequence-based phylogenetic analyses. Syst Biol 49:369–381

    Article  CAS  PubMed  Google Scholar 

  • Simmons MP, Müller KF, Norton AP (2010) Alignment of, and phylogenetic inference from, random sequences: the susceptibility of alternative alignment methods to creating artifactual resolution and support. Mol Phylogenet Evol 57:1004–1016

    Article  CAS  PubMed  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  CAS  PubMed  Google Scholar 

  • Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577

    Article  CAS  PubMed  Google Scholar 

  • Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins: Structure, Function, and Bioinformatics 61:127–136

    Article  CAS  Google Scholar 

  • Thompson JD, Linard B, Lecompte O, Poch O (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6:e18093

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Thorne JL, Kishino H (1992) Freeing phylogenies from artifacts of alignment. Mol Biol Evol 9:1148–1162

    CAS  PubMed  Google Scholar 

  • Thornton JW, DeSalle R (2000) Gene family evolution and homology: genomics meets phylogenetics. Annu Rev Genomics Hum Genet 1:41–73

    Article  CAS  PubMed  Google Scholar 

  • Trapnell C, Salzberg SL (2009) How to map billions of short reads onto genomes. Nat Biotechnol 27:455–457

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25:1105–1111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wong KMA, Suchard MA, Huelsenbeck JP (2008) Alignment uncertainty and genomic analysis. Science 319(5862):473–476

    Article  CAS  PubMed  Google Scholar 

  • Wu M, Chatterji S, Eisen JA (2012) Accounting for alignment uncertainty in phylogenomics. PLoS One 7:e30288

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Bleidorn, C. (2017). Alignment and Mapping. In: Phylogenomics. Springer, Cham. https://doi.org/10.1007/978-3-319-54064-1_6

Download citation

Publish with us

Policies and ethics