1 Introduction

The silkworm, Bombyx mori, known to mankind since more than 4700 years is a highly domesticated and well-characterized genetic tool (Tazima 1962) next to the fruit fly, Drosophila. After the advent of recombinant DNA technology in breeding, the silkworm has become the lepidopteran molecular model system (Goldsmith 1995). In Japan, the information about improved races are available (Yokoyama 1973) since as early as 1680. Till and even after the Mendelian era, the breeding was often referred to as an art because a new breed developed based on the experience and ingenuity of the breeder in correlating phenotypic expression with the genetic makeup of the individual. Later, systematic breeding started using the principles of traditional genetics. Though the knowledge of genes governing phenotypic expressions was prevailing, there were no tools to have access to the genes. Characterization of the silkworm genome was fast developed because of its importance as lepidopteran model for breeding and genetic studies, for isolating valuable genes and promoters, and for comparative genomics (Goldsmith et al. 2005; Goldsmith 2006). This included molecular linkage maps, BAC libraries, large EST databases, and whole-genome shotgun sequences (Goldsmith 2006). In contrast to the traditional genetics, where individual organisms are used for crossing and genetic studies, modern-day genomic studies involve test tubes, micropipettes, gels, and liquid media. Along with the liquid genetics, the computational biology and bioinformatics provide insight into the genetic molecules and their expression.

The comparative increase in silk yield over 90 years of conventional breeding in Japan (Kuribayashi 1992) (Table 1.1) indicates that the cocoon production/50 dfls has increased by 386.5% and raw silk % by 291.3%.

Table 1.1 Increase in quantitative productivity over 90 years in Japan through conventional breeding

Thereafter, the increase in quantitative traits has become plateau. Employing biotechnological tools becomes a necessity to achieve marked quantum jump beyond this level. The recent DNA molecular architecture of silkworm, Bombyx mori (Gage 1974; Yasukochi 1998; Wu et al. 1999; Wang et al. 2005), is indicated in Table 1.2.

Table 1.2 DNA molecular architecture of Bombyx mori (Wang et al. 2005; Gage 1974; Yasukochi 1998; Wu et al. 1999)

Breeders are contemplating on designing the breeding program using the molecular information. This involves many terminologies and processes routinely used in molecular biotechnology. The terminologies and processes that are related to the DNA marker-assisted selection (MAS) breeding are discussed in this article.

2 Quantitative Trait Locus (QTL)

A quantitative trait locus (Beavis et al. 1991) is a region of DNA that is associated with a particular phenotypic trait. These QTLs are often found on different chromosomes. Knowing the number of QTLs that explains variation in the phenotypic trait indicates the genetic architecture of a trait. For example, it may suggest that plant height is controlled by many genes of small effect or by a few genes of large effect.

Another use of QTLs is to identify candidate genes underlying a trait. Once a region of DNA is identified as contributing to a phenotype, it can be sequenced. The DNA sequence of any genes in this region can then be compared to a database of DNA for genes whose function is already known. Advancement in quantitative genetics and work on QTL facilitated linking of certain markers to the gene of focus, thereby increasing the accuracy of breeding values.

In a recent development, classical QTL analyses are combined with gene expression profiling, i.e., by DNA microarrays. Such expression QTLs (e-QTLs) describes cis- and trans-controlling elements for the expression of often disease-associated genes. Knowledge of QTLs therefore becomes essential for marker-assisted selection (MAS) in breeding program (Nagaraju and Goldsmith 2002).

3 Reverse Genetics

Reverse genetics is an approach to discover the function of a gene that proceeds in the opposite direction of so-called forward genetics or classical genetics. While forward genetics seeks to find the genetic basis of a phenotype or trait, reverse genetics seeks to find the possible phenotypes that may derive from a specific genetic sequence enumerated during DNA sequencing. Due to the modern techniques of DNA sequencing, vast amounts of genomic sequence data become available, and many genetic sequences are discovered in advance of other information. Reverse genetics attempts to connect a given genetic sequence with specific effects on the organism. This is the reversal of central dogma where the information flows from protein to mRNA to cDNA.

4 Genome Annotation

Genome annotation (Ghedin et al. 2004) is the process of marking the genes and other biological features in a DNA sequence by genome annotation software system (White 1995). The software system is used to find the genes (places in the DNA sequence that encode a protein), the transfer RNA, and other features and to make initial assignments of function to those genes. The advanced genome annotation software systems work similarly, but the programs available for analysis of genomic DNA are constantly changing and improving.

5 Extraction of DNA

Both plasmids and chromosomal DNA molecules can be isolated from cells. In both cases, the cell membrane is solubilized by the application of detergent. The resultant lysate is then enzyme/heat treated to remove various contaminants from the desired DNA. The DNA molecules are collected by precipitating them into ethanol.

6 Cutting DNA into Small Fragments

The restriction enzyme, found in bacterial cells, functions to protect those cells from infection by bacteriophage particles. They carry out this function by cleaving the invading phage DNA. DNA of the host cell remains protected by attaching methyl group to some of its bases. The restriction enzymes bind to specific sequences of bases known as recognition sequences. Each enzyme has a single specific recognition sequence, e.g., EcoRI (pronounced as eco-r-one) (restriction enzyme I of E. coli.) restriction enzyme cuts DNA whenever the base sequence GAATTC is found.

Once the enzyme has bound to the specific base sequence, it cleaves the DNA backbone, thus breaking the molecule into fragments. The most useful enzymes, known as Type II restriction endonucleases, cleave the DNA molecule at a predictable site within the recognition sequence itself.

Each restriction enzyme, while cleaving, leaves two characteristic ends (terminals) on the fragments: (1) blunt ends, in which there is no overhanging single-strand tail, and (2) staggered ends, in which a single-strand tail overhangs, either in 3′ or 5′ direction.

The staggered ends sometimes complement one another, in which case the staggered ends tend to hydrogen bond to one another and are thus called “sticky ends.” Sticky ends make it possible to join two DNA fragments together, regardless of the source of DNA. The enzyme DNA ligase functions to complete the sugar-phosphate backbone between the newly joined (ligated) fragments. When the fragments are hybridized from at least two different sources, the resultant ligated molecule is called recombinant DNA (rDNA).

7 DNA Sequencing

The base sequence of DNA molecules can be determined using a variety of techniques (Fig. 1.1).

Fig. 1.1
figure 1

Electropherogram printout from automated sequencer showing part of a DNA sequence (Source: Wikimedia 2003)

7.1 The Maxam-Gilbert technique

The Maxam-Gilbert technique (Maxam and Gilbert 1977) depends on the selective degradation of specific bases within the molecule to be sequenced. The degradation results in the production of large population of DNA fragments which are separated by using PAGE. The resulting band pattern determined by autoradiography is used to determine the base sequence. Based on this protocol, more simple and advanced methods have been developed.

7.2 The Sanger dideoxy DNA sequencing technique

The Sanger dideoxy DNA sequencing technique (Sanger et al. 1977) relies on the interruption of DNA synthesis. By adding a carefully calculated amount of modified dideoxynucleotides to the reaction mixture of DNA subunits, it is possible to halt DNA chain elongation at various base sites, resulting in a wide variety of fragments of DNA. This approach is also known as “dye-primer sequencing.”

7.3 Pyrosequencing

Pyrosequencing is a method of DNA sequencing based on the “sequencing by synthesis” principle developed initially by Pal Nyren and coworkers (1985–1990) (Babak et al. 2004). The method is based on a chemiluminescent enzymatic reaction, which is triggered when a molecular recognition event occurs. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it. Each time a nucleotide, A, C, G, or T, is incorporated into the growing chain, a cascade of enzymatic reactions is triggered which results in a light signal. It is a method primarily used for sequencing of short stretches of DNA, SNP detection, and methylation analysis. Such analyses are crucial for biological research, genetics, and some medical and forensic applications. Pyrosequencing is fully automated, reliable, and accurate, and large numbers of samples can be analyzed in a short time. Pyrosequencing methods have been pursued to reduce costs relative to other automated sequencing methods.

7.4 Shotgun Sequencing

Shotgun sequencing (Anderson 1981) is used for sequencing long DNA strands. Since the chain termination method of DNA sequencing can only be used for fairly short strands, it is necessary to divide longer sequences up and then assemble the results to give the overall sequence. In chromosome walking, this division is done by progressing through the entire strand, piece by piece; shotgun sequencing uses a faster but more complex process to assemble random pieces of the sequence.

In shotgun sequencing, DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a contiguous sequence (Table 1.3).

Table 1.3 A simplified example of two rounds of shotgun sequencing (Anderson 1981)

For example, consider the following two rounds of shotgun reads:

In this extremely simplified example, the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. However, assembly of complex genomes is additionally complicated by the abundance of repetitive sequence, meaning similar short reads could come from completely different parts of the sequence. Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence.

7.5 Whole-Genome Shotgun Sequencing

Whole-genome shotgun sequencing (Edwards et al. 1990; Edwards and Caskey 1991; Fleischmann 1995) known also as double-barrel shotgun sequencing is an application of pair-wise end sequencing (Roach et al. 1995), from both the ends of a fragment of DNA. Although sequencing both ends of the same fragment and keeping track of the paired data were more cumbersome than sequencing a single end of two distinct fragments, the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment. To apply the strategy, high-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences. Each sequence is called an end-read or read, and two reads from the same clone are referred to as mate pairs. Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.

The original sequence is reconstructed from the reads using sequence assembly software. First, overlapping reads are collected into longer composite sequences known as contigs. Contigs can be linked together into scaffolds by following connections between mate pairs. The distance between contigs can be inferred from the mate pair positions if the library size is known and has a narrow window of deviation.

Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads (N), and the average read length (L) as NL/G. For example, a hypothetical genome with 2000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2× coverage.

With this approach, it is possible to sequence the whole genome at once using large arrays of sequencers, which makes the whole process much more efficient than more traditional approaches.

8 Gene Libraries

Gene libraries store genetic information. These libraries are composed of fragments of donor DNA which are protected by insertion into cloning vectors. Recombinant cloning vectors from a library, inserted into host cells, allow replication of the donor genetic material. Donor genetic information comes primarily from two sources: (1) Genomic DNA: It is the entire complement of DNA in a donor cell. Eukaryotic genes composed of genomic DNA consist of noncoding introns interspersed among coding exons. (2) cDNA (complementary DNA): Genes composed of cDNA are created using reverse transcriptase to synthesize a DNA copy of an mRNA template. Since mRNA template has already undergone posttranscriptional modifications, it consists solely of coding exons. Therefore, cDNA differs from genomic DNA in that it is composed of exons alone.

9 Preparation of cDNA

cDNA reduces the effective size of the eukaryotic cells since it represents only the exon portion of the gene. The lack of introns in eukaryotic cDNA allows its successful translation by bacterial cells, as these cells are unable to remove noncoding introns from the mRNA which is transcribed from a genomic DNA fragment.

The use of cDNA makes it easier to identify a particular target. By isolating mRNA from appropriate cells (e.g., mRNA complementary to fibroin gene from silk gland cells), the probability of locating a target molecule increases.

mRNA is isolated from a host cell by passing a preparation of nucleic acid over a cellulose column, which has been treated with short lengths of thymine deoxyribonucleotides. This type of column is called oligo-dT cellulose column. The poly(A) tail which characterizes the mRNA molecules binds to the complementary thymine nucleotides attached to the cellulose of the column. In this way the mRNA molecules remain within the column, while others containing nucleic acids pass out through the column. The bound mRNA molecules can be chemically removed from the cellulose. cDNA is synthesized from mRNA templates using the retroviral enzyme called reverse transcriptase.

10 DNA Fingerprinting (or Genetic Fingerprinting or DNA Testing or DNA Typing or DNA Profiling)

DNA fragments produced by restriction endonucleases can be separated on the basis of size by the technique of gel electrophoresis. Agarose gel electrophoresis (AGE) separates DNA fragments which differ from one another by at least 30–50 nucleotide pairs in length. Polyacrylamide gel electrophoresis (PAGE) separates fragments which differ even by a single nucleotide.

Not the entire length of DNA contains useful information. A large amount of DNA is not translated into useful proteins. It is called “noncoding” or “junk” DNA. Changes often crop up within these regions of junk DNA because they make no contribution to the health or survival of the organism; therefore, it remains in the genome and may inherit to the next generation. If a change occurs within a “coding sequence of DNA” (essential gene), preventing it from working properly, the organism will not probably survive, effectively removing that altered gene from the population. For this reason, random variations crop up in the noncoding (junk) DNA sequences as often as once in every 200 DNA letters.

The base sequences of the chromosomes of different individuals of the same species closely resemble one another. However, some differences, called polymorphisms, do exist. Each individual has enough polymorphic sites to make its DNA unique. The analysis of various polymorphisms in any one organism will provide a unique profile which can be used to identify it. This is called DNA fingerprinting which was described by Sir Alec Jeffreys in 1985 (Jeffreys et al. 1991).

11 Protection of DNA Fragments

Donor genetic sequences, either genomic or cDNA, must be protected from degradation and transported into appropriate host cells by cloning vectors. Cloning vectors are lengths of DNA which generally have three properties: (1) unique recognition sequences, (2) selectable visible markers, and (3) can replicate. The three types of cloning vectors are (1) bacterial plasmids, (2) bacteriophage chromosomes, and (3) cosmids:

  1. 1.

    Plasmid vectors often contain genes for antibiotic resistance and genes that govern their transmissions from cell to cell during the process of conjugation. Most plasmids contain a polylinker or multiple cloning site (MCS), which is a short region containing several commonly used restriction sites allowing the easy insertion of DNA fragments at this location. The replication of plasmids can either be linked to the replication of host chromosome “stringent plasmid” or independent of host chromosome “relaxed plasmids.” The number of relaxed plasmids per host cell increased by the technique known as amplification.

  2. 2.

    Bacteriophage (or phage) vector is a virus that can infect bacteria. Bacteriophage vectors have linear double-stranded DNA molecules, which are flanked by complementary single-stranded sequences, of bases known as “cos sites.” The cos sites can bind to one another thus making the phage chromosome circular. Phage particles inject their DNA into host bacterial cells. The phage DNA immediately directs the host bacterial cells to synthesize new phage particles in the lytic process or becomes relatively inert through incorporation into the host chromosome, in the lysogenic process.

  3. 3.

    Shuttle vectors have both yeast and bacterial origins of replication and can therefore be maintained in both cell types. This property has allowed the identification of genes within the yeast cell itself. In this system, a yeast DNA library is made and propagated in E. coli cells. When sufficient plasmid DNA is available, those plasmids are then isolated from the E. coli cells and introduced into yeast cells with known mutations. Donor DNA fragments of about 4000–20,000 base pairs can be incorporated into plasmid or phage vector, respectively.

  4. 4.

    Cosmid vectors represent hybrid vectors consisting of the phage cos site incorporated into a plasmid molecule. Cosmid vectors can accept donor DNA fragments of 35,000–45,000 base pairs in length. The same endonuclease then linearizes the cosmid in such a way as to leave the cos site intact. Both the cosmid and donor fragments are mixed and allowed to ligate. Some ligation products consist of a donor fragment flanked by two cosmids and therefore have two cos sites. Sequence located between two cos sites can be packaged into a phage particle ready to infect a host cell. The donor DNA can thus be inserted into host cell via the process of infection.

  5. 5.

    An episome is a plasmid that can integrate itself into the chromosomal DNA of the host organism (Fig. 1.2). Therefore, it can stay intact for a long time, be duplicated with every cell division of the host, and become a basic part of its genetic makeup. This term is no longer commonly used for plasmids, since it is now clear that a region of homology with the chromosome such as a transposon makes a plasmid into an episome. In mammalian systems, the term episome refers to a circular DNA (such as a viral genome) that is maintained by non-covalent tethering to the host cell chromosome.

  6. 6.

    F-plasmid, also known as the fertility F-plasmid or the F-factor, is found in bacteria allowing bacterial conjugation (where genetic information is exchanged) between different bacterial cells. A bacterial cell is described as F+ (male) when it contains this plasmid and F− (female) when it does not. The F-plasmid is also an episome and can integrate into the cell’s circular genome; in this instance, a cell would be described as Hfr. When an F+ cell conjugates with an F− cell, the result is two F+ cells, both capable of transmitting the plasmid further by conjugation. In the case of Hfr, the result is one Hfr and one F− cell. The F-plasmid has also been engineered so it contains inserted foreign DNA, called a fosmid.

  7. 7.

    Fosmids are similar to cosmids but are based on the bacterial F-plasmid. The cloning vector is limited, as a host (usually E. coli) can only contain one fosmid molecule. Low copy number offers higher stability than comparable high copy number cosmids. Fosmid clones were used to help assess the accuracy of the Public Human Genome Sequence.

  8. 8.

    R-plasmid, resistance plasmid, is a conjugative factor in bacterial cells that promotes resistance to agents such as antibiotics, metal ions, ultraviolet radiation, and bacteriophages.

  9. 9.

    Bacterial artificial chromosome (BAC) is a DNA construct, based on a fertility plasmid (or F-plasmid), used for transforming and cloning in bacteria, usually E. coli. F-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell division. The bacterial artificial chromosome’s usual insert size is 150 kbp, with a range from 100 to 300 kbp. A similar cloning vector, called a PAC, has also been produced from the bacterial P1-plasmid.

Fig. 1.2
figure 2

(Above) Bacterial cell with chromosomal DNA of bacteria and plasmid DNA. (Below) Plasmid DNA integrated with the chromosomal DNA

BACs are often used to sequence the genetic code of organisms in genome projects, for example, the Human Genome Project. A short piece of the organism’s DNA is amplified as an insert in BACs and then sequenced. Finally, the sequenced parts are rearranged, resulting in the genomic sequence of the organism.

BACs can carry both the gene and various promoter sequences which can often show the genes’ true expression level. They are transferred over to the organisms by electroporation/transformation or transfection with a suitable virus or microinjection. BACs can also be utilized to detect genes or large sequences of interest and then used to map them onto the human chromosome using BAC arrays.

12 Contig

In shotgun DNA sequencing projects, a contig (from contiguous) (Staden 1979) is a set of overlapping DNA segments derived from a single genetic source. A contig in this sense can be used to deduce the original DNA sequence of the source. A contig map depicts the relative order of a linked library of contigs representing a complete chromosome segment.

13 Expressed Sequence Tag (EST)

An expressed sequence tag (EST) is a subsequence of a transcribed spliced nucleotide sequence (either protein coding or not). They are intended as a way to identify gene transcripts and are instrumental in gene discovery and gene sequence determination. The identification of ESTs has proceeded rapidly, with approximately 42 million ESTs now available in public databases (e.g., GenBank 3/2007, all species).

An EST is produced by one-shot sequencing (single pass) of a cloned mRNA (i.e., sequencing several hundred base pairs from an end of a cDNA clone taken from a cDNA library). The resulting sequence is a relatively low-quality fragment whose length is limited by current technology to approximately 500–800 nucleotides. Because these clones consist of DNA that is complementary to mRNA, the ESTs represent portions of expressed genes. They may be present in the database as either cDNA/mRNA sequence or as the reverse complement of the mRNA, the template strand.

ESTs can be mapped to specific chromosome locations using physical mapping techniques, such as radiation hybrid mapping or FISH (fluorescent in situ hybridization). Alternatively, if the genome of the organism that originated the EST has been sequenced, one can align the EST sequence to that genome.

ESTs are the tools to refine the predicted transcripts for those genes, which leads to prediction of their protein products and eventually of their function. Moreover, the situation in which those ESTs are obtained (tissue, organ, disease state—e.g., cancer) gives information on the conditions in which the corresponding gene is acting. ESTs contain enough information to permit the design of precise probes for DNA microarrays that then can be used to determine the gene expression.

To generate ESTs, first a cDNA library is created from cell or tissue, and then hundreds or thousands of clones are picked from the library and sequenced in just one pass, without validation, without full-length sequence, usually from poly(A) tail, but can also be sequenced from 5′ or middle.

14 Transposons

Transposons are sequences of DNA that can move around to different positions within the genome of a single cell, a process called transposition. In the process, they can cause mutations and change the amount of DNA in the genome. Transposons are also called “jumping genes” or “mobile genetic elements.” There are a variety of mobile genetic elements, and they can be grouped based on their mechanism of transposition. Class I mobile genetic elements, or retrotransposons, move in the genome by being transcribed to RNA and then back to DNA by reverse transcriptase, while class II mobile genetic elements move directly from one position to another within the genome using a transposase to “cut and paste” them within the genome. Transposons are very useful to researchers as a means to alter DNA inside of a living organism. Transposons make up a large fraction of genome sizes which is evident through the C-values of eukaryotic species. For example, about 45% of the human genome is composed of transposons and their defunct remnants.

15 Cloning

Donor genetic sequences of cDNA are cloned to get recombinant vectors. Once a recombinant vector has been synthesized, it must be inserted into the target host cell (e.g., bacterial cell) in order to allow replication of the exogenous genetic material (Fig. 1.3) Watson JD (2007).

Fig. 1.3
figure 3

DNA to mRNA, mRNA to cDNA. ESTs are parts of cDNA

Treatment of linearized cloning vector with alkaline phosphatase removes the 5′ phosphate group, necessary for the religation of the cleaved ends. Thus, pretreatment of the cleaved vector with this enzyme ensures that only recombinant vectors result when a preparation of cleaved vector molecules is incubated with a preparation of donor DNA fragments in the presence of DNA ligase because the end of treated vector will be unable to ligate to one another.

Cells which are able to take up exogenous DNA, including recombinant plasmids, are called competent cells. Some cell types are naturally competent, while other cell types are not. E. coli cells must be treated with calcium chloride and heat shock or a similar procedure to transform them to a competent state. Cells which have taken up recombinant plasmid DNA are called transformed cells. The efficiency of transformation is generally extremely low and ranges from 0.1 to 10%.

E. coli cells are often chosen as host cells for a number of reasons. The E. coli chromosome has been well characterized, and many of its polypeptide products are identified. E. coli cells are easy to grow in laboratory and have a fast multiplication rate of 20–60 min.

Neither type of gene libraries (genomic DNA nor cDNA) has an index and must therefore be analyzed (screen) with specific probe molecules to find and identify target sequences.

15.1 Identification of Transformed Cell

Transformed cells can often be identified with the use of a selective media which interacts in some way with a selectable marker located on the cloning vector. Typically the cloning vector carries gene which confers drug resistance on a transformed cell, thus allowing only transformed cells to grow in selective media containing that particular drug. Non-transformed cells cannot grow in the selective medium.

The presence of an inserted donor gene in the cloning vector can sometimes be determined by insertional inactivation. This is the loss of a gene product due to insertion of foreign DNA in a recognition sequence located within a particular gene.

The basic screening techniques include the transfer of the host cells or the phage vector to a solid substrate, usually nitrocellulose membrane (or nylon or PVDF) through blotting, subsequent treatment of the sheet with the labeled probe which is complementary to the target sequence. Probes can be cDNA, genomic DNA, RNA, or antibody molecules. Specific probes are chosen based on the information known about the target sequence. Probes will hybridize to complementary molecules on the substrate. Hybridized probe/target complexes are visualized with autoradiography. Alternatively, mRNA probes can be removed from the target molecule and used to direct protein synthesis in order to identify the original gene sequence.

16 DNA Probe

  1. 1.

    DNA to be made radioactive (radiolabeled) is put into a tube.

  2. 2.

    Nicks, or horizontal breaks, are introduced along a strand into the DNA to be radiolabeled. At the same time, individual nucleotides are added to the nicked DNA, one of which, C, is radioactive (Fig. 1.4).

  3. 3.

    DNA polymerase is added to the tube with the nicked DNA and the individual nucleotides. The DNA polymerase will become immediately attracted to the nicks in the DNA and attempt to repair the DNA, starting from the 5′ end and moving toward the 3′ end.

  4. 4.

    The DNA polymerase begins repairing the nicked DNA. It destroys all the existing bonds in front of it and places the new nucleotides, gathered from the individual nucleotides mixed in the tube, behind it. Whenever a G base is read in the lower strand, a radioactive C base is placed in the new strand. In this fashion, the nicked strand, as it is repaired by the DNA polymerase, is made radioactive by the inclusion of radioactive C bases.

  5. 5.

    The nicked DNA is then heated, splitting the two strands of DNA apart. This creates single-stranded radioactive and nonradioactive pieces. The radioactive DNA, now called a probe, is ready for use.

Fig. 1.4
figure 4

Stages in synthesizing a Probe

17 Blotting

Southern blotting is the analysis of DNA sequences with either a DNA or RNA probe. Northern blotting is the analysis of RNA sequences with DNA or RNA probe. Western blotting is the analysis of proteins with an antibody probe. Transformed host cells can be maintained for a long period of time by storage at very cold temperatures, making gene libraries reusable. Frozen cells thawed many years after their original storage behave as if they are freshly prepared. Thus, gene libraries can be created, conserved, probed, regenerated, and re-probed many times.

17.1 The Southern Blot

The Southern blot is a method of enhancing the result of an agarose gel electrophoresis by marking specific DNA sequences. The method is named after its inventor, the British biologist Southern, Edwin M (Southern 1975). This became a convention to other blot methods to be named similarly to indicate variants in blot technology, e.g., Northern blot, Western blot, and Southwestern blot (Peters 1993).

DNA strands are broken into fragments by restriction endonucleases. The fragments are then electrophoresed on a gel to separate cut DNA based on size. If DNA is larger than 15 kb, prior to blotting, the gel may be treated with a dilute acid, such as dilute HCl, which acts to depurinate the DNA fragments. This breaks the DNA into smaller pieces that will be able to complete the transfer more efficiently than larger fragments.

DNA bands are transferred to nylon sheet: The gel from the DNA electrophoresis is treated with an alkaline solution (typically containing sodium hydroxide) to cause the double-stranded DNA to denature, separating it into single strands. Denaturation is necessary so that the DNA will stick to the membrane and be hybridized by the probe. Since the gel is brittle, it cannot withstand further process. Therefore, the bands are transferred to a firm supporting sheet (substrate or membrane) (Towbin et al. 1979). A sheet of nitrocellulose (or nylon, polyvinylidene fluoride (PVDF, it is also known as KYNAR®)) membrane is placed on top of the gel. Pressure is applied evenly to the gel (either using suction or by placing a stack of paper towels and a weight on top of the membrane and gel). This causes the DNA to move from the gel onto the membrane by capillary action, where it sticks. The membrane is then baked (in the case of nitrocellulose) or exposed to ultraviolet radiation (nylon) to permanently cross-link the DNA to the membrane.

The sheet is then treated with a hybridization probe—an isolated DNA molecule with a specific sequence that pairs with the appropriate sequence (the appropriate sequence is the complementary sequence of what the restriction enzyme recognized). The probe DNA is labeled so that it can be detected, usually by incorporating radioactivity or tagging the molecule with a fluorescent or chromogenic dye. In some cases, the hybridization probe may be made from RNA, rather than DNA.

After hybridization, excess probe is washed from the membrane, and the pattern of hybridization is visualized on X-ray film by autoradiography in the case of a radioactive or fluorescent probe or by development of color on the membrane itself if a chromogenic detection is used. When making use of hybridization in the laboratory, DNA must first be denatured, usually by using heat or chemicals. Denaturing is a process by which the hydrogen bonds of the original double-stranded DNA are broken, leaving a single strand of DNA whose bases are available for hydrogen bonding.

Once the DNA has been denatured, a single-stranded radioactive probe can be used to see if the denatured DNA contains a sequence similar to that on the probe. The denatured DNA is put into a plastic bag along with the probe and some saline liquid; the bag is then shaken to allow sloshing. If the probe finds a fit, it will bind to the DNA (Fig. 1.5).

Fig. 1.5
figure 5

Hybridization of radioactive probe with the single-stranded DNA

The fit of the probe to the DNA does not have to be exact. Sequences of varying homology (Fig. 1.6) can stick to the DNA even if the fit is poor; the poorer the fit, the fewer the hydrogen bonds between the probe and the denatured DNA. The ability of low-homology probes to still bind to DNA can be manipulated through varying the temperature of the hybridization reaction environment or by varying the amount of salt in the sloshing mixture.

Fig. 1.6
figure 6

Varying degree of homology between the DNA strand and the probe

17.2 The Northern Blot

The Northern blot (Alwine et al. 1977) is a technique to study gene expression. It is similar to the Southern blot procedure, with the fundamental difference that RNA, rather than DNA, is the substance being analyzed by electrophoresis and detection with a hybridization probe. A notable difference in the procedure (as compared with the Southern blot) is the addition of formaldehyde in the agarose gel, which acts as a denaturant. As in the Southern blot, the hybridization probe may be made from DNA or RNA. A variant of the procedure known as the reverse Northern blot was occasionally used. In this procedure, the substrate nucleic acid (i.e., affixed to the membrane) is a collection of isolated DNA fragments, and the probe is RNA extracted from a tissue and radioactively labeled.

The use of DNA microarrays that have come into widespread use in the early 2000s is similar to the reverse procedure, in that they involve the use of isolated DNA fragments affixed to a substrate and hybridization with a probe made from cellular RNA. Thus, the reverse procedure enabled the one-at-a-time study of gene expression using Northern analysis to evolve into gene expression profiling, in which many of the genes in an organism may have their expression monitored.

17.3 The Western Blot (Immunoblot)

A Western blot (immunoblot) (Burnette 1981) is a method to detect protein in a given sample of tissue homogenate or extract. It uses gel electrophoresis to separate denatured proteins by mass. The proteins are then transferred out of the gel and onto a membrane (nitrocellulose or PVDF), where they are “probed” using antibodies specific to the protein (Renart et al. 1979). As a result, researchers can examine the size, processing, or amount of protein in a given sample and compare several groups.

Other related techniques include using antibodies to detect proteins in tissues (immunohistochemistry) and cells (immunocytochemistry) or the use of antibody to separate proteins by precipitation (immunoprecipitation).

17.3.1 Tissue Preparation

Typically, samples are taken from either tissue or from cell culture. The samples are cooled or frozen rapidly. They are homogenized using sonication or mechanical force or simply lysed using high-salt buffers (150 mM). The resulting “whole-cell homogenate” or “whole-cell fraction” can be used as is or subjected to centrifugation in a series of steps to isolate cytosolic (cell interior), nuclear, and membrane fractions. The prepared sample is then assayed for protein concentration so that a consistent amount of protein can be taken from each different sample.

17.3.2 Protein Sample Preparation

Samples are boiled from 1 to 5 min in a buffer solution (e.g., Laemmli’s buffer—known as “sample buffer”), containing a buffer substance, normally tris base, a dye, a sulfhydryl compound (typically beta-mercaptoethanol or dithiothreitol (DTT) (for reducing disulfide bonds)), an anionic lipophilic detergent (sodium dodecyl sulfate—SDS), and a glycerol to increase its buoyant density. The boiling denatures the proteins, unfolding them completely. The SDS then surrounds the protein with a negative charge, and the beta-mercaptoethanol prevents the reformation of disulfide bonds. The glycerol increases the density of the sample versus the upper buffer in the gel tank and thus facilitates loading the samples as they will sink to the bottom of the gel pockets.

17.3.3 Separation of Protein Fractions

The Proteins of the sample are separated according to molecular weight using polyacrylamide gel electrophoresis, it is also possible to use a 2-D gel which spreads the proteins from a single sample out in two dimensions, and proteins are separated according to isoelectric point (pI at which they have neutral net charge) in the first dimension and according to their molecular weight in the second dimension.

In order to make the proteins accessible to antibody detection, they are moved from within the gel onto a membrane of nitrocellulose or PVDF. The membrane is placed face to face with the gel, and current is applied to large plates on either side. The charged proteins move from within the gel onto the membrane while maintaining the organization they had within the gel. As a result of this “blotting” process, the proteins are exposed on a thin surface layer for detection. Both varieties of the membrane are chosen for their nonspecific protein-binding properties (i.e., binds all proteins equally well). Protein binding is based upon hydrophobic interactions, as well as charged interactions between the membrane and protein. Nitrocellulose membranes are economical than PVDF but are far more fragile and do not stand up well to repeated probing.

Since the membrane has been chosen for its ability to bind protein and both antibodies and the target are proteins, steps must be taken to prevent interactions between the membrane and the antibody used for detection of the target protein. Blocking of nonspecific binding is achieved by placing the membrane in a dilute solution of protein—typically bovine serum albumin (BSA) or nonfat dry milk—with a minute percentage of detergent such as Tween 20. The protein in the dilute solution attaches to the membrane in all places where the target proteins have not attached. Thus, when the antibody is added, there is no room on the membrane for it to attach other than on the binding sites of the specific target protein. This reduces “noise” in the final product of the Western blot, leading to clearer results, and eliminates false positives.

During the detection process, the membrane is “probed” for the protein of interest with antibodies and links them to a reporter enzyme, which drives a colorimetric or photometric signal. This process takes place in a two-step method (now one-step method is also available for certain applications).

17.3.4 Two-Step Method

  1. 1.

    First Step: Primary Antibody—Antibodies are generated when a host species or immune cell culture is exposed to the protein of interest (or a part thereof). Normally a part of the immune response, here they are harvested and used as sensitive and specific detection tools that bind the protein directly—hence called primary antibody.

After blocking, a dilute solution of primary antibody (~0.5–5 μg/mL) is incubated with the membrane with gentle agitation. Typically, the solution is comprised of buffered saline solution with a small percentage of detergent and sometimes with powdered milk or BSA. The antibody solution and the membrane can be sealed and incubated together for 30 min to overnight. It can also be incubated at different temperatures, with warmer temperatures being associated with more binding, both specific to the target protein (called “signal”) and nonspecific (called “noise”).

  1. 2.

    Second Step: Secondary Antibody—After rinsing the membrane to remove unbound primary antibody, it is exposed to another antibody, directed at a species-specific portion of the primary antibody. This is known as a secondary antibody and, due to its targeting properties, generally referred to as “anti-mouse,” “anti-goat,” etc. Antibodies come from animal sources (or animal-sourced hybridoma cultures); an anti-mouse secondary antibody will bind to just about any mouse-sourced primary antibody. The secondary antibody is usually linked to biotin or to a reporter enzyme such as alkaline phosphatase or horseradish peroxidase. This step confers an advantage in that several secondary antibodies will bind to one primary antibody, providing enhanced signal.

17.3.5 One-Step Method

This requires a probe antibody which both recognizes the protein of interest and contains a detectable label, probes which are often available for known protein tags. The primary probe is incubated with the membrane in a manner similar to that for the primary antibody in a two-step process and then is ready for direct detection after a series of wash steps.

After the unbound probes are washed away, the Western blot is ready for detection of the probes that are labeled and bound to the protein of interest. In practical terms, not all Westerns reveal protein only at one band in a membrane. Size approximations are taken by comparing the stained bands to that of the marker or ladder loaded during electrophoresis. The process is repeated for a structural protein, such as actin or tubulin, that should not change between samples. The amount of target protein is indexed to the structural protein to control between groups. This practice ensures correction for the amount of total protein on the membrane in case of errors or incomplete transfers.

17.3.6 Colorimetric Detection

The colorimetric detection method depends on incubation of the Western blot with a substrate that reacts with the reporter enzyme (such as peroxidase) that is bound to the secondary antibody. This converts the soluble dye into an insoluble form of a different color that precipitates next to the enzyme and thereby stains the nitrocellulose membrane. Development of the blot is then stopped by washing away the soluble dye. Protein levels are evaluated through densitometry or spectrophotometry. ELISPOT (the enzyme-linked immunosorbent spot) and ELISA (the enzyme-linked immunosorbent assay) are popular examples.

17.3.7 Chemiluminescence

Chemiluminescent detection methods depend on incubation of the Western blot with a substrate that will luminesce when exposed to the reporter on the secondary antibody. The light is then detected by photographic film and more recently by CCD cameras which capture a digital image of the Western blot. The image is analyzed by densitometry, which evaluates the relative amount of protein staining and quantifies the results in terms of optical density. Newer software allows further data analysis such as molecular weight analysis if appropriate standards are used. The “enhanced chemiluminescent” (ECL) detection is considered to be among the most sensitive detection methods for blotting analysis.

17.3.8 Radioactive Detection

Radioactive labels do not require enzyme substrates, but rather allow the placement of medical X-ray film directly against the Western blot which develops as it is exposed to the label and creates dark regions which correspond to the protein bands of interest (see image to the right). The importance of radioactive detection methods is declining, because it is very expensive, health and safety risks are high, and ECL provides a useful alternative.

17.3.9 Fluorescent Detection

The fluorescently labeled probe is excited by light, and the emission of the excitation is then detected by a photo sensor such as CCD camera equipped with appropriate emission filters which captures a digital image of the Western blot and allows further data analysis such as molecular weight analysis and a quantitative Western blot analysis. Fluorescence is considered to be among the most sensitive detection methods for blotting analysis.

18 Polymorphism in Repeated Sequences

On chromosomes, there are sequences of repeated DNA nucleotides. The number of repeats can vary from about one to thirty and are not the same from individual to individual. These sequences are called variable number tandem repeats (VNTRs). Within the VNTRs, there are sites where an enzyme can cut the DNA, and the location of these sites also varies from person to person. Cutting with the enzyme will lead to DNA fragments of different lengths, which are called Restriction Fragment Length Polymorphisms (RFLPs) (Williums et al. 1990).

19 DNA Markers

Markers are DNA sequences that can be identified by a simple assay, allowing the presence or absence of neighboring stretches of genome to be inferred. Markers may be short, such as single base pair change (single nucleotide polymorphism), or long such as DNA fragment generated by restriction digestion. DNA markers can be identified directly, e.g., by DNA sequencing or indirectly as in case of allozymes.

19.1 Types of Markers

  1. 1.

    Non-PCR-based markers

    1. a.

      RFLP

  2. 2.

    PCR-based markers

    1. a.

      RAPD

    2. b.

      AFLP

    3. c.

      Minisatellites

    4. d.

      Microsatellites (SSR, STR, SSLPs)

    5. e.

      SNP

    6. f.

      Sequence-tagged sites (SCARs, CAPS, ISSRs)

    7. g.

      Diversity arrays

    8. h.

      LCN-DNA

19.1.1 RFLP

DNA polymorphism can be identified by the ability of various restriction enzymes to cleave DNA in the vicinity of the polymorphism into variable sizes of DNA fragments. This is called Restriction Fragment Length Polymorphisms (RFLPs). When DNA fingerprinting first began, RFLP (Zabeau and Vos 1993) analysis was used. Now it has been almost completely replaced with newer PCR-based techniques. RFLP analysis is performed by using a restriction enzyme to cut the DNA into fragments which are separated into bands by agarose gel electrophoresis. These bands of DNA are transferred by Southern blotting from the agarose gel to a nylon membrane. This is treated with a radioactively labeled DNA probe which binds to certain specific DNA sequences on the membrane. The excess DNA probe is then washed off. An X-ray film placed next to the nylon membrane detects the radioactive pattern. This film is then developed to make a visible pattern of bands called a DNA fingerprint. By using multiple probes targeting various polymorphisms in successive X-ray images, a fairly high degree of discrimination was possible.

Advantages of RFLPs are as follows: RFLPs have higher level of polymorphism than isozymes, larger number of loci can be identified, produce semidominat markers, allowing determination of homozygosity or heterozygosity, and have selective neutrality. They are stable and reproducible.

Disadvantages of RFLPs are the exact sizes of the bands are unknown and comparison to a molecular weight ladder is done in a purely qualitative manner. RFLP is a very time-consuming method which requires relatively high quantity of good-quality DNA. One has to work with radioisotopes. Too many polymorphisms for a short probe.

19.1.2 PCR Based Markers

With the invention of the polymerase chain reaction (PCR) (Mullis 1998; Wolfe and Liston 1998; Wolfe et al. 1998), DNA fingerprinting took huge strides forward in both discriminating power and ability to recover information from very small starting samples. PCR involves the amplification of specific regions of DNA using a cycling of temperature and a thermostable polymerase enzyme along with sequence-specific primers of DNA. Systems such as the HLA-DQ alpha reverse dot blot strips grew to be very popular due to their ease of use and the speed with which a result could be obtained; however, they were not as discriminating as RFLPs. A large number of protocols that are rapid and require only a small quantity of DNA have been developed.

19.1.2.1 RAPD

RAPD (Random Amplification of Polymorphic DNA) is a type of PCR reaction, but the segments of DNA that are amplified are random. The scientist performing RAPD creates several arbitrary, short primers (8–12 nucleotides) and then proceeds with the PCR using a large template of genomic DNA, hoping that fragments will amplify. By resolving the resulting patterns, a semi-unique profile can be generated from a RAPD reaction.

No knowledge of the DNA sequence for the targeted gene is required, as the primers will bind somewhere in the sequence, but it is not certain exactly where. This makes the method popular for comparing the DNA of biological systems that have not had the attention of the scientific community or in a system in which relatively few DNA sequences are compared (it is not suitable for forming a DNA data bank). Due to the fact that it relies on a large, intact DNA template sequence, it has some limitations in the use of degraded DNA samples. Its resolving power is much lower than targeted, species-specific DNA comparison methods, such as Short Tandem Repeats.

Advantages of RAPDs are they are more polymorphic than RFLPs, simple, and quick and have selective neutrality. Disadvantages are they are dominant and do not permit the scoring of heterozygous individuals. Reproducibility is limited.

19.1.2.2 AFLP

AFLP-PCR, amplified fragment length polymorphism-polymerase chain reaction (Vos et al. 1995), is a highly sensitive method for detecting polymorphisms in DNA. The procedure of this technique is divided into three steps: (1) digestion of total cellular DNA with one or more restriction enzymes and ligation of restriction half-site-specific adaptors to all restriction fragments, (2) selective amplification of some of these fragments with two PCR primers that have corresponding adaptor and restriction site-specific sequences, and (3) electrophoretic separation of amplicons on a gel matrix, followed by visualization of the band pattern.

AFLP is relied on variable number tandem repeat (VNTR) polymorphisms to distinguish various alleles, which were separated on a polyacrylamide gel using an allelic ladder (as opposed to a molecular weight ladder). Bands could be visualized by silver staining the gel. As with all PCR-based methods, highly degraded DNA or very small amounts of DNA may cause allelic dropout (causing a mistake in thinking a heterozygote is a homozygote) or other stochastic effects. In addition, because the analysis is done on a gel, very high number repeats may bunch together at the top of the gel, making it difficult to resolve. AFLP analysis can be highly automated and allows for easy creation of phylogenetic trees based on comparing individual samples of DNA. A variation of AFLP is TE Display, used to detect transposable element mobility.

Advantages: No sequence information required, reliable, highly sensitive and very large number of polymorphisms per reaction, highly reproducible (repeatable), selective neutrality. Disadvantages: Null allele not detected, proprietary technology.

19.1.2.3 Minisatellite

Minisatellite is a section of DNA that consists of a short series of bases 10–100 bp; these occur at more than 1000 locations in the genome. This series usually contains the same central sequence of letters “GGGCAGGAXG” (where X can be any one of A, T, G, C letters). This sequence encourages chromosomes to swap DNA. When this happens, frequent mistakes are made; this causes minisatellites at over 1000 locations in the genome to have slightly different numbers of repeats, thereby making them unique. Due to their high level of polymorphism, minisatellites were extensively used for DNA fingerprinting as well as for genetic markers. Minisatellites have also been implicated as regulators of gene expression (e.g., at levels of transcription, alternative splicing, or imprint control) or as part of bona fide open reading frames. Minisatellites have also been associated with chromosome fragile sites and are proximal to a number of recurrent translocation breakpoints.

19.1.2.4 Microsatellites: STR, SSR, or SSLPs

The most prevalent method of DNA fingerprinting used today is based on PCR and uses Short Tandem Repeats (STR) or Microsatellites or Simple Sequence Repeats (SSR) or Simple Sequence Length Polymorphisms (SSLPs) (Gupta et al. 1994; Tautz 1989; Zietkiewicz et al. 1994). The lengths of sequences used most often are mono-, di-, tri-, or tetra-nucleotides, e.g., AAAAAAA would be referred to as (A)7, GTGTGTGTGTGT as (GT)7, CTGCTGCTGCTG as (CTG)4, and ACTCACTCACTCACTCACTC as (ACTC)5. Microsatellites are inherited in a Mendelian fashion. They are typically neutral and codominant and are used as molecular markers. Because different individuals have different numbers of repeat units, these regions of DNA can be used to discriminate between individuals. These STR loci are targeted with sequence-specific primers and are amplified using PCR. The DNA fragments that result are then separated and detected using capillary electrophoresis (CE) and gel electrophoresis (PAGE). The polymorphisms displayed at each STR region are by themselves very common; typically each polymorphism will be shared by around 5–20% of individuals. When looking at multiple loci, it is the unique combinations of these polymorphisms to an individual that makes this method discriminating as an identification tool. The more STR regions that are tested in an individual, the more discriminating the test becomes.

DNA is denatured at a high temperature, separating the double strand. Annealing of primers and the extension of nucleotide sequences along opposite strands are effected at lower temperatures. This process results in production of enough DNA to be visible on agarose or acrylamide gels; only small amounts of DNA are needed for amplification as thermocycling in this manner creates an exponential increase in the replicated segment. With the advance of PCR technology, primers that flank microsatellite loci are simple and quick to use, but the development of correctly functioning primers is a critical process.

19.1.2.4.1 Development of Microsatellite Primers
  1. 1.

    For searching specific microsatellite markers in particular regions of a genome, for example, within a particular exon of a gene, primers can be designed manually. This involves searching the genomic DNA sequence for microsatellite repeats, which can be done visually or by using automated tools such as repeat masker. Once the potentially useful microsatellites are determined (removing non-useful ones such as those with random inserts within the repeat region), the flanking sequences can be used to design oligonucleotide primers which will amplify the specific microsatellite repeat in a PCR reaction.

  2. 2.

    Random microsatellite primers can be developed by cloning random segments of DNA from the focal species. These are inserted into a plasmid or phage vector, which is in turn implanted into Escherichia coli bacteria. Colonies are then developed and screened with fluorescently labeled oligonucleotide sequences that will hybridize to a microsatellite repeat, if present on the DNA segment. If positive clones can be obtained from this procedure, the DNA is sequenced, and PCR primers are chosen from sequences flanking such regions to determine a specific locus. This process involves significant trial and error on the part of researchers, as microsatellite repeat sequences must be predicted and primers that are randomly isolated may not display significant polymorphism. Microsatellite loci are widely distributed throughout the genome and can be isolated from semi-degraded DNA of older specimens, as all that is needed is a suitable substrate for amplification through PCR.

Microsatellites have been proved to be versatile molecular markers, particularly for population analysis, but they are not without limitations. Advantages: High level of polymorphisms, high locus specificity, easy and fast to run, robust and reproducible. Disadvantages: May only be used for intraspecific and intragenomic alleles, time-consuming and expensive.

19.1.2.5 SNP (Single Nucleotide Polymorphism)

SNP (pronounced as snip) is a DNA sequence variation occurring when a single nucleotide—A, T, C, or G—in the genome (or other shared sequences) differs between members of a species (or between paired chromosomes in an individual). For example, two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. In this case, it is referred as two alleles: C and T. Almost all common SNPs have only two alleles.

Within a population, SNPs can be assigned a minor allele frequency—the ratio of chromosomes in the population carrying the less common variant to those with the more common variant. SNPs may fall within coding sequences of gene and noncoding regions of genes or in the intergenic regions between genes. SNPs within a coding sequence will not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code. A SNP in which both forms lead to the same polypeptide sequence is termed synonymous (sometimes called a silent mutation)—if a different polypeptide sequence is produced, they are non-synonymous. SNPs that are not in protein coding regions may still have consequences for gene splicing, transcription factor binding, or the sequence of noncoding RNA.

19.1.2.6 STS (Sequence Tagged Sites)

From the sequence information, oligonucleotide primers 18–20 nucleotide long are synthesized that are complementary to each end of the RAPD product or the clone. These new primers are then used to amplify DNA by PCR. Two results could occur. First the size of the amplification products among different DNAs (e.g., two parents differing from disease resistance locus) could be polymorphic. Alternatively the amplification product could be monomorphic (of the same size). If this is the case, then it will be necessary to cut the products with various restriction enzymes to identify polymorphisms. SCAR, CAPS, and ISSRs are grouped under this category.

19.1.2.6.1 SCAR (Sequence Characterized Amplified Region)

The RAPD band from the +allele (i.e., the allele that gives the band) is cloned and sequenced. Longer primers a1 and a2 are synthesized. Because they contain longer sequences, they are likely to be specific only for the desired locus and will not amplify any other loci. However, it is also likely that a1 and a2 will amplify both the + and −alleles at that locus. If so then it should be possible to amplify the + and −alleles. By comparing the sequences of the + and −allele, it should be possible to find out other mutations, internal to the a1 and a2, primers which are polymorphic between two alleles. If such polymorphism can be identified, then it should be possible to synthesize new primers, a1 and a2, that will only amplify a region of the +allele but not the −allele. 16–24 bp primers designed from the ends of cloned RAPD markers are used. This technique converts a band which is prone to difficulties in interpretation and/or reproducibility into a very reliable marker.

Advantages: Simpler pattern than RAPDs, robust and reproducible, Mendelian inheritance, sometimes convertible to codominant markers. Disadvantages: Require a small degree of sequence knowledge, require effort and expense in designing specific primers for each locus.

19.1.2.6.2 CAPS (Cleaved Amplified Polymorphic Sequence)

This method is based on the design of specific primers, amplification of DNA fragments, and generation of smaller possibly variable fragments by means of a restriction enzyme. This technique aims to convert an amplified band that does not show variation into a polymorphic one. In this method, A band, DNA, gene sequence, or another type of markers is identified as important. Either the band is detected through PCR and cut out of the gel and fragment cloned and sequenced, or the fragment sequence is already available. Specific primers are designed from fragment sequences. The newly designed primers are used to amplify the template DNA. The PCR product is subjected to digestion by a panel of restriction enzymes. Polymorphism may be identified with some of the enzymes.

Advantages: Robust assay, because of long primers; codominant markers; can be compared with library markers. Disadvantages: Require at least small amount of sequence knowledge, effort, and expense to produce specific primers.

19.1.2.6.3 ISSRs (Inter-Simple Sequence Repeats)

ISSR techniques (Wolfe et al. 1998; Bornet and Branchard 2001) are nearly identical to RAPD techniques except that ISSR primer sequences are designed from microsatellite regions and the annealing temperatures used are higher than those used for RAPD markers. They are the regions found between microsatellite repeats. Technique is based on PCR amplification of intermicrosatellite sequences. It targets multiple loci because of the known abundance of repeat sequence spread all over the genome. In this method, a typical PCR is performed in which primers have been designed, based on a microsatellite repeat sequence, and extended one to several bases into the flanking sequence as anchor points. Different alternatives are possible: Only one primer is used; two primers of similar characters are used; combinations of microsatellite-sequence-anchored primer with a random primer (i.e., those used for RAPD).

Advantages: Do not require prior sequence information. Variation within unique regions of genome may be found at several loci simultaneously. Microsatellite sequence specific. Very useful DNA fingerprinting especially for closely related species. Disadvantage: Only dominant markers can be identified.

19.1.2.7 DArT (Diversity Array Technology)

DArT can detect and type DNA variation at several hundred genomic loci in parallel without relying on sequence information. Two steps are involved in this method. (1) Generation of array: Restriction-generated fragments representing the diversity of a genepool are cloned. The outcome is called representation (typically 0.1–10%) of the genome. Polymorphic clones in the library are identified by arraying insert from a random set of clones and hybridizing the array to different samples. The inserts from polymorphic clones are immobilized on a chip. (2) Genotyping a sample: Label the representation (DNA) of the sample with fluorescence and hybridize against the array. Scan the array and measure for each spot the amount of hybridization signal. By using multiple labels, contrast a representation from one sample with the other or with control probe.

Advantages: Do not require sequence information, high output, fast data acquisition and analysis, detects single base changes as well as insertions and/or deletions, detects differences in DNA methylation, depending on the enzyme used to generate the fragments, small DNA sample is enough, good transferability of markers among breeding populations, full automation possible. Disadvantages: Dominance of markers, technically demanding, low polymorphism in genomic library.

19.1.2.8 LCN-DNA (Low Copy Number DNA)

The journal Nature summarizes this technique as “Initial tests showed that they could readily obtain correct genetic profiles from swabs taken directly from the palm of a hand (13 of 13)” (Findlay et al. 1997; Ronald et al. 1997; van Oorschot et al. 2005). DNA yields varied from 2 to 150 ng (average 48.6 ng). Dry hands and those that had been washed recently tended to provide the least DNA. This is similar to ISSR, but a small trace of DNA is sufficient.

Advantages: Work with <100 pg genomic DNA (~15–17 diploid copies of nuclear DNA markers such as STRs) below stochastic threshold level where PCR amplification is not as reliable (determined by each laboratory; typically 150–250 pg). Enhancing sensitivity of detection (34 cycles instead of 28 cycles). Too few copies of DNA template to ensure reliable PCR amplification. Disadvantages: Allele dropout, allele drop-in contamination, increased stutter, heterozygote imbalance, no thresholds, tissue source cannot be determined, DNA may not be relevant—casual contact/transfer—rarely useful for database searches.

20 Markers in Silkworm, Bombyx mori

The silkworm genomic analysis programs on identification of DNA markers for QTLs have been taken up by many researchers (Nagaraju and Goldsmith 2002; Rao and Chandrashekharaiah 2003; Reddy et al. 1999). Many workers have analyzed silkworm genome by ESTs (Kadono-Okuda et al. 2002) and SNPs (Yamamoto et al. 2006; Chatterjee and Mohandas 2003) and have identified eight molecular RFLP markers of which six are linked to high shell ratio and two to low cocoon shell ratio. Prudhome and Couble (2002) have succeeded in incorporating DNA sequence into silkworm genome using PiggyBAC technology.

A cDNA linkage map of silkworm, Bombyx mori, based on RFLP was constructed by Nguu et al. (2005) with RF02 female and F1 (RF02xRF50) male populations. Map consists of more than 194 randomly isolated cDNA clones whose linkage groups which had been determined previously were analyzed, out of which 189 were unambiguously placed on the linkage map by repeated three-point analysis. The majority of the mapped clones corresponded to single loci dispersed on every 28 chromosomes. The map covers about 66% of the silkworm genome.

A RAPD linkage map of Bombyx mori was constructed by Li et al. (2000) with Dazao/C108 and their F2 generation. The map consists of 182 RAPD loci, of which 103 loci come from Dazao and from the first 23 linkage groups and the other 79 loci come from C108 and from the second 16 linkage groups. This map covered a total genetic distance of over 1148.3 cM.

Mita et al. (2004) established draft sequence of silkworm Bombyx mori by threefold whole-genome shotgun (WGS) sequencing and assembled into 49,345 scaffolds that span a total length of 514 mb including gaps and 387 mb without gaps. Because the genome size of the silkworm is estimated to be 530 mb, almost 97% of the genome has been organized in scaffolds, of which 75% has been sequenced.

Yamamoto et al. (2006) have developed a linkage map for the silkworm Bombyx mori based on single nucleotide polymorphisms (SNPs) between strains p50T and C108T initially found on regions corresponding to the end sequences of bacterial artificial chromosome (BAC) clones. Using 190 segregants from a backcross of a p50T female × F1 (p50T × C108T) male, they analyzed segregation patterns of 534 SNPs, detected among 3840 PCR amplicons, each associated with a p50T BAC end sequence. They have constructed a linkage map composed of 534 SNP markers spanning 1305 cM in total length distributed over the expected 28 linkage groups.

Nagaraja and Nagaraju (1995) studied DNA profiling of 13 silkworm genotypes using the RAPD technique. Two hundred sixteen amplified products were generated using 40 random primers. Amplification products specific to diapausing genotypes were identified.

Nagaraja et al. (2005) also constructed a genetic map of RAPD, SSR, and FISSR markers for the Z chromosome using a backcross mapping population. Sixteen Z-linked markers were identified, characterized, and mapped using od, a recessive trait for translucent skin as an anchor marker yielding a total recombination map of 334.5 cM distributed throughout the Z chromosome. Four RAPD and four SSR markers that were linked to W chromosome were also identified.

Nagaraju et al. (2002) showed that the FISSR-PCR markers are inherited and segregated in Mendelian fashion as demonstrated on a panel of 99 F2 offspring derived from a cross of two divergent silkworm strains.

SilkDB 2017 (Silkworm Knowledgebase from China) (http://silkworm.genomics.org.cn) (Xia et al. 2004), SilkSatDB (SilkSatDB 2017) (a microsatellite database of silkworm from CDFD, India) (www.cdfd.org.in/silksatdb), Silkbase (Silkbase 2017) (EST database and BAC library from Japan) (www.ab.a.u-tokyo.ac.jp/silkbase) (Mita et al. 2004; Mita et al. 2003), and many other websites provide updated information about genome sequence assembly, cDNAs, ESTs, SNPs, and functional annotations of genes of silkworm.