Human Genome Project, The
The Human Genome Project was an international research project that mapped and stored all 3.2 billion base pairs in the human genome.
The Human Genome Project (HGP), successfully completed in 2003, was an international collaboration that accurately sequenced and mapped the entire euchromatic portion of the human genome. Two separate drafts of the human genome DNA sequences were published in 2001 as a consequence of the race between Celera Genomic’s privately funded and the International Human Genome Sequencing Consortium’s publicly funded initiatives. Their goals were to sequence all the nucleotides of the human genome, create databases to store this information, and develop tools to analyze this massive amount of information. Completing the project required collaborators from the UK, France, Japan, Italy, Canada, and Germany and costs approximately $3 billion (International Human Genome Sequencing Consortium 2004).
History and Significance
In 1986, over 30 years after James Watson and Francis Crick first discovered the structure of DNA, Italian Nobel laureate Renato Dulbecco supported the creation of a project aimed at sequencing the complete genome of an organism. While he was not the first to propose such an undertaking, his commentary in Science propelled the concept of a HGP into scientific mainstream. Its independent origins stem from Dulbecco, Robert Sinsheimer of UC Santa Cruz, and the Department of Energy’s (DOE) Charles DeLisi (Berg 2006). Though each with his own specific reasons, ranging from understanding cancer to detecting chemically induced mutations, they understood that this colossal feat was needed to completely understand the genetic basis of human illnesses and diseases. The project was officially launched in 1998 by the DOE and the National Institutes of Health.
Before the HGP, stretches of genetic material had only been examined through individual small-scale experiments. DNA is the code that makes thousands of genes that keep a cell living. This code is made of four chemical bases, called nucleotides, ordered along a sugar-phosphate backbone of the DNA double helix. An organism’s genome is the entire DNA sequence of one set of chromosomes. Humans have 23 pairs of autosomal chromosomes, which are identical in males and females, and a set of sex chromosomes, which are different between males and females. Each chromosome consists of around 50–300 million nucleotides, but because of DNA’s double helix complementarity, the makeup of one strand provides exact information about the other strand. The complete human genome consists of 3.2 billion nucleotides, however knowing just the arrangement of the bases (called a sequence) does not give any information of where the genes are located on the chromosome (called a map), the function of each genes, or what genes correspond to which proteins. The difficulty of the HGP was to both sequence and map every single nucleotide (International Human Genome Sequencing Consortium 2001).
The laboratory techniques in the 1990s were not able to yet read the three billion sequence from end to end. Sequencing at this scale was a laborious process as the HGP Consortium started by using clone-by-clone sequencing to create a crude physical map of the genome before sequencing the DNA. In this method, long strands of DNA are cut into smaller 150K base pair long pieces and then into a bacterial artificial chromosome (BAC) vector, an artificial chromosome from a bacteria. A BAC is a piece of circular DNA that can reproduce inside a bacterial cell, so each time the bacteria divides, the inserted human DNA is copied. All of this DNA is broken down into small overlapping pieces, sequenced, and then pieced back together using areas of overlap to reform the large original pieces originally inserted into the BAC (International Human Genome Sequencing Consortium 2004). While each piece is from a known region of the chromosome, making clones and creating genome maps is expensive and time-consuming. Furthermore, this type of sequencing cannot be used on centromeres or telomeres, the tightly packed middle and ends of a chromosome, because they contain long repetitive sequences (Venter et al. 2001).
Craig Venture and his company Celera instead pushed forth a method that bypassed the step of using BAC clones entirely. In this method called shotgun sequencing, the whole genome is broken down into small pieces that are easy to sequence. Then, using massive computing power, the pieces are assembled by a computer program that looks for small overlaps between the pieces (Venter et al. 2001). Because this eliminates the physical mapping stages needed for BAC clones, sequencing is faster and less expensive.
The HGP used genetic material from five anonymous donors in order to create the reference map we have today. Scientists collected blood and sperm samples from individuals of European, African, American, and Asian ancestry, so the completed genome was a composite of DNA and not a specific individual’s genetic information, called a reference genome. A major finding of the HGP was that all humans contain the same large set of genes and genomic regulatory regions that direct development (International Human Genome Sequencing Consortium 2004). People only deviate in 1 out of 1000 bases, so this reference genome is 99.9% identical across all humans, regardless of race (Venter et al. 2001). While each person’s genome does contain individual variations, the conclusion of the project dismissed many genetically based ethnic differences.
Scientists were also surprised to discover that humans only contain 20,000 genes, a number far lower to pre-HGP predictions of 100,000 genes (International Human Genome Sequencing Consortium 2004). As humans are arguably the most complex organisms on earth, researchers assumed that we would contain the largest number of genes. While 20,000 is more than a rodent, it is not much larger than other apes. Even more astonishing was the discovery that genes only make up 1% of the genome. The rest, once considered junk DNA, is vital in regulation when and where genes are turned on and off.
The project had an immense impact on all of science, chiefly because it accelerated the advancement of DNA sequencing technologies. The first genome took around 13 years to complete, but since then scientists have been able to sequence the genomes of other organisms within a year. The price to sequence an entire human genome now only costs around $1000 which allows for detailed comparisons across thousands of individuals. Moreover, since the completion of the HGP scientists have identified the genetic basis of more diseases, like autoimmune disorders and specific types of blindness. Research studies have also linked over 200 genes to types of cancers – triple the number that was known before the HGP. The completion of the project has heralded in a new age of personalized medicine and revolutionized genomic technology.
The HGP was an international collaborative project that identified and mapped the 3.2 billion base pairs that make up the human genome. Initiated in 1990, the project took 13 years to complete and led to many signifiant advances in sequencing technology. Using wholegenome random shotgun method and hierarchical shotgun methods, the collaboration built a genetic and physical map of the genome. Most striking, researchers discovered that the genome only contains around 20,000 genes and 99.9% of nucleotides are identical between any two people.