Germ-line DNA copy number variation frequencies in a large North American population


Genomic copy number variation (CNV) is a recently identified form of global genetic variation in the human genome. The Affymetrix GeneChip 100 and 500 K SNP genotyping platforms were used to perform a large-scale population-based study of CNV frequency. We constructed a genomic map of 578 CNV regions, covering approximately 220 Mb (7.3%) of the human genome, identifying 183 previously unknown intervals. Copy number changes were observed to occur infrequently (<1%) in the majority (>93%) of these genomic regions, but encompass hundreds of genes and disease loci. This North American population-based map will be a useful resource for future genetic studies.

Fig. 1
Fig. 2


This work was supported by the National Cancer Institute, National Institutes of Health under RFA # CA-96-011 (to SG & MC) and through cooperative agreements with members of the Colon Cancer Family Registry and P.I.s. The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating institutions or investigators in the Colon CFR, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government or the Colon CFR. SG is the recipient of a grant from the Lustgarten Foundation for Pancreas Cancer Research, which also supported this work. Cancer Care Ontario, as the host organization to the ARCTIC Genome Project, acknowledges that this Project was funded by Genome Canada through the Ontario Genomics Institute, by Génome Québec, the Ministère du Développement Économique et Régional et de la Recherche du Québec and the Ontario Institute for Cancer Research. GZ is a Scholar of the Society of University Surgeons and a recipient of a Terry Fox Foundation Research Fellowship from the National Cancer Institute of Canada. The authors thank Dr. S. Ogawa for providing early access to CNAG version 2, Drs. C. Marshall and L. Feuk for their advice, Dr. D. Daftary, and Ms. T. Selander, Dr. Ling Liu and the Mount Sinai Hospital Biospecimen Repository for technical assistance.

Author information

Correspondence to Steven Gallinger.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Table 1. CNVR Coordinates, Population Frequency and Ancestry-Associations. (XLS 120 kb)

Supplementary Table 2. Enriched gene ontology categories in CNVRs. Known genes that overlapped with CNVRs were tested for over or under-representation of specific Gene Ontology (GO) gene function annotation terms (The Gene Ontology Consortium 2000) using the BiNGO software (Maere et al. 2005) and updated human GO annotation downloaded on Dec.3.2006. 1351 unique genes were tested against the entire GO reference set, consisting of 16,123 annotated genes. Assessment of significance was conducted using the hypergeometric test and Benjamini & Hochberg False Discovery Rate multiple testing correction. UniProt (Wu et al. 2006) identifiers (IDs) for each known gene were converted to Entrez Gene IDs (Wheeler et al. 2006), using Expasy (Gasteiger et al. 2003) and Ensembl (Clamp et al. 2003). The following GO terms were enriched in CNVRs, while a significant impoverishment of GO categories was not observed. (DOC 34 kb)

Supplementary Table 3. CNVRs associated with 55 cancer genes of 406 genes known to be involved in cancer, downloaded from the Cancer Genes resource (Higgins et al. 2006), and 189 genes reported in a recent paper by Sjoblom et al. (2006). (XLS 27 kb)

Supplementary Table 4. CNVRs associated with OMIM Morbid Map genes (Hamosh et al. 2002, downloaded in November 2006). (XLS 47 kb)

Supplementary Table 5. Novel CNVRs. (XLS 64 kb)

Supplementary Table 6. CNVRs not identified in the HapMap sample collection (Redon et al. 2006). (XLS 45 kb)

