Skip to main content

Combinatorial pooled sequencing: experiment design and decoding

Abstract

Owing to rapid advances in the next-generation sequencing technology, the cost of DNA sequencing has been reduced by over several orders of magnitude. However, genomic sequencing of individuals at the population scale is still restricted to a few model species due to the huge challenge of constructing libraries for thousands of samples. Meanwhile, pooled sequencing provides a cost-effective alternative to sequencing individuals separately, which could vastly reduce the time and cost for DNA library preparation. Technological improvements, together with the broad range of biological research questions that require large sample sizes, mean that pooled sequencing will continue to complement the sequencing of individual genomes and become increasingly important in the foreseeable future. However, simply mixing samples together for sequencing makes it impossible to identify reads that belongs to each sample. Barcoding technology could help to solve this problem, nonetheless, currently, barcoding every sample is costly especially for large-scale samples. An alternative to barcoding is combinatorial pooled sequencing which employs pooling pattern rather than short DNA barcodes to encode each sample. In combinatorial pooled sequencing, samples are mixed into few pools according to a carefully designed pooling strategy which allows the sequencing data to be decoded to identify the reads that belongs to the sample that are unique or rare in the population. In this review, we mainly survey the experiment design and decoding procedure for the combinatorial pooled sequencing applied in rare variant and rare haplotype carriers screening, complex genome assembling and single individual haplotyping.

References

  1. van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. (2014) Ten years of next-generation sequencing technology. Trends Genet., 30, 418–426

    Article  PubMed  Google Scholar 

  2. Metzker, M. L. (2010) Sequencing technologies— the next generation. Nat. Rev. Genet., 11, 31–46

    CAS  Article  PubMed  Google Scholar 

  3. Shendure, J. and Ji, H. (2008) Next-generation DNA sequencing. Nat. Biotechnol., 26, 1135–1145

    CAS  Article  PubMed  Google Scholar 

  4. Schlötterer, C., Tobler, R., Kofler, Rand Nolte, V. (2014) Sequencing pools of individuals—mining genome-wide polymorphism data without big funding. Nat. Rev. Genet., 15, 749–763

    Article  PubMed  Google Scholar 

  5. Futschik, A. and Schlötterer, C. (2010) The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics, 186, 207–218

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Patterson, N. and Gabriel, S. (2009) Combinatorics and next-generation sequencing. Nat. Biotechnol., 27, 826–827

    CAS  Article  PubMed  Google Scholar 

  7. Wang, W., Yin, X., Soo Pyon, Y., Hayes, Mand Li, J. (2013) Rare variant discovery and calling by sequencing pooled samples with overlaps. Bioinformatics, 29, 29–38

    Article  PubMed  PubMed Central  Google Scholar 

  8. Smith, A. M., Heisler, L. E., St Onge, R. P., Farias-Hesson, E.,Wallace, I. M., Bodeau, J., Harris, A. N., Perry, K. M., Giaever, G., Pourmand, N., et al. (2010) Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Res., 38, e142

    Article  PubMed  PubMed Central  Google Scholar 

  9. Gautier, M., Foucaud, J., Gharbi, K., Cézard, T., Galan, M., Loiseau, A., Thomson, M., Pudlo, P., Kerdelhué, C. and Estoup, A. (2013) Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol. Ecol., 22, 3766–3779

    CAS  Article  PubMed  Google Scholar 

  10. Cao, C.-C. and Sun, X. (2015) Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing. Bioinformatics, 31, 515–522

    Article  PubMed  Google Scholar 

  11. Lonardi, S., Duma, D., Alpert, M., Cordero, F., Beccuti, M., Bhat, P. R., Wu, Y., Ciardo, G., Alsaihati, B., Ma, Y., et al. (2013) Combinatorial pooling enables selective sequencing of the barley gene space. PLoS Comput. Biol., 9, e1003010

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. Lo, C., Liu, R., Lee, J., Robasky, K., Byrne, S., Lucchesi, C., Aach, J., Church, G., Bafna, V. and Zhang, K. (2013) On the design of clonebased haplotyping. Genome Biol., 14, R100

    Article  PubMed  PubMed Central  Google Scholar 

  13. Skums, P., Artyomenko, A., Glebova, O., Ramachandran, S., Mandoiu, I., Campo, D. S., Dimitrova, Z., Zelikovsky, A. and Khudyakov, Y. (2015) Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling. Bioinformatics, 31, 682–690

    Article  PubMed  Google Scholar 

  14. Ngo, H., and Du, D. (2000) A survey on combinatorial group testing algorithms with applications to DNA library screening. Discrete mathematical problems with medical applications, 55, 171–182.

    Google Scholar 

  15. Erlich, Y., Chang, K., Gordon, A., Ronen, R., Navon, O., Rooks, M. and Hannon, G. J. (2009) DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Res., 19, 1243–1253

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. Thierry-Mieg, N. (2006) A new pooling strategy for high-throughput screening: the Shifted Transversal Design. BMC Bioinformatics, 7, 28

    Article  PubMed  PubMed Central  Google Scholar 

  17. Dorfman, R. (1943) The detection of defective members of large populations. Ann. Math. Stat., 14, 436–440.

    Article  Google Scholar 

  18. Prabhu, S. and Pe’er, I. (2009) Overlapping pools for high-throughput targeted resequencing. Genome Res., 19, 1254–1261

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. Chen, H.-B. and Hwang, F. K. (2008) A survey on nonadaptive group testing algorithms through the angle of decoding. J. Comb. Optim., 15, 49–59.

    CAS  Article  Google Scholar 

  20. Candes, E., Romberg, J. and Tao, T. (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math., 59, 1207–1223.

    Article  Google Scholar 

  21. Donoho, D. (2006) Compressed sensing. IEEE Trans. Inf. Theory, 52, 1289–1306.

    Article  Google Scholar 

  22. Bodmer, W. and Bonilla, C. (2008) Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet., 40, 695–701

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., Cardon, L. R., Chakravarti, A., et al. (2009) Finding the missing heritability of complex diseases. Nature, 461, 747–753

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  24. Nelson, M. R., Wegmann, D., Ehm, M. G., Kessner, D., St Jean, P., Verzilli, C., Shen, J., Tang, Z., Bacanu, S. A., Fraser, D., et al. (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science, 337, 100–104

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. Tennessen, J. A., Bigham, A. W., O’Connor, T. D., Fu, W., Kenny, E. E., Gravel, S., McGee, S., Do, R., Liu, X., Jun, G., et al., (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science, 337, 64–69

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. Du, D. and Hwang, F. (2000) Combinatorial group testing and its applications, 2nd ed. Singapore: World Scientific

  27. Thierry-Mieg, N. and Bailly, G. (2008) Interpool: interpreting smartpooling results. Bioinformatics, 24, 696–703

    CAS  Article  PubMed  Google Scholar 

  28. Golan, D., Erlich, Y. and Rosset, S. (2012) Weighted pooling—practical and cost-effective techniques for pooled high-throughput sequencing. Bioinformatics, 28, i197–i206

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  29. Shental, N., Amir, A. and Zuk, O. (2010) Identification of rare alleles and their carriers using compressed se(que)nsing. Nucleic Acids Res., 38, e179

    Article  PubMed  PubMed Central  Google Scholar 

  30. Erlich, Y., Gordon, A., Brand, M., Hannon, G. J. and Mitra, P. P. (2010) Compressed Genotyping. IEEE Trans. Inf. Theory, 56, 706–723

    Article  PubMed  PubMed Central  Google Scholar 

  31. Erlich, Y., Shental, N., Amir, A. and Zuk, O. (2009) Compressed sensing approach for high throughput carrier screen. In Communication, Control, and Computing, 2009 Allerton 2009 47th Annual Allerton Conference

    Google Scholar 

  32. Figueiredo, M. A., Nowak, R. D., and Wright, S. J. (2007) Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. Selected Topics in Signal Processing, 1, 586–597

    Article  Google Scholar 

  33. Cao, C.-C., Li, C. and Sun, X. (2014) Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers. BMC Bioinformatics, 15, 195

    Article  PubMed  PubMed Central  Google Scholar 

  34. Hwang, F. (2000) Random k-set pool designs with distinct columns. Probab. Engrg. Inform. Sci., 14, 49–56.

    Article  Google Scholar 

  35. He, D., Zaitlen, N., Pasaniuc, B., Eskin, E. and Halperin, E. (2011) Genotyping common and rare variation using overlapping pool sequencing. BMC Bioinformatics, 12, S2

    Google Scholar 

  36. Hormozdiariy, F., Wang, Z., Yangy, W. -Y. and Eskiny, E. (2012) Efficient genotyping of individuals using overlapping pool sequencing and imputation. In Signals, Systems and Computers (ASILOMAR), 2012 Conference Record of the Forty Sixth Asilomar Conference. 1023–1027.

    Chapter  Google Scholar 

  37. Zuzarte, P. C., Denroche, R. E., Fehringer, G., Katzov-Eckert, H., Hung, R. J. and McPherson, J. D. (2014) A two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms. PLoS One, 9, e93455

    Article  PubMed  PubMed Central  Google Scholar 

  38. Bonachea, E. M., Zender, G., White, P., Corsmeier, D., Newsom, D., Fitzgerald-Butt, S., Garg, V. and McBride, K. L. (2014) Use of a targeted, combinatorial next-generation sequencing approach for the study of bicuspid aortic valve. BMC Med. Genomics, 7, 56

    Article  PubMed  PubMed Central  Google Scholar 

  39. Cao, C.-C., Li, C., Huang, Z., Ma, X. and Sun, X. (2013) Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing. Genet. Epidemiol., 37, 820–830

    Article  PubMed  Google Scholar 

  40. Trégouët, D.-A., König, I. R., Erdmann, J., Munteanu, A., Braund, P. S., Hall, A. S., Grosshennig, A., Linsel-Nitschke, P., Perret, C., DeSuremain, M., et al. (2009) Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease. Nat. Genet., 41, 283–285

    Article  PubMed  Google Scholar 

  41. Niu, T. (2004) Algorithms for inferring haplotypes. Genet. Epidemiol., 27, 334–347

    Article  PubMed  Google Scholar 

  42. Iliadis, A., Anastassiou, D. and Wang, X. (2012) Fast and accurate haplotype frequency estimation for large haplotype vectors from pooled DNA data. BMC Genet., 13, 94

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  43. Chang, Y.-C., Chang, L.-Y., Chang, T.-J., Jiang, Y.-D., Lee, K.-C., Kuo, S.-S., Lee, W.-J. and Chuang, L.-M. (2010) The associations of LPIN1 gene expression in adipose tissue with metabolic phenotypes in the Chinese population. Obesity (Silver Spring), 18, 7–12

    CAS  Article  Google Scholar 

  44. Jin, H., Stewart, T. L., Hof, R. V., Reid, D. M., Aspden, R. M. and Ralston, S. (2009) A rare haplotype in the upstream regulatory region of COL1A1 is associated with reduced bone quality and hip fracture. J. Bone Miner. Res., 24, 448–454

    CAS  Article  PubMed  Google Scholar 

  45. Lambert, J. C., Grenier-Boley, B., Harold, D., Zelenika, D., Chouraki, V., Kamatani, Y., Sleegers, K., Ikram, M. A., Hiltunen, M., Reitz, C., et al. (2013) Genome-wide haplotype association study identifies the FRMD4A gene as a risk locus for Alzheimer’s disease. Mol. Psychiatry, 18, 461–470

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  46. Martin, R. J. L., McKnight, A. J., Patterson, C. C., Sadlier, D. M., Maxwell, A. P. and Group, T. W. U. G. S., and the Warren 3/UK GoKinD Study Group. (2010) A rare haplotype of the vitamin D receptor gene is protective against diabetic nephropathy. Nephrol. Dial. Transplant., 25, 497–503

    CAS  Article  PubMed  Google Scholar 

  47. Long, Q., Jeffares, D. C., Zhang, Q., Ye, K., Nizhynska, V., Ning, Z., Tyler-Smith, C. and Nordborg, M. (2011) PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS One, 6, e15292

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  48. Kessner, D., Turner, T. L. and Novembre, J. (2013) Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol. Biol. Evol., 30, 1145–1158

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  49. Pirinen, M. (2009) Estimating population haplotype frequencies from pooled SNP data using incomplete database information. Bioinformatics, 25, 3296–3302

    CAS  Article  PubMed  Google Scholar 

  50. Gasbarra, D., Kulathinal, S., Pirinen, M. and Sillanpää, M. J. (2011) Estimating haplotype frequencies by combining data from large DNA pools with database information. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 8, 36–44

    Article  Google Scholar 

  51. Treangen, T. J. and Salzberg, S. L. (2012) Repetitive DNA and nextgeneration sequencing: computational challenges and solutions. Nat. Rev. Genet., 13, 36–46

    CAS  Google Scholar 

  52. Zhang, G., Fang, X., Guo, X., Li, L., Luo, R., Xu, F., Yang, P., Zhang, L., Wang, X., Qi, H., et al. (2012) The oyster genome reveals stress adaptation and complexity of shell formation. Nature, 490, 49–54

    CAS  Article  PubMed  Google Scholar 

  53. Lonardi, S., Duma, D., Alpert, M., Cordero, F., Beccuti, M., Bhat, P. R., Wu, Y., Ciardo, G., Alsaihati, B. and Ma, Y. (2011) Barcoding-free BAC pooling enables combinatorial selective sequencing of the barley gene space. arXiv:1112.4438.

    Google Scholar 

  54. Engler, F. W., Hatfield, J., Nelson, W. and Soderlund, C. A. (2003) Locating sequence on FPC maps and selecting a minimal tiling path. Genome Res., 13, 2152–2163

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  55. Bozdag, S., Close, T. J. and Lonardi, S. (2008) Computing the minimal tiling path from a physical map by integer linear programming. In Algorithms in Bioinformatics. 148–161. Berlin: Springer Berlin Heidelberg

    Chapter  Google Scholar 

  56. Duma, D., Wootters, M., Gilbert, A. C., Ngo, H. Q., Rudra, A., Alpert, M., Close, T. J., Ciardo, G. and Lonardi, S. (2013) Accurate decoding of pooled sequenced data using compressed sensing. In Algorithms in Bioinformatics. 70–84. Berlin: Springer Berlin Heidelberg

    Chapter  Google Scholar 

  57. Glusman, G., Cox, H. Cand Roach, J. C. (2014) Whole-genome haplotyping approaches and genomic medicine. Genome Med., 6, 73

    Article  PubMed  PubMed Central  Google Scholar 

  58. Yang, H., Chen, X. and Wong, W. H. (2011) Completely phased genome sequencing through chromosome sorting. Proc. Natl. Acad. Sci. USA, 108, 12–17

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  59. Fan, H. C., Wang, J., Potanina, A. and Quake, S. R. (2011) Wholegenome molecular haplotyping of single cells. Nat. Biotechnol., 29, 51–57

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  60. Ma, L., Xiao, Y., Huang, H., Wang, Q., Rao, W., Feng, Y., Zhang, K. and Song, Q. (2010) Direct determination of molecular haplotypes by chromosome microdissection. Nat. Methods, 7, 299–301

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  61. Selvaraj, S., R Dixon, J., Bansal, Vand Ren, B. (2013) Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol., 31, 1111–1118

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  62. Snyder, M. W., Adey, A., Kitzman, J. O. and Shendure, J. (2015) Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet., 16, 344–358

    CAS  Article  PubMed  Google Scholar 

  63. Kitzman, J. O., Mackenzie, A. P., Adey, A., Hiatt, J. B., Patwardhan, R. P., Sudmant, P. H., Ng, S. B., Alkan, C., Qiu, R., Eichler, E. E., et al. (2011) Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol., 29, 59–63

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  64. Suk, E.-K., McEwen, G. K., Duitama, J., Nowick, K., Schulz, S., Palczewski, S., Schreiber, S., Holloway, D. T., McLaughlin, S., Peckham, H., et al. (2011) A comprehensively molecular haplotyperesolved genome of a European individual. Genome Res., 21, 1672–1685

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  65. Peters, B. A., Kermani, B. G., Sparks, A. B., Alferov, O., Hong, P., Alexeev, A., Jiang, Y., Dahl, F., Tang, Y. T., Haas, J., et al. (2012) Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature, 487, 190–195

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  66. Kaper, F., Swamy, S., Klotzle, B., Munchel, S., Cottrell, J., Bibikova, M., Chuang, H.-Y., Kruglyak, S., Ronaghi, M., Eberle, M. A., et al. (2013) Whole-genome haplotyping by dilution, amplification, and sequencing. Proc. Natl. Acad. Sci. USA, 110, 5552–5557

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  67. Amini, S., Pushkarev, D., Christiansen, L., Kostem, E., Royce, T., Turk, C., Pignatelli, N., Adey, A., Kitzman, J. O., Vijayan, K., et al. (2014) Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet., 46, 1343–1349

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  68. Zielinski, D., Gordon, A., Zaks, B. L. and Erlich, Y. (2014) iPipet: sample handling using a tablet. Nat. Methods, 11, 784–785

    CAS  Article  PubMed  Google Scholar 

  69. Cradic, K.W., Murphy, S. J., Drucker, T. M., Sikkink, R. A., Eberhardt, N. L., Neuhauser, C., Vasmatzis, G. and Grebe, S. K. (2014) A simple method for gene phasing using mate pair sequencing. BMC Med. Genet., 15, 19

    Article  PubMed  PubMed Central  Google Scholar 

  70. Feder, A. F., Petrov, D. A. and Bergland, A. O. (2012) LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One, 7, e48588

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  71. Clarke, J., Wu, H. C., Jayasinghe, L., Patel, A., Reid, S. and Bayley, H. (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol., 4, 265–270

    CAS  Article  PubMed  Google Scholar 

  72. Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., et al. (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323, 133–138

    CAS  Article  PubMed  Google Scholar 

  73. Konczal, M., Koteja, P., Stuglik, M. T., Radwan, J. and Babik, W. (2014) Accuracy of allele frequency estimation using pooled RNA-Seq. Mol. Ecol. Resour., 14, 381–392

    CAS  Article  PubMed  Google Scholar 

  74. Hill, J. T., Demarest, B. L., Bisgrove, B. W., Gorsi, B., Su, Y. -C., and Yost, H. J. (2013) MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq. Genome Res., 23, 687–697.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Sun.

Additional information

This article is dedicated to the Special Collection of Recent Advances in Next-Generation Bioinformatics (Ed. Xuegong Zhang).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cao, Cc., Sun, X. Combinatorial pooled sequencing: experiment design and decoding. Quant Biol 4, 36–46 (2016). https://doi.org/10.1007/s40484-016-0064-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40484-016-0064-3

Keywords

  • combinatorial pooled sequencing
  • experiment design
  • decoding