TNSim: A Tumor Sequencing Data Simulator for Incorporating Clonality Information

  • Yu Geng
  • Zhongmeng Zhao
  • Mingzhe Xu
  • Xuanping Zhang
  • Xiao Xiao
  • Jiayin WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10955)


In recent years, the next generation sequencing enables us to obtain high resolution landscapes of the genetic changes at single-nucleotide level. More and more novel methods are proposed for efficient and effective analyses on cancer sequencing data. To facilitate such development, data simulator is a crucial tool, which not only tests and evaluates proposed approaches, but provides the feedbacks for further improvements as well. Several simulators are released to generate the next generation sequencing data. However, based on our best knowledge, none of them considers clonality information. It is suggested that clonal heterogeneity does widely exist in tumor samples. The patterns of somatic mutational events usually expose a wide spectrum of variant allelic frequencies, while some of them are only detectable in one or multiple clonal lineages. In this article, we introduce a Tumor-Normal sequencing Simulator, TNSim, to generate the next generation sequencing data by involving clonality information. The simulator is able to mimic a tumor sample and the paired normal sample, where the germline variants and somatic mutations can be settled respectively. Tumor purity is adjustable. Clonal architecture is preassigned as one or more clonal lineages, where each lineage consists of a set of somatic mutations whose variant allelic frequencies are similar. A group of experiments are conducted to evaluate its performance. The statistical features of the artificial sequencing reads are comparable to the real tumor sequencing data whose sample consists of multiple sub-clones. The source codes are available at and for academic use only.


Cancer genomics Cancer sequencing data Data simulator Clonal structure 



This work is supported by the National Science Foundation of China (Grant No: 31701150) and the Fundamental Research Funds for the Central Universities (CXTD2017003).


  1. 1.
    Kandoth, C., McLellan, M., Vandin, F., et al.: Mutational landscape and significance across 12 major cancer types. Nature 502(7471), 333–339 (2013)CrossRefGoogle Scholar
  2. 2.
    Lu, C., Xie, M., Wendl, M., et al.: Patterns and functional implications of rare germline variants across 12 cancer types. Nature Commun. 6, 10086 (2015)CrossRefGoogle Scholar
  3. 3.
    Huang, K., Mashl, R., Wu, Y., et al.: Pathogenic germline variants in 10,389 adult cancers. Cell 173(2), 355–370 (2018)CrossRefGoogle Scholar
  4. 4.
    Ding, L., Raphael, B., Chen, F., et al.: Advances for studying clonal evolution in cancer. Cancer Lett. 340(2), 212–219 (2013)CrossRefGoogle Scholar
  5. 5.
    The Computational Pan-Genomics Consortium: Computational pan-genomics: status, promises and challenges. Briefings Bioinform. 19(1), 118–135 (2018)Google Scholar
  6. 6.
    Vijg, J.: Somatic mutations, genome mosaicism, cancer and aging. Curr. Opin. Genet. Dev. 26(26C), 141–149 (2014)CrossRefGoogle Scholar
  7. 7.
    Xie, M., Lu, C., Wang, J., et al.: Age-related cancer mutations associated with clonal hematopoietic expansion. Nature Med. 20(12), 1472–1478 (2014)CrossRefGoogle Scholar
  8. 8.
    Geng, Yu., Zhao, Z., Liu, R., Zheng, T., Xu, J., Huang, Y., Zhang, X., Xiao, X., Wang, J.: Accurately estimating tumor purity of samples with high degree of heterogeneity from cancer sequencing data. In: Huang, D.-S., Jo, K.-H., Figueroa-García, J.C. (eds.) ICIC 2017. LNCS, vol. 10362, pp. 273–285. Springer, Cham (2017). Scholar
  9. 9.
    Hu, X., Yuan, J., Shi, Y., et al.: pIRS: Profile-based Illumina pair-end reads simulator. Bioinformatics 28(11), 1533–1535 (2012)CrossRefGoogle Scholar
  10. 10.
    Huang, W., Li, L., Myers, J., et al.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2012)CrossRefGoogle Scholar
  11. 11.
    McElroy, K., Luciani, F., Thomas, T.: GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genom. 13(74), 1–9 (2012)Google Scholar
  12. 12.
    Geng, Y., Zhao, Z., Xu, J., et al.: Identifying heterogeneity patterns of allelic imbalance on germline variants to infer clonal architecture. In: Huang, D., Jo, K., Figueroa-García, J. (eds.) ICIC 2017. LNCS, vol. 10362, pp. 286–297. Springer, Cham (2017). Scholar
  13. 13.
    Miller, C., White, B., Dees, N., et al.: SciClone: Inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput. Biol. 10(8), e1003665 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Yu Geng
    • 1
    • 3
    • 4
  • Zhongmeng Zhao
    • 1
    • 3
  • Mingzhe Xu
    • 1
    • 3
  • Xuanping Zhang
    • 1
    • 3
  • Xiao Xiao
    • 2
    • 3
  • Jiayin Wang
    • 1
    • 3
    Email author
  1. 1.Department of Computer Science and Technology, School of Electronic and Information EngineeringXi’an Jiaotong UniversityXi’anChina
  2. 2.School of Public Policy and AdministrationXi’an Jiaotong UniversityXi’anChina
  3. 3.Shaanxi Engineering Research Center of Medical and Health Big DataXi’an Jiaotong UniversityXi’anChina
  4. 4.Jinzhou Medical UniversityJinzhouChina

Personalised recommendations