Reproducible, Scalable Fusion Gene Detection from RNA-Seq

  • Vladan Arsenijevic
  • Brandi N. Davis-DusenberyEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1381)


Chromosomal rearrangements resulting in the creation of novel gene products, termed fusion genes, have been identified as driving events in the development of multiple types of cancer. As these gene products typically do not exist in normal cells, they represent valuable prognostic and therapeutic targets. Advances in next-generation sequencing and computational approaches have greatly improved our ability to detect and identify fusion genes. Nevertheless, these approaches require significant computational resources. Here we describe an approach which leverages cloud computing technologies to perform fusion gene detection from RNA sequencing data at any scale. We additionally highlight methods to enhance reproducibility of bioinformatics analyses which may be applied to any next-generation sequencing experiment.

Key words

RNA-Seq Cloud Fusion Cancer Reproducible Genomics Next-generation sequencing 


  1. 1.
    Nowell P, Hungerford D (1960) A minute chromosome in human chronic granulocytic leukemia [abstract]. Science 132:1497Google Scholar
  2. 2.
    Groffen J, Stephenson JR, Heisterkamp N et al (1984) Philadelphia chromosomal breakpoints are clustered within a limited region, bcr, on chromosome 22. Cell 36:93–99CrossRefPubMedGoogle Scholar
  3. 3.
    Koretzky GA (2007) The legacy of the Philadelphia chromosome. J Clin Invest 117:2030–2032PubMedCentralCrossRefPubMedGoogle Scholar
  4. 4.
    Mitelman F, Johansson B, Mertens F (2007) The impact of translocations and gene fusions on cancer causation. Nat Rev Cancer 7:233–245CrossRefPubMedGoogle Scholar
  5. 5.
    Tomlins SA, Laxman B, Varambally S et al (2008) Role of the TMPRSS2-ERG gene fusion in prostate cancer. Neoplasia 10:177–188PubMedCentralCrossRefPubMedGoogle Scholar
  6. 6.
    Tomlins SA, Rhodes DR, Perner S et al (2005) Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310:644–648CrossRefPubMedGoogle Scholar
  7. 7.
    Edgren H, Murumagi A, Kangaspeska S et al (2011) Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol 12:R6PubMedCentralCrossRefPubMedGoogle Scholar
  8. 8.
    Aplan PD (2006) Causes of oncogenic chromosomal translocation. Trends Genet 22:46–55PubMedCentralCrossRefPubMedGoogle Scholar
  9. 9.
    Mitelman F, Johansson B, Mertens F (2004) Fusion genes and rearranged genes as a linear function of chromosome aberrations in cancer. Nat Genet 36:331–334CrossRefPubMedGoogle Scholar
  10. 10.
    Mitelman database of chromosome aberrations and gene fusions in cancer. Accessed 1 Feb 2015
  11. 11.
    Wang Q, Xia J, Jia P et al (2013) Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives. Brief Bioinform 14:506–519PubMedCentralCrossRefPubMedGoogle Scholar
  12. 12.
    Martin JA, Wang Z (2011) Next-generation transcriptome assembly. Nat Rev Genet 12:671–682CrossRefPubMedGoogle Scholar
  13. 13.
    Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63PubMedCentralCrossRefPubMedGoogle Scholar
  14. 14.
    Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36PubMedCentralCrossRefPubMedGoogle Scholar
  15. 15.
    Engström PG, Steijger T, Sipos B et al (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 10:1185–1191PubMedCentralCrossRefPubMedGoogle Scholar
  16. 16.
    Pruitt KD, Brown GR, Hiatt SM et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42:D756–D763PubMedCentralCrossRefPubMedGoogle Scholar
  17. 17.
    Hubbard T, Barker D, Birney E et al (2002) The Ensembl genome database project. Nucleic Acids Res 30:38–41PubMedCentralCrossRefPubMedGoogle Scholar
  18. 18.
    Dobin A, Davis CA, Schlesinger F et al (2012) STAR: ultrafast universal RNA-seq aligner. Bioinformatics. doi: 10.1093/bioinformatics/bts635 PubMedCentralPubMedGoogle Scholar
  19. 19.
    Abate F, Acquaviva A, Paciello G et al (2012) Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model. Bioinformatics 28:2114–2121CrossRefPubMedGoogle Scholar
  20. 20.
    Chen K, Wallis JW, Kandoth C et al (2012) BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data. Bioinformatics 28:1923–1924PubMedCentralCrossRefPubMedGoogle Scholar
  21. 21.
    Iyer MK, Chinnaiyan AM, Maher CA (2011) ChimeraScan: a tool for identifying chimeric transcription in sequencing data. Bioinformatics 27:2903–2904PubMedCentralCrossRefPubMedGoogle Scholar
  22. 22.
    McPherson A, Hormozdiari F, Zayed A et al (2011) deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data. PLoS Comput Biol 7, e1001138PubMedCentralCrossRefPubMedGoogle Scholar
  23. 23.
    Yorukoglu D, Hach F, Swanson L et al (2012) Dissect: detection and characterization of novel structural alterations in transcribed sequences. Bioinformatics 28:i179–i187PubMedCentralCrossRefPubMedGoogle Scholar
  24. 24.
    Nicorici D, Satalan M, Edgren H et al (2014) FusionCatcher—a tool for finding somatic fusion genes in paired-end RNA-sequencing data. bioRxiv. doi:  10.1101/011650
  25. 25.
    Francis RW, Thompson-Wicking K, Carter KW et al (2012) FusionFinder: a software tool to identify expressed gene fusion candidates from RNA-Seq data. PLoS One 7, e39987PubMedCentralCrossRefPubMedGoogle Scholar
  26. 26.
    Li Y, Chien J, Smith DI, Ma J (2011) FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq. Bioinformatics 27:1708–1710CrossRefPubMedGoogle Scholar
  27. 27.
    Ge H, Liu K, Juan T et al (2011) FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution. Bioinformatics 27:1922–1928CrossRefPubMedGoogle Scholar
  28. 28.
    Liu C, Ma J, Chang CJ, Zhou X (2013) FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq. BMC Bioinformatics 14:193PubMedCentralCrossRefPubMedGoogle Scholar
  29. 29.
    Sboner A, Habegger L, Pflueger D et al (2010) FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data. Genome Biol 11:R104PubMedCentralCrossRefPubMedGoogle Scholar
  30. 30.
    Davidson NM, Majewski IJ, Oshlack A (2015) JAFFA: high sensitivity transcriptome-focused fusion gene detection. Genome Med 7(1):43PubMedCentralCrossRefPubMedGoogle Scholar
  31. 31.
    Bandlamudi C, Lin P, Tian J et al (2014) Discovery and functional characterization of recurrent gene fusions from 7,470 primary tumor transcriptomes across 28 human cancers. ASHG 2014 meeting abstractsGoogle Scholar
  32. 32.
    Kinsella M, Harismendy O, Nakano M et al (2011) Sensitive gene fusion detection using ambiguously mapping RNA-Seq read pairs. Bioinformatics 27:1068–1075PubMedCentralCrossRefPubMedGoogle Scholar
  33. 33.
    Asmann YW, Hossain A, Necela BM et al (2011) A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines. Nucleic Acids Res 39, e100PubMedCentralCrossRefPubMedGoogle Scholar
  34. 34.
    Jia W, Qiu K, He M et al (2013) SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data. Genome Biol 14:R12PubMedCentralCrossRefPubMedGoogle Scholar
  35. 35.
    Wu J, Zhang W, Huang S et al (2013) SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads. Bioinformatics 29:2971–2978CrossRefPubMedGoogle Scholar
  36. 36.
    Kim D, Salzberg SL (2011) TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol 12:R72PubMedCentralCrossRefPubMedGoogle Scholar
  37. 37.
    Fernandez-Cuesta L, Sun R, Menon R et al (2015) Identification of novel fusion genes in lung cancer using breakpoint assembly of transcriptome sequencing data. Genome Biol 16:7PubMedCentralCrossRefPubMedGoogle Scholar
  38. 38.
    Li J-W, Wan R, Yu C-S et al (2013) ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution. Bioinformatics 29:649–651PubMedCentralCrossRefPubMedGoogle Scholar
  39. 39.
    McPherson A, Wu C, Hajirasouliha I et al (2011) Comrad: detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data. Bioinformatics 27:1481–1488CrossRefPubMedGoogle Scholar
  40. 40.
    McPherson A, Wu C, Wyatt AW et al (2012) nFuse: discovery of complex genomic rearrangements in cancer using high-throughput sequencing. Genome Res 22:2250–2261PubMedCentralCrossRefPubMedGoogle Scholar
  41. 41.
    Piazza R, Pirola A, Spinelli R et al (2012) FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery. Nucleic Acids Res 40, e123PubMedCentralCrossRefPubMedGoogle Scholar
  42. 42.
    Beccuti M, Carrara M, Cordero F et al (2014) Chimera: a Bioconductor package for secondary analysis of fusion products. Bioinformatics 30:3556–3557PubMedCentralCrossRefPubMedGoogle Scholar
  43. 43.
    Shugay M, Ortiz de Mendíbil I, Vizmanos JL, Novo FJ (2013) Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. Bioinformatics 29:2539–2546CrossRefPubMedGoogle Scholar
  44. 44.
    Abate F, Zairis S, Ficarra E et al (2014) Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer. BMC Syst Biol 8:97PubMedCentralCrossRefPubMedGoogle Scholar
  45. 45.
    Common-workflow-language common-workflow-language/common-workflow-language. In: GitHub. Accessed 22 Feb 2015
  46. 46.
    Docker build, ship, and run any app, anywhere. Accessed 1 Aug 2014
  47. 47.
    rabix rabix/rabix. In: GitHub. Accessed 22 Feb 2015
  48. 48.
    Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359PubMedCentralCrossRefPubMedGoogle Scholar
  49. 49.
    Krzywinski M, Schein J, Birol I et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645PubMedCentralCrossRefPubMedGoogle Scholar
  50. 50.
    Arsenijevic V fusion transcript detection—ChimeraScan.

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Vladan Arsenijevic
    • 1
  • Brandi N. Davis-Dusenbery
    • 1
    Email author
  1. 1.Department of BioinformaticsSeven Bridges GenomicsCambridgeUSA

Personalised recommendations