Advertisement

Clustering and Assembling Large Transcriptome Datasets by EasyCluster2

  • Vitoantonio Bevilacqua
  • Nicola Pietroleonardo
  • Ely Ignazio Giannino
  • Fabio Stroppa
  • Graziano Pesole
  • Ernesto Picardi
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 375)

Abstract

EasyCluster is a well-established python software appropriately developed to produce reliable clusters by expressed sequence tags (EST) in order to infer and improve gene structures as well as discover potential alternative splicing events. In the present work we present EasyCluster2, a reimplementation of EasyCluster in Java programming language, able to manage genome scale transcriptome data produced by Roche 454 sequencers. EasyCluster2 has been developed to speed up the creation of gene-oriented clusters and facilitate downstream analyses as the assembly of full-length transcripts. In addition, EasyCluster2 can employ known annotations to refine the overall clustering procedure, embeds the AStalavista software to predict the impact of alternative splicing per cluster and provides output files in specific formats to be uploaded in the UCSC genome browser for an easy browsing of results. Thanks to the user-friendly interface, EasyCluster2 simplifies the interpretation of findings to researchers with no specific skills in bioinformatics. Easycluster2 executable is freely available at https://code.google.com/p/easycluster2/.

Keywords

EasyCluster2 expressed sequence tags 454 reads alternative splicing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Nagaraj, S.H., Gasser, R.B., Ranganathan, S.: A hitchhiker’s guide to expressed sequence tag (EST) analysis. Briefings in Bioinformatics 8, 6–21 (2007)CrossRefGoogle Scholar
  2. 2.
    Picardi, E., Mignone, F., Pesole, G.: EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data. BMC Bioinformatics 10, S10 (2009)CrossRefGoogle Scholar
  3. 3.
    Picardi, E., Bevilacqua, V., Stroppa, F., Pesole, G.: An improved procedure for clustering and assembly of large transcriptome data. EMBnet. journal (2012)Google Scholar
  4. 4.
    Bevilacqua, V., Stroppa, F., Saladino, S., Picardi, E.: A novel approach to clustering and assembly of large-scale roche 454 transcriptome data for gene validation and alternative splicing analysis. In: Huang, D.-S., Gan, Y., Premaratne, P., Han, K. (eds.) ICIC 2011. LNCS, vol. 6840, pp. 641–648. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Droege, M., Hill, B.: The Genome Sequencer FLX System–longer reads, more applications, straightforward bioinformatics and more complete data sets. J. Biotechnol. 31, 136(1-2), 3–10 (2008)CrossRefGoogle Scholar
  6. 6.
    Wu, T.D., Watanabe, C.K.: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005)CrossRefGoogle Scholar
  7. 7.
    Foissac, S., Sammeth, M.: ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res 35, W297–W299 (2007)CrossRefGoogle Scholar
  8. 8.
    Lysholm, F., Andersson, B., Persson, B.: An efficient simulator of 454 data using configurable statistical models. BMC Res Notes 4(1), 449 (2011)CrossRefGoogle Scholar
  9. 9.
    Moustafa, A.: JAligner: Open source Java implementation of Smith-Waterman., http://jaligner.sourceforge.net (the date accessed)

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Vitoantonio Bevilacqua
    • 1
  • Nicola Pietroleonardo
    • 1
  • Ely Ignazio Giannino
    • 1
  • Fabio Stroppa
    • 1
  • Graziano Pesole
    • 2
    • 3
  • Ernesto Picardi
    • 2
    • 3
  1. 1.DEIPolitecnico di BariBariItaly
  2. 2.DBBBUniversity of BariBariItaly
  3. 3.Istituto di Biomembrane e Bioenergetica del Consiglio Nazionale delle RicercheBariItaly

Personalised recommendations