Advertisement

International Journal of Parallel Programming

, Volume 36, Issue 2, pp 226–249 | Cite as

The ParTriCluster Algorithm for Gene Expression Analysis

  • Renata Braga Araújo
  • Guilherme Henrique Trielli Ferreira
  • Gustavo Henrique Orair
  • Wagner MeiraJr.
  • Renato Antônio Celso Ferreira
  • Dorgival Olavo Guedes Neto
  • Mohammed Javeed Zaki
Article

Abstract

Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.

Keywords

Parallel programming Clustering Bioinformatics Depth-first search 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cohen J. (2004). Bioinformatics—an introduction for computer scientists. ACM Comput. Surv. 36(2): 122–158 CrossRefGoogle Scholar
  2. 2.
    Lizhuang, Z., Zaki, M.J.: Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: SIGMOD Conference, pp. 694–705 (2005)Google Scholar
  3. 3.
    Ben-Dor, A., Yakhini, Z.: Clustering gene expression patterns. Third Annual International Conference on Computational Molecular Biology (April 1999)Google Scholar
  4. 4.
    Jiang, D., Pei, J., Zhang, A.: Articles on microarray data mining: towards interactive exploration of gene expression patterns. ACM SIGKDD Explorations Newsletter 5(2), 79–90 (December 2003)Google Scholar
  5. 5.
    Tanay A., Sharan R. and Shamir R. (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics 18: S136–S144 Google Scholar
  6. 6.
    Liu, J., Wang, W.: Op-cluster: clustering by tendency in high dimensional space. 3rd IEEE International Conference on Data Mining, pp. 187–194. Melbourne (2003)Google Scholar
  7. 7.
    Murali T.M. and Kasif S. (2003). Extracting conserved gene expression motifs from gene expression data. Pac. Symp. Biocomput. 8: 77–88 Google Scholar
  8. 8.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) (January 2004)Google Scholar
  9. 9.
    Araujo, R., Trielli, G., Orair, G., Meira, W. Jr. Ferreira, R., Guedes, D.: Partricluster: a scalable parallel algorithm for gene expression analysis. In: SBAC-PAD ’06: Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’06), pp. 3–10. IEEE Computer Society, Washington, DC, USA (2006)Google Scholar
  10. 10.
    Ferreira, R., Meira, W. Jr., Guedes, D., Drummond, L., Coutinho, B., Teodoro, G., Tavares, T., Araujo, R., Ferreira, G.: Anthill: a scalable run-time environment for data mining applications. In: Proceedings of the 17th International Symposium on Computer Architecture and High Performance Computing. Rio de Janeiro, RJ (2005)Google Scholar
  11. 11.
    Veloso, A., Meira, W. Jr., Ferreira, R., Guedes, D.: Asynchronous and anticipatory filter-stream based parallel algorithm for frequent itemset mining. In: ECML/PKDD 2004 Conference, pp. 647–652 ACM Press (2004)Google Scholar
  12. 12.
    Acharya, A., Uysal, M., Saltz, J.H.: Active disks: programming model, algorithms and evaluation. In: Architectural Support for Programming Languages and Operating Systems, pp. 81–91 (1998)Google Scholar
  13. 13.
    Beynon, M., Ferreira, R., Kurc, T.M., Sussman, A., Saltz, J.H.: Datacutter: middleware for filtering very large scientific datasets on archival storage systems. In: IEEE Symposium on Mass Storage Systems, pp. 119–134 (2000)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Renata Braga Araújo
    • 1
  • Guilherme Henrique Trielli Ferreira
    • 1
  • Gustavo Henrique Orair
    • 1
  • Wagner MeiraJr.
    • 1
  • Renato Antônio Celso Ferreira
    • 1
  • Dorgival Olavo Guedes Neto
    • 1
  • Mohammed Javeed Zaki
    • 2
  1. 1.Department of Computer ScienceUniversidade Federal de Minas GeraisBelo HorizonteBrazil
  2. 2.Department of Computer ScienceRensselaer Polytechnique InstituteTroyUSA

Personalised recommendations