Advertisement

Parallel Partitioning and Mining Gene Expression Data with Butterfly Network

  • Tao Jiang
  • Zhanhuai Li
  • Qun Chen
  • Zhong Wang
  • Wei Pan
  • Zhuo Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8055)

Abstract

In the area of massive gene expression analysis, Order-Preserving Sub-Matrices have been employed to find biological associations between genes and experimental conditions from a large number of gene expression datasets. While many techniques have been developed, few of them are parallel, and they lack the capability to incorporate the large-scale datasets or are very time-consuming. To help fill this critical void, we propose a Butterfly Network based parallel partitioning and mining method (BNPP), which formalizes the communication and data transfer among nodes. In the paper, we firstly give the details of OPSM and the implementations of OPSM on MapReduce and Hama BSP and their shortcomings. Then, we extend the Hama BSP framework using Butterfly Network to reduce the communication time, workload of bandwidth and duplicate results percent, and call the new framework as BNHB. Finally, we implement a state-of-the-art OPSM mining method (OPSM) and our BNPP method on top of the framework of naïve Hama BSP and our BNHB, and the experimental results show that the computational speed of our methods are nearly one order faster than that of the implementation on a single machine and the proposed framework has better effectiveness and scalability.

Keywords

Gene Expression Data Data Partitioning Butterfly Network BSP model MapReduce Parallel Processing OPSM Hadoop Hama 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gao, B.J., et al.: Discovering Significant OPSM Subspace Clusters in Massive Gene Expression Data. In: Proceedings of KDD, pp. 922–928. ACM Press, New York (2006)Google Scholar
  2. 2.
    Frey, B.J., Dueck, D.: Clustering by Passing Messages between Data Points. Science 315(5814), 972–976 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Chui, C.K., Kao, B., et al.: Mining Order-Preserving Submatrices from Data with Repeated Measurements. In: Proceedings of ICDM, pp. 133–142. IEEE Press, Cancun (2008)Google Scholar
  4. 4.
    Zhang, M., Wang, W., Liu, J.: Mining Approximate Order Preserving Clusters in the Presence of Noise. In: Proceedings of ICDE, pp. 160–168. IEEE Press, Cancun (2008)Google Scholar
  5. 5.
    Fang, Q., Ng, W., Feng, J., Li, Y.: Mining Bucket Order-Preserving SubMatrices in Gene Expression Data. IEEE Trans. on Know. and Data Engin. 24(12), 2218–2231 (2012)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Dean, J., et al.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of OSDI, pp. 137–150. USENIX Press, California (2004)Google Scholar
  8. 8.
    Ding, L., Xin, J., Wang, G., Huang, S.: ComMapReduce: An Improvement of MapReduce with Lightweight Communication Mechanisms. In: Lee, S.-G., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part II. LNCS, vol. 7239, pp. 150–168. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Feldmann, R., Unger, W.: The Cube-Connected Cycles Network is a Subgraph of the Butterfly Network. Parallel Processing Letters 2(1), 13–19 (1992)CrossRefGoogle Scholar
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Kang, U., et al.: PEGASUS: A Peta-Scale Graph Mining System-Implementation and Observations. In: Proceedings of ICDM, pp. 229–238. IEEE Press, Florida (2009)Google Scholar
  14. 14.
    Zhou, J., Larson, P.A., et al.: Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer. In: Proceedings of ICDE, pp. 1060–1071. IEEE Press, California (2010)Google Scholar
  15. 15.
    Malewicz, G., et al.: Pregel: A System for Large-scale Graph Processing. In: Proceedings of SIGMOD, pp. 135–146. ACM Press, Indiana (2010)Google Scholar
  16. 16.
    Eltabakh, M.Y., Tian, Y., et al.: CoHadoop: Flexible Data Placement and its Exploitation in Hadoop. In: Proceedings of VLDB, pp. 575–585. ACM Press, Washington (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tao Jiang
    • 1
  • Zhanhuai Li
    • 1
  • Qun Chen
    • 1
  • Zhong Wang
    • 1
  • Wei Pan
    • 1
  • Zhuo Wang
    • 1
  1. 1.School of Computer Science and TechnologyNorthwestern Polytechnical UniversityXi’anChina

Personalised recommendations