Skip to main content

High Performance Subgraph Mining in Molecular Compounds

  • Conference paper
High Performance Computing and Communications (HPCC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 3726))

Abstract

Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter 5, 59–68 (2003)

    Article  Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., pp. 207–216 (1993)

    Google Scholar 

  3. Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proceedings of 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD 1997), pp. 283–296 (1997)

    Google Scholar 

  4. Deshpande, M., Kuramochi, M., Karypis, G.: Frequent sub-structure-based approaches for classifying chemical compounds. In: Proceedings of IEEE International Conference on Data Mining (ICDM 2003), Melbourne, Florida, USA (2003)

    Google Scholar 

  5. Deshpande, M., Kuramochi, M., Karypis, G.: Automated approaches for classifying structures. In: Proceedings of Workshop on Data Mining in Bioinformatics (BioKDD), pp. 11–18 (2002)

    Google Scholar 

  6. Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: Proc. of the IEEE Int. Conference on Data Mining, Maebashi City, Japan (2002)

    Google Scholar 

  7. Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: IEEE International Conference on Data Mining (ICDM 2002), Maebashi, Japan, pp. 51–58 (2002)

    Google Scholar 

  8. Kramer, S., de Raedt, L., Helma, C.: Molecular feature mining in hiv data. In: Proceedings of 7th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 136–143 (2001)

    Google Scholar 

  9. Zaki, M.J.: Parallel and distributed association mining: A survey. IEEE Concurrency 7, 14–25 (1999)

    Article  Google Scholar 

  10. Di Fatta, G., Berthold, M.R.: Distributed mining of molecular fragments. In: IEEE DM-Grid Workshop of the Int. Conf. on Data Mining, Brighton, UK (2004)

    Google Scholar 

  11. Wang, C., Parthasarathy, S.: Parallel algorithms for mining frequent structural motifs in scientific data. In: Proceedings of the 18th Annual International Conference on Supercomputing (ICS 2004), Saint Malo, France, June 26 - July 01 (2004)

    Google Scholar 

  12. Finkel, R., Manber, U.: DIB - a distributed implementation of backtracking. ACM Transactions on Programming Languages and Systems 9(2), 235–256 (1987)

    Article  Google Scholar 

  13. Daylight Chemical Information Systems, Inc.: SMILES - Simplified Molecular Input Line Entry Specification, http://www.daylight.com/smiles

  14. Karp, R., Zhang, Y.: A randomized parallel branch-and-bound procedure. In: Proceedings of the 20 Annual ACM Symposium on Theory of Computing (STOC 1988), pp. 290–300 (1988)

    Google Scholar 

  15. Chakrabarti, S., Ranade, A., Yelick, K.: Randomized load-balancing for tree-structured computation. In: Proceedings of the Scalable High Performance Computing Conference (SHPCC 1994), Knoxville, TN, pp. 666–673 (1994)

    Google Scholar 

  16. Chung, Y., Park, J., Yoon, S.: An asynchronous algorithm for balancing unpredictable workload on distributed-memory machines. ETRI Journal 20, 346–360 (1998)

    Article  Google Scholar 

  17. Weislow, O., Kiser, R., Fine, D., Bader, J., Shoemaker, R., Boyd, M.: New soluble formazan assay for hiv-1 cytopathic effects: Application to high flux screening of synthetic and natural products for aids antiviral activity. Journal of the National Cancer Institute 81, 577–586 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Di Fatta, G., Berthold, M.R. (2005). High Performance Subgraph Mining in Molecular Compounds. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds) High Performance Computing and Communications. HPCC 2005. Lecture Notes in Computer Science, vol 3726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557654_97

Download citation

  • DOI: https://doi.org/10.1007/11557654_97

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29031-5

  • Online ISBN: 978-3-540-32079-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics