Skip to main content

Recent Advances in Improving the Memory Efficiency of the TRIBE MCL Algorithm

  • Conference paper
  • First Online:
  • 1709 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9490))

Abstract

A fast and highly memory-efficient implementation of the TRIBE-MCL clustering algorithm is proposed to perform the classification of huge protein sequence data sets using an ordinary PC. Improvements compared to previous versions are achieved through adequately chosen data structures that facilitate the efficient handling of symmetric sparse matrices. The proposed algorithm was tested on huge synthetic protein sequence data sets. The validation process revealed that the proposed method extended the data size processable on a regular PC from previously reported 250 thousand to one million items. The algorithm needs 10–20 % less time for processing the same data sizes than previous efficient Markov clustering algorithms, without losing anything from the partition quality. The proposed solution is open for further improvement via parallel data processing.

Research supported by the Hungarian National Research Funds (OTKA), Project no. PD103921. S. M. Szilágyi is a Bolyai Fellow of the Hungarian Academy of Sciences.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Altschul, S.F., Madden, T.L., Schaffen, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search program. Nucl. Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  2. Andreeva, A., Howorth, D., Chadonia, J.M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data growth and its impact on the SCOP database: new developments. Nucl. Acids Res. 36, D419–D425 (2008)

    Article  Google Scholar 

  3. Dai, H., Zhou, Q., He, O., Bian, J.: Markov clustering based placement algorithm for island-style FPGAs. In: IEEE International Conference on Green Circuits and Systems, pp. 123–128. IEEE Press, New York (2010)

    Google Scholar 

  4. Dhara, M., Shukla, K.K.: Characteristics of restricted neighbourhood search algorithm and Markov clustering on modified power-law distribution. In: 1st International Conference on Recent Advances in Information Technology, pp. 520–525. IEEE Press, New York (2012)

    Google Scholar 

  5. Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)

    Article  Google Scholar 

  6. Enright, A.J., van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucl. Acids Res. 30, 1575–1584 (2002)

    Article  Google Scholar 

  7. Gáspári, Z., Vlahovicek, K., Pongor, S.: Efficient recognition of folds in protein 3D structures by the improved PRIDE algorithm. Bioinformatics 21, 3322–3323 (2005)

    Article  Google Scholar 

  8. Hospedales, T., Gong, S.G., Xiang, T.: A Markov clustering topic model for mining behaviour in video. In: 12th IEEE International Conference on Computer Vision, pp. 1156–1172. IEEE Press, New York (2009)

    Google Scholar 

  9. Keensub, L., Ellis, D.P.W., Loui, A.C.: Detecting local semantic concepts in environmental sounds using Markov model based clustering. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 2278–2281. IEEE Press, New York (2010)

    Google Scholar 

  10. Lo Conte, L., Ailey, B., Hubbard, T.J., Brenner, S.E., Murzin, A.G., Chothia, C.: SCOP: a structural classification of protein database. Nucl. Acids Res. 28, 257–259 (2000)

    Article  Google Scholar 

  11. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  12. Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, I., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  14. Structural Classification of Proteins database. http://scop.mrc-lmb.cam.ac.uk/scop

  15. Szilágyi, L., Kovács, L., Szilágyi, S.M.: Synthetic test data generation for hierarchical graph clustering methods. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014, Part II. LNCS, vol. 8835, pp. 303–310. Springer, Heidelberg (2014)

    Google Scholar 

  16. Szilágyi, L., Szilágyi, S.M., Hirsbrunner, B.: A fast and memory-efficient hierarchical graph clustering algorithm. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014, Part I. LNCS, vol. 8834, pp. 247–254. Springer, Heidelberg (2014)

    Google Scholar 

  17. Szilágyi, S.M., Szilágyi, L.: A fast hierarchical clustering algorithm for large-scale protein sequence data sets. Comput. Biol. Med. 48, 94–101 (2014)

    Article  Google Scholar 

  18. Zhu, X., Li, H.: Unsupervised human action categorization using latent Dirichlet Markov clustering. In: 4th International Conference on Intelligent Networking and Collaborative Systems, pp. 347–352. IEEE Press, New York (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to László Szilágyi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Szilágyi, L., Nagy, L.L., Szilágyi, S.M. (2015). Recent Advances in Improving the Memory Efficiency of the TRIBE MCL Algorithm. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9490. Springer, Cham. https://doi.org/10.1007/978-3-319-26535-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26535-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26534-6

  • Online ISBN: 978-3-319-26535-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics