Recent Advances in Improving the Memory Efficiency of the TRIBE MCL Algorithm

Szilágyi, László; Nagy, Lajos Loránd; Szilágyi, Sándor Miklós

doi:10.1007/978-3-319-26535-3_4

Recent Advances in Improving the Memory Efficiency of the TRIBE MCL Algorithm

László Szilágyi^17,18,19,
Lajos Loránd Nagy¹⁸ &
Sándor Miklós Szilágyi^17,20

Conference paper
First Online: 10 November 2015

1709 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9490))

Abstract

A fast and highly memory-efficient implementation of the TRIBE-MCL clustering algorithm is proposed to perform the classification of huge protein sequence data sets using an ordinary PC. Improvements compared to previous versions are achieved through adequately chosen data structures that facilitate the efficient handling of symmetric sparse matrices. The proposed algorithm was tested on huge synthetic protein sequence data sets. The validation process revealed that the proposed method extended the data size processable on a regular PC from previously reported 250 thousand to one million items. The algorithm needs 10–20 % less time for processing the same data sizes than previous efficient Markov clustering algorithms, without losing anything from the partition quality. The proposed solution is open for further improvement via parallel data processing.

Research supported by the Hungarian National Research Funds (OTKA), Project no. PD103921. S. M. Szilágyi is a Bolyai Fellow of the Hungarian Academy of Sciences.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Altschul, S.F., Madden, T.L., Schaffen, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search program. Nucl. Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Andreeva, A., Howorth, D., Chadonia, J.M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data growth and its impact on the SCOP database: new developments. Nucl. Acids Res. 36, D419–D425 (2008)
Article Google Scholar
Dai, H., Zhou, Q., He, O., Bian, J.: Markov clustering based placement algorithm for island-style FPGAs. In: IEEE International Conference on Green Circuits and Systems, pp. 123–128. IEEE Press, New York (2010)
Google Scholar
Dhara, M., Shukla, K.K.: Characteristics of restricted neighbourhood search algorithm and Markov clustering on modified power-law distribution. In: 1st International Conference on Recent Advances in Information Technology, pp. 520–525. IEEE Press, New York (2012)
Google Scholar
Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)
Article Google Scholar
Enright, A.J., van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucl. Acids Res. 30, 1575–1584 (2002)
Article Google Scholar
Gáspári, Z., Vlahovicek, K., Pongor, S.: Efficient recognition of folds in protein 3D structures by the improved PRIDE algorithm. Bioinformatics 21, 3322–3323 (2005)
Article Google Scholar
Hospedales, T., Gong, S.G., Xiang, T.: A Markov clustering topic model for mining behaviour in video. In: 12th IEEE International Conference on Computer Vision, pp. 1156–1172. IEEE Press, New York (2009)
Google Scholar
Keensub, L., Ellis, D.P.W., Loui, A.C.: Detecting local semantic concepts in environmental sounds using Markov model based clustering. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 2278–2281. IEEE Press, New York (2010)
Google Scholar
Lo Conte, L., Ailey, B., Hubbard, T.J., Brenner, S.E., Murzin, A.G., Chothia, C.: SCOP: a structural classification of protein database. Nucl. Acids Res. 28, 257–259 (2000)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, I., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005)
Chapter Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Structural Classification of Proteins database. http://scop.mrc-lmb.cam.ac.uk/scop
Szilágyi, L., Kovács, L., Szilágyi, S.M.: Synthetic test data generation for hierarchical graph clustering methods. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014, Part II. LNCS, vol. 8835, pp. 303–310. Springer, Heidelberg (2014)
Google Scholar
Szilágyi, L., Szilágyi, S.M., Hirsbrunner, B.: A fast and memory-efficient hierarchical graph clustering algorithm. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014, Part I. LNCS, vol. 8834, pp. 247–254. Springer, Heidelberg (2014)
Google Scholar
Szilágyi, S.M., Szilágyi, L.: A fast hierarchical clustering algorithm for large-scale protein sequence data sets. Comput. Biol. Med. 48, 94–101 (2014)
Article Google Scholar
Zhu, X., Li, H.: Unsupervised human action categorization using latent Dirichlet Markov clustering. In: 4th International Conference on Intelligent Networking and Collaborative Systems, pp. 347–352. IEEE Press, New York (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Control Engineering and Information Technology, Budapest University of Technology and Economics, Budapest, Hungary
László Szilágyi & Sándor Miklós Szilágyi
Faculty of Technical and Human Sciences, Sapientia University of Transylvania, Tîrgu-Mureş, Romania
László Szilágyi & Lajos Loránd Nagy
Canterbury University of Christchurch, Christchurch, New Zealand
László Szilágyi
Department of Informatics, Petru Maior University of Tîrgu-Mureş, Tîrgu-Mureş, Romania
Sándor Miklós Szilágyi

Authors

László Szilágyi
View author publications
You can also search for this author in PubMed Google Scholar
Lajos Loránd Nagy
View author publications
You can also search for this author in PubMed Google Scholar
Sándor Miklós Szilágyi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to László Szilágyi .

Editor information

Editors and Affiliations

University of Istanbul, Istanbul, Turkey
Sabri Arik
University at Qatar, Doha, Qatar
Tingwen Huang
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Weng Kin Lai
University of Science Technology, Wuhan, China
Qingshan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szilágyi, L., Nagy, L.L., Szilágyi, S.M. (2015). Recent Advances in Improving the Memory Efficiency of the TRIBE MCL Algorithm. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9490. Springer, Cham. https://doi.org/10.1007/978-3-319-26535-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-26535-3_4
Published: 10 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26534-6
Online ISBN: 978-3-319-26535-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics