CCMS: A Greedy Approach to Motif Extraction

  • Giacomo Drago
  • Marco Ferretti
  • Mirto Musci
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8158)

Abstract

Efficient and precise motif extraction is a central problem in the study of proteins functions and structures. This paper presents an efficient new geometric approach to the problem, based on the General Hough Transform. The approach is both an extension and a variation of the Secondary Structure Co-Occurrences algorithm by Cantoni et al. [1-2]. The goal is to provide an effective and efficient implementation, suitable for HPC. The most significant contribution of this paper is the introduction of a heuristic greedy variant of the algorithm, which is able to reduce computational time by two orders of magnitude. A secondary effect of the new version is the capability to cope with uncertainty in the geometric description of the secondary structures.

Keywords

Motif Extraction Secondary Structures SSC Hough Transform Algorithm Optimization Greedy Algorithm 

References

  1. 1.
    Cantoni, V., Ferone, A., Ozbudak, O., Petrosino, A.: Structural analysis of protein secondary structure by GHT. In: 21st International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 11-15, pp. 1767–1770. IEEE Computer Society Press (2012)Google Scholar
  2. 2.
    Cantoni, V., Ferone, A., Ozbudak, O., Petrosino, A.: Motif Retrieval by Exhaustive Matching and Couple Co-occurrences. In: 9th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2012, Texas, July 12-14 (2012)Google Scholar
  3. 3.
    Ferretti, M., Musci, M.: Entire Motifs Search of Secondary Structures in Proteins: A Parallelization Study. In: International Workshop on Parallelism in Bioinformatics EUROMPI 2013, Madrid, Spain, September 17 (in printing, 2013)Google Scholar
  4. 4.
    Protein Data Bank, http://www.rcsb.org/pdb
  5. 5.
    Ballard, D.: Generalizing the Hough Transform to Detect Arbitrary Shapes. Pattern Recognition 13(2), 111–122 (1981)CrossRefMATHGoogle Scholar
  6. 6.
    Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983)CrossRefGoogle Scholar
  7. 7.
    Knuth, D.E.: Generating All Combinations and Partitions. In: The Art of Computer Programming, vol. 4, Fascicle 3, pp. 5–6. Addison-Wesley (2005)Google Scholar
  8. 8.
    Konc, J., Janežič, D.: An improved branch and bound algorithm for the maximum clique problem. MATCH Communications in Mathematical and in Computer Chemistry 58(3), 569–590 (2007)MathSciNetMATHGoogle Scholar
  9. 9.
    Cantoni, V., Ferone, A., Ozbudak, O., Petrosino, A.: Protein motifs retrieval by SS terns occurrences. Pattern Recognition Letters 34, 559–563 (2012)CrossRefGoogle Scholar
  10. 10.
    Structural Classification of Proteins and ASTRAL (January 2013), scop.berkeley.edu
  11. 11.
    CINECA supercomputing center (9th in top500.org as of May 2013), http://www.cineca.it

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Giacomo Drago
    • 1
  • Marco Ferretti
    • 1
  • Mirto Musci
    • 1
  1. 1.University of PaviaPaviaItaly

Personalised recommendations