Skip to main content

A Novel Greedy Algorithm for the Minimum Common String Partition Problem

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4463))

Abstract

The Minimum Common String Partition problem (MCSP) is to partition two given input strings into the same collection of substrings, where the number of substrings in the partition is minimized. This problem is a key problem in genome rearrangement, and is closely related to the problem of sorting by reversals with duplicates. MCSP is NP-hard, even for the most trivial case, 2-MCSP, where each letter occurs at most twice in each input string. There are various approximation algorithms which can achieve very good approximation ratios but with complicated implementations, for example, 1.5-approximation algorithm for 2-MCSP, 1.1037-approximation algorithm for 2-MCSP and a 4-approximation algorithm for 3-MCSP. There is also a simple greedy algorithm for MCSP which extracts the longest common substring from the given strings at each step. In this paper, we propose a novel greedy algorithm for MCSP, where we extract the longest common substring containing a symbol occurring only once at each step whenever there is a such symbol. We show our algorithm is more “worst case” greedy at each step than the greedy algorithm and the expected performance of our algorithm is better than that of the greedy algorithm. Our experiments show that our method achieves a better partition on average than the greedy algorithm does. Another advantage of our algorithm is that it is much faster than the greedy algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, X., et al.: Computing the assignment of orthologous genes via genome rearrangement. In: Proc. of Asia Pacific Bioinformatics Conference, Jan. 18-20, 2005, pp. 363–378 (2005)

    Google Scholar 

  2. Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partition problem: hardness and approximation. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 484–495. Springer, Heidelberg (2004)

    Google Scholar 

  3. Chrobak, M., Kolman, P., Sgall, J.: A greedy algorithm for the minimum common string partition problem. In: Jansen, K., et al. (eds.) RANDOM 2004 and APPROX 2004. LNCS, vol. 3122, pp. 84–95. Springer, Heidelberg (2004)

    Google Scholar 

  4. Cormode, G., Muthukrishnan, J.A.: The string edit distance matching with moves. In: Proc. 13th Annual Symposium on Discrete Algorithms (SODA), pp. 667–676 (2002)

    Google Scholar 

  5. Kruskal, J.B., Snakoff, D.: An anthology of algorithms and concepts for sequence comparision. In: Sankoff, D., Kruskal, J.B. (eds.) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading (1983)

    Google Scholar 

  6. Lopresti, D., Tomkins, A.: Block edit models for approximate string matching. Theoretical Computer Science 181, 159–179 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  7. Shapira, D., Storer, J.A.: Edit Distance with Move Operations. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 85–98. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  8. Tichy, W.F.: The string-to-string correction problem with block moves. ACM Trans. Computer Systems 2, 309–321 (1984)

    Article  Google Scholar 

  9. Watterson, G.A., et al.: The chromosome inversion problem. J. of Theoretical Biology 99, 1–7 (1982)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ion Măndoiu Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

He, D. (2007). A Novel Greedy Algorithm for the Minimum Common String Partition Problem. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72031-7_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72030-0

  • Online ISBN: 978-3-540-72031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics