Abstract
We compare different statistical characterizations of a set of strings, for three different histogram-based distances. Given a distance, a set of strings may be characterized by its generalized median, i.e., the string —over the set of all possible strings— that minimizes the sum of distances to every string of the set, or by its set median, i.e., the string of the set that minimizes the sum of distances to every other string of the set. For the first two histogram-based distances, we show that the generalized median string can be computed efficiently; for the third one, which biased histograms with individual substitution costs, we conjecture that this is a NP-hard problem, and we introduce two different heuristic algorithms for approximating it. We experimentally compare the relevance of the three histogram-based distances, and the different statistical characterizations of sets of strings, for classifying images that are represented by strings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge (1990)
de la Higuera, C., Casacuberta, F.: Topology of strings: Median string is np-complete. Theoretical Computer Science 230(1/2), 39–48 (2000)
Jiang, X., Bunke, H., Csirik, J.: Median strings: A review. In: Last, M., Kandel, A., Bunke, H. (eds) World Scientific Data Mining in Time Series Databases, pp. 173–192 (2004)
Levenstein, A.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phy. Dohl. 10, 707–710 (1966)
Martinez-Hinarejos, C.D., Juan, A., Casacuberta, F.: Use of median string for classification. In: International Conference on Pattern Recognition, vol. 2, pp. 903–906 (2000)
Ros, J., Laurent, C., Jolion, J.M., Simand, I.: Comparing string representations and distances in a natural image classification task. In: Brun, L., Vento, M. (eds.) GbRPR 2005. LNCS, vol. 3434, pp. 71–83. Springer, Heidelberg (2005)
Simand, I., Jolion, J.M.: Représentation d’images par chaÎnes de symboles: application á la recherche par le contenu. In: Presses universitaires de Louvain, editor, Actes du 20éme colloque GRETSI: Traitement du signal et des images, vol. 2, pp. 925–928 (2005)
Sim, J.S., Park, K.: The consensus string problem for a metric is np-complete. Journal of Discrete Algorithms 2(1), 115–121 (2001)
Wagner, R.A., Fisher, M.J.: The string to string correction problem. Journal of the ACM 21(1), 168–173 (1974)
Wang, J.Z., Li, J., Wiederhold, G.: Simplicity: Semantics-sensitive integrated matching for picture libraries. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(9), 947–963 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Solnon, C., Jolion, JM. (2007). Generalized vs Set Median Strings for Histogram-Based Distances: Algorithms and Classification Results in the Image Domain. In: Escolano, F., Vento, M. (eds) Graph-Based Representations in Pattern Recognition. GbRPR 2007. Lecture Notes in Computer Science, vol 4538. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72903-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-72903-7_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72902-0
Online ISBN: 978-3-540-72903-7
eBook Packages: Computer ScienceComputer Science (R0)