Information Distance and Its Extensions

Li, Ming

doi:10.1007/978-3-642-24477-3_3

Ming Li²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6926))

Included in the following conference series:

International Conference on Discovery Science

1381 Accesses
1 Citations

Abstract

Consider, in the most general sense, the space of all information carrying objects: a book, an article, a name, a definition, a genome, a letter, an image, an email, a webpage, a Google query, an answer, a movie, a music score, a Facebook blog, a short message, or even an abstract concept. Over the past 20 years, we have been developing a general theory of information distance in this space and applications of this theory. The theory is object-independent and application-independent. The theory is also unique, in the sense that no other theory is “better”. During the past 10 years, such a theory has found many applications. Recently we have introduced two extensions to this theory concerning multiple objects and irrelevant information. This expository article will focus on explaining the main ideas behind this theory, especially these recent extensions, and their applications. We will also discuss some very preliminary applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ané, C., Sanderson, M.J.: Missing the forest for the trees: Phylogenetic compression and its implications for inferring complex evolutionary histories. Systematic Biology 54(1), 146–157 (2005)
Article Google Scholar
Arbuckle, T., Balaban, A., Peters, D.K., Lawford, M.: Software documents: comparison and measurement. In: Proc. 18 Int’l Conf. on Software Engineering and Knowledge Engineering 2007 (SEKE 2007), pp. 740–745 (2007)
Google Scholar
Arbuckle, T.: Studying software evolution using artefacts’ shared information content. Sci. of Comput. Programming 76(2), 1078–1097 (2011)
Article Google Scholar
Bennett, C.H., Gács, P., Li, M., Vitányi, P., Zurek, W.: Information Distance. IEEE Trans. Inform. Theory 44(4), 1407–1423 (1993) (STOC 1993)
Article MathSciNet MATH Google Scholar
Bennett, C.H., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003) (feature article)
Article Google Scholar
Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. 88(4), 048702 (2002)
Article Google Scholar
Bu, F., Zhu, X., Li, M.: A new multiword expression metric and its applications. J. Comput. Sci. Tech. 26(1), 3–13 (2011); also in COLING 2010
Article MATH Google Scholar
Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Information Theory 50(7), 1545–1550 (2004)
Article MathSciNet MATH Google Scholar
Cilibrasi, R., Vitányi, P., de Wolf Algorithmic, R.: clustring of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)
Article Google Scholar
Cilibrasi, R., Vitányi, P.: Automatic semantics using Google (2005) (manuscript), http://arxiv.org/abs/cs.CL/0412098 (2004)
Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Trans. Inform. Theory 51(4), 1523–1545 (2005)
Article MathSciNet MATH Google Scholar
Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Networks 18(4), 1111–1123 (2005)
Article Google Scholar
Emanuel, K., Ravela, S., Vivant, E., Risi, C.: A combined statistical-deterministic approach of hurricane risk assessment. In: Program in Atmospheres, Oceans, and Climate. MIT, Cambridge (2005) (manuscript)
Google Scholar
Fagin, R., Stockmeyer, L.: Relaxing the triangle inequality in pattern matching. Int’l J. Comput. Vision 28(3), 219–231 (1998)
Article Google Scholar
Kirk, S.R., Jenkins, S.: Information theory-baed software metrics and obfuscation. J. Systems and Software 72, 179–186 (2004)
Article Google Scholar
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD 2004, pp. 206–215 (2004)
Google Scholar
Kocsor, A., Kertesz-Farkas, A., Kajan, L., Pongor, S.: Application of compression-based distance measures to protein sequence classification: a methodology study. Bioinformatics 22(4), 407–412 (2006)
Article Google Scholar
Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)
Article Google Scholar
Li, M., Badger, J., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2), 149–154 (2001)
Article Google Scholar
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Trans. Information Theory 50(12), 3250–3264 (2004)
Article MathSciNet MATH Google Scholar
Li, M.: Information distance and its applications. Int’l J. Found. Comput. Sci. 18(4), 669–681 (2007)
Article MathSciNet MATH Google Scholar
Li, M., Ma, B.: Notes on information distance among many entities, March 23 (2008) (unpublished notes)
Google Scholar
Li, M., Tang, Y., Wang, D.: Information distance between what I said and what it heard (manuscript, 2011)
Google Scholar
Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer, Heidelberg (2008)
Book MATH Google Scholar
Long, C., Zhu, X.Y., Li, M., Ma, B.: Information shared by many objects. In: ACM 17th Conf. Info. and Knowledge Management (CIKM 2008), Napa Valley, California, October 26-30 (2008)
Google Scholar
Long, C., Huang, M., Zhu, X., Li, M.: Multi-document summarization by information distance. In: IEEE Int’l Conf. Data Mining, 2009 (ICDM 2009), Miami, Florida, December 6-9 (2009)
Google Scholar
Nikvand, N., Wang, Z.: Generic image similarity based on Kolmogorov complexity. In: IEEE Int’l Conf. Image Processing, Hong Kong, China, September 26-29 (2010)
Google Scholar
Nykter, M., Price, N.D., Larjo, A., Aho, T., Kauffman, S.A., Yli-Harja, O., Shmulevich, I.: Critical networks exhibit maximal information diversity in structure-dynamics relationships. Phy. Rev. Lett. 100, 058702(4) (2008)
Google Scholar
Nykter, M., Price, N.D., Aldana, M., Ramsey, S.A., Kauffman, S.A., Hood, L.E., Yli-Harja, O., Shmulevich, I.: Gene expression dynamics in the macrophage exhibit criticality. Proc. Nat. Acad. Sci. USA 105(6), 1897–1900 (2008)
Article Google Scholar
Otu, H.H., Sayood, K.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(6), 2122–2130 (2003)
Article Google Scholar
Pao, H.K., Case, J.: Computing entropy for ortholog detection. In: Int’l Conf. Comput. Intell., Istanbul, Turkey, December 17-19 (2004)
Google Scholar
Parry, D.: Use of Kolmogorov distance identification of web page authorship, topic and domain. In: Workshop on Open Source Web Inf. Retrieval (2005), http://www.emse.fr/OSWIR05/
Costa Santos, C., Bernardes, J., Vitányi, P., Antunes, L.: Clustering fetal heart rate tracings by compression. In: Proc. 19th IEEE Intn’l Symp. Computer-Based Medical Systems, Salt Lake City, Utah, June 22-23 (2006)
Google Scholar
Taha, W., Crosby, S., Swadi, K.: A new approach to data mining for software design, Rice Univ. (2006) (manuscript)
Google Scholar
Varre, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)
Article Google Scholar
Veltkamp, R.C.: Shape Matching: Similarity Measures and Algorithms. In: Proc. Int ’l Conf. Shape Modeling Applications, Italy, pp. 188–197 (2001) (invited talk)
Google Scholar
Vitanyi, P.M.B.: Information distance in multiples. IEEE Trans. Inform. Theory 57(4), 2451–2456 (2011)
Article MathSciNet Google Scholar
Wehner, S.: Analyzing worms and network traffice using compression. J. Comput. Security 15(3), 303–320 (2007)
Article Google Scholar
Zhang, X., Hao, Y., Zhu, X., Li, M.: Information distance from a question to an answer. In: 13th ACM SIGKDD Int’l Conf. Knowledge Discovery Data Mining, San Jose, CA, August 12-15 (2007)
Google Scholar
Zhang, X., Hao, Y., Zhu, X.Y., Li, M.: New information measure and its application in question answering system. J. Comput. Sci. Tech. 23(4), 557–572 (2008); This is the final version of [39]
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Waterloo, Waterloo, Ont., N2L 3G1, Canada
Ming Li

Authors

Ming Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software Systems, Tampere University of Technology, P. O. Box 553, 33101, Tampere, Finland
Tapio Elomaa
Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Jaakko Hollmén
Helsinki Institute for Information Technology (HIIT), Finland
Heikki Mannila

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M. (2011). Information Distance and Its Extensions. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-24477-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24476-6
Online ISBN: 978-3-642-24477-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics