Skip to main content

Information Distance and Its Extensions

  • Conference paper
Discovery Science (DS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6926))

Included in the following conference series:

Abstract

Consider, in the most general sense, the space of all information carrying objects: a book, an article, a name, a definition, a genome, a letter, an image, an email, a webpage, a Google query, an answer, a movie, a music score, a Facebook blog, a short message, or even an abstract concept. Over the past 20 years, we have been developing a general theory of information distance in this space and applications of this theory. The theory is object-independent and application-independent. The theory is also unique, in the sense that no other theory is “better”. During the past 10 years, such a theory has found many applications. Recently we have introduced two extensions to this theory concerning multiple objects and irrelevant information. This expository article will focus on explaining the main ideas behind this theory, especially these recent extensions, and their applications. We will also discuss some very preliminary applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ané, C., Sanderson, M.J.: Missing the forest for the trees: Phylogenetic compression and its implications for inferring complex evolutionary histories. Systematic Biology 54(1), 146–157 (2005)

    Article  Google Scholar 

  2. Arbuckle, T., Balaban, A., Peters, D.K., Lawford, M.: Software documents: comparison and measurement. In: Proc. 18 Int’l Conf. on Software Engineering and Knowledge Engineering 2007 (SEKE 2007), pp. 740–745 (2007)

    Google Scholar 

  3. Arbuckle, T.: Studying software evolution using artefacts’ shared information content. Sci. of Comput. Programming 76(2), 1078–1097 (2011)

    Article  Google Scholar 

  4. Bennett, C.H., Gács, P., Li, M., Vitányi, P., Zurek, W.: Information Distance. IEEE Trans. Inform. Theory 44(4), 1407–1423 (1993) (STOC 1993)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bennett, C.H., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003) (feature article)

    Article  Google Scholar 

  6. Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. 88(4), 048702 (2002)

    Article  Google Scholar 

  7. Bu, F., Zhu, X., Li, M.: A new multiword expression metric and its applications. J. Comput. Sci. Tech. 26(1), 3–13 (2011); also in COLING 2010

    Article  MATH  Google Scholar 

  8. Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Information Theory 50(7), 1545–1550 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Cilibrasi, R., Vitányi, P., de Wolf Algorithmic, R.: clustring of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)

    Article  Google Scholar 

  10. Cilibrasi, R., Vitányi, P.: Automatic semantics using Google (2005) (manuscript), http://arxiv.org/abs/cs.CL/0412098 (2004)

  11. Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Trans. Inform. Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Networks 18(4), 1111–1123 (2005)

    Article  Google Scholar 

  13. Emanuel, K., Ravela, S., Vivant, E., Risi, C.: A combined statistical-deterministic approach of hurricane risk assessment. In: Program in Atmospheres, Oceans, and Climate. MIT, Cambridge (2005) (manuscript)

    Google Scholar 

  14. Fagin, R., Stockmeyer, L.: Relaxing the triangle inequality in pattern matching. Int’l J. Comput. Vision 28(3), 219–231 (1998)

    Article  Google Scholar 

  15. Kirk, S.R., Jenkins, S.: Information theory-baed software metrics and obfuscation. J. Systems and Software 72, 179–186 (2004)

    Article  Google Scholar 

  16. Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD 2004, pp. 206–215 (2004)

    Google Scholar 

  17. Kocsor, A., Kertesz-Farkas, A., Kajan, L., Pongor, S.: Application of compression-based distance measures to protein sequence classification: a methodology study. Bioinformatics 22(4), 407–412 (2006)

    Article  Google Scholar 

  18. Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)

    Article  Google Scholar 

  19. Li, M., Badger, J., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2), 149–154 (2001)

    Article  Google Scholar 

  20. Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.: The similarity metric. IEEE Trans. Information Theory 50(12), 3250–3264 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  21. Li, M.: Information distance and its applications. Int’l J. Found. Comput. Sci. 18(4), 669–681 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  22. Li, M., Ma, B.: Notes on information distance among many entities, March 23 (2008) (unpublished notes)

    Google Scholar 

  23. Li, M., Tang, Y., Wang, D.: Information distance between what I said and what it heard (manuscript, 2011)

    Google Scholar 

  24. Li, M., Vitányi, P.: An introduction to Kolmogorov complexity and its applications, 3rd edn. Springer, Heidelberg (2008)

    Book  MATH  Google Scholar 

  25. Long, C., Zhu, X.Y., Li, M., Ma, B.: Information shared by many objects. In: ACM 17th Conf. Info. and Knowledge Management (CIKM 2008), Napa Valley, California, October 26-30 (2008)

    Google Scholar 

  26. Long, C., Huang, M., Zhu, X., Li, M.: Multi-document summarization by information distance. In: IEEE Int’l Conf. Data Mining, 2009 (ICDM 2009), Miami, Florida, December 6-9 (2009)

    Google Scholar 

  27. Nikvand, N., Wang, Z.: Generic image similarity based on Kolmogorov complexity. In: IEEE Int’l Conf. Image Processing, Hong Kong, China, September 26-29 (2010)

    Google Scholar 

  28. Nykter, M., Price, N.D., Larjo, A., Aho, T., Kauffman, S.A., Yli-Harja, O., Shmulevich, I.: Critical networks exhibit maximal information diversity in structure-dynamics relationships. Phy. Rev. Lett. 100, 058702(4) (2008)

    Google Scholar 

  29. Nykter, M., Price, N.D., Aldana, M., Ramsey, S.A., Kauffman, S.A., Hood, L.E., Yli-Harja, O., Shmulevich, I.: Gene expression dynamics in the macrophage exhibit criticality. Proc. Nat. Acad. Sci. USA 105(6), 1897–1900 (2008)

    Article  Google Scholar 

  30. Otu, H.H., Sayood, K.: A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19(6), 2122–2130 (2003)

    Article  Google Scholar 

  31. Pao, H.K., Case, J.: Computing entropy for ortholog detection. In: Int’l Conf. Comput. Intell., Istanbul, Turkey, December 17-19 (2004)

    Google Scholar 

  32. Parry, D.: Use of Kolmogorov distance identification of web page authorship, topic and domain. In: Workshop on Open Source Web Inf. Retrieval (2005), http://www.emse.fr/OSWIR05/

  33. Costa Santos, C., Bernardes, J., Vitányi, P., Antunes, L.: Clustering fetal heart rate tracings by compression. In: Proc. 19th IEEE Intn’l Symp. Computer-Based Medical Systems, Salt Lake City, Utah, June 22-23 (2006)

    Google Scholar 

  34. Taha, W., Crosby, S., Swadi, K.: A new approach to data mining for software design, Rice Univ. (2006) (manuscript)

    Google Scholar 

  35. Varre, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)

    Article  Google Scholar 

  36. Veltkamp, R.C.: Shape Matching: Similarity Measures and Algorithms. In: Proc. Int ’l Conf. Shape Modeling Applications, Italy, pp. 188–197 (2001) (invited talk)

    Google Scholar 

  37. Vitanyi, P.M.B.: Information distance in multiples. IEEE Trans. Inform. Theory 57(4), 2451–2456 (2011)

    Article  MathSciNet  Google Scholar 

  38. Wehner, S.: Analyzing worms and network traffice using compression. J. Comput. Security 15(3), 303–320 (2007)

    Article  Google Scholar 

  39. Zhang, X., Hao, Y., Zhu, X., Li, M.: Information distance from a question to an answer. In: 13th ACM SIGKDD Int’l Conf. Knowledge Discovery Data Mining, San Jose, CA, August 12-15 (2007)

    Google Scholar 

  40. Zhang, X., Hao, Y., Zhu, X.Y., Li, M.: New information measure and its application in question answering system. J. Comput. Sci. Tech. 23(4), 557–572 (2008); This is the final version of [39]

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, M. (2011). Information Distance and Its Extensions. In: Elomaa, T., Hollmén, J., Mannila, H. (eds) Discovery Science. DS 2011. Lecture Notes in Computer Science(), vol 6926. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24477-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24477-3_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24476-6

  • Online ISBN: 978-3-642-24477-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics