Skip to main content

Semantic Distillation: A Method for Clustering Objects by their Contextual Specificity

  • Chapter
Nature Inspired Cooperative Strategies for Optimization (NICSO 2007)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 129))

Abstract

Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism.

The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed semantic distillation — strongly inspired from the theory of quantum measurement —, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337–404, 1950.

    Article  MATH  MathSciNet  Google Scholar 

  2. Ricardo Baeza-Yates. Information retrieval in the web: beyond current search engines. Internat. J. Approx. Reason., 34(2-3):97–104, 2003. Soft computing applications to intelligent information retrieval on the Internet (Mérida/Granada, 2002).

    Article  MATH  MathSciNet  Google Scholar 

  3. Pierre Baldi, Paolo Frasconi, and Padhraic Smyth. Modeling the Internet and the Web: Probabilistic Methods and Algorithms. Wiley InterScience, New York, 2003.

    Google Scholar 

  4. Mikhail Belkin and Partha Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. In Learning theory, volume 3559 of Lecture Notes in Comput. Sci., pages 486–500. Springer, Berlin, 2005.

    Google Scholar 

  5. Michael W. Berry, Zlatko Drmač, and Elizabeth R. Jessup. Matrices, vector spaces, and information retrieval. SIAM Rev., 41(2):335–362 (electronic), 1999.

    Article  MATH  MathSciNet  Google Scholar 

  6. James C. Bezdek. Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York, 1981. With a foreword by L. A. Zadeh, Advanced Applications in Pattern Recognition.

    MATH  Google Scholar 

  7. Massimo Campanino and Dimitri Petritis. On the physical relevance of random walks: an example of random walks on a randomly oriented lattice. In Random walks and geometry, pages 393–411. Walter de Gruyter GmbH & Co. KG, Berlin, 2004.

    Google Scholar 

  8. Fan R. K. Chung. Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC, 1997.

    Google Scholar 

  9. Ronald R. Coifman and Stéphane Lafon. Diffusion maps. Appl. Comput. Harmon. Anal., 21(1):5–30, 2006.

    Article  MATH  MathSciNet  Google Scholar 

  10. Dragoš M. Cvetković, Michael Doob, and Horst Sachs. Spectra of graphs. Johann Ambrosius Barth, Heidelberg, third edition, 1995. Theory and applications.

    MATH  Google Scholar 

  11. Maria Luisa Dalla Chiara, Roberto Giuntini, and Roberto Leporini. Compositional and holistic quantum computational semantics. Natural Computing, 6(5):113–132, 2007.

    Article  MATH  MathSciNet  Google Scholar 

  12. Sándor Dominich. Mathematical foundations of information retrieval, volume 12 of Mathematical Modelling: Theory and Applications. Kluwer Academic Publishers, Dordrecht, 2001.

    Google Scholar 

  13. Francois Fouss and Jean-Michel Renders. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369, 2007. Member-Alain Pirotte and Member-Marco Saerens.

    Article  Google Scholar 

  14. Peter Gärdenfors. Induction, conceptual spaces and AI. Philos. Sci., 57(1):78–95, 1990.

    Article  MathSciNet  Google Scholar 

  15. Chris Godsil and Gordon Royle. Algebraic graph theory, volume 207 of Graduate Texts in Mathematics. Springer-Verlag, New York, 2001.

    Google Scholar 

  16. Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry. Gene clustering by latent semantic indexing of medline abstracts. Bioinformatics, 21(1):104–115, 2005.

    Article  Google Scholar 

  17. Amy N. Langville and Carl D. Meyer. A survey of eigenvector methods for Web information retrieval. SIAM Rev., 47(1):135–161 (electronic), 2005.

    Article  MATH  MathSciNet  Google Scholar 

  18. Amy N. Langville and Carl D. Meyer. A reordering for the PageRank problem. SIAM J. Sci. Comput., 27(6):2112–2120 (electronic), 2006.

    Article  MATH  MathSciNet  Google Scholar 

  19. Hans Maassen and Burkhard Kümmerer. Purification of quantum trajectories. In Dynamics & stochastics, volume 48 of IMS Lecture Notes Monogr. Ser., pages 252–261. Inst. Math. Statist., Beachwood, OH, 2006.

    Google Scholar 

  20. M. Meilă and J. Shi. A random walks view of spectral segmentation, In AI and Statistics, (2001).

    Google Scholar 

  21. J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Phil. Trans. Roy. Soc. London Ser. A, 209:415–446, 1909.

    Article  Google Scholar 

  22. Bojan Mohar. Some applications of Laplace eigenvalues of graphs. In Graph symmetry (Montreal, PQ, 1996), volume 497 of NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., pages 225–275. Kluwer Acad. Publ., Dordrecht, 1997.

    Google Scholar 

  23. See-Kiong Ng, Zexuan Zhu, and Yew-Soon Ong. Whole-genome functional classification of genes by latent semantic analysis on microarray data. In 2nd Asia-Pacific Bioinformatics Conference, Dunedin, New Zealand, volume 29 of Conferences in research and practice in information technology. Australian Computer Society, Inc., 2004.

    Google Scholar 

  24. Jens Nilsson. Nonlinear dimensionality reduction of gene expression data. PhD thesis, Lund University, Faculty of Engineering, 2005.

    Google Scholar 

  25. K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2:559–572, 1901.

    Google Scholar 

  26. Dimitri Petritis. Mathematical foundations of quantum mechanics and its applications, Preliminary version (2006) available at http://perso.univ-rennes1.fr/dimitri.petritis/ps/qiccc.pdf.

  27. Miklós Rédei. Quantum logic in algebraic approach, volume 91 of Fundamental Theories of Physics. Kluwer Academic Publishers Group, Dordrecht, 1998.

    Google Scholar 

  28. Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888–905, 2000.

    Article  Google Scholar 

  29. C. J. van Rijsbergen. The geometry of information retrieval. Cambridge University Press, Cambridge, 2004.

    MATH  Google Scholar 

  30. Sebastien Vast, Pierre Dupont, and Yves Deville. Automatic extraction of relevant nodes in biochemical networks, citeseer.ist.psu.edu/742958.html.

  31. J. Vert and M. Kanehisa. Graph-driven features extraction from microarray data using diffusion kernels and kernel cca, NIPS (2002). citeseer.ist.psu.edu/vert02graphdriven.html.

  32. Saraswathi Vishveshwara, K. V. Brinda, and N. Kannan. Protein structure: insights from graph theory. Journal of Theoretical and Computational Chemistry, 1:187–212, 2002.

    Article  Google Scholar 

  33. S. T. Wang, K. F. Chung, H. B. Shen, and R. Q. Zhu. Note on the relationship between probabilistic and fuzzy clustering. Soft Computing, 8:523–526, 2004.

    Article  MATH  Google Scholar 

  34. Dominic Widdows. Geometry and meaning, volume 172 of CSLI Lecture Notes. CSLI Publications, Stanford, CA, 2004. With a foreword by Pentti Kanerva.

    Google Scholar 

  35. Itai Yanai, Hila Benjamin, Michael Shmoish, Vered Chalifa-Caspi, Maxim Shklar, Ron Ophir, Arren Bar-Even, Shirley Horn-Saban, Marilyn Safran, Eytan Domany, Doron Lancet, and Orit Shmueli. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics, 21:650–659, 2005.

    Article  Google Scholar 

  36. L. A. Zadeh. Fuzzy sets. Information and Control, 8:338–353, 1965.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sierocinski, T., Le Béchec, A., Théret, N., Petritis, D. (2008). Semantic Distillation: A Method for Clustering Objects by their Contextual Specificity. In: Krasnogor, N., Nicosia, G., Pavone, M., Pelta, D. (eds) Nature Inspired Cooperative Strategies for Optimization (NICSO 2007). Studies in Computational Intelligence, vol 129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78987-1_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78987-1_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78986-4

  • Online ISBN: 978-3-540-78987-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics