Abstract
Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism.
The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed semantic distillation — strongly inspired from the theory of quantum measurement —, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337–404, 1950.
Ricardo Baeza-Yates. Information retrieval in the web: beyond current search engines. Internat. J. Approx. Reason., 34(2-3):97–104, 2003. Soft computing applications to intelligent information retrieval on the Internet (Mérida/Granada, 2002).
Pierre Baldi, Paolo Frasconi, and Padhraic Smyth. Modeling the Internet and the Web: Probabilistic Methods and Algorithms. Wiley InterScience, New York, 2003.
Mikhail Belkin and Partha Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. In Learning theory, volume 3559 of Lecture Notes in Comput. Sci., pages 486–500. Springer, Berlin, 2005.
Michael W. Berry, Zlatko Drmač, and Elizabeth R. Jessup. Matrices, vector spaces, and information retrieval. SIAM Rev., 41(2):335–362 (electronic), 1999.
James C. Bezdek. Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York, 1981. With a foreword by L. A. Zadeh, Advanced Applications in Pattern Recognition.
Massimo Campanino and Dimitri Petritis. On the physical relevance of random walks: an example of random walks on a randomly oriented lattice. In Random walks and geometry, pages 393–411. Walter de Gruyter GmbH & Co. KG, Berlin, 2004.
Fan R. K. Chung. Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC, 1997.
Ronald R. Coifman and Stéphane Lafon. Diffusion maps. Appl. Comput. Harmon. Anal., 21(1):5–30, 2006.
Dragoš M. Cvetković, Michael Doob, and Horst Sachs. Spectra of graphs. Johann Ambrosius Barth, Heidelberg, third edition, 1995. Theory and applications.
Maria Luisa Dalla Chiara, Roberto Giuntini, and Roberto Leporini. Compositional and holistic quantum computational semantics. Natural Computing, 6(5):113–132, 2007.
Sándor Dominich. Mathematical foundations of information retrieval, volume 12 of Mathematical Modelling: Theory and Applications. Kluwer Academic Publishers, Dordrecht, 2001.
Francois Fouss and Jean-Michel Renders. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369, 2007. Member-Alain Pirotte and Member-Marco Saerens.
Peter Gärdenfors. Induction, conceptual spaces and AI. Philos. Sci., 57(1):78–95, 1990.
Chris Godsil and Gordon Royle. Algebraic graph theory, volume 207 of Graduate Texts in Mathematics. Springer-Verlag, New York, 2001.
Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry. Gene clustering by latent semantic indexing of medline abstracts. Bioinformatics, 21(1):104–115, 2005.
Amy N. Langville and Carl D. Meyer. A survey of eigenvector methods for Web information retrieval. SIAM Rev., 47(1):135–161 (electronic), 2005.
Amy N. Langville and Carl D. Meyer. A reordering for the PageRank problem. SIAM J. Sci. Comput., 27(6):2112–2120 (electronic), 2006.
Hans Maassen and Burkhard Kümmerer. Purification of quantum trajectories. In Dynamics & stochastics, volume 48 of IMS Lecture Notes Monogr. Ser., pages 252–261. Inst. Math. Statist., Beachwood, OH, 2006.
M. Meilă and J. Shi. A random walks view of spectral segmentation, In AI and Statistics, (2001).
J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Phil. Trans. Roy. Soc. London Ser. A, 209:415–446, 1909.
Bojan Mohar. Some applications of Laplace eigenvalues of graphs. In Graph symmetry (Montreal, PQ, 1996), volume 497 of NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., pages 225–275. Kluwer Acad. Publ., Dordrecht, 1997.
See-Kiong Ng, Zexuan Zhu, and Yew-Soon Ong. Whole-genome functional classification of genes by latent semantic analysis on microarray data. In 2nd Asia-Pacific Bioinformatics Conference, Dunedin, New Zealand, volume 29 of Conferences in research and practice in information technology. Australian Computer Society, Inc., 2004.
Jens Nilsson. Nonlinear dimensionality reduction of gene expression data. PhD thesis, Lund University, Faculty of Engineering, 2005.
K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2:559–572, 1901.
Dimitri Petritis. Mathematical foundations of quantum mechanics and its applications, Preliminary version (2006) available at http://perso.univ-rennes1.fr/dimitri.petritis/ps/qiccc.pdf.
Miklós Rédei. Quantum logic in algebraic approach, volume 91 of Fundamental Theories of Physics. Kluwer Academic Publishers Group, Dordrecht, 1998.
Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888–905, 2000.
C. J. van Rijsbergen. The geometry of information retrieval. Cambridge University Press, Cambridge, 2004.
Sebastien Vast, Pierre Dupont, and Yves Deville. Automatic extraction of relevant nodes in biochemical networks, citeseer.ist.psu.edu/742958.html.
J. Vert and M. Kanehisa. Graph-driven features extraction from microarray data using diffusion kernels and kernel cca, NIPS (2002). citeseer.ist.psu.edu/vert02graphdriven.html.
Saraswathi Vishveshwara, K. V. Brinda, and N. Kannan. Protein structure: insights from graph theory. Journal of Theoretical and Computational Chemistry, 1:187–212, 2002.
S. T. Wang, K. F. Chung, H. B. Shen, and R. Q. Zhu. Note on the relationship between probabilistic and fuzzy clustering. Soft Computing, 8:523–526, 2004.
Dominic Widdows. Geometry and meaning, volume 172 of CSLI Lecture Notes. CSLI Publications, Stanford, CA, 2004. With a foreword by Pentti Kanerva.
Itai Yanai, Hila Benjamin, Michael Shmoish, Vered Chalifa-Caspi, Maxim Shklar, Ron Ophir, Arren Bar-Even, Shirley Horn-Saban, Marilyn Safran, Eytan Domany, Doron Lancet, and Orit Shmueli. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics, 21:650–659, 2005.
L. A. Zadeh. Fuzzy sets. Information and Control, 8:338–353, 1965.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sierocinski, T., Le Béchec, A., Théret, N., Petritis, D. (2008). Semantic Distillation: A Method for Clustering Objects by their Contextual Specificity. In: Krasnogor, N., Nicosia, G., Pavone, M., Pelta, D. (eds) Nature Inspired Cooperative Strategies for Optimization (NICSO 2007). Studies in Computational Intelligence, vol 129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78987-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-78987-1_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78986-4
Online ISBN: 978-3-540-78987-1
eBook Packages: EngineeringEngineering (R0)