Suitably Representing Data

Apolloni, Bruno; Pedrycz, Witold; Bassis, Simone; Malchiodi, Dario

doi:10.1007/978-3-540-79864-4_7

Bruno Apolloni,
Witold Pedrycz,
Simone Bassis &
…
Dario Malchiodi

Part of the book series: Studies in Computational Intelligence ((SCI,volume 138))

426 Accesses

Abstract

We may organize the search for the metric d at the basis of the cost function ℓ into two main steps: search for a sound representation of the data and use of a metric appropriate to the representation. The term sound stands for a representation allowing to better understanding the data, for instance by decoupling original signals, removing noise, discarding meaningless details, and alike. The result of the splitting could prove less efficient than the direct metric, but more manageable in most cases. Essentially, we are looking for rewriting the metric instances $d(\boldsymbol y_i,\boldsymbol y_j)$ as a composition $d'(g(\boldsymbol y_i),g(\boldsymbol y_j))$, with g optimizing the cost function C, namely:

$$g=\arg\max_{\widetilde g}\textsf C[\widetilde g]=\arg\max_{\widetilde g}\left\{\sum_{\boldsymbol y\in \mathfrak Y}\sum_{j=1}^k \ell\left[ D(\widetilde g(\boldsymbol y)),d_j\right]\right\}. ~~(7.1)$$

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apolloni, B., Caravalho, N., De Falco, D.: Quantum stochastic optimization. Stochastic Processes and their Applications 33, 233–244 (1989)
Article MathSciNet MATH Google Scholar
Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proceedings of SCG 2006, Sedona, Arizona, USA. ACM, New York (2006)
Google Scholar
Ball, G., Hall, D.: Isodata, an iterative method of multivariate analysis and pattern classification. In: IFIPS Congress (1965)
Google Scholar
Bang-Jensen, J., Gutin, G., Yeo, A.: When the greedy algorithm fails. Discrete Optimization 1, 121–127 (2004)
Article MathSciNet MATH Google Scholar
Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. Journal of Machine Learning Research 2, 125–137 (2001)
Article Google Scholar
Blatt, M., Wiseman, S., Domanym, E.: Super-paramagnetic clustering of data. Physical Review Letters 76, 3251 (1996)
Article Google Scholar
Bracewell, R.N.: The Fourier Transform and Its Applications, 3rd edn. McGraw Hill, Boston (2000)
Google Scholar
Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the Twenty First International Conference on Very Large Databases (VLDB 1995), pp. 574–584 (2005)
Google Scholar
Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3, 1–27 (1974)
Article MathSciNet Google Scholar
Chan, T.F., Shen, J.J.: Image Processing and Analysis - Variational, PDE, Wavelet, and Stochastic Methods. Paperback, Society of Applied Mathematics (2005)
Google Scholar
Chang, K., Ghosh, J.: A unified model for probabilistic principal surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(1) (2001)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21–27 (1967)
Article MATH Google Scholar
Das, A., Chakrabarti, B.K.: Quantum Annealing and Related Optimization Methods. Lecture Note in Physics, vol. 679. Springer, Heidelberg (2005)
MATH Google Scholar
Davis, M.: Weighing the universe. Nature 410, 153–154 (2001)
Article Google Scholar
Delgado, K.K., Murray, J.F., Rao, B.D., Engan, K., Lee, T.W., Sejnowski, T.J.: Dictionary learning algorithms for sparse representation. Neural Computation 15, 349–396 (2003)
Article MATH Google Scholar
Diamantaras, K.I., Kung, S.Y.: Principal component neural networks. Wiley, New York (1996)
MATH Google Scholar
Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. John Wiley & Sons, New York (1973)
MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley Interscience Publication, Chichester (2000)
Google Scholar
Garey, M.R., Johnson, D.S.: Computer and Intractability: a Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco (1978)
Google Scholar
Gavert, H., Hurri, J., Sarela, J., Hyvarinen, A.: The fastica package for matlab (2005)
Google Scholar
Girolami, M.: Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks 13(3), 780–784 (2002)
Article Google Scholar
Glover, F., Laguna, M.: Tabu Search. Kluwer, Norwell (1997)
MATH Google Scholar
Har-Peled, S., Sadri, B.: How fast is the k-means method? Algorithmica 41(3), 185–202 (2005)
Article MathSciNet MATH Google Scholar
Hartigan, J.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Hastie, T.J., Stuetzle, W.: Principal curves. Journal of the American Statistical Associations 84(406), 502–516 (1989)
Article MathSciNet MATH Google Scholar
Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: Identification and analysis of coexpressed genes. Genome Research 9, 1106–1115 (1999)
Article Google Scholar
Hyvärinen, A., Kahunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Chichester (2001)
Google Scholar
Jessop, A.: Informed assessments: an introduction to information, entropy and statistics. Ellis Horwood, New York (1995)
MATH Google Scholar
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 2, 241–254 (1967)
Article Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)
Google Scholar
Kaufman, L., Rousseeuw, P.: Finding groups in data: an introduction to cluster analysis. Wiley, New York (1990)
Google Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Article MathSciNet Google Scholar
Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer Series in Information Sciences. Springer, Berlin (2001)
MATH Google Scholar
Krzanowski, W.J., Lai, Y.T.: A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrika 44, 23–34 (1985)
MathSciNet Google Scholar
Li, Y., Cichocki, A., Amari, S.: Analysis of sparse representation and blind source separation. Neural Computation 16, 1193–1234 (2004)
Article MATH Google Scholar
Lloyd, S.P.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28, 129–137 (1982)
Article MathSciNet MATH Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, vol. 1, pp. 281–296 (1967)
Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Article Google Scholar
Minaei-Bidgoli, B., Topchy, A., Punch, W.F.: Ensembles of partitions via data resampling (to be discovered, 2005)
Google Scholar
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)
Google Scholar
Morrison, D.F.: Multivariate Statistical Methods. McGraw-Hill, New York (1967)
MATH Google Scholar
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Becker, S., Dietterich, T., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)
Google Scholar
Oja, E.: A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15, 267–273 (1982)
Article MathSciNet MATH Google Scholar
Pearson, K.: Principal components analysis. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 6(2), 559 (1901)
MathSciNet Google Scholar
Resnikoff, H.L., Wells, R.O.: Wavelet analysis: the scalable structure of information. Springer, Berlin (1998)
MATH Google Scholar
Ristad, E.S., Yianilos, P.N.: Learning string edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5), 522–532 (1998)
Article Google Scholar
Rohatgi, V.K.: An Introduction to Probablity Theory and Mathematical Statistics. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York (1976)
Google Scholar
Rubner, Y., Tomasi, C.: Texture-based image retrieval without segmentation. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision 1999, Kerkyra, Greece, vol. 2, pp. 1018–1024 (1999)
Google Scholar
Scholkopf, B., Platt, J., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7) (2001)
Google Scholar
Scholkopf, B., Smola, A., Muller, K.R., Scholz, M., Ratsch, G.: Kernel pca and de-noising in feature space. In: Advances in Neural Information Processing Systems, vol. 11, pp. 536–542 (1999)
Google Scholar
Smyth, P.: Clustering sequences with hidden markov models. In: Advances in Neural Information Processing Systems, vol. 9, pp. 648–654 (1997)
Google Scholar
Spielman, D.A., Teng, S.H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal ACM 51(3), 385–463 (2004)
Article MathSciNet Google Scholar
Staiano, A., De Vinco, L., Ciaramella, A., Raiconi, G., Tagliaferri, R.: Probabilistic principal surfaces for yeast gene microarray data mining. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275. Springer, Heidelberg (2004)
Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2), 411–423 (2001)
Article MathSciNet MATH Google Scholar
Turner, J.R.: Introduction to analysis of variance: Design, analysis, and interpretation. Sage Publications, Thousand Oaks (2001)
Google Scholar
Von Neumann, J.: Mathematical Foundations of Quantum Mechanics. Princeton University Press, Princeton (1955)
MATH Google Scholar
Von Neumann, J.: The Computer and the Brain. Yale University Press, New Haven (1958)
MATH Google Scholar
Voronoi, G.: Nouvelles applications del paramètres continus à la théorie des formes quadratiques. Journal für die Reine und Angewandte Mathematik 133, 97–179 (1908)
Article MATH Google Scholar
Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)
Article MathSciNet Google Scholar
Wilks, S.S.: Mathematical Statistics. Wiley Publications in Statistics. John Wiley, New York (1962)
MATH Google Scholar
Wishart, D.: ClustanGraphics Primer: A Guide to Clustaer Analysis, 2nd edn. Clustan Ltd., Edinburgh (2003)
Google Scholar
Wojna, A.: Center-based indexing in vector and metric spaces. Fundamenta Informaticae 56(3), 285–310 (2003)
MathSciNet MATH Google Scholar
Yu, K., Ji, L., Zhang, X.: Kernel nearest-neighbor algorithm. Neural Processing Letters 15(2), 147–156 (2002)
Article MATH Google Scholar
Zibulevsky, M., Pearlmutter, B.A.: Blind source separation by sparse decomposition in a signal dictionary. Neural Computation 13, 863–882 (2001)
Article MATH Google Scholar
Zucker, S.W.: Local structure, consistency, and continuous relaxation. In: Haralick, R., Simon, J.C. (eds.) Digital Image Processing Noordhoff International, Leyden (1980)
Google Scholar

Download references

Authors

Bruno Apolloni
View author publications
You can also search for this author in PubMed Google Scholar
Witold Pedrycz
View author publications
You can also search for this author in PubMed Google Scholar
Simone Bassis
View author publications
You can also search for this author in PubMed Google Scholar
Dario Malchiodi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Apolloni, B., Pedrycz, W., Bassis, S., Malchiodi, D. (2008). Suitably Representing Data. In: The Puzzle of Granular Computing. Studies in Computational Intelligence, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79864-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-79864-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79863-7
Online ISBN: 978-3-540-79864-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics