Skip to main content

Suitably Representing Data

  • Chapter
Book cover The Puzzle of Granular Computing

Part of the book series: Studies in Computational Intelligence ((SCI,volume 138))

  • 426 Accesses

Abstract

We may organize the search for the metric d at the basis of the cost function ℓ into two main steps: search for a sound representation of the data and use of a metric appropriate to the representation. The term sound stands for a representation allowing to better understanding the data, for instance by decoupling original signals, removing noise, discarding meaningless details, and alike. The result of the splitting could prove less efficient than the direct metric, but more manageable in most cases. Essentially, we are looking for rewriting the metric instances \(d(\boldsymbol y_i,\boldsymbol y_j)\) as a composition \(d'(g(\boldsymbol y_i),g(\boldsymbol y_j))\), with g optimizing the cost function C, namely:

$$g=\arg\max_{\widetilde g}\textsf C[\widetilde g]=\arg\max_{\widetilde g}\left\{\sum_{\boldsymbol y\in \mathfrak Y}\sum_{j=1}^k \ell\left[ D(\widetilde g(\boldsymbol y)),d_j\right]\right\}. ~~(7.1)$$

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apolloni, B., Caravalho, N., De Falco, D.: Quantum stochastic optimization. Stochastic Processes and their Applications 33, 233–244 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  2. Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proceedings of SCG 2006, Sedona, Arizona, USA. ACM, New York (2006)

    Google Scholar 

  3. Ball, G., Hall, D.: Isodata, an iterative method of multivariate analysis and pattern classification. In: IFIPS Congress (1965)

    Google Scholar 

  4. Bang-Jensen, J., Gutin, G., Yeo, A.: When the greedy algorithm fails. Discrete Optimization 1, 121–127 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  5. Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. Journal of Machine Learning Research 2, 125–137 (2001)

    Article  Google Scholar 

  6. Blatt, M., Wiseman, S., Domanym, E.: Super-paramagnetic clustering of data. Physical Review Letters 76, 3251 (1996)

    Article  Google Scholar 

  7. Bracewell, R.N.: The Fourier Transform and Its Applications, 3rd edn. McGraw Hill, Boston (2000)

    Google Scholar 

  8. Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the Twenty First International Conference on Very Large Databases (VLDB 1995), pp. 574–584 (2005)

    Google Scholar 

  9. Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3, 1–27 (1974)

    Article  MathSciNet  Google Scholar 

  10. Chan, T.F., Shen, J.J.: Image Processing and Analysis - Variational, PDE, Wavelet, and Stochastic Methods. Paperback, Society of Applied Mathematics (2005)

    Google Scholar 

  11. Chang, K., Ghosh, J.: A unified model for probabilistic principal surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(1) (2001)

    Google Scholar 

  12. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  13. Das, A., Chakrabarti, B.K.: Quantum Annealing and Related Optimization Methods. Lecture Note in Physics, vol. 679. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  14. Davis, M.: Weighing the universe. Nature 410, 153–154 (2001)

    Article  Google Scholar 

  15. Delgado, K.K., Murray, J.F., Rao, B.D., Engan, K., Lee, T.W., Sejnowski, T.J.: Dictionary learning algorithms for sparse representation. Neural Computation 15, 349–396 (2003)

    Article  MATH  Google Scholar 

  16. Diamantaras, K.I., Kung, S.Y.: Principal component neural networks. Wiley, New York (1996)

    MATH  Google Scholar 

  17. Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. John Wiley & Sons, New York (1973)

    MATH  Google Scholar 

  18. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley Interscience Publication, Chichester (2000)

    Google Scholar 

  19. Garey, M.R., Johnson, D.S.: Computer and Intractability: a Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco (1978)

    Google Scholar 

  20. Gavert, H., Hurri, J., Sarela, J., Hyvarinen, A.: The fastica package for matlab (2005)

    Google Scholar 

  21. Girolami, M.: Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks 13(3), 780–784 (2002)

    Article  Google Scholar 

  22. Glover, F., Laguna, M.: Tabu Search. Kluwer, Norwell (1997)

    MATH  Google Scholar 

  23. Har-Peled, S., Sadri, B.: How fast is the k-means method? Algorithmica 41(3), 185–202 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  24. Hartigan, J.: Clustering Algorithms. Wiley, New York (1975)

    MATH  Google Scholar 

  25. Hastie, T.J., Stuetzle, W.: Principal curves. Journal of the American Statistical Associations 84(406), 502–516 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  26. Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: Identification and analysis of coexpressed genes. Genome Research 9, 1106–1115 (1999)

    Article  Google Scholar 

  27. Hyvärinen, A., Kahunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Chichester (2001)

    Google Scholar 

  28. Jessop, A.: Informed assessments: an introduction to information, entropy and statistics. Ellis Horwood, New York (1995)

    MATH  Google Scholar 

  29. Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 2, 241–254 (1967)

    Article  Google Scholar 

  30. Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (1986)

    Google Scholar 

  31. Kaufman, L., Rousseeuw, P.: Finding groups in data: an introduction to cluster analysis. Wiley, New York (1990)

    Google Scholar 

  32. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  33. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer Series in Information Sciences. Springer, Berlin (2001)

    MATH  Google Scholar 

  34. Krzanowski, W.J., Lai, Y.T.: A criterion for determining the number of groups in a data set using sum of squares clustering. Biometrika 44, 23–34 (1985)

    MathSciNet  Google Scholar 

  35. Li, Y., Cichocki, A., Amari, S.: Analysis of sparse representation and blind source separation. Neural Computation 16, 1193–1234 (2004)

    Article  MATH  Google Scholar 

  36. Lloyd, S.P.: Least squares quantization in pcm. IEEE Transactions on Information Theory 28, 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  37. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, vol. 1, pp. 281–296 (1967)

    Google Scholar 

  38. Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)

    Article  Google Scholar 

  39. Minaei-Bidgoli, B., Topchy, A., Punch, W.F.: Ensembles of partitions via data resampling (to be discovered, 2005)

    Google Scholar 

  40. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)

    Google Scholar 

  41. Morrison, D.F.: Multivariate Statistical Methods. McGraw-Hill, New York (1967)

    MATH  Google Scholar 

  42. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Becker, S., Dietterich, T., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)

    Google Scholar 

  43. Oja, E.: A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15, 267–273 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  44. Pearson, K.: Principal components analysis. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 6(2), 559 (1901)

    MathSciNet  Google Scholar 

  45. Resnikoff, H.L., Wells, R.O.: Wavelet analysis: the scalable structure of information. Springer, Berlin (1998)

    MATH  Google Scholar 

  46. Ristad, E.S., Yianilos, P.N.: Learning string edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5), 522–532 (1998)

    Article  Google Scholar 

  47. Rohatgi, V.K.: An Introduction to Probablity Theory and Mathematical Statistics. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York (1976)

    Google Scholar 

  48. Rubner, Y., Tomasi, C.: Texture-based image retrieval without segmentation. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision 1999, Kerkyra, Greece, vol. 2, pp. 1018–1024 (1999)

    Google Scholar 

  49. Scholkopf, B., Platt, J., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7) (2001)

    Google Scholar 

  50. Scholkopf, B., Smola, A., Muller, K.R., Scholz, M., Ratsch, G.: Kernel pca and de-noising in feature space. In: Advances in Neural Information Processing Systems, vol. 11, pp. 536–542 (1999)

    Google Scholar 

  51. Smyth, P.: Clustering sequences with hidden markov models. In: Advances in Neural Information Processing Systems, vol. 9, pp. 648–654 (1997)

    Google Scholar 

  52. Spielman, D.A., Teng, S.H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal ACM 51(3), 385–463 (2004)

    Article  MathSciNet  Google Scholar 

  53. Staiano, A., De Vinco, L., Ciaramella, A., Raiconi, G., Tagliaferri, R.: Probabilistic principal surfaces for yeast gene microarray data mining. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275. Springer, Heidelberg (2004)

    Google Scholar 

  54. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2), 411–423 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  55. Turner, J.R.: Introduction to analysis of variance: Design, analysis, and interpretation. Sage Publications, Thousand Oaks (2001)

    Google Scholar 

  56. Von Neumann, J.: Mathematical Foundations of Quantum Mechanics. Princeton University Press, Princeton (1955)

    MATH  Google Scholar 

  57. Von Neumann, J.: The Computer and the Brain. Yale University Press, New Haven (1958)

    MATH  Google Scholar 

  58. Voronoi, G.: Nouvelles applications del paramètres continus à la théorie des formes quadratiques. Journal für die Reine und Angewandte Mathematik 133, 97–179 (1908)

    Article  MATH  Google Scholar 

  59. Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  60. Wilks, S.S.: Mathematical Statistics. Wiley Publications in Statistics. John Wiley, New York (1962)

    MATH  Google Scholar 

  61. Wishart, D.: ClustanGraphics Primer: A Guide to Clustaer Analysis, 2nd edn. Clustan Ltd., Edinburgh (2003)

    Google Scholar 

  62. Wojna, A.: Center-based indexing in vector and metric spaces. Fundamenta Informaticae 56(3), 285–310 (2003)

    MathSciNet  MATH  Google Scholar 

  63. Yu, K., Ji, L., Zhang, X.: Kernel nearest-neighbor algorithm. Neural Processing Letters 15(2), 147–156 (2002)

    Article  MATH  Google Scholar 

  64. Zibulevsky, M., Pearlmutter, B.A.: Blind source separation by sparse decomposition in a signal dictionary. Neural Computation 13, 863–882 (2001)

    Article  MATH  Google Scholar 

  65. Zucker, S.W.: Local structure, consistency, and continuous relaxation. In: Haralick, R., Simon, J.C. (eds.) Digital Image Processing Noordhoff International, Leyden (1980)

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Apolloni, B., Pedrycz, W., Bassis, S., Malchiodi, D. (2008). Suitably Representing Data. In: The Puzzle of Granular Computing. Studies in Computational Intelligence, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79864-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-79864-4_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-79863-7

  • Online ISBN: 978-3-540-79864-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics