Skip to main content

Cluster Analysis

  • Chapter
  • First Online:
Book cover Data Mining

Abstract

Many applications require the partitioning of data points into intuitively similar groups. The partitioning of a large number of data points into a smaller number of groups helps greatly in summarizing the data and understanding it for a variety of data mining applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For a fixed cluster assignment \({\cal C}_1 \ldots {\cal C}_k\), the gradient of the clustering objective function \(\sum _{j=1}^k \sum _{ \overline {X_i} \in {\cal C}_j} || \overline {X_i} - \overline {Y_j}||^2\) with respect to \(\overline {Y_j}\) is \(2 \sum _{ \overline {X_i} \in {\cal C}_j} (\overline {X_i} - \overline {Y_j})\). Setting the gradient to 0 yields the mean of cluster \({\cal C}_j\) as the optimum value of \(\overline {Y_j}\). Note that the other clusters do not contribute to the gradient, and, therefore, the approach effectively optimizes the local clustering objective function for \({\cal C}_j\).

  2. 2.

    http://www.dmoz.org

  3. 3.

    This is achieved by setting the partial derivative of \({\cal L}({\cal D}|{\cal M})\) (see Eq. 6.12) with respect to each parameter in \(\overline {\mu _i}\) and \(\sigma \) to 0.

  4. 4.

    The parameter \(MinPts\) is used in the original DBSCAN description. However, the notation τ is used here to retain consistency with the grid-clustering description.

  5. 5.

    See [257], which is a graph-based alternative to the LOF algorithm for locality-sensitive outlier analysis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Aggarwal, C. (2015). Cluster Analysis. In: Data Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-14142-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14142-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14141-1

  • Online ISBN: 978-3-319-14142-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics