Skip to main content

Clustering with Diversity

  • Conference paper
Book cover Automata, Languages and Programming (ICALP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6198))

Included in the following conference series:

Abstract

We consider the clustering with diversity problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least ℓ points, all of which have distinct colors. We give a 2-approximation to this problem for any ℓ when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless P =  NP, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem is mainly motivated by applications in privacy-preserving data publication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: PODS, pp. 153–162 (2006)

    Google Scholar 

  2. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 246–258. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: Ranking and clustering. J. ACM 55(5), 1–27 (2008)

    Article  MathSciNet  Google Scholar 

  4. Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using Dedupalog. In: ICDE, pp. 952–963 (2009)

    Google Scholar 

  5. Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic acids research 25(1), 31 (1997)

    Article  Google Scholar 

  6. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1), 89–113 (2004)

    Article  MATH  Google Scholar 

  7. Beresford, A., Stajano, F.: Location privacy in pervasive computing. IEEE Pervasive Computing, 46–55 (2003)

    Google Scholar 

  8. Wong, R.C.-W., Li, J., Fu, A.-C., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: SIGKDD, pp. 754–759 (2006)

    Google Scholar 

  9. Charikar, M., Khuller, S., Mount, D., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642–651 (2001)

    Google Scholar 

  10. Davidson, I., Ravi, S.: Intractability and clustering with constraints. In: ICML, pp. 201–208 (2007)

    Google Scholar 

  11. Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: STOC, pp. 381–390 (2009)

    Google Scholar 

  12. Feldman, D., Fiat, A., Kaplan, H., Nissim, K.: Private coresets. In: STOC, pp. 361–370 (2009)

    Google Scholar 

  13. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: VLDB, pp. 758–769 (2007)

    Google Scholar 

  14. Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, pp. 1176–1185 (2006)

    Google Scholar 

  15. Hoppner, F., Klawonn, F., Platz, R., Str, S.: Clustering with Size Constraints. Computational Intelligence Paradigms: Innovative Applications (2008)

    Google Scholar 

  16. Ji, X.: Graph Partition Problems with Minimum Size Constraints. PhD thesis, Rensselaer Polytechnic Institute (2004)

    Google Scholar 

  17. Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD, pp. 217–228 (2006)

    Google Scholar 

  18. Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms, 4th edn. Springer, Heidelberg (2007)

    Google Scholar 

  19. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25 (2006)

    Google Scholar 

  20. Li, J., Yi, K., Zhang, Q.: Clustering with diversity (2010), http://arxiv.org/abs/1004.2968

  21. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)

    Google Scholar 

  22. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)

    Google Scholar 

  23. Alsuwaiyel, M.H.: Algorithms: Design Techniques and Analysis. World Scientific, Singapore (1998)

    Google Scholar 

  24. Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: SIGMOD (2007)

    Google Scholar 

  25. Samarati, P.: Protecting respondents’ identities in microdata release. TKDE 13(6), 1010–1027 (2001)

    Google Scholar 

  26. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: ICML, pp. 1103–1110 (2000)

    Google Scholar 

  27. Wagstaff, K., Cardie, C., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)

    Google Scholar 

  28. Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)

    Google Scholar 

  29. Xiao, X., Tao, Y.: m-invariance: Towards privacy preserving re-publication of dynamic datasets. In: SIGMOD, pp. 689–700 (2007)

    Google Scholar 

  30. Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: EDBT (2010)

    Google Scholar 

  31. Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS, pp. 505–512 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, J., Yi, K., Zhang, Q. (2010). Clustering with Diversity. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds) Automata, Languages and Programming. ICALP 2010. Lecture Notes in Computer Science, vol 6198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14165-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14165-2_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14164-5

  • Online ISBN: 978-3-642-14165-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics