Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 327))

Abstract

It has always been a major challenge to cluster high dimensional data considering the inherent sparsity of data-points. Our model uses attribute selection and handles the sparse structure of the data effectively. We select the most informative attributes that do preserve cluster structure using LASSO (Least Absolute Selection and Shrinkage Operator). Though there are other methods for attribute selection, LASSO has distinctive properties that it selects the most correlated set of attributes of the data. This model also identifies dominant attributes of each cluster which retain their predictive power as well. The quality of the projected clusters formed, is also assured with the use of LASSO.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hastie, T., et al.: Linear Methods for Regression in The Elements of Statistical Learning – Data Mining, Inference, and Prediction, 2nd edn. Springer

    Google Scholar 

  2. Donoho, D.L.: High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality

    Google Scholar 

  3. Tibshirani, R.: Regression Shrinkage and Selection via Lasso. Journal of the Royal Statistical Society 58(1), 267–288 (1996)

    Google Scholar 

  4. Bouguessa, M., Wang, S.: Mining Projected Clusters in High- Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 21(4) (2009)

    Google Scholar 

  5. Yip, K.Y., et al.: HARP: A practical Projected Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering 16(11) (2004)

    Google Scholar 

  6. Agarwal, R., et al.: Automatic Subspace Clustering of High Dimensional Data. Data Mining and Knowledge Discovery 11(1), 5–33 (2005)

    Article  MathSciNet  Google Scholar 

  7. Yip, K.Y., et al.: Identifying Projected Clusters from Gene Expression Profiles. J. Biomedical Informatics 37(5), 345–357 (2004)

    Article  MathSciNet  Google Scholar 

  8. Aggarwal, C.C., Yu, P.S.: Redefining Clustering for High-Dimensional Applications. IEEE Transactions on Knowledge and Data Engineering 14(2) (2002)

    Google Scholar 

  9. Efron, B., et al.: Least Angle Regression. The Annals of Statistics 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  10. Johnstone, M., Titterington, D.M.: Statistical challenges of high-dimensional data. Phil. Trans. R. Soc. A 200(367) (2009)

    Google Scholar 

  11. Sun, W., et al.: Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics 6-148-167 (2012)

    Google Scholar 

  12. Lv, J., et al.: Prediction of Transient Stability Boundary Using the Lasso. IEEE Transaction on Power Systems 28(1) (2013)

    Google Scholar 

  13. Bondell, H.D., Reich, B.J.: Simultaneous Regression Shrinkage, Variable Selection and Supervised Clustering of Predictors with OSCAR. Biometrics 64, 115–123 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  14. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. JRSSB 67(2), 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. Zou, H.: The adaptive lasso and its oracle properties. JASA 101(476), 1418–1429 (2006)

    Article  MATH  Google Scholar 

  16. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. JRSSB 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Fang, Y.: Asymptotic Equivalence between Cross – validation and Akaike Information Criteria in Mixed- Effects Models. Journal of Data Science, 15–21 (2011)

    Google Scholar 

  18. Aggrawal, C.C., et al.: Fast Algorithm for Projected Clustering. Proc. ACM SIGMOD 1999, 329–340 (2005)

    Google Scholar 

  19. Procopius, C.M., et al.: A Monte Carlo algorithm for fast projective clustering. In: Proc. ACM SIGMOID International Conference on Management of Data (2002)

    Google Scholar 

  20. Kriegel, H.P., et al.: Clustering High Dimensional Data: A Survey on Subspace Clustering, Pattern- Based Clustering, and Correlation Clustering. ACM Trans. Know. Discov. Data 3(1), Article 1 (2009)

    Google Scholar 

  21. Yip, K.Y., et al.: On discovery of Extremely Low- Dimensional Clusters using Semi-Supervised Projected Clustering. In: Proc. 21st International Conference on Management of Data Engineering (ICDE 2005), pp. 329–340 (2005)

    Google Scholar 

  22. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 651–666 (2010)

    Article  Google Scholar 

  23. She, Y.: Sparse Regression with exact clustering, Ph. D. Dissertation, Dept. Statistics. Stanford Univ (2008)

    Google Scholar 

  24. Fraley, C., Hesterberg, T.: Least Angle Regression and Lasso for large data sets, Technical Report, Insightful Corporation (2008)

    Google Scholar 

  25. Ma, S., et al.: Supervised group lasso with application to microarray data analysis. Technical Report, Department of Statistics and Actuarial Science (2007)

    Google Scholar 

  26. Fu, W.: Penalized Regressions: The Bridge Versus the Lasso (1998)

    Google Scholar 

  27. Little, M.A., et al.: Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection (2007)

    Google Scholar 

  28. Fan, J.: Selected Works of Peter J. Bickel. Springer (2013)

    Google Scholar 

  29. Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lidiya Narayanan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Narayanan, L., Babu, A.S., Kaimal, M.R. (2015). Projected Clustering with LASSO for High Dimensional Data Analysis. In: Satapathy, S., Biswal, B., Udgata, S., Mandal, J. (eds) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol 327. Springer, Cham. https://doi.org/10.1007/978-3-319-11933-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11933-5_23

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11932-8

  • Online ISBN: 978-3-319-11933-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics