Abstract
It has always been a major challenge to cluster high dimensional data considering the inherent sparsity of data-points. Our model uses attribute selection and handles the sparse structure of the data effectively. We select the most informative attributes that do preserve cluster structure using LASSO (Least Absolute Selection and Shrinkage Operator). Though there are other methods for attribute selection, LASSO has distinctive properties that it selects the most correlated set of attributes of the data. This model also identifies dominant attributes of each cluster which retain their predictive power as well. The quality of the projected clusters formed, is also assured with the use of LASSO.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hastie, T., et al.: Linear Methods for Regression in The Elements of Statistical Learning – Data Mining, Inference, and Prediction, 2nd edn. Springer
Donoho, D.L.: High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality
Tibshirani, R.: Regression Shrinkage and Selection via Lasso. Journal of the Royal Statistical Society 58(1), 267–288 (1996)
Bouguessa, M., Wang, S.: Mining Projected Clusters in High- Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering 21(4) (2009)
Yip, K.Y., et al.: HARP: A practical Projected Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering 16(11) (2004)
Agarwal, R., et al.: Automatic Subspace Clustering of High Dimensional Data. Data Mining and Knowledge Discovery 11(1), 5–33 (2005)
Yip, K.Y., et al.: Identifying Projected Clusters from Gene Expression Profiles. J. Biomedical Informatics 37(5), 345–357 (2004)
Aggarwal, C.C., Yu, P.S.: Redefining Clustering for High-Dimensional Applications. IEEE Transactions on Knowledge and Data Engineering 14(2) (2002)
Efron, B., et al.: Least Angle Regression. The Annals of Statistics 32(2), 407–499 (2004)
Johnstone, M., Titterington, D.M.: Statistical challenges of high-dimensional data. Phil. Trans. R. Soc. AÂ 200(367) (2009)
Sun, W., et al.: Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics 6-148-167 (2012)
Lv, J., et al.: Prediction of Transient Stability Boundary Using the Lasso. IEEE Transaction on Power Systems 28(1) (2013)
Bondell, H.D., Reich, B.J.: Simultaneous Regression Shrinkage, Variable Selection and Supervised Clustering of Predictors with OSCAR. Biometrics 64, 115–123 (2008)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. JRSSB 67(2), 301–320 (2005)
Zou, H.: The adaptive lasso and its oracle properties. JASA 101(476), 1418–1429 (2006)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. JRSSB 68, 49–67 (2006)
Fang, Y.: Asymptotic Equivalence between Cross – validation and Akaike Information Criteria in Mixed- Effects Models. Journal of Data Science, 15–21 (2011)
Aggrawal, C.C., et al.: Fast Algorithm for Projected Clustering. Proc. ACM SIGMOD 1999, 329–340 (2005)
Procopius, C.M., et al.: A Monte Carlo algorithm for fast projective clustering. In: Proc. ACM SIGMOID International Conference on Management of Data (2002)
Kriegel, H.P., et al.: Clustering High Dimensional Data: A Survey on Subspace Clustering, Pattern- Based Clustering, and Correlation Clustering. ACM Trans. Know. Discov. Data 3(1), Article 1 (2009)
Yip, K.Y., et al.: On discovery of Extremely Low- Dimensional Clusters using Semi-Supervised Projected Clustering. In: Proc. 21st International Conference on Management of Data Engineering (ICDE 2005), pp. 329–340 (2005)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 651–666 (2010)
She, Y.: Sparse Regression with exact clustering, Ph. D. Dissertation, Dept. Statistics. Stanford Univ (2008)
Fraley, C., Hesterberg, T.: Least Angle Regression and Lasso for large data sets, Technical Report, Insightful Corporation (2008)
Ma, S., et al.: Supervised group lasso with application to microarray data analysis. Technical Report, Department of Statistics and Actuarial Science (2007)
Fu, W.: Penalized Regressions: The Bridge Versus the Lasso (1998)
Little, M.A., et al.: Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection (2007)
Fan, J.: Selected Works of Peter J. Bickel. Springer (2013)
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Narayanan, L., Babu, A.S., Kaimal, M.R. (2015). Projected Clustering with LASSO for High Dimensional Data Analysis. In: Satapathy, S., Biswal, B., Udgata, S., Mandal, J. (eds) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol 327. Springer, Cham. https://doi.org/10.1007/978-3-319-11933-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-11933-5_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11932-8
Online ISBN: 978-3-319-11933-5
eBook Packages: EngineeringEngineering (R0)