Abstract
Document indexing using dimension reduction has been widely studied in recent years. Application of these methods in large distributed systems may be inefficient due to the required computational, storage, and communication costs. In this paper, we propose DLPR, a distributed locality preserving dimension reduction algorithm, to project a large distributed data set into a lower dimensional space. Partitioning methods are applied to divide the data set into several clusters. The system nodes communicate through virtual groups to project the clusters to the target space, independently or in conjunction with each other.
The actual computation of reduction transforms is performed using Locality Preserving Indexing, which is a less studied method in distributed environments. Experimental results demonstrate the efficiency of DLPR in terms of preserving the local structure of the data set, and reducing the computing and storage costs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Larose, D.T.: Data mining methods and models. Wiley-Interscience, Hohn Wiley and Sons, Hoboken, New Jersey (2005)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, John Wiley and Sons (1995)
Smith, L.: A tutorial on principal components analysis. University of Otago (2002)
Heisterkamp, D.R.: Building a latent semantic index of an image database from patterns of relevance feedback. In: 4th International Conference on Pattern Recognition, pp. 134–137 (2002)
Sahouria, E., Zakhor, A.: Content analysis of video using principal componets. In: 3rd International Conference on Image Processing, pp. 541–545 (1998)
Smaragdis, P., Raj, B., Shashanka, M.: A probabilistic latent variable model for acoustic modeling. In: Workshop on Advances in Models for Acoustic Processing at NIPS (2006)
He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems 16, Vancouver, Canada (2003)
Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineerin 17, 1624–1637 (2005)
Bassu, D., Behrens, C.: Distributed LSI: scalable concept-based information retrieval with high semantic resolution. In: 2003 Text Mining Workshop, pp. 72–82. ACM Press, San Francisco (2003)
Zhang, Z., Zha, H.: Structure and perturbation analysis of truncated SVD for column-partitioned matrices. Matrix Analysis and Applications 22, 1245–1262 (2001)
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)
Gao, J., Zhang, J.: Clustered SVD strategies in latent semantic indexing. Information Processing and Management 41, 1051–1063 (2005)
Gao, J., Zhang, J.: Text Retrieval Using Sparsified Concept Decomposition Matrix. In: Zhang, J., He, J.-H., Fu, Y. (eds.) CIS 2004. LNCS, vol. 3314, pp. 523–529. Springer, Heidelberg (2004)
Zeimpekis, D., Gallopoulos, E.: ClSI: A flexible approximation scheme from clustered term-document matrices. In: SIAM Data Mining Conference, Newport Beach, California, pp. 631–635 (2005)
Vigna, S.: Distributed, large-scale latent semantic analysis by index interpolation. In: 3rd International Conference on Scalable Information Systems, vol. 18 (2008)
Alham, N.K., Li, M., Liu, Y., Hammoud, S.: A MapReduce-based Distributed SVM Algorithm for Automatic Image Annotation. Computers and Mathematics with Applications 62, 2801–2811 (2011)
Liu, Y., Li, M., Hammoud, S., Alham, N.K., Ponraj, M.: A MapReduce based distributed LSI. In: 7th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 297–298. IEEE Press, Yantai (2010)
He, X., Niyogi, P.: Indexing by latent semantic analysis. Neural Information Processing Systems 6, 153–160 (2003)
Lo, V., Zhou, D., Liu, Y., Dickey, C.G., Li, J.: Scalable supernode selection in peer-to-peer overlay networks. In: 2nd HOT-P2P Workshop, pp. 18–25. IEEE Press (2005)
Datta, S., Giannella, C., Kargupta, H.: K-Means Clustering over a Large, Dynamic Network. In: SIAM International Conference on Data Mining, pp. 153–164 (2006)
Hammouda, K.M., Kamel, M.S.: Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization. IEEE Transactions on Knowledge and Data Engineering, 681–698 (2009)
Panigrahy, R.: Entropy-based nearest neighbor algorithm in high dimensions. In: ACM-SIAM Symposium on Discrete Algorithms (2006)
Mashayekhi, H., Habibi, J.: K-Nearest Neighbor Search in Peer-to-Peer Systems. In: 2nd International Conference on Advances in P2P Systems, pp. 2–5 (2010)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: 3rd International Conference on Research and Development in Information Retreival, Toronto, Canada, pp. 267–273 (2003)
Lovasz, L., Plummer, M.: Matching Theory. Akadémiai Kiadó. North Holland, Budapest (1986)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ghashami, M., Mashayekhi, H., Habibi, J. (2012). DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Internet and Distributed Computing Systems. IDCS 2012. Lecture Notes in Computer Science, vol 7646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34883-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-34883-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34882-2
Online ISBN: 978-3-642-34883-9
eBook Packages: Computer ScienceComputer Science (R0)