DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm

Ghashami, Mina; Mashayekhi, Hoda; Habibi, Jafar

doi:10.1007/978-3-642-34883-9_8

Mina Ghashami¹⁹,
Hoda Mashayekhi¹⁹ &
Jafar Habibi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7646))

Included in the following conference series:

International Conference on Internet and Distributed Computing Systems

783 Accesses

Abstract

Document indexing using dimension reduction has been widely studied in recent years. Application of these methods in large distributed systems may be inefficient due to the required computational, storage, and communication costs. In this paper, we propose DLPR, a distributed locality preserving dimension reduction algorithm, to project a large distributed data set into a lower dimensional space. Partitioning methods are applied to divide the data set into several clusters. The system nodes communicate through virtual groups to project the clusters to the target space, independently or in conjunction with each other.

The actual computation of reduction transforms is performed using Locality Preserving Indexing, which is a less studied method in distributed environments. Experimental results demonstrate the efficiency of DLPR in terms of preserving the local structure of the data set, and reducing the computing and storage costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Dimension Reduction and Storage Optimization Techniques for Distributed and Big Data Cluster Environment

$$\mathtt {IP.LSH.DBSCAN}$$ : Integrated Parallel Density-Based Clustering Through Locality-Sensitive Hashing

Indexability-Based Dataset Partitioning

References

Larose, D.T.: Data mining methods and models. Wiley-Interscience, Hohn Wiley and Sons, Hoboken, New Jersey (2005)
Book Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, John Wiley and Sons (1995)
Google Scholar
Smith, L.: A tutorial on principal components analysis. University of Otago (2002)
Google Scholar
Heisterkamp, D.R.: Building a latent semantic index of an image database from patterns of relevance feedback. In: 4th International Conference on Pattern Recognition, pp. 134–137 (2002)
Google Scholar
Sahouria, E., Zakhor, A.: Content analysis of video using principal componets. In: 3rd International Conference on Image Processing, pp. 541–545 (1998)
Google Scholar
Smaragdis, P., Raj, B., Shashanka, M.: A probabilistic latent variable model for acoustic modeling. In: Workshop on Advances in Models for Acoustic Processing at NIPS (2006)
Google Scholar
He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems 16, Vancouver, Canada (2003)
Google Scholar
Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineerin 17, 1624–1637 (2005)
Article Google Scholar
Bassu, D., Behrens, C.: Distributed LSI: scalable concept-based information retrieval with high semantic resolution. In: 2003 Text Mining Workshop, pp. 72–82. ACM Press, San Francisco (2003)
Google Scholar
Zhang, Z., Zha, H.: Structure and perturbation analysis of truncated SVD for column-partitioned matrices. Matrix Analysis and Applications 22, 1245–1262 (2001)
Article MathSciNet MATH Google Scholar
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)
Article MATH Google Scholar
Gao, J., Zhang, J.: Clustered SVD strategies in latent semantic indexing. Information Processing and Management 41, 1051–1063 (2005)
Article MATH Google Scholar
Gao, J., Zhang, J.: Text Retrieval Using Sparsified Concept Decomposition Matrix. In: Zhang, J., He, J.-H., Fu, Y. (eds.) CIS 2004. LNCS, vol. 3314, pp. 523–529. Springer, Heidelberg (2004)
Chapter Google Scholar
Zeimpekis, D., Gallopoulos, E.: ClSI: A flexible approximation scheme from clustered term-document matrices. In: SIAM Data Mining Conference, Newport Beach, California, pp. 631–635 (2005)
Google Scholar
Vigna, S.: Distributed, large-scale latent semantic analysis by index interpolation. In: 3rd International Conference on Scalable Information Systems, vol. 18 (2008)
Google Scholar
Alham, N.K., Li, M., Liu, Y., Hammoud, S.: A MapReduce-based Distributed SVM Algorithm for Automatic Image Annotation. Computers and Mathematics with Applications 62, 2801–2811 (2011)
Article MATH Google Scholar
Liu, Y., Li, M., Hammoud, S., Alham, N.K., Ponraj, M.: A MapReduce based distributed LSI. In: 7th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 297–298. IEEE Press, Yantai (2010)
Google Scholar
He, X., Niyogi, P.: Indexing by latent semantic analysis. Neural Information Processing Systems 6, 153–160 (2003)
Google Scholar
Lo, V., Zhou, D., Liu, Y., Dickey, C.G., Li, J.: Scalable supernode selection in peer-to-peer overlay networks. In: 2nd HOT-P2P Workshop, pp. 18–25. IEEE Press (2005)
Google Scholar
Datta, S., Giannella, C., Kargupta, H.: K-Means Clustering over a Large, Dynamic Network. In: SIAM International Conference on Data Mining, pp. 153–164 (2006)
Google Scholar
Hammouda, K.M., Kamel, M.S.: Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization. IEEE Transactions on Knowledge and Data Engineering, 681–698 (2009)
Google Scholar
Panigrahy, R.: Entropy-based nearest neighbor algorithm in high dimensions. In: ACM-SIAM Symposium on Discrete Algorithms (2006)
Google Scholar
Mashayekhi, H., Habibi, J.: K-Nearest Neighbor Search in Peer-to-Peer Systems. In: 2nd International Conference on Advances in P2P Systems, pp. 2–5 (2010)
Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: 3rd International Conference on Research and Development in Information Retreival, Toronto, Canada, pp. 267–273 (2003)
Google Scholar
Lovasz, L., Plummer, M.: Matching Theory. Akadémiai Kiadó. North Holland, Budapest (1986)
MATH Google Scholar
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Sharif University of Technology, Tehran, Iran
Mina Ghashami, Hoda Mashayekhi & Jafar Habibi

Authors

Mina Ghashami
View author publications
You can also search for this author in PubMed Google Scholar
Hoda Mashayekhi
View author publications
You can also search for this author in PubMed Google Scholar
Jafar Habibi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang
Media Distribution, Telstra Corporation Limited, 21/35 Collins St, 3000, Melbourne, VIC, Australia
Mukaddim Pathan
Department of Mathematics and Computing, The University of Southern Queensland, Toowoomba, QLD, Australia
Xiaohui Tao & Hua Wang &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghashami, M., Mashayekhi, H., Habibi, J. (2012). DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Internet and Distributed Computing Systems. IDCS 2012. Lecture Notes in Computer Science, vol 7646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34883-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-34883-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34882-2
Online ISBN: 978-3-642-34883-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm

Abstract

Access this chapter

Preview

Similar content being viewed by others

Dimension Reduction and Storage Optimization Techniques for Distributed and Big Data Cluster Environment

$$\mathtt {IP.LSH.DBSCAN}$$ : Integrated Parallel Density-Based Clustering Through Locality-Sensitive Hashing

Indexability-Based Dataset Partitioning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm

Abstract

Access this chapter

Preview

Similar content being viewed by others

Dimension Reduction and Storage Optimization Techniques for Distributed and Big Data Cluster Environment

$$\mathtt {IP.LSH.DBSCAN}$$ : Integrated Parallel Density-Based Clustering Through Locality-Sensitive Hashing

Indexability-Based Dataset Partitioning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation