Skip to main content

DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm

  • Conference paper
Internet and Distributed Computing Systems (IDCS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7646))

Included in the following conference series:

  • 781 Accesses

Abstract

Document indexing using dimension reduction has been widely studied in recent years. Application of these methods in large distributed systems may be inefficient due to the required computational, storage, and communication costs. In this paper, we propose DLPR, a distributed locality preserving dimension reduction algorithm, to project a large distributed data set into a lower dimensional space. Partitioning methods are applied to divide the data set into several clusters. The system nodes communicate through virtual groups to project the clusters to the target space, independently or in conjunction with each other.

The actual computation of reduction transforms is performed using Locality Preserving Indexing, which is a less studied method in distributed environments. Experimental results demonstrate the efficiency of DLPR in terms of preserving the local structure of the data set, and reducing the computing and storage costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Larose, D.T.: Data mining methods and models. Wiley-Interscience, Hohn Wiley and Sons, Hoboken, New Jersey (2005)

    Book  Google Scholar 

  2. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, John Wiley and Sons (1995)

    Google Scholar 

  3. Smith, L.: A tutorial on principal components analysis. University of Otago (2002)

    Google Scholar 

  4. Heisterkamp, D.R.: Building a latent semantic index of an image database from patterns of relevance feedback. In: 4th International Conference on Pattern Recognition, pp. 134–137 (2002)

    Google Scholar 

  5. Sahouria, E., Zakhor, A.: Content analysis of video using principal componets. In: 3rd International Conference on Image Processing, pp. 541–545 (1998)

    Google Scholar 

  6. Smaragdis, P., Raj, B., Shashanka, M.: A probabilistic latent variable model for acoustic modeling. In: Workshop on Advances in Models for Acoustic Processing at NIPS (2006)

    Google Scholar 

  7. He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems 16, Vancouver, Canada (2003)

    Google Scholar 

  8. Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineerin 17, 1624–1637 (2005)

    Article  Google Scholar 

  9. Bassu, D., Behrens, C.: Distributed LSI: scalable concept-based information retrieval with high semantic resolution. In: 2003 Text Mining Workshop, pp. 72–82. ACM Press, San Francisco (2003)

    Google Scholar 

  10. Zhang, Z., Zha, H.: Structure and perturbation analysis of truncated SVD for column-partitioned matrices. Matrix Analysis and Applications 22, 1245–1262 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  11. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Machine Learning 42, 143–175 (2001)

    Article  MATH  Google Scholar 

  12. Gao, J., Zhang, J.: Clustered SVD strategies in latent semantic indexing. Information Processing and Management 41, 1051–1063 (2005)

    Article  MATH  Google Scholar 

  13. Gao, J., Zhang, J.: Text Retrieval Using Sparsified Concept Decomposition Matrix. In: Zhang, J., He, J.-H., Fu, Y. (eds.) CIS 2004. LNCS, vol. 3314, pp. 523–529. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Zeimpekis, D., Gallopoulos, E.: ClSI: A flexible approximation scheme from clustered term-document matrices. In: SIAM Data Mining Conference, Newport Beach, California, pp. 631–635 (2005)

    Google Scholar 

  15. Vigna, S.: Distributed, large-scale latent semantic analysis by index interpolation. In: 3rd International Conference on Scalable Information Systems, vol. 18 (2008)

    Google Scholar 

  16. Alham, N.K., Li, M., Liu, Y., Hammoud, S.: A MapReduce-based Distributed SVM Algorithm for Automatic Image Annotation. Computers and Mathematics with Applications 62, 2801–2811 (2011)

    Article  MATH  Google Scholar 

  17. Liu, Y., Li, M., Hammoud, S., Alham, N.K., Ponraj, M.: A MapReduce based distributed LSI. In: 7th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 297–298. IEEE Press, Yantai (2010)

    Google Scholar 

  18. He, X., Niyogi, P.: Indexing by latent semantic analysis. Neural Information Processing Systems 6, 153–160 (2003)

    Google Scholar 

  19. Lo, V., Zhou, D., Liu, Y., Dickey, C.G., Li, J.: Scalable supernode selection in peer-to-peer overlay networks. In: 2nd HOT-P2P Workshop, pp. 18–25. IEEE Press (2005)

    Google Scholar 

  20. Datta, S., Giannella, C., Kargupta, H.: K-Means Clustering over a Large, Dynamic Network. In: SIAM International Conference on Data Mining, pp. 153–164 (2006)

    Google Scholar 

  21. Hammouda, K.M., Kamel, M.S.: Hierarchically Distributed Peer-to-Peer Document Clustering and Cluster Summarization. IEEE Transactions on Knowledge and Data Engineering, 681–698 (2009)

    Google Scholar 

  22. Panigrahy, R.: Entropy-based nearest neighbor algorithm in high dimensions. In: ACM-SIAM Symposium on Discrete Algorithms (2006)

    Google Scholar 

  23. Mashayekhi, H., Habibi, J.: K-Nearest Neighbor Search in Peer-to-Peer Systems. In: 2nd International Conference on Advances in P2P Systems, pp. 2–5 (2010)

    Google Scholar 

  24. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: 3rd International Conference on Research and Development in Information Retreival, Toronto, Canada, pp. 267–273 (2003)

    Google Scholar 

  25. Lovasz, L., Plummer, M.: Matching Theory. Akadémiai Kiadó. North Holland, Budapest (1986)

    MATH  Google Scholar 

  26. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ghashami, M., Mashayekhi, H., Habibi, J. (2012). DLPR: A Distributed Locality Preserving Dimension Reduction Algorithm. In: Xiang, Y., Pathan, M., Tao, X., Wang, H. (eds) Internet and Distributed Computing Systems. IDCS 2012. Lecture Notes in Computer Science, vol 7646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34883-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34883-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34882-2

  • Online ISBN: 978-3-642-34883-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics