Abstract
In recent years, the field of storage and data processing has known a radical evolution, because of the large mass of data generated every minute. As a result, traditional tools and algorithms have become incapable of following this exponential evolution and yielding results within a reasonable time. Among the solutions that can be adopted to solve this problem, is the use of distributed data storage and parallel processing. In our work we used the distributed platform Spark, and a massive data set called hyperspectral image. Indeed, a hyperspectral image processing, such as visualization and feature extraction, has to deal with the large dimensionality of the image. Several dimensions reduction techniques exist in the literature. In this paper, we proposed a distributed and parallel version of Principal Component Analysis (PCA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mercier, L.: Système d’analyse et de visualisation d’images hyperspectrales appliqué aux sciences planétaires (2011)
Zebin, W., et al.: Parallel and distributed dimensionality reduction of hyperspectral data on cloud computing architectures. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 9(6), 2270–2278 (2016)
Apache Software Foundation. Official apache hadoop. http://hadoop.apache.org/. Accessed 10 July 2017
Apache Spark - Lightning-Fast Cluster Computing. http://spark.apache.org/. Accessed 10 July 2017
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)
Elgamal, T., et al.: sPCA: scalable principal component analysis for big data on distributed platforms. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM (2015)
Shlens, J.: A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100 (2014)
MLlib machine learning library. https://spark.apache.org/mllib/. Accessed 10 July 2017
Mahout machine learning library. http://mahout.apache.org/. 10 July 2017
AVIRIS - Airborne Visible/Infrared Imaging Spectrometer - Data. http://aviris.jpl.nasa.gov/data/image_cube.html. 10 July 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Zbakh, A., Alaoui Mdaghri, Z., El Yadari, M., Benyoussef, A., El Kenz, A. (2018). Proposition of a Parallel and Distributed Algorithm for the Dimensionality Reduction with Apache Spark. In: Ben Ahmed, M., Boudhir, A. (eds) Innovations in Smart Cities and Applications. SCAMS 2017. Lecture Notes in Networks and Systems, vol 37. Springer, Cham. https://doi.org/10.1007/978-3-319-74500-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-74500-8_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74499-5
Online ISBN: 978-3-319-74500-8
eBook Packages: EngineeringEngineering (R0)