Exploring Apache Spark Data APIs for Water Big Data Management

El Hassane, Nassif; Hajji, Hicham

doi:10.1007/978-3-030-11881-5_10

Exploring Apache Spark Data APIs for Water Big Data Management

Nassif El Hassane¹⁵ &
Hicham Hajji¹⁵

Conference paper
First Online: 14 February 2019

414 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 913))

Abstract

Managing data complexity is a recurrent problem in multiple domains related to water resources management such as utilities, hydrological and meteorological modelling. Recently and since the advent of intelligent sensors, we observe a systemic growth in the volume of collected data. Besides, these kinds of sensors generate near real-time data under various formats. To get the right value of this kind of water datasets we need to design new solutions, efficient enough to manage massive data coming from intelligent sensors in near real time and under various formats. We present in our paper a reference architecture for managing massive data collected from smart meters. Also, we show how recent advances in big data technologies mainly the Apache Spark project can effectively be used to obtain insights from massive datasets. Finally, we will focus on presenting the advantages that provide the distributed execution model of Spark by exploring three Apache Spark APIs: RDD, Dataframe, and SparkR.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Such advanced queries will be developed separately in further works.
2.
https://code.google.com/p/smart-meter-information-portal/.

References

Akyildiz, L.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A survey on sensor networks (2002)
Google Scholar
Bennett, N.D., Croke, B.F.W., Guariso, G., Guillaume, J.H.A., Hamilton, S.H., Jakeman, A.J., Marsili-Libelli, S., Newham, L.T.H., Norton, J.P., Perrin, C., Pierce, S.A., Robson, B., Seppelt, R., Voinov, A.A., Fath, B.D., Andreassian, V.: Position paper : characterising performance of environmental models. Environ. Model. Softw. 40, 1–20 (2013)
Article Google Scholar
Bernardo, V., Curado, M., Staub, T., Braun, T.: Towards energy consumption measurement in a cloud computing wireless testbed. In: Proceedings of the 2011 First International Symposium on Network Cloud Computing and Applications, NCCA 2011, Washington, DC, pp. 91–98. IEEE Computer Society (2011)
Google Scholar
D’Agostino, D., Clematis, A., Galizia, A., Quarati, A., Danovaro, E., Roverelli, L., Zereik, G., Kranzlmüller, D., Schiffers, M., Felde, N.G., Straube, C., Caumont, O., Richard, E., Garrote, L., Harpham, Q., Jagers, H.R.A., Dimitrijevic, V., Dekic, L., Fiorii, E., Delogu, F., Parodi, A.: The DRIHM project: a flexible approach to integrate HPC, grid and cloud resources for hydro-meteorological research. In: Proceeding of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, Piscataway, pp. 536–546. IEEE Press (2014)
Google Scholar
Dunning, T., Friedman, E.: Time Series Databases. O’Reilly Media, Greenwich (2014)
Google Scholar
Eichinger, F., Pathmaperuma, D., Vogt, H., Muller, E.: Data analysis challenges in the future energy domain. In: Yu, T., Chawla, N., Simoff, S. (eds.) Computational Intelligent Data Analysis for Sustainable Development; Data Mining and Knowledge Discovery Series. CRC Press, Taylor Francis Group, Boca Raton. Chapter 7
Google Scholar
Vatsavai, R.R., Ganguly, A., Chandola, V., Stefanidis, A., Klasky, S., Shekhar, S.: Spatiotemporal data mining in the era of big spatial data: algorithms and applications. In: Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial 2012, New York, pp. 1–10. ACM (2012)
Google Scholar
Fang, X., Misra, S., Xue, G., Yang, D.: Smart grid - the new and improved power grid: a survey. IEEE Commun. Surv. Tutor. (2011)
Google Scholar
Yigit, M., Cagri Gungor, V., Baktir, S.: Cloud computing for smart grid applications. Comput. Netw. 70, 312–329 (2014)
Article Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 2012, Berkeley, p. 2. USENIX Association (2012)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, Berkeley, p. 10. USENIX Association (2010)
Google Scholar
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP 2013, New York, pp. 423–438. ACM (2013)
Google Scholar
Laney, D.: META Group, 3D Data Management: Controlling Data Volume, Velocity, and Variety, February 2001
Google Scholar
Eichinger, F., Pathmaperuma, D., Vogt, H., Müller, E.: Data analysis challenges in the future energy domain. In: Yu, T., Chawla, N., Simoff, S. (eds.) Computational Intelligent Data Analysis for Sustainable Development. Chapman and Hall/CRC, London (2013)
Google Scholar
http://camel.apache.org/
http://sqoop.apache.org/
https://kafka.apache.org/
http://cassandra.apache.org/

Download references

Author information

Authors and Affiliations

School of Geomatic Sciences and Surveying Engineering, SGIT, IAV Institute, Rabat, Morocco
Nassif El Hassane & Hicham Hajji

Authors

Nassif El Hassane
View author publications
You can also search for this author in PubMed Google Scholar
Hicham Hajji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nassif El Hassane .

Editor information

Editors and Affiliations

Computer Sciences Department, Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaâdi University, Souani Tangier, Morocco
Mostafa Ezziyyani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El Hassane, N., Hajji, H. (2019). Exploring Apache Spark Data APIs for Water Big Data Management. In: Ezziyyani, M. (eds) Advanced Intelligent Systems for Sustainable Development (AI2SD’2018). AI2SD 2018. Advances in Intelligent Systems and Computing, vol 913. Springer, Cham. https://doi.org/10.1007/978-3-030-11881-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-11881-5_10
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11880-8
Online ISBN: 978-3-030-11881-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics