Streaming Massive Electric Power Data Analysis Based on Spark Streaming

Zhang, Xudong; Qian, Zhongwen; Shen, Siqi; Shi, Jia; Wang, Shujun

doi:10.1007/978-3-030-18590-9_14

Xudong Zhang¹⁹,
Zhongwen Qian¹⁹,
Siqi Shen¹⁹,
Jia Shi²⁰ &
…
Shujun Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11448))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3555 Accesses
3 Citations

Abstract

Electric power user classification is one of the most important methods to realize the optimal allocation of power resources. Through the analysis of users’needs, behavior and habits, Countries and enterprises can offer different incentives for different users. In this way, people are more willing to use green and clean Electric power resources. In the analysis of user clustering, there is a need for real-time processing of massive and high-speed data. In this paper we propose a novel distributed user data stream clustering method based on Spark streaming, improved clusStream algorithm and improved K-means algorithm named “DStreamEPK”. In the final experimental evaluation, we first tested the clustering effectiveness of DStreamEPK on UCI datasets, the results show that the proposed DStreamEPK is better than the traditional K-means clustering algorithm. At the same time, it is found that DStreamEPK can cluster user’s electricity data quickly and efficiently through testing on user’s real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. ACM J. Exp. Algorithmics 17(1), 2–4 (2012)
MathSciNet MATH Google Scholar
Bogojeska, J., Alexa, A., Altmann, A., Lengauer, T., Rahnenführer, J.: Rtreemix: an R package for estimating evolutionary pathways and genetic progression scores. Bioinformatics 24(20), 2391–2392 (2008)
Article Google Scholar
Chen, W., Zhou, K., Yang, S., Cheng, W.: Data quality of electricity consumption data in a smart grid environment. Renew. Sustain. Energy Rev. 75, 98–105 (2016)
Article Google Scholar
Freytag, J.C., Lockemann, P.C., Abiteboul, S., Carey, M.J., Selinger, P.G., Heuer, A. (eds.): VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, 9–12 September 2003, Berlin, Germany. Morgan Kaufmann (2003)
Google Scholar
Goldbergs, G., Maier, S.W., Levick, S.R., Edwards, A.: Limitations of high resolution satellite stereo imagery for estimating canopy height in Australian tropical savannas. Int. J. Appl. Earth Obs. Geoinf. 75, 83–95 (2019)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. 28(1), 100–108 (1979)
MATH Google Scholar
Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)
Article Google Scholar
Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-Stream: evolution-based technique for stream clustering. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds.) ADMA 2007. LNCS (LNAI), vol. 4632, pp. 605–615. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73871-8_58
Chapter Google Scholar
Wang, H.Z., Liu, K., Zhou, J., Wang, Y.F.: Pretreatment of short-term load forecasting based on k-means clustering algorithm. Computer Simulation (2016)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Usenix Conference on Hot Topics in Cloud Computing (2010)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)
Article Google Scholar
Zhao, W., Gong, Y.: Load curve clustering based on kernel k-means. Electr. Power Autom. Equip. (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Zhejiang Electric Power Company, Ltd., Hangzhou, 310007, China
Xudong Zhang, Zhongwen Qian & Siqi Shen
Zhejiang Huayun Information Technology Company, Ltd., Hangzhou, 310012, China
Jia Shi
College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
Shujun Wang

Authors

Xudong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongwen Qian
View author publications
You can also search for this author in PubMed Google Scholar
Siqi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jia Shi
View author publications
You can also search for this author in PubMed Google Scholar
Shujun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shujun Wang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Guoliang Li
Duke University, Durham, NC, USA
Jun Yang
University of Porto, Porto, Portugal
Joao Gama
Chiang Mai University, Chiang Mai, Thailand
Juggapong Natwichai
Beihang University, Beijing, China
Yongxin Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Qian, Z., Shen, S., Shi, J., Wang, S. (2019). Streaming Massive Electric Power Data Analysis Based on Spark Streaming. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-18590-9_14
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18589-3
Online ISBN: 978-3-030-18590-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics