Abstract
For fully taking into account the feature of environmental data set the Gaussian mixture model (GMM) is combined with the Dirichlet Process (DP) to solve the problem of specifying the initial cluster number. The Gibbs sampling algorithm is also used as the substitute of the Expectation Maximization algorithm to estimate the parameter of the model with Dirichlet Process. The clustering process is implemented under the framework of Spark so as to deal with farm environmental data set stored in distributed computer cluster. Experiment results with external criterion show that the improved clustering method has a better ability in data anomaly detection compared with other common cluster methods. Farm environmental data anomaly detection is implemented by the improved clustering method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ananthara, M.G., Arunkumar, T., Hemavathy, R.: CRY — an improved crop yield prediction model using bee hive clustering approach for agricultural data sets. In: International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 473–478 (2013)
Cleverly, J., Eamus, D., Gorsel, E.V., et al.: Productivity and evapotranspiration of two contrasting semiarid ecosystems following the 2011 carbon land sink anomaly. Agric. For. Meteorol. 220, 151–159 (2016)
Dieleman, H.: Urban agriculture in Mexico City; balancing between ecological, economic, social and symbolic value. J. Clean. Produc. (2016)
Dudik, J.M., Kurosu, A., Coyle, J.L., et al.: A comparative analysis of DBSCAN, K-means, and quadratic variation algorithms for automatic identification of swallows from swallowing accelerometry signals. Comput. Biol. Med. 59, 10–18 (2015)
Sansegundo, R., Cordoba, R., Ferreiros, J., et al.: Frequency features and GMM-UBM approach for Gait-based person identification using smartphone inertial signals. Pattern Recogn. Lett. 73, 60–67 (2016)
Fox, E.B., Choi, D.S., Willsky, A.S.: Nonparametric Bayesian methods for large scale multi-target tracking. In: 1977 11th Asilomar Conference on Circuits, Systems and Computers 1977, Conference Record, pp. 2009–2013 (2006)
Orbanz, P., Buhmann, J.M.: Nonparametric Bayesian image segmentation. Int. J. Comput. Vis. 77(1–3), 25–45 (2008)
Ahmadi, S., Yeh, C.H., Papageorgiou, E.I., et al.: An FCM-FAHP approach for managing readiness-relevant activities for ERP implementation. Comput. Indus. Eng. 88, 501–517 (2015)
Acknowledgments
This work is supported by the Key Project of Science and Technology Commission of Shanghai Municipality under Grant No. 14DZ1206302. National Natural Science Foundation of China (Grant No. 61304031), and Innovation Program of Shanghai Municipal Education Commission (14YZ007). This work was also supported by Shanghai College Young Teachers’ Training Plan (No. B37010913003). The authors would like to thank editors and anonymous reviewers for their valuable comments and suggestions to improve this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Pang, H., Deng, L., Wang, L., Fei, M. (2016). The Application of Spark-Based Gaussian Mixture Model for Farm Environmental Data Analysis. In: Zhang, L., Song, X., Wu, Y. (eds) Theory, Methodology, Tools and Applications for Modeling and Simulation of Complex Systems. AsiaSim SCS AutumnSim 2016 2016. Communications in Computer and Information Science, vol 645. Springer, Singapore. https://doi.org/10.1007/978-981-10-2669-0_18
Download citation
DOI: https://doi.org/10.1007/978-981-10-2669-0_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2668-3
Online ISBN: 978-981-10-2669-0
eBook Packages: Computer ScienceComputer Science (R0)