Data optimisation and partitioning in private cloud using dynamic clusters for agricultural datasets

Abstract

The contemporary soil analytical database processing techniques lack in optimization of databases and tables on storage grids, and are limited to the single instance of database transactions in handling large volumes of soil analytical data sets. Unfortunately, these scenarios increase data processing overheads in private agricultural cloud services. In this paper, we propose a Predictive Scalability Generator (PS-Gen) technique to optimize and partition the large data sets by creating indexed databases and tables dynamically on clustered storage grids. This intelligence is conceptualized by studying the k-means clustering algorithm which is highly used in cloud database processing systems. Our approach allows the creation of database and tables dynamically within a cluster by monitoring the cluster balancer defined in the system to handle large datasets. Alternatively, the proposed approach enables quick and dynamic movement of database and tables within clusters to manage load actively by performing the row-specific tuple management. This is achieved by integrating the horizontal sharding technique to our proposed method. The evaluated experimental results exhibit the effective management of large agricultural data in private cloud systems by effective load balancing across clusters. Further, the proposed approach is flexible for adopting network subsystems and to develop an efficient cloud-based application system.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. 1.

    Basavaraja PK, Mohamed SH, Dey P, Nethradhani RCR (2017) Geo-reference based soil fertility status in Tumkur district of Karnataka, India. Environ Ecol 35(1):93–101

    Google Scholar 

  2. 2.

    Andrienko N, Andrienko G (2011) Spatial generalization and aggregation of massive movement data. IEEE Trans Vis Comput Graph 17(2):205–219

    Article  Google Scholar 

  3. 3.

    Tahir N, Khan MJ, Ayaz M, Ali M, Fatima A, Ayesha SAB (2016) Analysis of soil fertility and mapping using geostatistical information system. Pure Appl Biol 5(3):446–452

    Article  Google Scholar 

  4. 4.

    Zhu Y, Di W, Li S (2013) Cloud computing and agricultural development of China: theory and practice. IJCSI Int J Comput Sci Issues 10(1):7–12

    Google Scholar 

  5. 5.

    Sadooghi I, Palur S, Anthony A, Kapur I, Ramamurty K, Wang K, Raicu I (2014) Achieving efficient distributed scheduling with message queues in the cloud for many-task computing and high-performance computing. In: Proc 14th IEEE/ACM Int Symp Cluster, Cloud Grid Comput, pp 404–413

  6. 6.

    Ramakrishnan L, Canon RS, Muriki K, Sakrejda I, Wright NJ (2012) Evaluating Interconnect and virtualization performance for high performance computing. ACM Perform Eval Rev 40:55–60

    Article  Google Scholar 

  7. 7.

    Ramesh V, Ramar K, Babu S (2013) Parallel K-means algorithm on agricultural databases. IJCSI Int J Comput Sci 10(1):710

    Google Scholar 

  8. 8.

    Guide to Scaling Web Databases with MySQL Cluster, A MySQL® White Paper https://www.mysql.com/products/cluster/scalability.html (2011). Accessed 20 Jan 2019

  9. 9.

    Patel MP, Hasan MI, Vasava HD (2014) Performance improvement of sharding in MongoDB using k-mean clustering algorithm. Int J Adv Eng Res Dev (IJAERD) 1(5):1–5

    Google Scholar 

  10. 10.

    Jackson K, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman H, Wright N (2010) Performance analysis of high performance computing applications on the Amazon web services cloud. In: Proc 2nd IEEE Int Conf Cloud Comput Technol Sci, pp 159–168

  11. 11.

    Saraswati M, Chandra SA (2016) An efficient method of partitioning high volumes of multidimensional data for parallel clustering algorithms. Int J Eng Res Appl 6(8 Part–5):67–71

    Google Scholar 

  12. 12.

    Chaudhari Chaitali G (2012) Optimizing clustering technique based on partitioning DBSCAN and ant clustering algorithm. Int J Eng Adv Technol (IJEAT) 2(2):2249–8958

    Google Scholar 

  13. 13.

    Wang J, Korambath P, Altintas I, Davis J, Crawl D (2014) Workflow as a service in the cloud: architecture and scheduling algorithms. Proc Comput Sci 29:546–556

    Article  Google Scholar 

  14. 14.

    Herodotou H, Borisov N, Babu S (2011) Query optimization techniques for partitioned tables. In: SIGMOD’11, June 12–16, Athens, Greece

  15. 15.

    Jain S, Barwal PN (2014) Performance analysis of optimization techniques for SQL multi query expressions over text databases in RDBMS. Int J Inf Comput Technol 4(8):841–852

    Google Scholar 

  16. 16.

    Nisha S, Lakshmipathi B (2012) Optimization of horizontal aggregation in SQL by using K-Means clustering. Int J Adv Res Comput Sci Softw Eng 2(5):203–208

    Google Scholar 

  17. 17.

    Atabay HA, Sheikhzadeh MJ, Torshizi M (2016) Clustering Algorithm based on PSO and k-means to find optimal cluster centroids. In: IEEE international conference on swarm intelligence and evolutionary computation (CSIEC)

  18. 18.

    Adam OY, Lee YC, Zomaya AY (2016) Constructing performance-predictable clusters with performance-varying resources of clouds. IEEE Trans Comput 65(9):2709–2724

    MathSciNet  Article  Google Scholar 

  19. 19.

    Ordonez C (2006) Integrating K-means clustering with a relational DBMS using SQL. Trans Knowl Data Eng (TKDE J) 18(2):188–201

    Article  Google Scholar 

  20. 20.

    Khandare A, Alvi A (2018) Efficient clustering algorithm with enhanced cohesive quality clusters. Int J Intell Syst Appl 7:48–57

    Google Scholar 

  21. 21.

    Sharma S, Goel M, Kaur P (2013) Performance comparison of various robust data clustering algorithms. Int J Intell Syst Appl 5(7):63–71

    Google Scholar 

  22. 22.

    Papadomanolakis S, Ailamaki A (2004) AutoPart: automating schema design for large scientific databases using data partitioning. In: Proceedings of the 16th international conference on scientific and statistical database management, pp 1099–3371

  23. 23.

    Khan M, Khan MNA (2013) Exploring query optimization techniques in relational databases. Int J Database Theory Appl 6(3):11–20

    Google Scholar 

  24. 24.

    Pradeep Kumar V, Krishnaiah RV (2012) Horizontal aggregation in SQL to prepare data sets for data mining analysis. IOSR J Comput Eng (IOSRJCE) 6(5):36–41

    Article  Google Scholar 

  25. 25.

    Kozlovszky M, Karoczkai K, Marton I, Balasko A, Marosi AC, Kacsuk P (2012) Enabling generic distributed computing infrastructure compatibility for workflow management systems. Comput Sci 13(3):61

    Article  Google Scholar 

  26. 26.

    Li D, Han L, Ding Y (2010) SQL query optimization methods of relation database system. In: Computer engineering and applications (ICCEA)

Download references

Acknowledgements

This work is supported and funded by the team of AICRP on STCR (Soil Test Crop Response), University of Agricultural Sciences (UAS), Gandhi Krishi Vignana Kendra (GKVK), Bangalore, Karnataka, India It is also financially supported by New Age Incubation Network (NAIN) ICT Skill Development Society, Department of IT, BT and S & T, Ref No: ICTSDS/CEO/17/2014-15, Govt. of Karnataka, and Vision Group on Science and Technology (VGST) scheme of RFTT, Govt. of Karnataka, Ref No: KSTePS/VGST-RFTT/2016-17/279/6.

Author information

Affiliations

Authors

Corresponding author

Correspondence to H. U. Leena.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Leena, H.U., Premasudha, B.G. & Basavaraja, P.K. Data optimisation and partitioning in private cloud using dynamic clusters for agricultural datasets. Int. J. Dynam. Control 8, 1027–1039 (2020). https://doi.org/10.1007/s40435-019-00596-9

Download citation

Keywords

  • Data optimization
  • Private cloud
  • Sharding
  • Table partitioning