Data optimisation and partitioning in private cloud using dynamic clusters for agricultural datasets

  • H. U. LeenaEmail author
  • B. G. Premasudha
  • P. K. Basavaraja


The contemporary soil analytical database processing techniques lack in optimization of databases and tables on storage grids, and are limited to the single instance of database transactions in handling large volumes of soil analytical data sets. Unfortunately, these scenarios increase data processing overheads in private agricultural cloud services. In this paper, we propose a Predictive Scalability Generator (PS-Gen) technique to optimize and partition the large data sets by creating indexed databases and tables dynamically on clustered storage grids. This intelligence is conceptualized by studying the k-means clustering algorithm which is highly used in cloud database processing systems. Our approach allows the creation of database and tables dynamically within a cluster by monitoring the cluster balancer defined in the system to handle large datasets. Alternatively, the proposed approach enables quick and dynamic movement of database and tables within clusters to manage load actively by performing the row-specific tuple management. This is achieved by integrating the horizontal sharding technique to our proposed method. The evaluated experimental results exhibit the effective management of large agricultural data in private cloud systems by effective load balancing across clusters. Further, the proposed approach is flexible for adopting network subsystems and to develop an efficient cloud-based application system.


Data optimization Private cloud Sharding Table partitioning 



This work is supported and funded by the team of AICRP on STCR (Soil Test Crop Response), University of Agricultural Sciences (UAS), Gandhi Krishi Vignana Kendra (GKVK), Bangalore, Karnataka, India It is also financially supported by New Age Incubation Network (NAIN) ICT Skill Development Society, Department of IT, BT and S & T, Ref No: ICTSDS/CEO/17/2014-15, Govt. of Karnataka, and Vision Group on Science and Technology (VGST) scheme of RFTT, Govt. of Karnataka, Ref No: KSTePS/VGST-RFTT/2016-17/279/6.


  1. 1.
    Basavaraja PK, Mohamed SH, Dey P, Nethradhani RCR (2017) Geo-reference based soil fertility status in Tumkur district of Karnataka, India. Environ Ecol 35(1):93–101Google Scholar
  2. 2.
    Andrienko N, Andrienko G (2011) Spatial generalization and aggregation of massive movement data. IEEE Trans Vis Comput Graph 17(2):205–219CrossRefGoogle Scholar
  3. 3.
    Tahir N, Khan MJ, Ayaz M, Ali M, Fatima A, Ayesha SAB (2016) Analysis of soil fertility and mapping using geostatistical information system. Pure Appl Biol 5(3):446–452CrossRefGoogle Scholar
  4. 4.
    Zhu Y, Di W, Li S (2013) Cloud computing and agricultural development of China: theory and practice. IJCSI Int J Comput Sci Issues 10(1):7–12Google Scholar
  5. 5.
    Sadooghi I, Palur S, Anthony A, Kapur I, Ramamurty K, Wang K, Raicu I (2014) Achieving efficient distributed scheduling with message queues in the cloud for many-task computing and high-performance computing. In: Proc 14th IEEE/ACM Int Symp Cluster, Cloud Grid Comput, pp 404–413Google Scholar
  6. 6.
    Ramakrishnan L, Canon RS, Muriki K, Sakrejda I, Wright NJ (2012) Evaluating Interconnect and virtualization performance for high performance computing. ACM Perform Eval Rev 40:55–60CrossRefGoogle Scholar
  7. 7.
    Ramesh V, Ramar K, Babu S (2013) Parallel K-means algorithm on agricultural databases. IJCSI Int J Comput Sci 10(1):710Google Scholar
  8. 8.
    Guide to Scaling Web Databases with MySQL Cluster, A MySQL® White Paper (2011). Accessed 20 Jan 2019
  9. 9.
    Patel MP, Hasan MI, Vasava HD (2014) Performance improvement of sharding in MongoDB using k-mean clustering algorithm. Int J Adv Eng Res Dev (IJAERD) 1(5):1–5Google Scholar
  10. 10.
    Jackson K, Ramakrishnan L, Muriki K, Canon S, Cholia S, Shalf J, Wasserman H, Wright N (2010) Performance analysis of high performance computing applications on the Amazon web services cloud. In: Proc 2nd IEEE Int Conf Cloud Comput Technol Sci, pp 159–168Google Scholar
  11. 11.
    Saraswati M, Chandra SA (2016) An efficient method of partitioning high volumes of multidimensional data for parallel clustering algorithms. Int J Eng Res Appl 6(8 Part–5):67–71Google Scholar
  12. 12.
    Chaudhari Chaitali G (2012) Optimizing clustering technique based on partitioning DBSCAN and ant clustering algorithm. Int J Eng Adv Technol (IJEAT) 2(2):2249–8958Google Scholar
  13. 13.
    Wang J, Korambath P, Altintas I, Davis J, Crawl D (2014) Workflow as a service in the cloud: architecture and scheduling algorithms. Proc Comput Sci 29:546–556CrossRefGoogle Scholar
  14. 14.
    Herodotou H, Borisov N, Babu S (2011) Query optimization techniques for partitioned tables. In: SIGMOD’11, June 12–16, Athens, GreeceGoogle Scholar
  15. 15.
    Jain S, Barwal PN (2014) Performance analysis of optimization techniques for SQL multi query expressions over text databases in RDBMS. Int J Inf Comput Technol 4(8):841–852Google Scholar
  16. 16.
    Nisha S, Lakshmipathi B (2012) Optimization of horizontal aggregation in SQL by using K-Means clustering. Int J Adv Res Comput Sci Softw Eng 2(5):203–208Google Scholar
  17. 17.
    Atabay HA, Sheikhzadeh MJ, Torshizi M (2016) Clustering Algorithm based on PSO and k-means to find optimal cluster centroids. In: IEEE international conference on swarm intelligence and evolutionary computation (CSIEC)Google Scholar
  18. 18.
    Adam OY, Lee YC, Zomaya AY (2016) Constructing performance-predictable clusters with performance-varying resources of clouds. IEEE Trans Comput 65(9):2709–2724MathSciNetCrossRefGoogle Scholar
  19. 19.
    Ordonez C (2006) Integrating K-means clustering with a relational DBMS using SQL. Trans Knowl Data Eng (TKDE J) 18(2):188–201CrossRefGoogle Scholar
  20. 20.
    Khandare A, Alvi A (2018) Efficient clustering algorithm with enhanced cohesive quality clusters. Int J Intell Syst Appl 7:48–57Google Scholar
  21. 21.
    Sharma S, Goel M, Kaur P (2013) Performance comparison of various robust data clustering algorithms. Int J Intell Syst Appl 5(7):63–71Google Scholar
  22. 22.
    Papadomanolakis S, Ailamaki A (2004) AutoPart: automating schema design for large scientific databases using data partitioning. In: Proceedings of the 16th international conference on scientific and statistical database management, pp 1099–3371Google Scholar
  23. 23.
    Khan M, Khan MNA (2013) Exploring query optimization techniques in relational databases. Int J Database Theory Appl 6(3):11–20Google Scholar
  24. 24.
    Pradeep Kumar V, Krishnaiah RV (2012) Horizontal aggregation in SQL to prepare data sets for data mining analysis. IOSR J Comput Eng (IOSRJCE) 6(5):36–41CrossRefGoogle Scholar
  25. 25.
    Kozlovszky M, Karoczkai K, Marton I, Balasko A, Marosi AC, Kacsuk P (2012) Enabling generic distributed computing infrastructure compatibility for workflow management systems. Comput Sci 13(3):61CrossRefGoogle Scholar
  26. 26.
    Li D, Han L, Ding Y (2010) SQL query optimization methods of relation database system. In: Computer engineering and applications (ICCEA)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Master of Computer ApplicationsSiddaganga Institute of Technology (Affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India)TumakuruIndia
  2. 2.Department of Master of Computer ApplicationsSiddaganga Institute of TechnologyTumakuruIndia
  3. 3.AICRP on STCR, Department of Soil Science and Agricultural ChemistryUAS, GKVKBengaluruIndia

Personalised recommendations