Cluster-By: An Efficient Clustering Operator in Emergency Management Database Systems

  • Peng Sun
  • Yan Huang
  • Chengyang Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7901)


Database management systems (DBMS) have been widely used to efficiently store, manage and analysis large emergency management data. Despite the popularity of clustering as a general data mining method, current emergency management database systems lacked a unified and convenient way to support in-database clustering. In this paper we promote the advantages of integrating clustering into databases and propose a new Cluster-by SQL extension. We formally define the syntax and semantics of the Cluster-by clause, illustrate its query plan node in database engine and present two data preprocessing rules. Then we explore the query optimization opportunities, present a novel framework for multiquery optimization and define the cost model for multi-query scheduling. We also introduce DBSCAN-based Shrink and Expand algorithms to utilize the historical clustering results and present a heuristic cost model. To demonstrate the integration of the extension with existing DBMSs, we implemented the Cluster-by extension in PostgreSQL. We performed experiments on real data sets in PostgreSQL. Results show that Cluster-by extension is useful, the multiquery optimization techniques proposed are efficient.


Cluster-by operator SQL in-database clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Oracle Spatial Developer’s Guide 11g (11.1) (2009)Google Scholar
  2. 2.
    Ester, M., Kriegel, H., Sander, J., Wimmer, M., Xu, X.: Incremental clustering for mining in a data warehousing environment. In: VLDB 1998, pp. 323–333 (1998)Google Scholar
  3. 3.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise, pp. 226–231. AAAI Press (1996)Google Scholar
  4. 4.
    Li, F., Liu, S., et al.: An inheritable clustering algorithm suited for parameter changing. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, vol. 2, pp. 198–203 (2004)Google Scholar
  5. 5.
    Frank, R., Jin, W., Ester, M.: Efficiently mining regional outliers in spatial data. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 112–129. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Guting, R.H.: An introduction to spatial database systems. VLDB Journal 4, 357–399 (1994)CrossRefGoogle Scholar
  7. 7.
    Kalnis, P., Papadias, D.: Multi-query optimization for on-line analytical processing. Information Systems 278(5), 457–473 (2001)Google Scholar
  8. 8.
    Li, C., Wang, M., Lim, L., et al.: Supporting ranking and clustering as generalized order-by and group-by. In: SIGMOD 2007, pp. 127–138 (2007)Google Scholar
  9. 9.
    Li, F.-f., Cheng, D., Hadjieleftheriou, M., Kollios, G., Teng, S.-H.: On trip planning queries in spatial databases. In: Medeiros, C.B., Egenhofer, M., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 273–290. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Ordonez, C.: Integrating k-means clustering with a relational dbms using sql. IEEE Trans. on Knowl. and Data Eng. 18(2), 188–201 (2006)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Santos, M.Y., Moreira, A.: Automatic classification of location contexts with decision trees. In: CSMU 2006, pp. 79–88 (2006)Google Scholar
  12. 12.
    Shekhar, S., Chawla, S., Ravada, S., Fetterer, A., Liu, X., Lu, C.T.: Spatial databases: Accomplishments and research needs. IEEE Transactions on Knowledge and Data Engineering 11, 45–55 (1997)CrossRefGoogle Scholar
  13. 13.
    Silva, Y.N., Aref, E.: Similarity group-by. In: ICDE 2009, pp. 904–915 (2009)Google Scholar
  14. 14.
    Yan, W., Larson, P.: Interchanging the order of grouping and join. In: Technical report (1995)Google Scholar
  15. 15.
    Yan, W.P., Larson, P.A.: Eager aggregation and lazy aggregation. In: VLDB 1995, pp. 345–357 (1995)Google Scholar
  16. 16.
    Zhang, C., Huang, Y.: Cluster by: a new sql extension for spatial data aggregation. In: ACM GIS 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Peng Sun
    • 1
  • Yan Huang
    • 2
  • Chengyang Zhang
    • 3
  1. 1.Institute of SoftwareChinese Academy of SciencesBeijingChina
  2. 2.University of North TexasDentonU.S.A
  3. 3.Teradata Inc.El SegundoU.S.A

Personalised recommendations