Skip to main content

Value Range Queries on Earth Science Data via Histogram Clustering

  • Conference paper
  • First Online:
Book cover Temporal, Spatial, and Spatio-Temporal Data Mining (TSDM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2007))

Abstract

Remote sensing data as well as ground-based and model output data about the Earth system can be very large in volume. On the other hand, in order to use the data efficiently, scientists need to search for data based on not only metadata but also actual data values. To answer value range queries by scanning very large volumes of data is obviously unrealistic. This article studies a clustering technique on histograms of data values on predefined cells to index the cells. Through this index system, the so-called statistical range queries can be answered quickly and approximately together with an accuracy assessment. Examples of using this technique for Earth science data sets are given in this article.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Asrar and R. Greenstone, eds., 1999 EOS Reference Handbook. NASA (Washington, DC), 1999.

    Google Scholar 

  2. NASA’s ESIP, “Earth Science Information Partners,” 2000. http://www.esipfed.org/.

  3. Content-based Search and Data Mining Cluster of ESIP, “Science Scenarios for Content-based Search and Data Mining,” 2000. http://esipfed.org:8080/Clusters/Content Based/sci scen.html.

  4. J. Gary, A. Bosworth, A. Layman, and H. Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-tabs, and Sub-totals,” in Proceedings of IEEE Conf. on Data Engineering, pp. 152–159, 1996.

    Google Scholar 

  5. J. M. Hellerstein, P. J. Haas, and H. J. Wang, “Online Aggregation,” in Proce. 1997 ACM SIGMOD Intl. Conf. Management of Data, pp. 171–182, ACM Press, 1997

    Google Scholar 

  6. P. J. Haas, “Techniques for Online Exploration of Large Object-Relational Databases,” in Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Z. M. Ozsoyoglu, G. Ozsoyoglu, and W.-C. Hou, eds.), pp. 4–12, IEEE, Computer Society, 1999.

    Google Scholar 

  7. V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita, “Improved Histograms for Selectivity Estimation of Range Predicates,” in Proce. 1996 ACM SIGMOD Intl. Conf. Management of Data, pp. 294–305, ACM Press, 1996.

    Google Scholar 

  8. V. Poosala and V. Ganti, “Fast Approximation Answers to Aggregate Queries on a Data Cube,” in Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Z. M. Ozsoyoglu, G. Ozsoyoglu, and W.-C. Hou, eds.), pp. 24–33, IEEE, Computer Society, 1999.

    Google Scholar 

  9. D. Carr and A. R. Olsen, “Simplifying Visual Applearance by Sorting: An Example Using 159 AVHRR Classes,” Statistical Computing & Statistical Graphics Newsletter, pp. 10–16, April 1996.

    Google Scholar 

  10. A. K. Jain, M. N. Murty, and P. J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264–323, 1999.

    Article  Google Scholar 

  11. B. Everitt, Cluster Analysis. John Wiley & Sons, 1993.

    Google Scholar 

  12. J. Puzicha, T. Hofmann, and J. M. Buhmann, “Histogram Clustering for Unsupervised Segmentation and Image Retrieval,” Pattern Recognition Letters, vol. 20, pp. 899–909, 1999.

    Article  Google Scholar 

  13. Z. Li, X. S. Wang, M. Kafatos, and R. Yang, “A Pyramid Data Model for Supporting Content-based Browsing and Knowledge Discovery,” in Proceedings of the 10th International Conference on Scientific and Statistical Database Management (M. Rafanelli and M. Jarke, eds.), pp. 170–179, IEEE, Computer Society, 1998.

    Google Scholar 

  14. W. Venables and B. Ripley, Modern Applied Statistics with S-Plus. Springer-Verlag, 1994.

    Google Scholar 

  15. R. T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” in Proce. of the 20th VLDB Conference Santiago, Chile, pp. 144–155, 1994.

    Google Scholar 

  16. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” SIGMOD Rec., vol. 25, no. 2, pp. 103–114, 1996.

    Article  Google Scholar 

  17. Z. Huang, “Extension to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Data Mining and Knowledge Discovery, vol. 2, no. 3, pp. 283–304, 1998.

    Article  Google Scholar 

  18. H. Kyle, J. McManus, S. Ahmad, and et al., Climatology Interdisciplinary Data Collection, Volumes 1-4, Monthly Means for Climate Studies. NASA Goddard DAAC Science Series, Earth Science Enterprise, National Aeronautics & Space Administration, NP-1998(06)-029-GSFC, 1998.

    Google Scholar 

  19. A. P. Cracknell, The Advanced Very High Resolution Radiometer. Taylor & Francis Inc., 1997.

    Google Scholar 

  20. J. D. Jobson, Applied Multivariate Data Analysis, vol. 2. Springer, 1992.

    Google Scholar 

  21. A. Guttman, “R-trees: A Dynamic Index Structure for Spatial Searching,” in Proc. ACM SIGMOD, pp. 47–57, June 1984.

    Google Scholar 

  22. F. Korn, T. Johnson, and H. V. Jagadish, “Range Selectivity Estimation for Continuous Attributes,” in Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Z. M. Ozsoyoglu, G. Ozsoyoglu, and W.-C. Hou, eds.), pp. 244–253, IEEE, Computer Society, 1999.

    Google Scholar 

  23. M. Kafatos, Z. Li, R. Yang, and et al., “The Virtual Domain Application Data Center: Serving Interdisciplinary Earth Scientists,” in Proceedings of the 9th International Conference on Scientific and Statistical Database Management (D. Hansen and Y. Ioannidis, eds.), pp. 264–276, IEEE, Computer Society, 1997.

    Google Scholar 

  24. M. Kafatos, X. Wang, Z. Li, R. Yang, and D. Ziskin, “Information Technology Implementation for a Distributed Data System Serving Earth Scientists: Seasonal to Interannual ESIP,” in Proceedings of the 10th International Conference on Scientific and Statistical Database Management (M. Rafanelli and M. Jarke, eds.), pp. 210–215, IEEE, Computer Society, 1998.

    Google Scholar 

  25. R. Yang, C. Wang, M. Kafatos, X. Wang, and T. El-Ghazawi, “Remote Data Access via SIESIP Distributed Information System,” in Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Z. M. Ozsoyoglu, G. Ozsoyoglu, and W.-C. Hou, eds.), p. 284, IEEE, Computer Society, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, R., Yang, KS., Kafatos, M., Sean Wang, X. (2001). Value Range Queries on Earth Science Data via Histogram Clustering. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45244-3_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41773-6

  • Online ISBN: 978-3-540-45244-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics