Abstract
Remote sensing data as well as ground-based and model output data about the Earth system can be very large in volume. On the other hand, in order to use the data efficiently, scientists need to search for data based on not only metadata but also actual data values. To answer value range queries by scanning very large volumes of data is obviously unrealistic. This article studies a clustering technique on histograms of data values on predefined cells to index the cells. Through this index system, the so-called statistical range queries can be answered quickly and approximately together with an accuracy assessment. Examples of using this technique for Earth science data sets are given in this article.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
G. Asrar and R. Greenstone, eds., 1999 EOS Reference Handbook. NASA (Washington, DC), 1999.
NASA’s ESIP, “Earth Science Information Partners,” 2000. http://www.esipfed.org/.
Content-based Search and Data Mining Cluster of ESIP, “Science Scenarios for Content-based Search and Data Mining,” 2000. http://esipfed.org:8080/Clusters/Content Based/sci scen.html.
J. Gary, A. Bosworth, A. Layman, and H. Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-tabs, and Sub-totals,” in Proceedings of IEEE Conf. on Data Engineering, pp. 152–159, 1996.
J. M. Hellerstein, P. J. Haas, and H. J. Wang, “Online Aggregation,” in Proce. 1997 ACM SIGMOD Intl. Conf. Management of Data, pp. 171–182, ACM Press, 1997
P. J. Haas, “Techniques for Online Exploration of Large Object-Relational Databases,” in Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Z. M. Ozsoyoglu, G. Ozsoyoglu, and W.-C. Hou, eds.), pp. 4–12, IEEE, Computer Society, 1999.
V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita, “Improved Histograms for Selectivity Estimation of Range Predicates,” in Proce. 1996 ACM SIGMOD Intl. Conf. Management of Data, pp. 294–305, ACM Press, 1996.
V. Poosala and V. Ganti, “Fast Approximation Answers to Aggregate Queries on a Data Cube,” in Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Z. M. Ozsoyoglu, G. Ozsoyoglu, and W.-C. Hou, eds.), pp. 24–33, IEEE, Computer Society, 1999.
D. Carr and A. R. Olsen, “Simplifying Visual Applearance by Sorting: An Example Using 159 AVHRR Classes,” Statistical Computing & Statistical Graphics Newsletter, pp. 10–16, April 1996.
A. K. Jain, M. N. Murty, and P. J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264–323, 1999.
B. Everitt, Cluster Analysis. John Wiley & Sons, 1993.
J. Puzicha, T. Hofmann, and J. M. Buhmann, “Histogram Clustering for Unsupervised Segmentation and Image Retrieval,” Pattern Recognition Letters, vol. 20, pp. 899–909, 1999.
Z. Li, X. S. Wang, M. Kafatos, and R. Yang, “A Pyramid Data Model for Supporting Content-based Browsing and Knowledge Discovery,” in Proceedings of the 10th International Conference on Scientific and Statistical Database Management (M. Rafanelli and M. Jarke, eds.), pp. 170–179, IEEE, Computer Society, 1998.
W. Venables and B. Ripley, Modern Applied Statistics with S-Plus. Springer-Verlag, 1994.
R. T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” in Proce. of the 20th VLDB Conference Santiago, Chile, pp. 144–155, 1994.
T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An Efficient Data Clustering Method for Very Large Databases,” SIGMOD Rec., vol. 25, no. 2, pp. 103–114, 1996.
Z. Huang, “Extension to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Data Mining and Knowledge Discovery, vol. 2, no. 3, pp. 283–304, 1998.
H. Kyle, J. McManus, S. Ahmad, and et al., Climatology Interdisciplinary Data Collection, Volumes 1-4, Monthly Means for Climate Studies. NASA Goddard DAAC Science Series, Earth Science Enterprise, National Aeronautics & Space Administration, NP-1998(06)-029-GSFC, 1998.
A. P. Cracknell, The Advanced Very High Resolution Radiometer. Taylor & Francis Inc., 1997.
J. D. Jobson, Applied Multivariate Data Analysis, vol. 2. Springer, 1992.
A. Guttman, “R-trees: A Dynamic Index Structure for Spatial Searching,” in Proc. ACM SIGMOD, pp. 47–57, June 1984.
F. Korn, T. Johnson, and H. V. Jagadish, “Range Selectivity Estimation for Continuous Attributes,” in Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Z. M. Ozsoyoglu, G. Ozsoyoglu, and W.-C. Hou, eds.), pp. 244–253, IEEE, Computer Society, 1999.
M. Kafatos, Z. Li, R. Yang, and et al., “The Virtual Domain Application Data Center: Serving Interdisciplinary Earth Scientists,” in Proceedings of the 9th International Conference on Scientific and Statistical Database Management (D. Hansen and Y. Ioannidis, eds.), pp. 264–276, IEEE, Computer Society, 1997.
M. Kafatos, X. Wang, Z. Li, R. Yang, and D. Ziskin, “Information Technology Implementation for a Distributed Data System Serving Earth Scientists: Seasonal to Interannual ESIP,” in Proceedings of the 10th International Conference on Scientific and Statistical Database Management (M. Rafanelli and M. Jarke, eds.), pp. 210–215, IEEE, Computer Society, 1998.
R. Yang, C. Wang, M. Kafatos, X. Wang, and T. El-Ghazawi, “Remote Data Access via SIESIP Distributed Information System,” in Proceedings of the 11th International Conference on Scientific and Statistical Database Management (Z. M. Ozsoyoglu, G. Ozsoyoglu, and W.-C. Hou, eds.), p. 284, IEEE, Computer Society, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, R., Yang, KS., Kafatos, M., Sean Wang, X. (2001). Value Range Queries on Earth Science Data via Histogram Clustering. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_6
Download citation
DOI: https://doi.org/10.1007/3-540-45244-3_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41773-6
Online ISBN: 978-3-540-45244-7
eBook Packages: Springer Book Archive