Abstract
XML is a new standard for exchanging and representing information on the Internet. Documents can be hierarchically represented by XML-elements. In this paper, we propose that an XML document collection be represented and indexed using a bitmap indexing technique. We define the similarity and popularity operations suitable for bitmap indexes. We also define statistical measurements in the BitCube: center, and radius. Based on these measurements, we describe a new bitmap indexing based technique to cluster XML documents. The techniques for clustering are motivated by the fact that the bitmap indexes are expected to be very sparse.
Furthermore, a 2-dimensional bitmap index is extended to a 3-dimensional bitmap index, called the BitCube. Sophisticated querying of XML document collections can be performed using primitive operations such as slice, project, and dice. Experiments show that the BitCube can be created efficiently and the primitive operations can be performed more efficiently with the BitCube than with other alternatives.
Similar content being viewed by others
References
Berchtold, S., Keim, D.A., and Kriegel, H.P. (1996). The X-tree: An Index Structure for High-Dimensional Data. In Proc. Intl. Conf. On Very Large Data Bases, Bombay, India (pp. 28-39).
Chan, C. and Ioannidis,Y. (1998). Bitmap Index Design and Evaluation. In Proc. of Int'lACMSIGMODConference(pp. 355-366).
Gupta, A. and Mumick, I.S. (Eds.) (2000). Materialized Views. Cambridge, MA: MIT Press.
Hill, D. (1968). Mechanized Information Storage, Retrieval and Dissemination. Amsterdam: North-Holland.
Kobayashi, M. and Takeda, K. (2000). Information Retrieval on theWeb. ACMComputing Surveys, 32(2), 144-173.
O'Neil, P. and Quass, D. (1997). Improved Query Performance with Variant Indexes. In Proc. of Int'l ACM SIGMOD Conference(pp. 38-49).
Papadimitriou, C., Tamaki, H., Raghavan, P., and Vempala, S. (1998). Latent Semantic Indexing: A Probabilistic Analysis. In Proc. of the 17th ACM Symposium on Principles of Database Systems(pp. 159-168).
Salton, G. and McGill, M. (1983). Introduction to Modern Information Retrieval. NY: McGraw-Hill.
Tomasic, A., Garcia-Molina, H., and Shoens, K. (1994). Incremental Updates of Inverted Lists for Text Retrieval. In Proc. ACM SIGMOD Conference on Management of Data, Minneapolis, U.S.A. (pp. 289-300).
Willet, P. (1988). Recent Trends in Hierarchical Document Clustering: A Critical Review. Information Processing and Management, 24, 577-597.
Wu, M. (1999). Query Optimization for Selections using Bitmaps. In Proc. Int'l ACM SIGMOD Conference(pp. 227-238).
Yoon, J. and Kim, S. (1998). A Three-Level User Interface to Multimedia Digital Libraries with Relaxation and Restriction. In IEEE Conf. on Advanced Digital Libraries, Santa Barbara, U.S.A. (pp. 206-215).
Zamir, O. and Etzioni, O. (1998).Web Document Clustering: A Feasibility Demonstration. In Proc. of ACMSIGIR Conf. on Research and Development in Information Retrieval(pp. 46-54).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Yoon, J.P., Raghavan, V., Chakilam, V. et al. BitCube: A Three-Dimensional Bitmap Indexing for XML Documents. Journal of Intelligent Information Systems 17, 241–254 (2001). https://doi.org/10.1023/A:1012861931139
Issue Date:
DOI: https://doi.org/10.1023/A:1012861931139