Inverted Lists Compression Using Contextual Information

  • Dariusz Czerski
  • Krzysztof Ciesielski
  • Michał Dramiński
  • Mieczysław A. Kłopotek
  • Sławomir T. Wierzchoń


In this paper we present new approach to compression of inverted lists in indexes of information retrieval systems. The technique exploits contextual information obtained from a non-supervised clustering process run on the document collection. A substantial improvement of compression factor is achieved.


Information Retrieval System Document Cluster Huffman Code Compression Factor Inverted List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Anh, V.N., Moffat, A., Inverted index compression using word-aligned binary codes, Information Retrieval, 8(2004)151-166CrossRefGoogle Scholar
  2. [2]
    Becks, A., Visual Knowledge Management with Adaptable Document Maps, Sankt Augustin, GMD 2001Google Scholar
  3. [3]
    Berry, M.W., Drmac, Z., Jessup, E.R. Matrices, vector spaces and information retrieval, SIAM Review, 41(1999)335-362MATHCrossRefMathSciNetGoogle Scholar
  4. [4]
    Bezdek, J.C., Pal, S.K., Fuzzy Models for Pattern Recognition: Methods that Search for Structures in Data, IEEE, New York, 1992Google Scholar
  5. [5]
    Blanco, R., Barreiro, A., Characterization of a simple case of the reassignment of document identifiers as a pattern sequencing problem, Proc. of the 28 th Annual Internat. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2005Google Scholar
  6. [6]
    Blandford, D., Blelloch, G., Index compression through document reordering, in: Proceesings of Data Compression Conference (DCC), 2002, pp. 342-351Google Scholar
  7. [7]
    Cher-Sheng Cheng, Jean Jyh-Jiun Shann, Chung-Ping Chung, Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems, Information Processing and Management, 42(2006)407-428MATHCrossRefGoogle Scholar
  8. [8]
    Ciesielski, K., Klopotek, M.A., Contextual maps for browsing huge document collections, in: Proceedings of the 16 th International Symposium Methodologies for Intelligent Systems (ISMIS-2006), LNAI 4203, Springer, 2006Google Scholar
  9. [9]
    Ciesielski, K. et al., Adaptive document maps, in: Proceedings. of the Intelligent Information Processing and Web Mining, Springer, 2006, pp.109-120Google Scholar
  10. [10]
    Fritzke, B., A growing neural gas network learns topologies, In: G. Tesauro, D.S. Touretzky, and T.K. Leen (eds.) Advances in Neural Information Processing Systems 7, MIT Press Cambridge, MA, 1995, pp. 625-632.Google Scholar
  11. [11]
    Persin, M., Zobel, J., Sacks-Davis, R., Filtered document retrieval with frequency-sorted indexes, Journal of the American Society for Information Science 47(1996)749-764CrossRefGoogle Scholar
  12. [12]
    Robertson, S., Walker, S., Okapi/Keenbow at TREC- 8, In: E. Voorhees and D. Harman, (eds.), The 8 th Text Retrieval Conference (TREC-8), NIST Special Publication 500-246, Gaithersburg, MD, 2000, pp. 151-161Google Scholar
  13. [13]
    Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M., Okapi at TREC, in D. Harman, ed., The 1 st Text Retrieval Conference (TREC-1), NIST Special Publication 500-207, Gaithersburg, MD, 1992, pp. 21-30Google Scholar
  14. [14]
    Silvestri, F., Orlando, S., Perego, R., Assigning identifiers to documents to enhance the clustering property of full text indexes, Proceedings of the 27 th ACM SIGIR Conference, 2004Google Scholar
  15. [15]
    Williams H., Zobel J. Compressing integers for fast file access. Computer Journal, 2(1999)193-201CrossRefGoogle Scholar
  16. [16]
    Witten I., Moffat A. and Bell T. Managing Gigabytes. Morgan Kaufman Publishers, New York, second edition, 1999Google Scholar
  17. [17]
    Zobel, J. and Moffat, A., Exploring the similarity space, ACM SIGIR Forum 32(1), 1998, 18-34CrossRefGoogle Scholar
  18. [18]
    Moffat, A. und Zobel, J. Self-indexing inverted files for fast text retrieval, ACM Transactions on Information Systems, 14(1996)349-379.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Dariusz Czerski
    • 1
  • Krzysztof Ciesielski
    • 1
  • Michał Dramiński
    • 1
  • Mieczysław A. Kłopotek
    • 1
  • Sławomir T. Wierzchoń
    • 1
  1. 1.Institute of Computer Sci., Polish Acad. of SciencesOrdona 21Poland

Personalised recommendations