Bit-Vector Search Filtering with Application to a Kanji Dictionary

  • Matthew SkalaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9939)


Database query problems can be categorized by the expressiveness of their query languages, and data structure bounds are better for less expressive languages. Highly expressive languages, such as those permitting Boolean operations, lead to difficult query problems with poor bounds, and high dimensionality in geometric problems also causes their query languages to become expressive and inefficient. The IDSgrep kanji dictionary software approaches a highly expressive tree-matching query problem with a filtering technique set in 128-bit Hamming space. It can be a model for other highly expressive query languages. We suggest improvements to bit vector filtering of general applicability, and evaluate them in the context of IDSgrep.


Query Language Query Time Dictionary Entry Bloom Filter Binary Decision Diagram 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aït-Kaci, H., Boyer, R.S., Lincoln, P., Nasr, R.: Efficient implementation of lattice operations. ACM Trans. Program. Lang. Syst. 11(1), 115–146 (1989)CrossRefGoogle Scholar
  2. 2.
    Apel, U.: KanjiVG.
  3. 3.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefzbMATHGoogle Scholar
  4. 4.
    Bramer, M.: Logic Programming with Prolog, 2nd edn. Springer, London (2013)CrossRefzbMATHGoogle Scholar
  5. 5.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  6. 6.
    CHISE project.
  7. 7.
    Choi, Y.S.: Tree pattern expression for extracting information from syntactically parsed text corpora. Data Min. Knowl. Disc. 22(1–2), 211–231 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Frances, M., Litman, A.: On covering problems of codes. Theor. Comput. Syst. 30(2), 113–119 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Guo, D., Wu, J., Chen, H., Yuan, Y., Luo, X.: The dynamic bloom filters. IEEE Trans. Knowl. Data Eng. 22(1), 120–133 (2010)CrossRefGoogle Scholar
  10. 10.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. pp. 604–613. ACM, New York (1998)Google Scholar
  11. 11.
    Kaneta, Y., Arimura, H., Raman, R.: Faster bit-parallel algorithms for unordered pseudo-tree matching and tree homeomorphism. J. Discrete Algorithms 14, 119–135 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Kawabata, T.: IDS data for CJK unified Ideographs.
  13. 13.
    Knuth, D.E.: The Art of Computer Programming, Pre-fascicle 1B, vol. 4. Addison-Wesley, Reading (2009)Google Scholar
  14. 14.
    Lai, C., Bird, S.: Querying linguistic trees. J. Logic Lang. Inf. 19(1), 53–73 (2010)CrossRefzbMATHGoogle Scholar
  15. 15.
    Levy, R., Andrew, G.: Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Calzolari, N., Choukri, K., Gangemi, A., Maegaard, B., Mariani, J., Odijk, J., Tapias, D. (eds.) 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, 22–28 May 2006Google Scholar
  16. 16.
    Lind-Nielsen, J.: BuDDy: a BDD package.
  17. 17.
    Ott, E.: Chaos in Dynamical Systems, 2nd edn. Cambridge University Press, Cambridge (2002)CrossRefzbMATHGoogle Scholar
  18. 18.
    Polách, R.: Tree pattern matching and tree expressions. Master’s thesis, Czech Technical University in Prague (2011)Google Scholar
  19. 19.
    Skala, M.: Tsukurimashou font family and IDSgrep.
  20. 20.
    Skala, M.: Measuring the difficulty of distance-based indexing. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 103–114. Springer, Heidelberg (2005). doi: 10.1007/11575832_12 CrossRefGoogle Scholar
  21. 21.
    Skala, M.: On the complexity of reverse similarity search. In: Chávez, E., Navarro, G. (eds.) First International Workshop on Similarity Search and Applications (SISAP 2008), Cancun, Mexico, 11–12 April 2008, pp. 149–156. IEEE (2008)Google Scholar
  22. 22.
    Skala, M.: Tsukurimashou: a Japanese-language font meta-family. TUGboat 34(3), 269–278. In: Proceedings of the 34th Annual Meeting of the TEX Users Group (TUG 2013), Tokyo, Japan, 23–26 October 2013 (2014)Google Scholar
  23. 23.
    Skala, M.: A structural query system for Han characters. Int. J. Asian Lang. Process. 23(2), 127–159 (2015)Google Scholar
  24. 24.
    Skala, M., Krakovna, V., Kramár, J., Penn, G.: A generalized-zero-preserving method for compact encoding of concept lattices. In: 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, 11–16 July 2010, pp. 1512–1521. Association for Computational Linguistics (2010).
  25. 25.
    Skala, M., Penn, G.: Approximate bit vectors for fast unification. In: Kanazawa, M., Kornai, A., Kracht, M., Seki, H. (eds.) MOL 2011. LNCS (LNAI), vol. 6878, pp. 158–173. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-23211-4_10 CrossRefGoogle Scholar
  26. 26.
    Skala, M.A.: Aspects of metric spaces in computation. Ph.D. thesis, University of Waterloo (2008)Google Scholar
  27. 27.
    Unicode Consortium: Ideographic description characters. In: The Unicode Standard, Version 6.0.0, Section 12.2. The Unicode Consortium, Mountain View, USA (2011).
  28. 28.
    Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci. 348(2–3), 357–365 (2005)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.IT University of CopenhagenCopenhagenDenmark

Personalised recommendations