Abstract
Database query problems can be categorized by the expressiveness of their query languages, and data structure bounds are better for less expressive languages. Highly expressive languages, such as those permitting Boolean operations, lead to difficult query problems with poor bounds, and high dimensionality in geometric problems also causes their query languages to become expressive and inefficient. The IDSgrep kanji dictionary software approaches a highly expressive tree-matching query problem with a filtering technique set in 128-bit Hamming space. It can be a model for other highly expressive query languages. We suggest improvements to bit vector filtering of general applicability, and evaluate them in the context of IDSgrep.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aït-Kaci, H., Boyer, R.S., Lincoln, P., Nasr, R.: Efficient implementation of lattice operations. ACM Trans. Program. Lang. Syst. 11(1), 115–146 (1989)
Apel, U.: KanjiVG. http://kanjivg.tagaini.net/
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Bramer, M.: Logic Programming with Prolog, 2nd edn. Springer, London (2013)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
CHISE project. http://www.chise.org/
Choi, Y.S.: Tree pattern expression for extracting information from syntactically parsed text corpora. Data Min. Knowl. Disc. 22(1–2), 211–231 (2011)
Frances, M., Litman, A.: On covering problems of codes. Theor. Comput. Syst. 30(2), 113–119 (1997)
Guo, D., Wu, J., Chen, H., Yuan, Y., Luo, X.: The dynamic bloom filters. IEEE Trans. Knowl. Data Eng. 22(1), 120–133 (2010)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. pp. 604–613. ACM, New York (1998)
Kaneta, Y., Arimura, H., Raman, R.: Faster bit-parallel algorithms for unordered pseudo-tree matching and tree homeomorphism. J. Discrete Algorithms 14, 119–135 (2012)
Kawabata, T.: IDS data for CJK unified Ideographs. https://github.com/cjkvi/cjkvi-ids
Knuth, D.E.: The Art of Computer Programming, Pre-fascicle 1B, vol. 4. Addison-Wesley, Reading (2009)
Lai, C., Bird, S.: Querying linguistic trees. J. Logic Lang. Inf. 19(1), 53–73 (2010)
Levy, R., Andrew, G.: Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Calzolari, N., Choukri, K., Gangemi, A., Maegaard, B., Mariani, J., Odijk, J., Tapias, D. (eds.) 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, 22–28 May 2006
Lind-Nielsen, J.: BuDDy: a BDD package. http://buddy.sourceforge.net/manual/main.html
Ott, E.: Chaos in Dynamical Systems, 2nd edn. Cambridge University Press, Cambridge (2002)
Polách, R.: Tree pattern matching and tree expressions. Master’s thesis, Czech Technical University in Prague (2011)
Skala, M.: Tsukurimashou font family and IDSgrep. http://tsukurimashou.osdn.jp/
Skala, M.: Measuring the difficulty of distance-based indexing. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 103–114. Springer, Heidelberg (2005). doi:10.1007/11575832_12
Skala, M.: On the complexity of reverse similarity search. In: Chávez, E., Navarro, G. (eds.) First International Workshop on Similarity Search and Applications (SISAP 2008), Cancun, Mexico, 11–12 April 2008, pp. 149–156. IEEE (2008)
Skala, M.: Tsukurimashou: a Japanese-language font meta-family. TUGboat 34(3), 269–278. In: Proceedings of the 34th Annual Meeting of the TEX Users Group (TUG 2013), Tokyo, Japan, 23–26 October 2013 (2014)
Skala, M.: A structural query system for Han characters. Int. J. Asian Lang. Process. 23(2), 127–159 (2015)
Skala, M., Krakovna, V., Kramár, J., Penn, G.: A generalized-zero-preserving method for compact encoding of concept lattices. In: 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, 11–16 July 2010, pp. 1512–1521. Association for Computational Linguistics (2010). http://www.aclweb.org/anthology/P10-1153
Skala, M., Penn, G.: Approximate bit vectors for fast unification. In: Kanazawa, M., Kornai, A., Kracht, M., Seki, H. (eds.) MOL 2011. LNCS (LNAI), vol. 6878, pp. 158–173. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23211-4_10
Skala, M.A.: Aspects of metric spaces in computation. Ph.D. thesis, University of Waterloo (2008)
Unicode Consortium: Ideographic description characters. In: The Unicode Standard, Version 6.0.0, Section 12.2. The Unicode Consortium, Mountain View, USA (2011). http://www.unicode.org/versions/Unicode6.0.0/ch12.pdf
Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci. 348(2–3), 357–365 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Skala, M. (2016). Bit-Vector Search Filtering with Application to a Kanji Dictionary. In: Amsaleg, L., Houle, M., Schubert, E. (eds) Similarity Search and Applications. SISAP 2016. Lecture Notes in Computer Science(), vol 9939. Springer, Cham. https://doi.org/10.1007/978-3-319-46759-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-46759-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46758-0
Online ISBN: 978-3-319-46759-7
eBook Packages: Computer ScienceComputer Science (R0)