Bit-Vector Search Filtering with Application to a Kanji Dictionary

Skala, Matthew

doi:10.1007/978-3-319-46759-7_11

Bit-Vector Search Filtering with Application to a Kanji Dictionary

Matthew Skala¹⁶

Conference paper
First Online: 27 September 2016

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9939))

Abstract

Database query problems can be categorized by the expressiveness of their query languages, and data structure bounds are better for less expressive languages. Highly expressive languages, such as those permitting Boolean operations, lead to difficult query problems with poor bounds, and high dimensionality in geometric problems also causes their query languages to become expressive and inefficient. The IDSgrep kanji dictionary software approaches a highly expressive tree-matching query problem with a filtering technique set in 128-bit Hamming space. It can be a model for other highly expressive query languages. We suggest improvements to bit vector filtering of general applicability, and evaluate them in the context of IDSgrep.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aït-Kaci, H., Boyer, R.S., Lincoln, P., Nasr, R.: Efficient implementation of lattice operations. ACM Trans. Program. Lang. Syst. 11(1), 115–146 (1989)
Article Google Scholar
Apel, U.: KanjiVG. http://kanjivg.tagaini.net/
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Article MATH Google Scholar
Bramer, M.: Logic Programming with Prolog, 2nd edn. Springer, London (2013)
Book MATH Google Scholar
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Article Google Scholar
CHISE project. http://www.chise.org/
Choi, Y.S.: Tree pattern expression for extracting information from syntactically parsed text corpora. Data Min. Knowl. Disc. 22(1–2), 211–231 (2011)
Article MathSciNet MATH Google Scholar
Frances, M., Litman, A.: On covering problems of codes. Theor. Comput. Syst. 30(2), 113–119 (1997)
Article MathSciNet MATH Google Scholar
Guo, D., Wu, J., Chen, H., Yuan, Y., Luo, X.: The dynamic bloom filters. IEEE Trans. Knowl. Data Eng. 22(1), 120–133 (2010)
Article Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing. pp. 604–613. ACM, New York (1998)
Google Scholar
Kaneta, Y., Arimura, H., Raman, R.: Faster bit-parallel algorithms for unordered pseudo-tree matching and tree homeomorphism. J. Discrete Algorithms 14, 119–135 (2012)
Article MathSciNet MATH Google Scholar
Kawabata, T.: IDS data for CJK unified Ideographs. https://github.com/cjkvi/cjkvi-ids
Knuth, D.E.: The Art of Computer Programming, Pre-fascicle 1B, vol. 4. Addison-Wesley, Reading (2009)
Google Scholar
Lai, C., Bird, S.: Querying linguistic trees. J. Logic Lang. Inf. 19(1), 53–73 (2010)
Article MATH Google Scholar
Levy, R., Andrew, G.: Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Calzolari, N., Choukri, K., Gangemi, A., Maegaard, B., Mariani, J., Odijk, J., Tapias, D. (eds.) 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, 22–28 May 2006
Google Scholar
Lind-Nielsen, J.: BuDDy: a BDD package. http://buddy.sourceforge.net/manual/main.html
Ott, E.: Chaos in Dynamical Systems, 2nd edn. Cambridge University Press, Cambridge (2002)
Book MATH Google Scholar
Polách, R.: Tree pattern matching and tree expressions. Master’s thesis, Czech Technical University in Prague (2011)
Google Scholar
Skala, M.: Tsukurimashou font family and IDSgrep. http://tsukurimashou.osdn.jp/
Skala, M.: Measuring the difficulty of distance-based indexing. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 103–114. Springer, Heidelberg (2005). doi:10.1007/11575832_12
Chapter Google Scholar
Skala, M.: On the complexity of reverse similarity search. In: Chávez, E., Navarro, G. (eds.) First International Workshop on Similarity Search and Applications (SISAP 2008), Cancun, Mexico, 11–12 April 2008, pp. 149–156. IEEE (2008)
Google Scholar
Skala, M.: Tsukurimashou: a Japanese-language font meta-family. TUGboat 34(3), 269–278. In: Proceedings of the 34th Annual Meeting of the TEX Users Group (TUG 2013), Tokyo, Japan, 23–26 October 2013 (2014)
Google Scholar
Skala, M.: A structural query system for Han characters. Int. J. Asian Lang. Process. 23(2), 127–159 (2015)
Google Scholar
Skala, M., Krakovna, V., Kramár, J., Penn, G.: A generalized-zero-preserving method for compact encoding of concept lattices. In: 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, 11–16 July 2010, pp. 1512–1521. Association for Computational Linguistics (2010). http://www.aclweb.org/anthology/P10-1153
Skala, M., Penn, G.: Approximate bit vectors for fast unification. In: Kanazawa, M., Kornai, A., Kracht, M., Seki, H. (eds.) MOL 2011. LNCS (LNAI), vol. 6878, pp. 158–173. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23211-4_10
Chapter Google Scholar
Skala, M.A.: Aspects of metric spaces in computation. Ph.D. thesis, University of Waterloo (2008)
Google Scholar
Unicode Consortium: Ideographic description characters. In: The Unicode Standard, Version 6.0.0, Section 12.2. The Unicode Consortium, Mountain View, USA (2011). http://www.unicode.org/versions/Unicode6.0.0/ch12.pdf
Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci. 348(2–3), 357–365 (2005)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

IT University of Copenhagen, Copenhagen, Denmark
Matthew Skala

Authors

Matthew Skala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew Skala .

Editor information

Editors and Affiliations

CNRS–IRISA , Rennes, France
Laurent Amsaleg
National Institute of Informatics , Tokyo, Japan
Michael E. Houle
Ludwig-Maximilians-Universität München , München, Germany
Erich Schubert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Skala, M. (2016). Bit-Vector Search Filtering with Application to a Kanji Dictionary. In: Amsaleg, L., Houle, M., Schubert, E. (eds) Similarity Search and Applications. SISAP 2016. Lecture Notes in Computer Science(), vol 9939. Springer, Cham. https://doi.org/10.1007/978-3-319-46759-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-46759-7_11
Published: 27 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46758-0
Online ISBN: 978-3-319-46759-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics