Advertisement

Structural Similarity Search for Formulas Using Leaf-Root Paths in Operator Subtrees

  • Wei ZhongEmail author
  • Richard Zanibbi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11437)

Abstract

We present a new search method for mathematical formulas based on Operator Trees (OPTs) representing the application of operators to operands. Our method provides (1) a simple indexing scheme using OPT leaf-root paths, (2) practical matching of the K largest common subexpressions, and (3) scoring matched OPT subtrees by counting nodes corresponding to visible symbols, weighting operators lower than operands. Using the largest common subexpression (K = 1), we outperform existing formula search engines for non-wildcard queries on the NTCIR-12 Wikipedia Formula Browsing Task. Stronger results are obtained when using additional subexpressions for scoring. Without parallelization or pruning, our system has practical execution times with low variance when compared to other state-of-the-art formula search engines.

Keywords

Mathematical Information Retrieval Formula search Similarity search Subexpression matching 

References

  1. 1.
    Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32. ACM (2004)Google Scholar
  2. 2.
    Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett. 19(3–4), 255–259 (1998)CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Davila, K., Zanibbi, R.: Layout and semantics: combining representations for mathematical formula search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1165–1168. ACM (2017)Google Scholar
  5. 5.
    Guidi, F., Sacerdoti Coen, C.: A survey on retrieval of mathematical knowledge. In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.) CICM 2015. LNCS (LNAI), vol. 9150, pp. 296–315. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-20615-8_20CrossRefGoogle Scholar
  6. 6.
    Hijikata, Y., Hashimoto, H., Nishida, S.: An investigation of index formats for the search of MathML objects. In: 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 244–248, November 2007Google Scholar
  7. 7.
    Kamali, S., Tompa, F.W.: Structural similarity search for mathematics retrieval. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS (LNAI), vol. 7961, pp. 246–262. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-39320-4_16CrossRefzbMATHGoogle Scholar
  8. 8.
    Kristianto, G., Topic, G., Aizawa, A.: MCAT Math Retrieval System for NTCIR-12 MathIR Task, June 2016Google Scholar
  9. 9.
    Lin, X., Gao, L., Hu, X., Tang, Z., Xiao, Y., Liu, X.: A mathematics retrieval system for formulae in layout presentations. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2014. ACM, New York (2014)Google Scholar
  10. 10.
    Lu, X., Moffat, A., Culpepper, J.S.: The effect of pooling and evaluation depth on IR metrics. Inf. Retr. 19(4), 416–445 (2016).  https://doi.org/10.1007/s10791-016-9282-6CrossRefGoogle Scholar
  11. 11.
    Miller, B.R., Youssef, A.: Technical aspects of the digital library of mathematical functions. Ann. Math. Artif. Intell. 38(1–3), 121–136 (2003). https://link.springer.com/article/10.1023/A:1022967814992MathSciNetCrossRefGoogle Scholar
  12. 12.
    Misutka, J., Galambos, L.: Extending Full Text Search Engine for Mathematical Content, pp. 55–67, January 2008Google Scholar
  13. 13.
    Shamir, R., Tsur, D.: Faster subtree isomorphism. J. Algorithms 33(2), 267–280 (1999)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Sojka, P., Líška, M.: Indexing and searching mathematics in digital libraries. In: Davenport, J.H., Farmer, W.M., Urban, J., Rabe, F. (eds.) CICM 2011. LNCS (LNAI), vol. 6824, pp. 228–243. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-22673-1_16CrossRefzbMATHGoogle Scholar
  15. 15.
    Stalnaker, D., Zanibbi, R.: Math expression retrieval using an inverted index over symbol pairs. In: Document recognition and retrieval XXII, vol. 9402, p. 940207. International Society for Optics and Photonics (2015)Google Scholar
  16. 16.
    Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manage. 31(6), 831–850 (1995).  https://doi.org/10.1016/0306-4573(95)00020-HCrossRefGoogle Scholar
  17. 17.
    Valiente, G.: An efficient bottom-up distance between trees. In: Proceedings of Eighth Symposium on String Processing and Information Retrieval, pp. 212–219, November 2001Google Scholar
  18. 18.
    Valiente Feruglio, G.A.: Simple and Efficient Tree Comparison (2001)Google Scholar
  19. 19.
    Yokoi, K., Aizawa, A.: An approach to similarity search for mathematical expressions using MathML. In: Towards a Digital Mathematics Library, Grand Bend, Ontario, Canada, 8–9th July 2009, pp. 27–35 (2009)Google Scholar
  20. 20.
    Zanibbi, R., Aizawa, A., Kohlhase, M., Ounis, I., Topic, G., Davila, K.: NTCIR-12 MathIR task overview. In: NTCIR (2016)Google Scholar
  21. 21.
    Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recognit. 15(4), 331–357 (2012)CrossRefGoogle Scholar
  22. 22.
    Zanibbi, R., Davila, K., Kane, A., Tompa, F.W.: Multi-stage math formula search: using appearance-based similarity metrics at scale. In: Proceedings of the 39th International ACM SIGIR Conference on Research & Development in Information Retrieval. SIGIR 2016. ACM, New York (2016)Google Scholar
  23. 23.
    Zhong, W., Fang, H.: A novel similarity-search method for mathematical content in LaTeX markup and its implementation. Master’s thesis, University of Delaware (2015)Google Scholar
  24. 24.
    Zhong, W., Fang, H.: OPMES: a similarity search engine for mathematical content. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 849–852. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-30671-1_79CrossRefGoogle Scholar
  25. 25.
    Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the 22nd International Conference on Data Engineering, 2006. ICDE 2006, p. 59. IEEE (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Rochester Institute of TechnologyRochesterUSA

Personalised recommendations