Abstract
Mathematical expression (ME) retrieval problem has currently received much attention due to wide-spread availability of MEs on the World Wide Web. As MEs are two-dimensional in nature, traditional text retrieval techniques used in natural language processing are not sufficient for their retrieval. In this paper, we have proposed a novel structure based approach to ME retrieval problem. In our approach, query given in \(\mbox{\LaTeX}\) format is preprocessed to eliminate extraneous keywords (like \displaystyle, \begin{array} etc.) while retaining the structure information like superscript and subscript relationships. MEs in the database are also preprocessed and stored in the same manner. We have created a database of 829 MEs in \(\mbox{\LaTeX}\) form, that covers various branches of mathematics like Algebra, Trigonometry, Calculus etc. Preprocessed query is matched against the database of preprocessed MEs using Longest Common Subsequence (LCS) algorithm. LCS algorithm is used as it preserves the order of keywords in the preprocessed MEs unlike bag of words approach in the traditional text retrieval techniques. We have incorporated structure information into LCS algorithm and proposed a measure based on the modified algorithm, for ranking MEs in the database. As proposed approach exploits structure information, it is closer to human intuition. Retrieval performance has been evaluated using standard precision measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adeel, M., Cheung, H.S., Khiyal, A.H.: Math go! prototype of a content based mathematical formula search engine. Journal of Theoretical and Applied Information Technology 4(10), 1002–1012 (2008)
Adeel, M., Sher, M., Khiyal, M.S.H.: Efficient cluster-based information retrieval from mathematical markup documents. World Applied Sciences Journal 17, 611–616 (2012)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. The MIT Press and McGraw-Hill Book Company (1989)
Graf, P.: Substitution Tree Indexing. In: Hsiang, J. (ed.) RTA 1995. LNCS, vol. 914, pp. 117–131. Springer, Heidelberg (1995)
Kamali, S., Tompa, F.W.: Improving mathematics retrieval. In: Proceedings of Digital Mathematics Libraries, Grand Bend, pp. 37–48 (2009)
Kohlhase, M., Sucan, I.: A Search Engine for Mathematical Formulae. In: Calmet, J., Ida, T., Wang, D. (eds.) AISC 2006. LNCS (LNAI), vol. 4120, pp. 241–253. Springer, Heidelberg (2006)
Lamport, L.: LaTeX: A Document Preparation System. Addison-Wesley (1986)
Lucene: Indexing and retrieval library, http://lucene.apache.org
MathML (2010), http://www.w3.org/Math/
Miner, R., Munavalli, R.: Mathfind: A math-aware search engine. In: Proceedings of the International Conference on Information Retrieval, New York, USA, pp. 735–735 (2006)
Miner, R., Munavalli, R.: An Approach to Mathematical Search Through Query Formulation and Data Normalization. In: Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS 2007. LNCS (LNAI), vol. 4573, pp. 342–355. Springer, Heidelberg (2007)
Müller, H., Müller, W., Squire, D.M., Marchand-Maillet, S., Pun, T.: Performance evaluation in content-based image retrieval: overview and proposals. Pattern Recognition Letters 22(5), 593–601 (2001)
Pavan Kumar, P., Agarwal, A., Bhagvati, C.: A Rule-Based Approach to Form Mathematical Symbols in Printed Mathematical Expressions. In: Sombattheera, C., Agarwal, A., Udgata, S.K., Lavangnananda, K. (eds.) MIWAI 2011. LNCS, vol. 7080, pp. 181–192. Springer, Heidelberg (2011)
Rich, E., Knight, K.: Artificial Intelligence, 2nd edn. McGraw-Hill Book Company (1991)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
Springer: LaTeX search, http://www.latexsearch.com/
Yokoi, K., Aizawa, A.: An approach to similarity search for mathematical expressions using mathml. Towards a Digital Mathematics Library, Grand Bend, pp. 27–35 (2009)
Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. In: IJDAR. Springer, Heidelberg (2011)
Zanibbi, R., Yu, L.: Math spotting: Retrieving math in technical documents using handwritten query images. In: 2011 International Conference on Document Analysis and Recognition, pp. 446–451 (2011)
Zanibbi, R., Yuan, B.: Keyword and image-based retrieval of mathematical expressions. In: Document Recognition and Retrieval XVIII, vol. 7874, pp. 1–10. SPIE (2011)
Zhao, J., Kan, M.Y., Theng, Y.L.: Math information retrieval: user requirements and prototype implementation. In: Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2008, pp. 187–196. ACM, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pavan Kumar, P., Agarwal, A., Bhagvati, C. (2012). A Structure Based Approach for Mathematical Expression Retrieval. In: Sombattheera, C., Loi, N.K., Wankar, R., Quan, T. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2012. Lecture Notes in Computer Science(), vol 7694. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35455-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-35455-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35454-0
Online ISBN: 978-3-642-35455-7
eBook Packages: Computer ScienceComputer Science (R0)