Skip to main content

Indexing and Searching Mathematics in Digital Libraries

Architecture, Design and Scalability Issues

  • Conference paper
Intelligent Computer Mathematics (CICM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6824))

Included in the following conference series:

Abstract

This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware search engine based on the state-of-the-art system, Apache Lucene.

Scalability issues were checked based on 324,000 real scientific documents from arXiv archive with 112 million mathematical formulae. More than two billions MathML subformulae were indexed using our Solr-compatible Lucene extension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altamimi, M., Youssef, A.S.: A Math Query Language with an Expanded Set of Wildcards. Mathematics in Computer Science 2, 305–331 (2008), http://dx.doi.org/10.1007/s11786-008-0056-4

    Article  MathSciNet  MATH  Google Scholar 

  2. Anca, Ş.: Natural Language and Mathematics Processing for Applicable Theorem Search. Master’s thesis, Jacobs University, Bremen (August 2009), https://svn.eecs.jacobs-university.de/svn/eecs/archive/msc-2009/aanca.pdf

  3. Archambault, D., Berger, F., Moço, V.: Overview of the “Universal Maths Conversion Library”. In: Pruski, A., Knops, H. (eds.) Assistive Technology: From Virtuality to Reality: Proceedings of 8th European Conference for the Advancement of Assistive Technology in Europe AAATE 2005, Lille, France, pp. 256–260. IOS Press, Amsterdam (September 2005)

    Google Scholar 

  4. Archambault, D., Moço, V.: Canonical MathML to Simplify Conversion of MathML to Braille Mathematical Notations. In: Miesenberger, K., Klaus, J., Zagler, W., Karshmer, A. (eds.) ICCHP 2006. LNCS, vol. 4061, pp. 1191–1198. Springer, Heidelberg (2006), http://dx.doi.org/10.1007/11788713_172

    Chapter  Google Scholar 

  5. Baker, J.B., Sexton, A.P., Sorge, V.: Extracting Precise Data on the Mathematical Content of PDF Documents. In: Sojka [11], pp. 75–79, http://dml.cz/handle/10338.dmlcz/702535

  6. Grigore, M., Wolska, M., Kohlhase, M.: Towards context-based disambiguation of mathematical expressions. Math-for-Industry Lecture Note Series, vol. 22, pp. 262–271 (December 2009)

    Google Scholar 

  7. Líška, M.: Vyhledávání v matematickém textu (in Slovak), Searching Mathematical Texts. Bachelor Thesis, Masaryk University, Brno, Faculty of Informatics (advisor: Petr Sojka) (2010), https://is.muni.cz/th/255768/fi_b/?lang=en

  8. Mišutka, J., Galamboš, L.: Extending Full Text Search Engine for Mathematical Content. In: Sojka [11], pp. 55–67, http://dml.cz/dmlcz/702546

  9. Munavalli, R., Miner, R.: MathFind: A Math-Aware Search Engine. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 735–735. ACM, New York (2006), http://doi.acm.org/10.1145/1148170.1148348

    Google Scholar 

  10. Růžička, M., Sojka, P.: Data Enhancements in a Digital Mathematics Library. In: Sojka [12], pp. 69–76, http://dml.cz/dmlcz/702575

  11. Sojka, P. (ed.) Towards a Digital Mathematics Library, Birmingham, UK. Masaryk University (July 2008), http://www.fi.muni.cz/~sojka/dml-2008-program.xhtml

  12. Sojka, P. (ed.) Towards a Digital Mathematics Library, Paris, France. Masaryk University (July 2010), http://www.fi.muni.cz/~sojka/dml-2010-program.html

  13. Stamerjohanns, H., Kohlhase, M., Ginev, D., David, C., Miller, B.: Transforming Large Collections of Scientific Publications to XML. Mathematics in Computer Science 3, 299–307 (2010), http://dx.doi.org/10.1007/s11786-010-0024-7

    Article  MATH  Google Scholar 

  14. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY — An integrated OCR system for mathematical documents. In: Vanoirbeek, C., Roisin, C., Munson, E. (eds.) Proceedings of ACM Symposium on Document Engineering 2003, Grenoble, France, pp. 95–104. ACM, New York (2003)

    Chapter  Google Scholar 

  15. Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiński, A., Sojka, P.: EuDML—Towards the European Digital Mathematics Library. In: Sojka [12], pp. 11–24, http://dml.cz/dmlcz/702569

  16. Youssef, A.S.: Roles of Math Search in Mathematics. In: Borwein, J., Farmer, W. (eds.) MKM 2006. LNCS (LNAI), vol. 4108, pp. 2–16. Springer, Heidelberg (2006), http://dx.doi.org/10.1007/11812289_2

    Chapter  Google Scholar 

  17. Youssef, A.S.: Methods of Relevance Ranking and Hit-Content Generation in Math Search. In: Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS 2007. LNCS (LNAI), vol. 4573, pp. 393–406. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sojka, P., Líška, M. (2011). Indexing and Searching Mathematics in Digital Libraries. In: Davenport, J.H., Farmer, W.M., Urban, J., Rabe, F. (eds) Intelligent Computer Mathematics. CICM 2011. Lecture Notes in Computer Science(), vol 6824. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22673-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22673-1_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22672-4

  • Online ISBN: 978-3-642-22673-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics