Skip to main content

Interactive Document Indexing Method Based on Explicit Semantic Analysis

  • Conference paper
Book cover Rough Sets and Current Trends in Computing (RSCTC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7413))

Included in the following conference series:

Abstract

In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.

This work is partially supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the Strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information” and grants from Ministry of Science and Higher Education of the Republic of Poland N N516 077837.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fazzinga, B., Gianforme, G., Gottlob, G., Lukasiewicz, T.: Semantic web search based on ontological conjunctive queries. Web Semantics: Science, Services and Agents on the World Wide Web (2011)

    Google Scholar 

  2. Nguyen, L.A., Nguyen, H.S.: On Designing the SONCA System. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 9–35. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic Analytics of PubMed Content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with Use of Knowledge from DBpedia. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS (LNAI), vol. 6954, pp. 394–403. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)

    Article  Google Scholar 

  6. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proc. of the 20th Int. Joint Conf. on Artificial Intelligence, Hyderabad, India, pp. 1606–1611 (2007)

    Google Scholar 

  7. Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval, 2008. Online edition (2007)

    Google Scholar 

  8. Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. Int. Journal on Semantic Web and Information Systems (IJSWIS). Special Issue of Multimedia Semantics 3(3), 55–73 (2006)

    Google Scholar 

  9. Rinaldi, A.M.: An ontology-driven approach for semantic information retrieval on the web. ACM Trans. Internet Technol. 9, 1–24 (2009)

    Article  Google Scholar 

  10. Mitchell, T.M.: Machine Learning. McGraw Hill series in computer science. McGraw-Hill (1997)

    Google Scholar 

  11. United States National Library of Medicine: Introduction to MeSH - 2011 (2011), http://www.nlm.nih.gov/mesh/introduction.html

  12. Feldman, R., Sanger, J. (eds.): The Text Mining Handbook. Cambridge University Press (2007)

    Google Scholar 

  13. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008)

    Google Scholar 

  14. Janusz, A., Nguyen, H.S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. In: Yan, J.T., et al. (eds.) RSCTC 2012. LNCS (LNAI), vol. 7413, pp. 422–431. Springer, Heidelberg (2012)

    Google Scholar 

  15. Janusz, A., Ślęzak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundamenta Informaticae (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S. (2012). Interactive Document Indexing Method Based on Explicit Semantic Analysis. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32115-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32114-6

  • Online ISBN: 978-3-642-32115-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics