Interactive Document Indexing Method Based on Explicit Semantic Analysis

Janusz, Andrzej; Świeboda, Wojciech; Krasuski, Adam; Nguyen, Hung Son

doi:10.1007/978-3-642-32115-3_18

Andrzej Janusz²⁶,
Wojciech Świeboda²⁶,
Adam Krasuski^26,27 &
…
Hung Son Nguyen²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7413))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

1983 Accesses
11 Citations

Abstract

In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.

This work is partially supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the Strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information” and grants from Ministry of Science and Higher Education of the Republic of Poland N N516 077837.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fazzinga, B., Gianforme, G., Gottlob, G., Lukasiewicz, T.: Semantic web search based on ontological conjunctive queries. Web Semantics: Science, Services and Agents on the World Wide Web (2011)
Google Scholar
Nguyen, L.A., Nguyen, H.S.: On Designing the SONCA System. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 9–35. Springer, Heidelberg (2012)
Chapter Google Scholar
Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic Analytics of PubMed Content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)
Chapter Google Scholar
Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with Use of Knowledge from DBpedia. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS (LNAI), vol. 6954, pp. 394–403. Springer, Heidelberg (2011)
Chapter Google Scholar
Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)
Article Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proc. of the 20th Int. Joint Conf. on Artificial Intelligence, Hyderabad, India, pp. 1606–1611 (2007)
Google Scholar
Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval, 2008. Online edition (2007)
Google Scholar
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. Int. Journal on Semantic Web and Information Systems (IJSWIS). Special Issue of Multimedia Semantics 3(3), 55–73 (2006)
Google Scholar
Rinaldi, A.M.: An ontology-driven approach for semantic information retrieval on the web. ACM Trans. Internet Technol. 9, 1–24 (2009)
Article Google Scholar
Mitchell, T.M.: Machine Learning. McGraw Hill series in computer science. McGraw-Hill (1997)
Google Scholar
United States National Library of Medicine: Introduction to MeSH - 2011 (2011), http://www.nlm.nih.gov/mesh/introduction.html
Feldman, R., Sanger, J. (eds.): The Text Mining Handbook. Cambridge University Press (2007)
Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008)
Google Scholar
Janusz, A., Nguyen, H.S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. In: Yan, J.T., et al. (eds.) RSCTC 2012. LNCS (LNAI), vol. 7413, pp. 422–431. Springer, Heidelberg (2012)
Google Scholar
Janusz, A., Ślęzak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundamenta Informaticae (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
Andrzej Janusz, Wojciech Świeboda, Adam Krasuski & Hung Son Nguyen
Chair of Computer Science, The Main School of Fire Service, Słowackiego 52/54, 01-629, Warsaw, Poland
Adam Krasuski

Authors

Andrzej Janusz
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Świeboda
View author publications
You can also search for this author in PubMed Google Scholar
Adam Krasuski
View author publications
You can also search for this author in PubMed Google Scholar
Hung Son Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina, S4S 0A2, Regina, SK, Canada
JingTao Yao
School of Information Science and Technology, Southwest Jiaotong University, 610031, Chengdu, P.R. China
Yan Yang
Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland
Roman Słowiński
Faculty of Economics, University of Catania, Corso Italia, 55, 95129, Catania, Italy
Salvatore Greco
School of Management and Engineering, Nanjing University, 210093, Nanjing, Jiangsu, P.R. China
Huaxiong Li
Machine Intelligence Unit, Indian Statistical Institute (ISI), 700108, Kolkata, India
Sushmita Mitra
Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008, Warsaw, Poland
Lech Polkowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S. (2012). Interactive Document Indexing Method Based on Explicit Semantic Analysis. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-32115-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32114-6
Online ISBN: 978-3-642-32115-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics