Abstract
In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.
This work is partially supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the Strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information” and grants from Ministry of Science and Higher Education of the Republic of Poland N N516 077837.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fazzinga, B., Gianforme, G., Gottlob, G., Lukasiewicz, T.: Semantic web search based on ontological conjunctive queries. Web Semantics: Science, Services and Agents on the World Wide Web (2011)
Nguyen, L.A., Nguyen, H.S.: On Designing the SONCA System. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 9–35. Springer, Heidelberg (2012)
Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic Analytics of PubMed Content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)
Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with Use of Knowledge from DBpedia. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS (LNAI), vol. 6954, pp. 394–403. Springer, Heidelberg (2011)
Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proc. of the 20th Int. Joint Conf. on Artificial Intelligence, Hyderabad, India, pp. 1606–1611 (2007)
Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval, 2008. Online edition (2007)
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. Int. Journal on Semantic Web and Information Systems (IJSWIS). Special Issue of Multimedia Semantics 3(3), 55–73 (2006)
Rinaldi, A.M.: An ontology-driven approach for semantic information retrieval on the web. ACM Trans. Internet Technol. 9, 1–24 (2009)
Mitchell, T.M.: Machine Learning. McGraw Hill series in computer science. McGraw-Hill (1997)
United States National Library of Medicine: Introduction to MeSH - 2011 (2011), http://www.nlm.nih.gov/mesh/introduction.html
Feldman, R., Sanger, J. (eds.): The Text Mining Handbook. Cambridge University Press (2007)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008)
Janusz, A., Nguyen, H.S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. In: Yan, J.T., et al. (eds.) RSCTC 2012. LNCS (LNAI), vol. 7413, pp. 422–431. Springer, Heidelberg (2012)
Janusz, A., Ślęzak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundamenta Informaticae (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S. (2012). Interactive Document Indexing Method Based on Explicit Semantic Analysis. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-32115-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32114-6
Online ISBN: 978-3-642-32115-3
eBook Packages: Computer ScienceComputer Science (R0)