Abstract
Semantic annotation is an enabling technology which links documents to concepts that unambiguously describe their content. Annotation improves access to document contents for both humans and software agents. However, the annotation process is a challenging task as annotators often have to select from thousands of potentially relevant concepts from controlled vocabularies. The best approaches to assist in this task rely on reusing the annotations of an annotated corpus. In the absence of a pre-annotated corpus, alternative approaches suffer due to insufficient descriptive texts for concepts in most vocabularies. In this paper, we propose an unsupervised method for recommending document annotations based on generating node descriptors from an external corpus. We exploit knowledge of the taxonomic structure of a thesaurus to ensure that effective descriptors (concept summaries) are generated for concepts. Our evaluation on recommending annotations show that the content that we generate effectively represents the concepts. Also, our approach outperforms those which rely on information from a thesaurus alone and is comparable with supervised approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
An example of documents used http://pubs.bgs.ac.uk/publications.html?pubID= B01745.
- 2.
- 3.
- 4.
- 5.
Elasticsearch Java API http://www.elastic.co/guide/en/elasticsearch/client/java-api/5.2.
References
Berlanga, R., Nebot, V., Pérez, M.: Tailored semantic annotation for semantic search. In: Web Semantics Science, Services and Agents on the World Wide Web (2014)
Trieschnigg, D., Pezik, P., Lee, V., De Jong, F., Kraaij, W., Rebholz-Schuhmann, D.: MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics 25(11), 1412–1418 (2009)
Robertson, S.E., Walker, S., Beaulieu, M., Gatford, M., Payne, A.: Okapi at TREC-4, pp. 73–96. NIST Special Publication SP (1996)
Große-Bölting, G., Nishioka, C., Scherp, A.: A comparison of different strategies for automated semantic document annotation. In: Proceedings of the 8th International Conference on Knowledge Capture, vol. 8. ACM (2015)
Aronson, A.R., Mork, J.G., Gay, C.W., Humphrey, S.M., Rogers, W.J.: The NLM indexing initiative’s Medical Text Indexer. Medinfo 11(Pt 1), 268–72 (2004)
Giannopoulos, G., Bikakis, N., Dalamagas, T., Sellis, T.: GoNTogle: a tool for semantic annotation and search. In: Aroyo, L., Antoniou, G., Hyvönen, E., Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 376–380. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13489-0_27
Huang, M., Névéol, A., Lu, Z.: Recommending MeSH terms for annotating biomedical articles. J. Am. Med. Inform. Assoc. 18(5), 660–667 (2011)
Dramé, K., Mougin, F., Diallo, G.: Large scale biomedical texts classification: A kNN and an ESA-based approaches. J. Biomed. Semant. 7(1), 40 (2016)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. In: Proceedings of the AMIA Symposium, American Medical Informatics Association, vol. 17 (2001)
Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using DBpedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 465–474. ACM (2013)
Hertling, S., Paulheim, H.: Wikimatch: using wikipedia for ontology matching. In: Proceedings of the 7th International Conference on Ontology Matching, vol. 946, pp. 37–48. CEUR-WS.org (2012)
Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., Motta, E.: Semantically enhanced information retrieval: An ontology-based approach. Web Semantics Science, Services and Agents on the World Wide Web, vol. 9(4), pp. 434–452 (2011)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G., Milios, E.: Information retrieval by semantic similarity. Int. J. Semant. Web Inf. Syst. 2(3), 55–73 (2006)
Knappe, R., Bulskov, H., Andreasen, T.: Perspectives on ontology-based querying. Int. J. Intell. Syst. 22(7), 739–761 (2007)
Leal Bando, L., Scholer, F., Turpin, A.: Query-biased summary generation assisted by query expansion. J. Assoc. Inf. Sci. Technol. 66(5), 961–979 (2015)
Acknowledgement
This work is partly funded by the British Geological Survey (BGS) through the BGS University Funding Initiative (BUFI). We are grateful for the valuable comments of our reviewers.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Nkisi-Orji, I., Wiratunga, N., Hui, KY., Heaven, R., Massie, S. (2017). Taxonomic Corpus-Based Concept Summary Generation for Document Annotation. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-67008-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67007-2
Online ISBN: 978-3-319-67008-9
eBook Packages: Computer ScienceComputer Science (R0)