Taxonomic Corpus-Based Concept Summary Generation for Document Annotation

Nkisi-Orji, Ikechukwu; Wiratunga, Nirmalie; Hui, Kit-Ying; Heaven, Rachel; Massie, Stewart

doi:10.1007/978-3-319-67008-9_5

Ikechukwu Nkisi-Orji¹⁸,
Nirmalie Wiratunga¹⁸,
Kit-Ying Hui¹⁸,
Rachel Heaven¹⁹ &
…
Stewart Massie¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10450))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

2424 Accesses
1 Altmetric

Abstract

Semantic annotation is an enabling technology which links documents to concepts that unambiguously describe their content. Annotation improves access to document contents for both humans and software agents. However, the annotation process is a challenging task as annotators often have to select from thousands of potentially relevant concepts from controlled vocabularies. The best approaches to assist in this task rely on reusing the annotations of an annotated corpus. In the absence of a pre-annotated corpus, alternative approaches suffer due to insufficient descriptive texts for concepts in most vocabularies. In this paper, we propose an unsupervised method for recommending document annotations based on generating node descriptors from an external corpus. We exploit knowledge of the taxonomic structure of a thesaurus to ensure that effective descriptors (concept summaries) are generated for concepts. Our evaluation on recommending annotations show that the content that we generate effectively represents the concepts. Also, our approach outperforms those which rely on information from a thesaurus alone and is comparable with supervised approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
An example of documents used http://pubs.bgs.ac.uk/publications.html?pubID= B01745.
2.
http://www.bgs.ac.uk/discoverymetadata/13603129.html.
3.
http://data.bgs.ac.uk/doc/Geochronology.html.
4.
http://data.bgs.ac.uk/doc/Lexicon.html.
5.
Elasticsearch Java API http://www.elastic.co/guide/en/elasticsearch/client/java-api/5.2.

References

Berlanga, R., Nebot, V., Pérez, M.: Tailored semantic annotation for semantic search. In: Web Semantics Science, Services and Agents on the World Wide Web (2014)
Google Scholar
Trieschnigg, D., Pezik, P., Lee, V., De Jong, F., Kraaij, W., Rebholz-Schuhmann, D.: MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics 25(11), 1412–1418 (2009)
Article Google Scholar
Robertson, S.E., Walker, S., Beaulieu, M., Gatford, M., Payne, A.: Okapi at TREC-4, pp. 73–96. NIST Special Publication SP (1996)
Google Scholar
Große-Bölting, G., Nishioka, C., Scherp, A.: A comparison of different strategies for automated semantic document annotation. In: Proceedings of the 8th International Conference on Knowledge Capture, vol. 8. ACM (2015)
Google Scholar
Aronson, A.R., Mork, J.G., Gay, C.W., Humphrey, S.M., Rogers, W.J.: The NLM indexing initiative’s Medical Text Indexer. Medinfo 11(Pt 1), 268–72 (2004)
Google Scholar
Giannopoulos, G., Bikakis, N., Dalamagas, T., Sellis, T.: GoNTogle: a tool for semantic annotation and search. In: Aroyo, L., Antoniou, G., Hyvönen, E., Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 376–380. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13489-0_27
Chapter Google Scholar
Huang, M., Névéol, A., Lu, Z.: Recommending MeSH terms for annotating biomedical articles. J. Am. Med. Inform. Assoc. 18(5), 660–667 (2011)
Article Google Scholar
Dramé, K., Mougin, F., Diallo, G.: Large scale biomedical texts classification: A kNN and an ESA-based approaches. J. Biomed. Semant. 7(1), 40 (2016)
Article Google Scholar
Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. In: Proceedings of the AMIA Symposium, American Medical Informatics Association, vol. 17 (2001)
Google Scholar
Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using DBpedia. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 465–474. ACM (2013)
Google Scholar
Hertling, S., Paulheim, H.: Wikimatch: using wikipedia for ontology matching. In: Proceedings of the 7th International Conference on Ontology Matching, vol. 946, pp. 37–48. CEUR-WS.org (2012)
Google Scholar
Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., Motta, E.: Semantically enhanced information retrieval: An ontology-based approach. Web Semantics Science, Services and Agents on the World Wide Web, vol. 9(4), pp. 434–452 (2011)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Google Scholar
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G., Milios, E.: Information retrieval by semantic similarity. Int. J. Semant. Web Inf. Syst. 2(3), 55–73 (2006)
Article Google Scholar
Knappe, R., Bulskov, H., Andreasen, T.: Perspectives on ontology-based querying. Int. J. Intell. Syst. 22(7), 739–761 (2007)
Article MATH Google Scholar
Leal Bando, L., Scholer, F., Turpin, A.: Query-biased summary generation assisted by query expansion. J. Assoc. Inf. Sci. Technol. 66(5), 961–979 (2015)
Article Google Scholar

Download references

Acknowledgement

This work is partly funded by the British Geological Survey (BGS) through the BGS University Funding Initiative (BUFI). We are grateful for the valuable comments of our reviewers.

Author information

Authors and Affiliations

Robert Gordon University, Aberdeen, UK
Ikechukwu Nkisi-Orji, Nirmalie Wiratunga, Kit-Ying Hui & Stewart Massie
British Geological Survey, Nottingham, UK
Rachel Heaven

Authors

Ikechukwu Nkisi-Orji
View author publications
You can also search for this author in PubMed Google Scholar
Nirmalie Wiratunga
View author publications
You can also search for this author in PubMed Google Scholar
Kit-Ying Hui
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Heaven
View author publications
You can also search for this author in PubMed Google Scholar
Stewart Massie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ikechukwu Nkisi-Orji .

Editor information

Editors and Affiliations

Faculteit der Geesteswetenschappen, Universiteit van Amsterdam , Amsterdam, The Netherlands
Jaap Kamps
Library & Information Center, University of Patras , Patras, Greece
Giannis Tsakonas
Aristotle University of Thessaloniki , Thessaloniki, Greece
Yannis Manolopoulos
Civil Engineering, University of Thrace , Kimmeria, Greece
Lazaros Iliadis
Informatics, Ionian University , Kerkyra, Greece
Ioannis Karydis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nkisi-Orji, I., Wiratunga, N., Hui, KY., Heaven, R., Massie, S. (2017). Taxonomic Corpus-Based Concept Summary Generation for Document Annotation. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-67008-9_5
Published: 02 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67007-2
Online ISBN: 978-3-319-67008-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics