Automatic Extraction of Explicit and Implicit Keywords to Build Document Descriptors

Ventura, João; Silva, Joaquim

doi:10.1007/978-3-642-40669-0_42

João Ventura²² &
Joaquim Silva²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8154))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

2865 Accesses
1 Citations

Abstract

Keywords are single and multiword terms that describe the semantic content of documents. They are useful in many applications, such as document searching and indexing, or to be read by humans. Keywords can be explicit, by occurring in documents, or implicit, since, although not explicitly written in documents, they are semantically related to their contents. This paper presents a statistical approach to build document descriptors with explicit and implicit keywords automatically extracted from the documents. Our approach is language-independent and we show comparative results for three different European languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2, 159–168 (1958)
Article MathSciNet Google Scholar
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)
Article Google Scholar
Salton, G., Yang, C.: On the specification of term value in automatic indexing. Journal of Documentation 29(4), 351–372 (1973)
Article Google Scholar
Cigarrán, J.M., Peñas, A., Gonzalo, J., Verdejo, M.F.: Automatic selection of noun phrases as document descriptors in an fca-based information retrieval system. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3403, pp. 49–63. Springer, Heidelberg (2005)
Chapter Google Scholar
Hulth, A.: Enhancing linguistically oriented automatic keyword extraction. In: Proceedings of Human Language Technology - North American Association for Computational Linguistics, pp. 17–20 (2004)
Google Scholar
Alani, H., Sanghee, K., Millard, D.E., Weal, M.J., Lewis, P.H., Hall, W., Shadbolt, N.: Automatic extraction of knowledge from web documents. In: Proceedings of Workshop of Human Language Technology for the Semantic Web and Web Services, 2nd International Semantic Web Conference (2003)
Google Scholar
Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Information Processing and Management: An International Journal Archive 6, 1705–1714 (2007)
Article Google Scholar
Zhang, K., Xu, H., Tang, J., Li, J.: Keyword extraction using support vector machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 85–96. Springer, Heidelberg (2006)
Chapter Google Scholar
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), vol. 2 (2004)
Google Scholar
Silva, J.F., Lopes, G.P.: Towards automatic building of document keywords. In: COLING 2010 The 23rd International Conference on Computational Linguistics, pp. 1149–1157 (2010)
Google Scholar
Teixeira, L.F., Lopes, G.P., Ribeiro, R.A.: An extensive comparison of metrics for automatic extraction of key terms. In: Proceedings of 4th International Conference on Agents and Artificial Intelligence, pp. 55–63 (2012)
Google Scholar
Ventura, J., Silva, J.F.: Mining concepts from texts. In: International Conference on Computer Science (2012)
Google Scholar
Suzuki, Y., Fukumoto, F., Sekiguchi, Y.: Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles. In: SIGIR (1998)
Google Scholar
Delort, J.Y., Bouchon-Meunier, B., Rifqi, M.: Enhanced web document summarization using hyperlinks. In: Proceedings of the Fourteenth Association for Computing Machinery Conference on Hypertext and Hypermedia (2003)
Google Scholar
Xu, S., Yang, S., Lau, F.C.: Keyword extraction and headline generation using novel word features. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010) (2010)
Google Scholar
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM 2007: Proceedings of the 16th ACM Conference on Information and Knowledge Management, vol. 2 (2010)
Google Scholar
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of Wikipedia and AI Workshop at the AAAI 2008 Conference (WikiAI 2008) (2008)
Google Scholar
Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multiword units. In: Proceedings of the 6th Meeting on the Mathematics of Language, pp. 369–381 (1999)
Google Scholar
Frantzi, K., Ananiadou, S.: Extracting nested collocations. In: The 16th International Conference on Computational Linguistics (COLING 1996), pp. 41–46 (1996)
Google Scholar
Yoshida, M., Nakagawa, H.: Automatic term extraction based on perplexity of compound words. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 269–279. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

CITI/DI/FCT, Universidade Nova de Lisboa, Campus de Caparica, 2829-516, Caparica, Portugal
João Ventura & Joaquim Silva

Authors

João Ventura
View author publications
You can also search for this author in PubMed Google Scholar
Joaquim Silva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatics Department, University of Lisbon, Campo Grande, 174-016, Lisbon, Portugal
Luís Correia
Information Systems Department, University of Minho, Campus de Azurém, 4800-058, Guimarães, Portugal
Luís Paulo Reis
Department of Education, University of the Azores, Campus de Angra do Heroísmo, Angra do Heroísma, 9700-042, Azores, Portugal
José Cascalho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ventura, J., Silva, J. (2013). Automatic Extraction of Explicit and Implicit Keywords to Build Document Descriptors. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-40669-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40668-3
Online ISBN: 978-3-642-40669-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics