Abstract
Keywords are single and multiword terms that describe the semantic content of documents. They are useful in many applications, such as document searching and indexing, or to be read by humans. Keywords can be explicit, by occurring in documents, or implicit, since, although not explicitly written in documents, they are semantically related to their contents. This paper presents a statistical approach to build document descriptors with explicit and implicit keywords automatically extracted from the documents. Our approach is language-independent and we show comparative results for three different European languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2, 159–168 (1958)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)
Salton, G., Yang, C.: On the specification of term value in automatic indexing. Journal of Documentation 29(4), 351–372 (1973)
Cigarrán, J.M., Peñas, A., Gonzalo, J., Verdejo, M.F.: Automatic selection of noun phrases as document descriptors in an fca-based information retrieval system. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3403, pp. 49–63. Springer, Heidelberg (2005)
Hulth, A.: Enhancing linguistically oriented automatic keyword extraction. In: Proceedings of Human Language Technology - North American Association for Computational Linguistics, pp. 17–20 (2004)
Alani, H., Sanghee, K., Millard, D.E., Weal, M.J., Lewis, P.H., Hall, W., Shadbolt, N.: Automatic extraction of knowledge from web documents. In: Proceedings of Workshop of Human Language Technology for the Semantic Web and Web Services, 2nd International Semantic Web Conference (2003)
Ercan, G., Cicekli, I.: Using lexical chains for keyword extraction. Information Processing and Management: An International Journal Archive 6, 1705–1714 (2007)
Zhang, K., Xu, H., Tang, J., Li, J.: Keyword extraction using support vector machine. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 85–96. Springer, Heidelberg (2006)
Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), vol. 2 (2004)
Silva, J.F., Lopes, G.P.: Towards automatic building of document keywords. In: COLING 2010 The 23rd International Conference on Computational Linguistics, pp. 1149–1157 (2010)
Teixeira, L.F., Lopes, G.P., Ribeiro, R.A.: An extensive comparison of metrics for automatic extraction of key terms. In: Proceedings of 4th International Conference on Agents and Artificial Intelligence, pp. 55–63 (2012)
Ventura, J., Silva, J.F.: Mining concepts from texts. In: International Conference on Computer Science (2012)
Suzuki, Y., Fukumoto, F., Sekiguchi, Y.: Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles. In: SIGIR (1998)
Delort, J.Y., Bouchon-Meunier, B., Rifqi, M.: Enhanced web document summarization using hyperlinks. In: Proceedings of the Fourteenth Association for Computing Machinery Conference on Hypertext and Hypermedia (2003)
Xu, S., Yang, S., Lau, F.C.: Keyword extraction and headline generation using novel word features. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010) (2010)
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM 2007: Proceedings of the 16th ACM Conference on Information and Knowledge Management, vol. 2 (2010)
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of Wikipedia and AI Workshop at the AAAI 2008 Conference (WikiAI 2008) (2008)
Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multiword units. In: Proceedings of the 6th Meeting on the Mathematics of Language, pp. 369–381 (1999)
Frantzi, K., Ananiadou, S.: Extracting nested collocations. In: The 16th International Conference on Computational Linguistics (COLING 1996), pp. 41–46 (1996)
Yoshida, M., Nakagawa, H.: Automatic term extraction based on perplexity of compound words. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 269–279. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ventura, J., Silva, J. (2013). Automatic Extraction of Explicit and Implicit Keywords to Build Document Descriptors. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-40669-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40668-3
Online ISBN: 978-3-642-40669-0
eBook Packages: Computer ScienceComputer Science (R0)