Entity Recognition in Information Extraction

  • Novita Hanafiah
  • Christoph Quix
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8397)

Abstract

Detecting and resolving entities is an important step in information retrieval applications. Humans are able to recognize entities by context, but information extraction systems (IES) need to apply sophisticated algorithms to recognize an entity. The development and implementation of an entity recognition algorithm is described in this paper. The implemented system is integrated with an IES that derives triples from unstructured text. By doing so, the triples are more valuable in query answering because they refer to identified entities. By extracting the information from Wikipedia encyclopedia, a dictionary of entities and their contexts is built. The entity recognition computes a score for context similarity which is based on cosine similarity with a tf-idf weighting scheme and the string similarity. The implemented system shows a good accuracy on Wikipedia articles, is domain independent, and recognizes entities of arbitrary types.

Keywords

Noun Phrase Ranking Function Cosine Similarity Entity Recognition Actual Entity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bast, H., Chitea, A., Suchanek, F.M., Weber, I.: Ester: efficient search on text, entities, and relations. In: Kraaij, W., de Vries, A.P., Clarke, C.L.A., Fuhr, N., Kando, N. (eds.) Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 671–678. ACM (2007)Google Scholar
  2. 2.
    Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: McCarthy, D., Wintner, S. (eds.) Proc. 11th Conf. of the European Chapter of the Association for Computational Linguistics, Trento, Italy (2006)Google Scholar
  3. 3.
    Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Kambhampati, S., Knoblock, C.A. (eds.) Proceedings of IJCAI 2003 Workshop on Information Integration on the Web (IIWeb), Acapulco, Mexico, pp. 73–78 (2003)Google Scholar
  4. 4.
    Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 708–716 (2007)Google Scholar
  5. 5.
    Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K., Rajagopalan, S., Tomkins, A.: A case for automated large-scale semantic annotation. Web Semantics 1(1), 115–132 (2003)CrossRefGoogle Scholar
  6. 6.
    Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China, pp. 277–285. Tsinghua University Press (2010)Google Scholar
  7. 7.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK, pp. 1535–1545 (2011)Google Scholar
  8. 8.
    Hakimov, S., Oto, S.A., Dogdu, E.: Named entity recognition and disambiguation using linked data and graph-based centrality scoring. In: Virgilio, R.D., Giunchiglia, F., Tanca, L. (eds.) Proc. 4th Intl. Workshop on Semantic Web Information Management (SWIM), Scottsdale, AZ. ACM (2012)Google Scholar
  9. 9.
    Halevy, A.Y., Etzioni, O., Doan, A., Ives, Z.G., Madhavan, J., McDowell, L., Tatarinov, I.: Crossing the structure chasm. In: Proc. 1st Biennal Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2003)Google Scholar
  10. 10.
    In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China. Tsinghua University Press (2010)Google Scholar
  11. 11.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  12. 12.
    Yosef, M.A., Hoart, J., Bordino, I., Spaniol, M., Weikum, G.: Aida: An online tool for accurate disambiguation of named entities in text and tables. PVLDB 4(12), 1450–1453 (2011)Google Scholar
  13. 13.
    Zhang, W., Su, J., Tan, C.L., Wang, W.: Entity linking leveraging automatically generated annotation. In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China, pp. 1290–1298. Tsinghua University Press (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Novita Hanafiah
    • 1
    • 2
  • Christoph Quix
    • 3
    • 4
  1. 1.Thai-German Graduate School of EngineeringKing Mongkut’s University of Technology North BangkokBangkokThailand
  2. 2.School of Computer ScienceBina Nusantara UniversityJakartaIndonesia
  3. 3.Fraunhofer Institute for Applied Information Technology FITSt. AugustinGermany
  4. 4.Information Systems and DatabasesRWTH Aachen UniversityAachenGermany

Personalised recommendations