An Integrated Approach for Large-Scale Relation Extraction from the Web

  • Naimdjon Takhirov
  • Fabien Duchateau
  • Trond Aalberg
  • Ingeborg Sølvberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7808)


Deriving knowledge from information stored in unstructured documents is a major challenge. More specifically, binary relationships representing facts between entities can be extracted to populate semantic triple stores or large knowledge bases. The main constraint of all knowledge extraction approaches is to find a trade-off between quality and scalability. Thus, we propose in this paper SPIDER, a novel integrated system for extracting binary relationships at large scale. Through series of experiments, we show the benefit of our approach, which in general, outperforms existing systems both in terms of quality (precision and the number of discovered facts) and scalability.


Relation Extraction Knowledge Bases Web Mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proc. of DL, pp. 85–94. ACM (2000)Google Scholar
  2. 2.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proc. of IJCAI, pp. 2670–2676. Morgan Kaufmann (2007)Google Scholar
  3. 3.
    Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  4. 4.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proc. of AAAI. AAAI Press (2010)Google Scholar
  5. 5.
    Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Communication of ACM 51, 68–74 (2008)CrossRefGoogle Scholar
  6. 6.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proc. of EMNLP, pp. 1535–1545. ACL (2011)Google Scholar
  7. 7.
    Levenshtein, V.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Journal of Soviet Physics Doklady 10, 707 (1966)MathSciNetGoogle Scholar
  8. 8.
    Lynam, T.R., Cormack, G.V., Cheriton, D.R.: On-line spam filter fusion. In: Proc. of SIGIR, pp. 123–130. ACM (2006)Google Scholar
  9. 9.
    Mausam, Schmitz, M., Soderland, S., Bart, R., Etzioni, O.: Open language learning for information extraction. In: Proc. of EMNLP, pp. 523–534. ACL (2012)Google Scholar
  10. 10.
    Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: Proc. of WSDM, pp. 227–236. ACM (2011)Google Scholar
  11. 11.
    Pantel, P., Pennacchiotti, M.: Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: Proc. of ACL, pp. 113–120. ACL (2006)Google Scholar
  12. 12.
    Parameswaran, A., Garcia-Molina, H., Rajaraman, A.: Towards the web of concepts: extracting concepts from large datasets. VLDB Endowment 3, 566–577 (2010)Google Scholar
  13. 13.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)zbMATHGoogle Scholar
  14. 14.
    Takhirov, N., Duchateau, F., Aalberg, T.: An evidence-based verification approach to extract entities and relations for knowledge base population. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 575–590. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Naimdjon Takhirov
    • 1
  • Fabien Duchateau
    • 2
  • Trond Aalberg
    • 1
  • Ingeborg Sølvberg
    • 1
  1. 1.Norwegian University of Science and TechnologyTrondheimNorway
  2. 2.LIRIS, UMR5205Université Lyon 1LyonFrance

Personalised recommendations