Unsupervised Relation Extraction in Specialized Corpora Using Sequence Mining

  • Kata GáborEmail author
  • Haïfa Zargayouna
  • Isabelle Tellier
  • Davide Buscaldi
  • Thierry Charnois
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9897)


This paper deals with the extraction of semantic relations from scientific texts. Pattern-based representations are compared to word embeddings in unsupervised clustering experiments, according to their potential to discover new types of semantic relations and recognize their instances. The results indicate that sequential pattern mining can significantly improve pattern-based representations, even in a completely unsupervised setting.


Sequential Pattern Semantic Relation Parse Tree Relation Extraction Sequential Pattern Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is part of the program “Investissements d’Avenir” overseen by the French National Research Agency, ANR-10-LABX-0083 (Labex EFL). The authors would like to thank the anonymous reviewers for their valuable comments.


  1. 1.
    Banko, M., Cafarella, J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, pp. 2670–2676 (2007)Google Scholar
  2. 2.
    Baroni, M., Bernardi, R., Do, N.-Q., Shan, C.-C.: Entailment above the word level in distributional semantics. In: ACL 2012 (2012)Google Scholar
  3. 3.
    Baroni, M., Dinu, G., Kruszewski, G.: Dont count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL 2014 (2014)Google Scholar
  4. 4.
    Béchet, N., Cellier, P., Charnois, T., Crémilleux, B.: Discovering linguistic patterns using sequence mining. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, pp. 154–165. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-28604-9_13 CrossRefGoogle Scholar
  5. 5.
    Bordea, G., Buitelaar, P., Polajnar, T.: Domain-independent term extraction through domain modelling. In: TIA 2013 (2013)Google Scholar
  6. 6.
    Chavalarias, D., Cointet, J.-P.: Phylomemetic patterns in science evolution - the rise and fall of scientific fields. PLOS ONE 8(2), e54847 (2013)CrossRefGoogle Scholar
  7. 7.
    Daille, B.: Building bilingual terminologies from comparable corpora: the TTC termsuite. In: 5th Workshop on Building and Using Comparable Corpora, Co-located with LREC, pp. 39–32 (2012)Google Scholar
  8. 8.
    Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: International Conference on World Wide Web, WWW 2013 (2013)Google Scholar
  9. 9.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP 2011 (2011)Google Scholar
  10. 10.
    Ferret, O.: Typing relations in distributional thesauri. In: Gala, N., Rapp, R., Bel-Enguix, G. (eds.) Language Production, Cognition, and the Lexicon. Text, Speech and Language Technology, vol. 48, pp. 113–134. Springer, Heidelberg (2015)Google Scholar
  11. 11.
    Gábor, K., Zargayouna, H., Buscaldi, D., Tellier, I., Charnois, T.: Semantic annotation of the acl anthology corpus for the automatic analysis of scientific literature. In: LREC 2016, Portoroz, Slovenia (2016, in press)Google Scholar
  12. 12.
    Gábor, K., Zargayouna, H., Tellier, I., Buscaldi, D., Charnois, T.: A typology of semantic relations dedicated to scientific literature analysis. In: SAVE-SD Workshop at the 25th World Wide Web Conference (2016)Google Scholar
  13. 13.
    Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: COLING 1992, pp. 539–545 (1992)Google Scholar
  14. 14.
    Hendrickx, I., Kim, S.N., Kozareva, Z., Nakov, D., Séaghdha, P.O., Padó, S., Pennacchiotti, M., Romano, L., Szpakowicz, S.: Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the Workshop on Semantic Evaluations (2010)Google Scholar
  15. 15.
    Hobbs, J.R., Riloff, E.: Information extraction. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010)Google Scholar
  16. 16.
    Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), pp. 624–639. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-87479-9_59 CrossRefGoogle Scholar
  17. 17.
    Korhonen, A., Krymolowski, Y., Collier, N.: The choice of features for classification of verbs in biomedical texts. In: COLING (2008)Google Scholar
  18. 18.
    Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. ACL 3, 211–225 (2015)Google Scholar
  19. 19.
    Levy, O., Remus, S., Biemannm, C., Dagan, I.: Do supervised distributional methods really learn lexical inference relations? In: ACL 2015 (2015)Google Scholar
  20. 20.
    Lin, D., Pantel, P.: Dirt: discovery of inference rules from text. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2001)Google Scholar
  21. 21.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)Google Scholar
  22. 22.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (2013)Google Scholar
  23. 23.
    Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL (2013)Google Scholar
  24. 24.
    Min, B., Shi, S., Grishman, R., Lin, C.-Y.: Ensemble semantics for large-scale unsupervised relation extraction. In: EMNLP 2012 (2012)Google Scholar
  25. 25.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Omodei, E., Cointet, J.-P., Poibeau, T.: Mapping the natural language processing domain: experiments using the acl anthology. In: LREC 2014 (2014)Google Scholar
  27. 27.
    Petasis, G., Karkaletsis, V., Paliouras, G., Krithara, A., Zavitsanos, E.: Ontology population and enrichment: state of the art. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds.) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. LNCS (LNAI), pp. 134–166. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-20795-2_6 CrossRefGoogle Scholar
  28. 28.
    Presutti, V., Consoli, S., Nuzzolese, A.G., Recupero, D.R., Gangemi, A., Bannour, I., Zargayouna, H.: Uncovering the semantics of wikipedia pagelinks. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS (LNAI), pp. 413–428. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-13704-9_32 Google Scholar
  29. 29.
    Radev, D.R., Muthukrishnan, P., Qazvinian, V.: The ACL anthology network corpus. In: ACL Workshop on Text and Citation Analysis for Scholarly Digital Libraries (2009)Google Scholar
  30. 30.
    Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web (2015)Google Scholar
  31. 31.
    Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, pp. 1–17. Springer, Heidelberg (1996). doi: 10.1007/BFb0014140 Google Scholar
  32. 32.
    Turney, P.D.: Similarity of semantic relations. CoRR, abs/cs/0608100 (2006)Google Scholar
  33. 33.
    Weeds, J., Clarke, D., Reffin, J., Weir, D., Keller, B.: Learning to distinguish hypernyms and co-hyponyms. In: COLING 2014 (2014)Google Scholar
  34. 34.
    Yangarber, R., Lin, W., Grishman, R.: Unsupervised learning of generalized names. In: COLING 2002 (2002)Google Scholar
  35. 35.
    Yao, L., Haghighi, A., Riedel, S., McCallum, A.: Structured relation discovery using generative models. In: EMNLP 2011 (2011)Google Scholar
  36. 36.
    Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: CIKM (2002)Google Scholar
  37. 37.
    Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10, 141–168 (2005)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: ACL 2005 (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Kata Gábor
    • 1
    Email author
  • Haïfa Zargayouna
    • 1
  • Isabelle Tellier
    • 2
  • Davide Buscaldi
    • 1
  • Thierry Charnois
    • 1
  1. 1.LIPN, CNRS (UMR 7030), Université Paris 13VilletaneuseFrance
  2. 2.LaTTiCe, CNRS (UMR 8094), ENS Paris, Université Sorbonne Nouvelle - Paris 3, PSL Research University, Université Sorbonne Paris CitéParisFrance

Personalised recommendations