Advertisement

Extracting Temporal Patterns from Large-Scale Text Corpus

  • Yu LiuEmail author
  • Wen Hua
  • Xiaofang Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11393)

Abstract

Knowledge, in practice, is time-variant and many relations are only valid for a certain period of time. This phenomenon highlights the importance of designing temporal patterns, i.e., indicating phrases and their temporal meanings, for temporal knowledge harvesting. However, pattern design is extremely laborious and time consuming even for a single relation. Therefore, in this work, we study the problem of temporal pattern extraction by automatically analysing a large-scale text corpus with a small number of seed temporal facts. The problem is challenging considering the ambiguous nature of natural language and the huge amount of documents we need to analyse in order to obtain highly representative temporal patterns. To this end, we introduce various techniques, including corpus annotation, pattern generation, scoring and clustering, to reduce ambiguity in the text corpus and improve both accuracy and coverage of the extracted patterns. We conduct extensive experiments on real world datasets and the experimental results verify the effectiveness of our proposals.

Keywords

Temporal knowledge harvesting Temporal patterns Text mining 

References

  1. 1.
    Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM (2000)Google Scholar
  2. 2.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-76298-0_52CrossRefGoogle Scholar
  3. 3.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)Google Scholar
  4. 4.
    Brin, S.: Extracting patterns and relations from the world wide web. In: Atzeni, P., Mendelzon, A., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999).  https://doi.org/10.1007/10704656_11CrossRefGoogle Scholar
  5. 5.
    Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 15 (2015)Google Scholar
  6. 6.
    Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! Long live rule-based information extraction systems! In: EMNLP, pp. 827–832, October 2013Google Scholar
  7. 7.
    Clark, K., Manning, C.D.: Deep reinforcement learning for mention-ranking coreference models. arXiv preprint arXiv:1609.08667 (2016)
  8. 8.
    Cucerzan, S., Sil, A.: The MSR systems for entity linking and temporal slot filling at TAC 2013. In: Text Analysis Conference (2013)Google Scholar
  9. 9.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)Google Scholar
  10. 10.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)Google Scholar
  11. 11.
    Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1625–1628. ACM (2010)Google Scholar
  12. 12.
    Garrido, G., Penas, A., Cabaleiro, B.: UNED slot filling and temporal slot filling systems at TAC KBP 2013: system description. In: TAC (2013)Google Scholar
  13. 13.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Kuzey, E., Weikum, G.: Extraction of temporal facts and events from Wikipedia. In: Proceedings of the 2nd Temporal Web Analytics Workshop, pp. 25–32. ACM (2012)Google Scholar
  15. 15.
    Ling, X., Weld, D.S.: Temporal information extraction. In: AAAI, vol. 10, pp. 1385–1390 (2010)Google Scholar
  16. 16.
    Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual Wikipedias. In: 7th Biennial Conference on Innovative Data Systems Research. CIDR Conference (2014)Google Scholar
  17. 17.
    Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT, pp. 746–751 (2013)Google Scholar
  18. 18.
    Mitchell, T., et al.: Never-ending learning (2015)Google Scholar
  19. 19.
    Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534. Association for Computational Linguistics (2012)Google Scholar
  20. 20.
    Strötgen, J., Gertz, M.: HeidelTime: high quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324. Association for Computational Linguistics (2010)Google Scholar
  21. 21.
    Surdeanu, M.: Overview of the TAC2013 knowledge base population evaluation: English slot filling and temporal slot filling. In: Proceedings of the Sixth Text Analysis Conference (TAC 2013) (2013)Google Scholar
  22. 22.
    Talukdar, P.P., Wijaya, D., Mitchell, T.: Coupled temporal scoping of relational facts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 73–82. ACM (2012)Google Scholar
  23. 23.
    UzZaman, N., Llorens, H., Derczynski, L., Verhagen, M., Allen, J., Pustejovsky, J.: SemEval-2013 task 1: TempEval-3: evaluating time expressions, events, and temporal relationsGoogle Scholar
  24. 24.
    Wang, Y., Dylla, M., Spaniol, M., Weikum, G.: Coupling label propagation and constraints for temporal fact extraction. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 233–237. Association for Computational Linguistics (2012)Google Scholar
  25. 25.
    Wang, Y., Yang, B., Qu, L., Spaniol, M., Weikum, G.: Harvesting facts from textual web sources by constrained label propagation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 837–846. ACM (2011)Google Scholar
  26. 26.
    Wang, Y., Zhu, M., Qu, L., Spaniol, M., Weikum, G.: Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 697–700. ACM (2010)Google Scholar
  27. 27.
    Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 481–492. ACM (2012)Google Scholar
  28. 28.
    Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: TextRunner: open information extraction on the web. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 25–26. Association for Computational Linguistics (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Information Technology and Electrical EngineeringThe University of QueenslandBrisbaneAustralia

Personalised recommendations