Advertisement

Temporal Feature Space for Text Classification

  • Stefano Giovanni RizzoEmail author
  • Danilo Montesi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10415)

Abstract

In supervised learning algorithms for text classification the text content is usually represented using the frequencies of the words it contains, ignoring their semantic and their relationships. Words within temporal expressions such as “ today ” or “ last February ” are particularly affected by this simplification: the same expression can have a different semantic in documents with different timestamps, while different expressions could refer to the same time. After extracting temporal expressions in documents, we model a set of temporal features derived from the time mentioned in the document, showing the relation between these features and the belonging category. We test our temporal approach on a subset of the New York Times corpus showing a significant improvement over the text-only baseline.

Keywords

Temporal features Automatic text classification Semantic annotation 

References

  1. 1.
    Berberich, K., Bedathur, S., Alonso, O., Weikum, G.: A language modeling approach for temporal information needs. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 13–25. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-12275-0_5 CrossRefGoogle Scholar
  2. 2.
    Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS, vol. 3932, pp. 149–166. Springer, Heidelberg (2006). doi: 10.1007/11899402_10 CrossRefGoogle Scholar
  3. 3.
    Brucato, M., Montesi, D.: Metric spaces for temporal information retrieval. In: Rijke, M., Kenter, T., Vries, A.P., Zhai, C.X., Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 385–397. Springer, Cham (2014). doi: 10.1007/978-3-319-06028-6_32 CrossRefGoogle Scholar
  4. 4.
    Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 15 (2015)Google Scholar
  5. 5.
    Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)Google Scholar
  6. 6.
    Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., Motta, E.: Semantically enhanced information retrieval: an ontology-based approach. Web Semant. Sci. Serv. Agents World Wide Web 9(4), 434–452 (2011). JWS special issue on Semantic SearchGoogle Scholar
  7. 7.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  8. 8.
    Jatowt, A., Au Yeung, C.M., Tanaka, K.: Estimating document focus time. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, pp. 2273–2278. ACM (2013)Google Scholar
  9. 9.
    Kara, S., Alan, Ö., Sabuncu, O., Akpinar, S., Cicekli, N.K., Alpaslan, F.N.: An ontology-based retrieval system using semantic indexing. Inf. Syst. 37(4), 294–305 (2012). Semantic Web Data ManagementGoogle Scholar
  10. 10.
    Moschitti, A., Basili, R.: Complex linguistic features for text classification: a comprehensive study. In: McDonald, S., Tait, J. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-24752-4_14 CrossRefGoogle Scholar
  11. 11.
    Radinsky, K., Agichtein, E., Gabrilovich, E., Markovitch, S.: A word at a time: computing word relatedness using temporal semantic analysis. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 337–346. ACM, New York (2011)Google Scholar
  12. 12.
    Strötgen, J., Alonso, O., Gertz, M.: Identification of top relevant temporal expressions in documents. In: Proceedings of the 2nd Temporal Web Analytics Workshop, pp. 33–40. ACM (2012)Google Scholar
  13. 13.
    Strötgen, J., Gertz, M.: Heideltime: High quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324. Association for Computational Linguistics (2010)Google Scholar
  14. 14.
    Vallet, D., Fernández, M., Castells, P.: An ontology-based information retrieval model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005). doi: 10.1007/11431053_31 CrossRefGoogle Scholar
  15. 15.
    Wu, S., Crestani, F.: Data fusion with estimated weights. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 648–651. ACM (2002)Google Scholar
  16. 16.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49. ACM (1999)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of BolognaBolognaItaly

Personalised recommendations