Advertisement

Sqn2Vec: Learning Sequence Representation via Sequential Patterns with a Gap Constraint

  • Dang NguyenEmail author
  • Wei Luo
  • Tu Dinh Nguyen
  • Svetha Venkatesh
  • Dinh Phung
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)

Abstract

When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.

Notes

Acknowledgment

Dinh Phung and Tu Dinh Nguyen gratefully acknowledge the partial support from the Australian Research Council (ARC).

Supplementary material

478890_1_En_34_MOESM1_ESM.pdf (214 kb)
Supplementary material 1 (pdf 213 KB)

References

  1. 1.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)CrossRefGoogle Scholar
  2. 2.
    Chen, M.: Efficient vector representation for documents through corruption. In: ICLR (2017)Google Scholar
  3. 3.
    De Smedt, J., Deeva, G., De Weerdt, J.: Behavioral constraint template-based sequence classification. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 20–36. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-71246-8_2CrossRefGoogle Scholar
  4. 4.
    Egho, E., Gay, D., Boullé, M., Voisine, N., Clérot, F.: A user parameter-free approach for mining robust sequential classification rules. Knowl. Inf. Syst. 52(1), 53–81 (2017)CrossRefGoogle Scholar
  5. 5.
    Fowkes J., Sutton, C.: A subsequence interleaving model for sequential pattern mining. In: KDD, pp. 835–844 (2016)Google Scholar
  6. 6.
    Fradkin, D., Mörchen, F.: Mining sequential patterns for classification. Knowl. Inf. Syst. 45(3), 731–749 (2015)CrossRefGoogle Scholar
  7. 7.
    Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD, pp. 855–864 (2016)Google Scholar
  8. 8.
    Jin, L., Schuler, W.: A comparison of word similarity performance using explanatory and non-explanatory texts. In: NACACL, pp. 990–994 (2015)Google Scholar
  9. 9.
    Lam, H.T., Mörchen, F., Fradkin, D., Calders, T.: Mining compressing sequential patterns. Stat. Anal. Data Mining ASA Data Sci. J. 7(1), 34–52 (2014)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)Google Scholar
  11. 11.
    Van Der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  12. 12.
    Mäntyjärvi, J., Himberg, J., Kangas, P., Tuomela, U., Huuskonen, P.: Sensor signal data set for exploring context recognition of mobile devices. In: PerCom, pp. 18–23 (2004)Google Scholar
  13. 13.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)Google Scholar
  14. 14.
    Nguyen, D., Luo, W., Nguyen, T.D., Venkatesh, S., Phung, D.: Learning graph representation via frequent subgraphs. In: SDM, pp. 306–314 (2018)Google Scholar
  15. 15.
    Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: ACL, pp. 1702–1712 (2015)Google Scholar
  16. 16.
    Tai, K.S., Socher, R., Manning, C.: Improved semantic representations from tree-structured long short-term memory networks. In: ACL, pp. 1556–1566 (2015)Google Scholar
  17. 17.
    Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: ICDE, pp. 79–90 (2004)Google Scholar
  18. 18.
    Zaki, M., Meira, W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2014)CrossRefGoogle Scholar
  19. 19.
    Zhou, C., Cule, B., Goethals, B.: Pattern based sequence classification. IEEE Trans. Knowl. Data Eng. 28(5), 1285–1298 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Dang Nguyen
    • 1
    Email author
  • Wei Luo
    • 1
  • Tu Dinh Nguyen
    • 1
  • Svetha Venkatesh
    • 1
  • Dinh Phung
    • 2
  1. 1.Center for Pattern Recognition and Data Analytics, School of Information TechnologyDeakin UniversityGeelongAustralia
  2. 2.Faculty of Information TechnologyMonash UniversityMelbourneAustralia

Personalised recommendations