Skip to main content

CPT+: Decreasing the Time/Space Complexity of the Compact Prediction Tree

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9078))

Included in the following conference series:

Abstract

Predicting next items of sequences of symbols has many applications in a wide range of domains. Several sequence prediction models have been proposed such as DG, All-k-order markov and PPM. Recently, a model named Compact Prediction Tree (CPT) has been proposed. It relies on a tree structure and a more complex prediction algorithm to offer considerably more accurate predictions than many state-of-the-art prediction models. However, an important limitation of CPT is its high time and space complexity. In this article, we address this issue by proposing three novel strategies to reduce CPT’s size and prediction time, and increase its accuracy. Experimental results on seven real life datasets show that the resulting model (CPT+) is up to 98 times more compact and 4.5 times faster than CPT, and has the best overall accuracy when compared to six state-of-the-art models from the literature: All-K-order Markov, CPT, DG, Lz78, PPM and TDAG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Begleiter, R., El-yaniv, R., Yona, G.: On prediction using variable order markov models. Journal of Artificial Intelligence Research 22, 385–421 (2004)

    MATH  MathSciNet  Google Scholar 

  2. Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Trans. on Inform. Theory 24(4), 413–421 (1984)

    MathSciNet  Google Scholar 

  3. Deshpande, M., Karypis, G.: Selective markov models for predicting web page accesses. ACM Transactions on Internet Technology 4(2), 163–184 (2004). https://www.developers.google.com/prediction, Accessed: 2014–02-15

    Article  Google Scholar 

  4. Gopalratnam, K., Cook, D.J.: Online sequential prediction via incremental parsing: The active lezi algorithm. IEEE Intelligent Systems 22(1), 52–58 (2007)

    Article  Google Scholar 

  5. Gueniche, T., Fournier-Viger, P., Tseng, V.S.: Compact prediction tree: a lossless model for accurate sequence prediction. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013, Part II. LNCS, vol. 8347, pp. 177–188. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Fournier-Viger, P., Gueniche, T., Tseng, V.S.: Using partially-ordered sequential rules to generate more accurate sequence prediction. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS, vol. 7713, pp. 431–442. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Laird, P., Saul, R.: Discrete sequence prediction and its applications. Machine Learning 15(1), 43–68 (1994)

    MATH  Google Scholar 

  8. Padmanabhan, V.N., Mogul, J.C.: Using Prefetching to Improve World Wide Web Latency. Computer Communications 16, 358–368 (1998)

    Google Scholar 

  9. Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans. Known. Data Engin. 16(11), 1424–1440 (2004)

    Article  Google Scholar 

  10. Pitkow, J., Pirolli, P.: Mining longest repeating subsequence to predict world wide web surng. In: Proc. 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, pp. 13–25 (1999)

    Google Scholar 

  11. Sun, R., Giles, C.L.: Sequence Learning: From Recognition and Prediction to Sequential Decision Making. IEEE Intelligent Systems 16(4), 67–70 (2001)

    Article  Google Scholar 

  12. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Fournier-Viger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Gueniche, T., Fournier-Viger, P., Raman, R., Tseng, V.S. (2015). CPT+: Decreasing the Time/Space Complexity of the Compact Prediction Tree. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18032-8_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18031-1

  • Online ISBN: 978-3-319-18032-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics