Skip to main content

Abstract

E-commerce analytics has received increased interest because of its importance in marketing, product evaluation and review, supply chain and logistics optimization, etc. The increased interest requires unique and scalable methods for clickstream analytics since the combination of the number of online users, retail stores and the volume of the available products is massive, due to the expanding internet usage and e-commerce over billions of mobile devices. In this chapter, a sequential frequent itemsets detection methodology (SAFID) is significantly modified and upgraded in order to solve the clickstream analytics problem by analyzing a composite dataset which simulates monthly holiday season traffic of the top hundred U.S. online retail shops. The dataset consists of more than ten billion records. It is shown that the methodology can perform the analysis very efficiently in a simple desktop. It can detect all the frequent and infrequent clickstream patterns which can provide valuable knowledge to marketers of online retail stores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C. Comben, The retail apocalypse and its knock-on effects on society (2018), https://www.moneymakers.com/the-retail-apocalypse-and-its-knock-on-effects-on-society/. Accessed 2 Apr 2019

  2. eMarketer, Worldwide retail ecommerce sales will reach $1.915 trillion this year (2016), https://www.emarketer.com/Article/Worldwide-Retail-Ecommerce-Sales-Will-Reach-1915-Trillion-This-Year/1014369. Accessed 20 May 2018

  3. Juniper Research, Online physical goods sales to account for 13% of $30 trillion retail market by 2020 (2018), https://www.businesswire.com/news/home/20180409005544/en/Juniper-Research%2D%2D-Online-Physical-Goods-Sales/. Accessed 20 May 2018

  4. T.N. Chandramohan, B. Ravindran, A neural attention based approach for clickstream mining, in Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (ACM, 2018), pp. 118–127

    Google Scholar 

  5. A.L. Montgomery, S. Li, K. Srinivasan, J.C. Liechty, Modeling online browsing and path analysis using clickstream data. Mark. Sci. 23(4), 579–595 (2004)

    Article  Google Scholar 

  6. J. Andersen, A. Giversen, A.H. Jensen, R.S. Larsen, T.B. Pedersen, J. Skyt, Analyzing clickstreams using subsessions, in Proceedings of the 3rd ACM International Workshop on Data Warehousing and OLAP (ACM, 2000), pp. 25–32

    Google Scholar 

  7. G. Wang, X. Zhang, S. Tang, H. Zheng, B.Y. Zhao, Unsupervised clickstream clustering for user behavior analysis, in Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (ACM, 2016), pp. 225–236

    Google Scholar 

  8. R. Agrawal, R. Srikant, Mining sequential patterns. ed. by P.S. Yu, A.S.P. Chen, in 11th International Conference on Data Engineering (ICDE’95) (IEEE Computer Society Press, Taipei, 1995), pp. 3–14

    Google Scholar 

  9. R. Srikant, R. Agrawal, Mining Sequential Patterns: Generalizations and Performance Improvements (Springer, Berlin, 1996), pp. 1–17

    Google Scholar 

  10. M.N. Garofalakis, R. Rastogi, K. Shim, SPIRIT: sequential pattern mining with regular expression constraints, in VLDB, vol. 99 (1999), pp. 7–10

    Google Scholar 

  11. M. Zhang, B. Kao, C.L. Yip, D. Cheung, A GSP-based efficient algorithm for mining frequent sequences, in Proceedings of IC-AI (2001), pp. 497–503

    Google Scholar 

  12. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.C. Hsu, FreeSpan: frequent pattern-projected sequential pattern mining, in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2000), pp. 355–359

    Google Scholar 

  13. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M.C. Hsu, Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth, in 2013 IEEE 29th International Conference on Data Engineering (ICDE) (IEEE Computer Society, 2001), pp. 0215–0215

    Google Scholar 

  14. M. Seno, G. Karypis, Lpminer: an algorithm for finding frequent itemsets using length-decreasing support constraint, in Data Mining. ICDM 2001, Proceedings IEEE International Conference on 2001 (IEEE, 2001), pp. 505–512

    Google Scholar 

  15. D.Y. Chiu, Y.H. Wu, A.L. Chen, An efficient algorithm for mining frequent sequences by a new strategy without support counting, in Data Engineering, 2004. Proceedings of 20th International Conference on (IEEE, 2004), pp. 375–386

    Google Scholar 

  16. J. Yin, Z. Zheng, L. Cao, USpan: an efficient algorithm for mining high utility sequential patterns, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2012), pp. 660–668

    Google Scholar 

  17. M. Zihayat, C.W. Wu, A. An, V.S. Tseng, Mining high utility sequential patterns from evolving data streams, in Proceedings of the ASE Big Data & Social Informatics (ACM, 2015), p. 52

    Google Scholar 

  18. G. Wang, X. Zhang, S. Tang, C. Wilson, H. Zheng, B.Y. Zhao, Clickstream user behavior models. ACM Trans. Web (TWEB) 11(4), 21–37 (2017)

    Google Scholar 

  19. T. Sun, M. Wang, L. Liang, Predictive modeling of potential customers based on the customers clickstream data: a field study, in Industrial Engineering and Engineering Management (IEEM), 2017 IEEE International Conference on (IEEE, 2017), pp. 2221–2225

    Google Scholar 

  20. L. Wu, D. Hu, L. Hong, H. Liu, Turning clicks into purchases: revenue optimization for product search in e-commerce, in Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, July 8–12, 2018 (SIGIR’18), 10 pages

    Google Scholar 

  21. D. Sevitt, Holiday Season 2018: Thanksgiving and Black Friday Numbers Are In! (2018). https://www.similarweb.com/blog/holiday-season-2018-thanksgiving-black-friday-numbers. Accessed 2 Apr 2019

  22. D. Sevitt, Holiday Season 2018: What’s the Deal with Cyber Monday? (2018). https://www.similarweb.com/blog/holiday-season-2018-cyber-monday. Accessed 2 Apr 2019

  23. K.F. Xylogiannopoulos, P. Karampelas, R. Alhajj, Sequential all frequent itemsets detection: a method to detect all frequent sequential itemsets using LERP-reduced suffix array data structure and ARPaD algorithm, in Advances in Social Networks Analysis and Mining (ASONAM), 2015 IEEE/ACM International Conference on (IEEE, 2015), pp. 1141–1148

    Google Scholar 

  24. K.F. Xylogiannopoulos, P. Karampelas, R. Alhajj, Clickstream analytics: an experimental analysis of the Amazon users’ simulated monthly traffic, in Advances in Social Networks Analysis and Mining (ASONAM), 2018 IEEE/ACM International Conference on (IEEE, 2018), pp. 841–848

    Google Scholar 

  25. M. Scholz, R package clickstream: analyzing clickstream data with Markov chains. J. Stat. Softw. 74(4), 1–17 (2016)

    Article  ADS  Google Scholar 

  26. E. Heim, A. Seitel, J. Andrulis, F. Isensee, C. Stock, T. Ross, L. Maier-Hein, Clickstream analysis for crowd-based object segmentation with confidence. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2814–2826 (2018)

    Article  Google Scholar 

  27. Q. Su, L. Chen, A method for discovering clusters of e-commerce interest patterns using click-stream data. Electron. Commer. Res. Appl. 14(1), 1–13 (2015)

    Article  MathSciNet  Google Scholar 

  28. A. Banerjee, J. Ghosh, Clickstream clustering using weighted longest common subsequences, in Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining, vol. 143 (2001), p. 144

    Google Scholar 

  29. Y. Sun, C. Xin, Using coursera clickstream data to improve online education for software engineering, in Proceedings of the ACM Turing 50th Celebration Conference-China (ACM, 2017), pp. 16–22

    Google Scholar 

  30. D. Schellong, J. Kemper, M. Brettel, Clickstream Data as a Source to Uncover Con-Sumer Shopping Types in a Large-Scale Online Setting (2016)

    Google Scholar 

  31. R. Hanamanthrao, S. Thejaswini, Real-time clickstream data analytics and visualization, in Recent Trends in Electronics, Information & Communication Technology (RTEICT), 2017 2nd IEEE International Conference on (IEEE, 2017), pp. 2139–2144

    Google Scholar 

  32. K. Xylogiannopoulos, P. Karampelas, R. Alhajj, Analyzing very large time series using suffix arrays. Appl. Intell. 41(3), 941–955 (2014)

    Article  Google Scholar 

  33. K.F. Xylogiannopoulos, P. Karampelas, R. Alhajj, Repeated patterns detection in big data using classification and parallelism on LERP reduced suffix arrays. Appl. Intell. 45(3), 567–597 (2016)

    Article  Google Scholar 

  34. R. Agrawal, R. Srikant, Quest Synthetic Data Generator (IBM Almaden Research Center, San Jose, 2009)

    Google Scholar 

  35. K.F. Xylogiannopoulos, Data Structures, Algorithms and Applications for Big Data Analytics: Single, Multiple and all Repeated Patterns Detection in Discrete Sequences. PhD thesis, University of Calgary, 2017

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantinos F. Xylogiannopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R. (2020). Simplifying E-Commerce Analytics by Discovering Hidden Knowledge in Big Data Clickstreams. In: Kaya, M., Birinci, Åž., Kawash, J., Alhajj, R. (eds) Putting Social Media and Networking Data in Practice for Education, Planning, Prediction and Recommendation. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-33698-1_4

Download citation

Publish with us

Policies and ethics