Skip to main content

Datasets for Business and Consumer Analytics

  • Chapter
  • First Online:
Book cover Business and Consumer Analytics: New Ideas

Abstract

This extended appendix provides information, summaries and methodological details for publicly available datasets and, in particular, those used by various authors throughout this volume. Each of these datasets are publicly available and readers are highly encouraged to investigate these datasets for themselves for the continued journey and challenge of finding new ideas in business and consumer analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://jmcauley.ucsd.edu/data/amazon/links.html.

  2. 2.

    http://www.json.org.

  3. 3.

    https://docs.mongodb.com.

  4. 4.

    Amazon Standard Identification Numbers.

  5. 5.

    http://www.versace.com.

  6. 6.

    https://en.wikipedia.org/wiki/Versace.

  7. 7.

    Check for the file ‘churn.txt’ found inside the compressed file at: http://dataminingconsultant.com/DKD2e_data_sets.zip.

  8. 8.

    https://archive.ics.uci.edu/ml/datasets.html.

  9. 9.

    http://marvel.com/comics.

  10. 10.

    http://www-personal.umich.edu/~mejn/netdata/lesmis.zip.

  11. 11.

    https://doi.org/10.6084/m9.figshare.1573032.v1.

  12. 12.

    https://www.macalester.edu/{~}abeverid/thrones.html.

  13. 13.

    https://networkofthrones.wordpress.com.

  14. 14.

    https://grouplens.org/datasets/movielens/.

  15. 15.

    https://webscope.sandbox.yahoo.com.

  16. 16.

    Note: most researchers call them ‘vectors’ but ‘array’ is probably more correct.

  17. 17.

    https://www.kaggle.com.

  18. 18.

    https://gist.github.com/entaroadun/1653794.

References

  1. Ricardo Alberich, Joe Miro-Julia, and Francesc Rosselló. Marvel universe looks almost like a real social network. arXiv preprint cond-mat/0202174, 2002.

    Google Scholar 

  2. James P. Bagrow and Erik M. Bollt. Local method for detecting communities. Phys. Rev. E, 72:046108, Oct 2005.

    Article  Google Scholar 

  3. Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4):547–553, 2009.

    Article  Google Scholar 

  4. Les Daniels. Marvel: Five fabulous decades of the world’s greatest comics, volume 1. Harry N Abrams Inc, 1991.

    Google Scholar 

  5. Natalie Jane de Vries, Jamie Carlson, and Pablo Moscato. A data-driven approach to reverse engineering customer engagement models: Towards functional constructs. PLOS ONE, 9:e102768, 2014.

    Article  Google Scholar 

  6. Natalie Jane de Vries and Jamie Carlson. Examining the drivers and brand performance implications of customer engagement with brands in the social media environment. Journal of Brand Management, 21(6):495–515, 2014.

    Article  Google Scholar 

  7. Natalie Jane de Vries, Rodrigo Reis, and Pablo Moscato. Clustering consumers based on trust, confidence and giving behaviour: Data-driven model building for charitable involvement in the Australian not-for-profit sector. PLOS ONE, 10(4):1–28, 04 2015.

    Google Scholar 

  8. Ademir C. Gabardo, Regina Berretta, Natalie Jane de Vries, and Pablo Moscato. Where does my brand end? An overlapping community approach. In Intelligent and Evolutionary Systems: The 20th Asia Pacific Symposium, IES 2016, Canberra, Australia, November 2016, Proceedings, pages 133–148. Springer, 2017.

    Google Scholar 

  9. David F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pages 597–605, New York, NY, USA, 2012. ACM.

    Google Scholar 

  10. F. Maxwell Harper and Joseph A. Konstan. The MovieLens datasets: History and context. TiiS, 5(4):19:1–19:19, 2016.

    Article  Google Scholar 

  11. Raghav Pavan Karumur, Tien T. Nguyen, and Joseph A. Konstan. Exploring the value of personality in predicting rating behaviors: A study of category preferences on MovieLens. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, September 15–19, 2016, pages 139–142, 2016.

    Google Scholar 

  12. Daniel T. Larose. Discovering Knowledge in Data: An Introduction to Data Mining. Wiley-Interscience, 2004.

    Book  Google Scholar 

  13. Babak Loni, Lei Yen Cheung, Michael Riegler, Alessandro Bozzon, Luke Gottlieb, and Martha Larson. Fashion 10000: an enriched social image dataset for fashion and clothing. In Multimedia Systems Conference 2014, MMSys ’14, Singapore, March 19–21, 2014, pages 41–46, 2014.

    Google Scholar 

  14. Julian McAuley, Rahul Pandey, and Jure Leskovec. Inferring networks of substitutable and complementary products. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. ACM, 2015.

    Google Scholar 

  15. Julian McAuley and Alex Yang. Addressing complex and subjective product-related queries with customer reviews. arXiv preprint arXiv:1512.06863, 2015.

    Google Scholar 

  16. Leila Moslemi Naeni, Natalie Jane de Vries, Rodrigo Reis, Ahmed Shamsul Arefin, Regina Berretta, and Pablo Moscato. Identifying communities of trust and confidence in the charity and not-for-profit sector: A memetic algorithm approach. In 2014 IEEE Fourth International Conference on Big Data and Cloud Computing, BDCloud 2014, Sydney, Australia, December 3–5, 2014, pages 500–507, 2014.

    Google Scholar 

  17. Ruba Obiedat, Mouhammd Alkasassbeh, Hossam Faris, and Osama Harfoushi. Customer churn prediction using a hybrid genetic programming approach. Scientific Research and Essays, 8:1289–1295, 2013.

    Google Scholar 

  18. John O’Donovan, Shinsuke Nakajima, Tobias Höllerer, Mayumi Ueda, Yuuki Matsunami, and Byungkyu Kang. A cross-cultural analysis of explanations for product reviews. In Peter Brusilovsky, editor, Proceedings of the Joint Workshop on Interfaces and Human Decision Making for Recommender Systems co-located with ACM Conference on Recommender Systems, Boston, MA, USA, Sep 2016.

    Google Scholar 

  19. Ioannis Psorakis, Stephen Roberts, Mark Ebden, and Ben Sheldon. Overlapping community detection using Bayesian non-negative matrix factorization. Phys. Rev. E, 83:066114, Jun 2011.

    Article  Google Scholar 

  20. Mengting Wan and Julian McAuley. Modeling ambiguity, subjectivity, and diverging viewpoints in opinion question answering systems. In preprint arXiv, 2016.

    Google Scholar 

  21. Hongning Wang, Yue Lu, and Chengxiang Zhai. Latent aspect rating analysis on review text data: a rating regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25–28, 2010, pages 783–792, 2010.

    Google Scholar 

  22. Hongning Wang, Yue Lu, and ChengXiang Zhai. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 21–24, 2011, pages 618–626, 2011.

    Google Scholar 

  23. An Zeng, Stanislao Gualdi, Matus Medo, and Yi-Cheng Zhang. Trend prediction in temporal bipartite networks: the case of MovieLens, Netflix, and Digg. Advances in Complex Systems, 16(4–5), 2013.

    Google Scholar 

  24. Adnan Amin, Saeed Shehzad, Changez Khan, Imtiaz Ali, and Sajid Anwar. Churn Prediction in Telecommunication Industry Using Rough Set Approach, pages 83–95. Springer International Publishing, Cham, 2015.

    Google Scholar 

  25. Jason Auerbach, Joel Galenson, and Mukund Sundararajan. An empirical analysis of return on investment maximization in sponsored search auctions. In Proceedings of the 2Nd International Workshop on Data Mining and Audience Intelligence for Advertising, ADKDD ’08, pages 1–9, New York, NY, USA, 2008. ACM.

    Google Scholar 

  26. Linas Baltrunas and Francesco Ricci. Locally adaptive neighborhood selection for collaborative filtering recommendations. In Wolfgang Nejdl, Judy Kay, Pearl Pu, and Eelco Herder, editors, Adaptive Hypermedia and Adaptive Web-Based Systems, 5th International Conference, AH 2008, Hannover, Germany, July 29 - August 1, 2008. Proceedings, volume 5149 of Lecture Notes in Computer Science, pages 22–31. Springer, 2008.

    Google Scholar 

  27. Linas Baltrunas and Francesco Ricci. Item weighting techniques for collaborative filtering. In Bettina Berendt, Dunja Mladenic, Marco de Gemmis, Giovanni Semeraro, Myra Spiliopoulou, Gerd Stumme, Vojtech Svátek, and Filip Zelezný, editors, Knowledge Discovery Enhanced with Semantic and Social Information, volume 220 of Studies in Computational Intelligence, pages 109–126. 2009.

    Google Scholar 

  28. Andrew Beveridge and Jie Shan. Network of thrones. Math Horizons, 23(4):18–22, 2016.

    Article  MathSciNet  Google Scholar 

  29. Tilman Börgers, Ingemar Cox, Martin Pesendorfer, and Vaclav Petricek. Equilibrium bids in sponsored search auctions: Theory and evidence. American Economic Journal: Microeconomics, 5(4):163–187, 2013.

    Google Scholar 

  30. Travis Ebesu and Yi Fang. Neural semantic personalized ranking for item cold-start recommendation. Inf. Retr. Journal, 20(2):109–131, 2017.

    Article  Google Scholar 

  31. Benjamin Edelman and Michael Ostrovsky. Strategic bidder behavior in sponsored search auctions. Decision Support Systems, 43(1):192–198, 2007.

    Article  Google Scholar 

  32. Tim Evans. Information on Les Miserables network used in Evans and Lambiotte 2010. 10 2015.

    Google Scholar 

  33. Tim S Evans and Renaud Lambiotte. Line graphs of weighted networks for overlapping communities. The European Physical Journal B-Condensed Matter and Complex Systems, 77(2):265–272, 2010.

    Article  Google Scholar 

  34. Jing Gao, Wei Fan, Yizhou Sun, and Jiawei Han. Heterogeneous source consensus learning via decision propagation and negotiation. In John F. Elder IV, Françoise Fogelman-Soulié, Peter A. Flach, and Mohammed Javeed Zaki, editors, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, June 28 - July 1, 2009, pages 339–348. ACM, 2009.

    Google Scholar 

  35. Mohammad Nazmul Haque, Luke Mathieson, and Pablo Moscato. A memetic algorithm for community detection by maximising the Connected Cohesion. In Proceedings of 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Hawaii, USA, 2017. IEEE.

    Google Scholar 

  36. Rachsuda Jiamthapthaksin, Christoph F. Eick, and Ricardo Vilalta. A framework for multi-objective clustering and its application to co-location mining. In Ronghuai Huang, Qiang Yang, Jian Pei, João Gama, Xiaofeng Meng, and Xue Li, editors, Advanced Data Mining and Applications, 5th International Conference, ADMA 2009, Beijing, China, August 17–19, 2009. Proceedings, volume 5678 of Lecture Notes in Computer Science, pages 188–199. Springer, 2009.

    Google Scholar 

  37. Andrea Lancichinetti, Filippo Radicchi, and José J Ramasco. Statistical significance of communities in networks. Physical Review E, 81(4):046110, 2010.

    Google Scholar 

  38. E. Lima, C. Mues, and B. Baesens. Domain knowledge integration in data mining using decision tables: case studies in churn prediction. Journal of the Operational Research Society, 60(8):1096–1106, Aug 2009.

    Article  Google Scholar 

  39. Dianbo Liu and Luca Albergante. Balance of thrones: a network study on ‘Game of Thrones’. arXiv preprint arXiv:1707.05213, 2017.

    Google Scholar 

  40. Jian Liu and Tingzhan Liu. Detecting community structure in complex networks using simulated annealing with k-means algorithms. Physica A: Statistical Mechanics and its Applications, 389(11):2300–2309, 2010.

    Article  Google Scholar 

  41. George R.R. Martin. A Song of Ice and Fire, Book Three: A Storm of Swords. HarperCollins Publishers, New York, NY, 2011.

    Google Scholar 

  42. Ananth Mohan, Zheng Chen, and Kilian Weinberger. Web-search ranking with initialized gradient boosted regression trees. In Proceedings of the 2010 International Conference on Yahoo! Learning to Rank Challenge - Volume 14, YLRC’10, pages 77–89. JMLR.org, 2010.

    Google Scholar 

  43. Stephen Rodriguez and Heechang Shin. Developing customer churn models for customer relationship management. International Journal of Business Continuity and Risk Management, 4(4):302–322, 2013.

    Article  Google Scholar 

  44. Stephanie Rosenthal, Manuela M. Veloso, and Anind K. Dey. Online selection of mediated and domain-specific predictions for improved recommender systems. In Sarabjot S. Anand, Bamshad Mobasher, Alfred Kobsa, and Dietmar Jannach, editors, Proceedings of the 7th Workshop on Intelligent Techniques for Web Personalization & Recommender Systems (ITWP’09), Pasadena, California, USA, July 11–17, 2009 in conjunction with the 21st International Joint Conference on Artificial Intelligence - IJCAI 2009, volume 528 of CEUR Workshop Proceedings. CEUR-WS.org, 2009.

    Google Scholar 

  45. Sebastian Schelter, Christoph Boden, and Volker Markl. Scalable similarity-based neighborhood methods with MapReduce. In Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ’12, pages 163–170, New York, NY, USA, 2012. ACM.

    Google Scholar 

  46. Sebastian Schelter, Christoph Boden, Martin Schenck, Alexander Alexandrov, and Volker Markl. Distributed matrix factorization with MapReduce using a series of broadcast-joins. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13, pages 281–284, New York, NY, USA, 2013. ACM.

    Google Scholar 

  47. Sebastian Schelter, Stephan Ewen, Kostas Tzoumas, and Volker Markl. “all roads lead to Rome”: Optimistic recovery for distributed iterative data processing. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13, pages 1919–1928, New York, NY, USA, 2013. ACM.

    Google Scholar 

  48. Anuj Sharma, Dr Panigrahi, and Prabin Kumar. A neural network based approach for predicting customer churn in cellular network services. arXiv preprint arXiv:1309.3945, 2013.

    Google Scholar 

  49. Wouter Verbeke, David Martens, Christophe Mues, and Bart Baesens. Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Systems with Applications, 38(3):2354–2364, 2011.

    Article  Google Scholar 

  50. Zhixiang Xu, Matt Kusner, Gao Huang, and Kilian Weinberger. Anytime representation learning. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1076–1084, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.

    Google Scholar 

  51. Zhixiang Xu, Matt J. Kusner, Kilian Q. Weinberger, and Minmin Chen. Cost-sensitive tree of classifiers. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, pages I–133–I–141. JMLR.org, 2013.

    Google Scholar 

  52. Zhixiang Eddie Xu, Matt J. Kusner, Kilian Q. Weinberger, Minmin Chen, and Olivier Chapelle. Classifier cascades and trees for minimizing feature evaluation cost. Journal of Machine Learning Research, 15(1):2113–2144, 2014.

    Google Scholar 

  53. Zhixiang Eddie Xu, Kilian Q. Weinberger, and Olivier Chapelle. The greedy miser: Learning under test-time budgets. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress, 2012.

    Google Scholar 

  54. Mümin Yildiz and Songül Albayrak. Customer churn prediction in telecommunication with rotation forest method. In DBKDA 2017, The Ninth International Conference on Advances in Databases, Knowledge, and Data Applications, pages 26–29, 2017.

    Google Scholar 

Download references

Acknowledgements

P.M. acknowledges this research by The University of Newcastle, and previous funding from the Australian Research Council grants Future Fellowship FT120100060 and Discovery Project DP140104183. P.M. also thanks M.N. Haque and A. Gabardo for discussions and in particular for verifying that all links cited in this chapter were operational at the time of completion. The authors would also like to thank all the publishers of the publicly available datasets and the authors and editors for supporting the open sharing of data and information.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalie Jane de Vries .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

de Vries, N.J., Moscato, P. (2019). Datasets for Business and Consumer Analytics. In: Moscato, P., de Vries, N. (eds) Business and Consumer Analytics: New Ideas. Springer, Cham. https://doi.org/10.1007/978-3-030-06222-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-06222-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-06221-7

  • Online ISBN: 978-3-030-06222-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics