Skip to main content

Big Consumer Behavior Data and their Analytics: Some Challenges and Solutions

  • Conference paper
  • First Online:
Book cover Finding New Ways to Engage and Satisfy Global Customers (AMSWMC 2018)

Abstract

This chapter contributes to the still very reduced marketing literature that deals with big consumer behavior data using cloud analytics by summarizing some of the main extant academic researches and by introducing new applications, datasets, and technologies in order to complete the picture. Both internal “purchase history” and external Web-based customer reviews and social media data are discussed, organized, and analyzed. They cover volume and variety aspects that define big data and uncover analytic complexities that need to be dealt with.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Each review consists of the following labels: (1) reviewerID: the ID of the reviewer; (2) asin: the product ID of the item being reviewed; (3) reviewerName: the name of the reviewer; (4) Helpful: the first number is the amount of people who voted the review as being helpful and the second number is the amount of people who voted on the review; (5) reviewText: the entire review in text form; (6) overall: the rating out of 5 that the reviewer gave the product; (7) summary: a shortened version of the review; (8) unixReviewTime: time of the review; (9) reviewTime: time of the review in dd/mm/yyyy.

  2. 2.

    Selecting relevant tweets demands the use of four identifiers: (1) name of the show (e.g., Breaking Bad);14; (2) official Twitter account of the show (e.g., @TwoHalfMen_CBS); (3) a list of hashtags associated with the show (e.g., #AskGreys); and (4) the characters’ names on the show (e.g., Sheldon Cooper)

  3. 3.

    Google Trends provides total search volume for a particular search item. For the TV series data, one can use the name of the show (e.g., Two and a Half Men) and character names on the show (e.g., Walden Schmidt) as the keywords.

  4. 4.

    Many of the Wikipedia editors are committed followers of TV and edit-related articles earlier than the show’s release date. Wikipedia edits or views may be good predictors of TV ratings.

  5. 5.

    Consumers also post reviews on discussion forums such as the IMDB, chosen here because it has the highest Web traffic ranking (according to Alexa) among all TV show-related sites.

  6. 6.

    Consumers may also be driven to watch TV series by news articles. Huffington Post is a site that offers news, blogs covering entertainment, politics, etc. It ran 26th on Alexa as of January 29, 2015.

  7. 7.

    http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html

  8. 8.

    For the example illustrated in figure 1, the brand smartcar had 11,052 followers, out of which 953 (8.6%) were also followers of environmental friendliness exemplars.

  9. 9.

    This is analogous to the “inverse document frequency” adjustment used in information retrieval to encourage documents containing rare query terms to be ranked higher than documents containing common query terms (Manning et al. 2008).

  10. 10.

    Hadoop is an open-source software framework that allows the distributed processing of large datasets across clusters of computers. It contains (1) the Hadoop Common package, which provides file system and OS-level abstraction; (2) Yarn, a MapReduce engine; and (3) the Hadoop Distributed File System. These mechanisms automatically break down jobs into distributed tasks, schedule jobs, and tasks efficiently at participating cluster nodes, and tolerate data and task failures.

  11. 11.

    The same applies to that is also needed when estimating linear regression

  12. 12.

    For quantitative methods and model builders this privilege in our opinion is only reserved to pure “creators of mathematics.”

References

  • Albescu, F., & Pugna, I. B. (2014). Marketing intelligence—The last frontier of business information technologies. Romanian Journal of Marketing, 3, 55–68.

    Google Scholar 

  • Bello-Orgaza, G., Jungb, J. J., & Camachoa, D. (2016). Social big data: Recent achievements and new challenges. Information Fusion, 28, 45–59.

    Article  Google Scholar 

  • Benson, A. R., Gleich D. F. & Demmel J. (2013). Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures, 2013 IEEE International Conference on Big Data, October 6–9, TBD Silicon Valley.

    Google Scholar 

  • Beyer, M. A., & Laney, D. (2012). The importance of ‘big data’: A definition. Stamford, CT: Gartner.

    Google Scholar 

  • Bradley, J. (2016). Apache® Spark™ MLlib: From Quick Start to Scikit-Learn. Retrieved October, 2017, from http://go.databricks.com/spark-mllib-from-quick-start-to-scikit-learn.

  • Culotta, A., & Cutler, J. (2016). Mining brand perceptions from twitter social networks. Marketing Science, 35(3), 343–362.

    Article  Google Scholar 

  • Davenport, T., & Patil, D. (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(10), 70–76.

    Google Scholar 

  • Dean, J. & Ghemawat, S. (2004, December). MapReduce: Simplified data processing on large clusters, OSDI'04: Sixth symposium on operating system design and implementation, San Francisco, CA.

    Google Scholar 

  • Forrester, (2011). Expand your digital horizon with big data. Forrester. Retrieved May 27 from http://www.asterdata.com/newsletter-images/30-04-2012/resources/Forrester_Expand_Your_Digital_Horiz.pdf Accessed July 7, 2017.

  • Goes, P. (2014). Big data and IS research. MIS Quarterly, 38(3), III–VIII.

    Google Scholar 

  • Halko, N. P. (2012). Randomized methods for computing low-rank approximations of matrices. Unpublished doctoral dissertation, University of Colorado, Boulder.

    Google Scholar 

  • IBM. (2011) From stretched to strengthened—Insights from a global CMO study. Retrieved September 17, 2015, from http://www.ibm.com/services/us/cmo/cmostudy2011/downloads.html.

  • Laney, D. (2001). 3D data management: Controlling data volume, velocity, and variety, technical report. Retrieved October, 2017, from https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf.

  • Lilien, G. L., & Rangaswamy, A. (2000). Modeled to bits: Decision models for the digital, networked economy. International Journal of Research in Marketing, 17, 227–235.

    Article  Google Scholar 

  • Liu, X., Singh, P. V., & Srinivasan, K. (2016). A structured analysis of unstructured big data by leveraging cloud computing. Marketing Science, 35(3), 363–388.

    Article  Google Scholar 

  • Martin, L. & Pu, P. (2014). Prediction of helpful reviews using emotions extraction. AAAI Publications.

    Google Scholar 

  • McAuley, J., Pandey, R. & Leskovec J. (2015). Inferring networks of substitutable and complementary products, KDD '15 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.

    Google Scholar 

  • Odersky, M., Spoon L., Venners B. (2011), Programming in Scala. In A comprehensive step-by-step guide (2nd ed) (January 4, 2011), Artima Inc.

    Google Scholar 

  • Rust, R. T., & Huang, M. H. (2014). The service revolution and the transformation of marketing science. Marketing Science, 33(2), 206–221.

    Google Scholar 

  • Sanders, N. R. (2016). How to use big data to drive your supply chain. California Management Review, 58(3), 26–48.

    Article  Google Scholar 

  • Wedel, M., & Kannan, P. K. (2016). Marketing analytics for data-rich environments. Journal of Marketing, 80(6), 97–121.

    Article  Google Scholar 

  • Wilkinson, D. (2013). Scala as a platform for statistical computing and data science. Retrieved October, 2017, from https://darrenjw.wordpress.com/2013/12/23/scala-as-a-platform-for-statistical-computing-and-data-science/

  • Xu, Z., Frankwick, G. L., & Ramirez, E. (2016). Effects of big data analytics and traditional marketing analytics on new product success: A knowledge fusion perspective. Journal of Business Research, 69(5), 1562–1566.

    Article  Google Scholar 

  • Zaharia, M. (2014). An architecture for fast and general data processing on large clusters, University of California at Berkeley, Technical Report No. UCB/EECS-2014-12.

    Google Scholar 

  • Zaharia, M., Chowdhury M., Das T., Dave A., Ma J., McCauley M., Franklin M. J., Shenker S., Stoica I. (2012, April). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, NSDI 2012.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihai Calciu .

Editor information

Editors and Affiliations

Appendix

Appendix

Listing 2 Measuring customer sentiment on the Amazon Reviews Dataset*

1. import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}

2. import org.apache.spark.ml.regression._

3. import org.apache.spark.ml.{Pipeline, PipelineModel}

4. import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator}

5. import org.apache.spark.ml.evaluation.RegressionEvaluator

6. //Load dataset and cache it

7. val data = spark.read.json(/media/storage1/reviews-train.json).cache()

8. //Define a pipeline combining text feature extractors + linear regression

9. val tokenizer = new Tokenizer().setInputCol("reviewText").setOutputCol("words")

10. val hashingTF = new HashingTF().setInputCol("words").setOutputCol("features")

11. val lasso = new LinearRegression().setLabelCol("overall").setElasticNetParam(1.0). setMaxIter(100)

12. val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lasso))

13. val paramGrid = new ParamGridBuilder().addGrid(lasso.regParam, Array(0.005, 0.01, 0.05)).build()

14. //Define evaluation metric

15. val evaluator = new RegressionEvaluator().setLabelCol("overall").setMetricName("r2")

16. val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(evaluator).setEstimatorParamMaps(paramGrid)

17. //Run everything!

18. val cvModel = cv.fit(data)

19. //Evaluate on test data:

20. val test = spark.read.json("/media/storage1/reviews-test.json")

21. var r2 = evaluator.evaluate(cvModel.transform(test))

22. println("Test data R^2 score:", r2)

23. val sparkPredictions = cvModel.transform(test)

24. sparkPredictions.write.format("json").mode("overwrite").save(/media/storage1/predictions.json)

*The listing is adapted by us to Scala from a Python version (Bradley 2016)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Academy of Marketing Science

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Calciu, M., Moulins, JL., Salerno, F. (2019). Big Consumer Behavior Data and their Analytics: Some Challenges and Solutions. In: Rossi, P., Krey, N. (eds) Finding New Ways to Engage and Satisfy Global Customers. AMSWMC 2018. Developments in Marketing Science: Proceedings of the Academy of Marketing Science. Springer, Cham. https://doi.org/10.1007/978-3-030-02568-7_13

Download citation

Publish with us

Policies and ethics