Big Consumer Behavior Data and their Analytics: Some Challenges and Solutions

Calciu, Mihai; Moulins, Jean-Louis; Salerno, Francis

doi:10.1007/978-3-030-02568-7_13

Mihai Calciu⁴,
Jean-Louis Moulins⁵ &
Francis Salerno⁶

Part of the book series: Developments in Marketing Science: Proceedings of the Academy of Marketing Science ((DMSPAMS))

Included in the following conference series:

Academy of Marketing Science World Marketing Congress

404 Accesses

Abstract

This chapter contributes to the still very reduced marketing literature that deals with big consumer behavior data using cloud analytics by summarizing some of the main extant academic researches and by introducing new applications, datasets, and technologies in order to complete the picture. Both internal “purchase history” and external Web-based customer reviews and social media data are discussed, organized, and analyzed. They cover volume and variety aspects that define big data and uncover analytic complexities that need to be dealt with.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Each review consists of the following labels: (1) reviewerID: the ID of the reviewer; (2) asin: the product ID of the item being reviewed; (3) reviewerName: the name of the reviewer; (4) Helpful: the first number is the amount of people who voted the review as being helpful and the second number is the amount of people who voted on the review; (5) reviewText: the entire review in text form; (6) overall: the rating out of 5 that the reviewer gave the product; (7) summary: a shortened version of the review; (8) unixReviewTime: time of the review; (9) reviewTime: time of the review in dd/mm/yyyy.
2.
Selecting relevant tweets demands the use of four identifiers: (1) name of the show (e.g., Breaking Bad);14; (2) official Twitter account of the show (e.g., @TwoHalfMen_CBS); (3) a list of hashtags associated with the show (e.g., #AskGreys); and (4) the characters’ names on the show (e.g., Sheldon Cooper)
3.
Google Trends provides total search volume for a particular search item. For the TV series data, one can use the name of the show (e.g., Two and a Half Men) and character names on the show (e.g., Walden Schmidt) as the keywords.
4.
Many of the Wikipedia editors are committed followers of TV and edit-related articles earlier than the show’s release date. Wikipedia edits or views may be good predictors of TV ratings.
5.
Consumers also post reviews on discussion forums such as the IMDB, chosen here because it has the highest Web traffic ranking (according to Alexa) among all TV show-related sites.
6.
Consumers may also be driven to watch TV series by news articles. Huffington Post is a site that offers news, blogs covering entertainment, politics, etc. It ran 26th on Alexa as of January 29, 2015.
7.
http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
8.
For the example illustrated in figure 1, the brand smartcar had 11,052 followers, out of which 953 (8.6%) were also followers of environmental friendliness exemplars.
9.
This is analogous to the “inverse document frequency” adjustment used in information retrieval to encourage documents containing rare query terms to be ranked higher than documents containing common query terms (Manning et al. 2008).
10.
Hadoop is an open-source software framework that allows the distributed processing of large datasets across clusters of computers. It contains (1) the Hadoop Common package, which provides file system and OS-level abstraction; (2) Yarn, a MapReduce engine; and (3) the Hadoop Distributed File System. These mechanisms automatically break down jobs into distributed tasks, schedule jobs, and tasks efficiently at participating cluster nodes, and tolerate data and task failures.
11.
The same applies to that is also needed when estimating linear regression
12.
For quantitative methods and model builders this privilege in our opinion is only reserved to pure “creators of mathematics.”

References

Albescu, F., & Pugna, I. B. (2014). Marketing intelligence—The last frontier of business information technologies. Romanian Journal of Marketing, 3, 55–68.
Google Scholar
Bello-Orgaza, G., Jungb, J. J., & Camachoa, D. (2016). Social big data: Recent achievements and new challenges. Information Fusion, 28, 45–59.
Article Google Scholar
Benson, A. R., Gleich D. F. & Demmel J. (2013). Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures, 2013 IEEE International Conference on Big Data, October 6–9, TBD Silicon Valley.
Google Scholar
Beyer, M. A., & Laney, D. (2012). The importance of ‘big data’: A definition. Stamford, CT: Gartner.
Google Scholar
Bradley, J. (2016). Apache^® Spark™ MLlib: From Quick Start to Scikit-Learn. Retrieved October, 2017, from http://go.databricks.com/spark-mllib-from-quick-start-to-scikit-learn.
Culotta, A., & Cutler, J. (2016). Mining brand perceptions from twitter social networks. Marketing Science, 35(3), 343–362.
Article Google Scholar
Davenport, T., & Patil, D. (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(10), 70–76.
Google Scholar
Dean, J. & Ghemawat, S. (2004, December). MapReduce: Simplified data processing on large clusters, OSDI'04: Sixth symposium on operating system design and implementation, San Francisco, CA.
Google Scholar
Forrester, (2011). Expand your digital horizon with big data. Forrester. Retrieved May 27 from http://www.asterdata.com/newsletter-images/30-04-2012/resources/Forrester_Expand_Your_Digital_Horiz.pdf Accessed July 7, 2017.
Goes, P. (2014). Big data and IS research. MIS Quarterly, 38(3), III–VIII.
Google Scholar
Halko, N. P. (2012). Randomized methods for computing low-rank approximations of matrices. Unpublished doctoral dissertation, University of Colorado, Boulder.
Google Scholar
IBM. (2011) From stretched to strengthened—Insights from a global CMO study. Retrieved September 17, 2015, from http://www.ibm.com/services/us/cmo/cmostudy2011/downloads.html.
Laney, D. (2001). 3D data management: Controlling data volume, velocity, and variety, technical report. Retrieved October, 2017, from https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf.
Lilien, G. L., & Rangaswamy, A. (2000). Modeled to bits: Decision models for the digital, networked economy. International Journal of Research in Marketing, 17, 227–235.
Article Google Scholar
Liu, X., Singh, P. V., & Srinivasan, K. (2016). A structured analysis of unstructured big data by leveraging cloud computing. Marketing Science, 35(3), 363–388.
Article Google Scholar
Martin, L. & Pu, P. (2014). Prediction of helpful reviews using emotions extraction. AAAI Publications.
Google Scholar
McAuley, J., Pandey, R. & Leskovec J. (2015). Inferring networks of substitutable and complementary products, KDD '15 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
Google Scholar
Odersky, M., Spoon L., Venners B. (2011), Programming in Scala. In A comprehensive step-by-step guide (2nd ed) (January 4, 2011), Artima Inc.
Google Scholar
Rust, R. T., & Huang, M. H. (2014). The service revolution and the transformation of marketing science. Marketing Science, 33(2), 206–221.
Google Scholar
Sanders, N. R. (2016). How to use big data to drive your supply chain. California Management Review, 58(3), 26–48.
Article Google Scholar
Wedel, M., & Kannan, P. K. (2016). Marketing analytics for data-rich environments. Journal of Marketing, 80(6), 97–121.
Article Google Scholar
Wilkinson, D. (2013). Scala as a platform for statistical computing and data science. Retrieved October, 2017, from https://darrenjw.wordpress.com/2013/12/23/scala-as-a-platform-for-statistical-computing-and-data-science/
Xu, Z., Frankwick, G. L., & Ramirez, E. (2016). Effects of big data analytics and traditional marketing analytics on new product success: A knowledge fusion perspective. Journal of Business Research, 69(5), 1562–1566.
Article Google Scholar
Zaharia, M. (2014). An architecture for fast and general data processing on large clusters, University of California at Berkeley, Technical Report No. UCB/EECS-2014-12.
Google Scholar
Zaharia, M., Chowdhury M., Das T., Dave A., Ma J., McCauley M., Franklin M. J., Shenker S., Stoica I. (2012, April). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, NSDI 2012.
Google Scholar

Download references

Author information

Authors and Affiliations

Université de Lille RIME-Lab, Lille, France
Mihai Calciu
Aix Marseille Université Cret-Log, Marseille, France
Jean-Louis Moulins
Université de Lille LEM, Lille, France
Francis Salerno

Authors

Mihai Calciu
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Louis Moulins
View author publications
You can also search for this author in PubMed Google Scholar
Francis Salerno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mihai Calciu .

Editor information

Editors and Affiliations

Marketing Department, IÉSEG School of Management, Lille, France
Patricia Rossi
Rohrer College of Business, Rowan University, Glassboro, NJ, USA
Nina Krey

Appendix

Listing 2 Measuring customer sentiment on the Amazon Reviews Dataset*

1. import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}

2. import org.apache.spark.ml.regression._

3. import org.apache.spark.ml.{Pipeline, PipelineModel}

4. import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator}

5. import org.apache.spark.ml.evaluation.RegressionEvaluator

6. //Load dataset and cache it

7. val data = spark.read.json(/media/storage1/reviews-train.json).cache()

8. //Define a pipeline combining text feature extractors + linear regression

9. val tokenizer = new Tokenizer().setInputCol("reviewText").setOutputCol("words")

10. val hashingTF = new HashingTF().setInputCol("words").setOutputCol("features")

11. val lasso = new LinearRegression().setLabelCol("overall").setElasticNetParam(1.0). setMaxIter(100)

12. val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, lasso))

13. val paramGrid = new ParamGridBuilder().addGrid(lasso.regParam, Array(0.005, 0.01, 0.05)).build()

14. //Define evaluation metric

15. val evaluator = new RegressionEvaluator().setLabelCol("overall").setMetricName("r2")

16. val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(evaluator).setEstimatorParamMaps(paramGrid)

17. //Run everything!

18. val cvModel = cv.fit(data)

19. //Evaluate on test data:

20. val test = spark.read.json("/media/storage1/reviews-test.json")

21. var r2 = evaluator.evaluate(cvModel.transform(test))

22. println("Test data R^2 score:", r2)

23. val sparkPredictions = cvModel.transform(test)

24. sparkPredictions.write.format("json").mode("overwrite").save(/media/storage1/predictions.json)

*The listing is adapted by us to Scala from a Python version (Bradley 2016)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Calciu, M., Moulins, JL., Salerno, F. (2019). Big Consumer Behavior Data and their Analytics: Some Challenges and Solutions. In: Rossi, P., Krey, N. (eds) Finding New Ways to Engage and Satisfy Global Customers. AMSWMC 2018. Developments in Marketing Science: Proceedings of the Academy of Marketing Science. Springer, Cham. https://doi.org/10.1007/978-3-030-02568-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-02568-7_13
Published: 02 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02567-0
Online ISBN: 978-3-030-02568-7
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Listing 2 Measuring customer sentiment on the Amazon Reviews Dataset*

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation