Analysis of book sales prediction at Amazon marketplace in India: a machine learning approach

  • Satyendra Kumar Sharma
  • Swapnajit ChakrabortiEmail author
  • Tanaya Jha
Original Article


Prediction of customer demand is an important part of Supply Chain Management, as it helps to avoid over or under production and reduces delivery time. In the context of e-commerce, accurate prediction of customer demand, typically captured by sales volume, requires careful analysis of multiple factors, namely, type of product, country of purchase, price, discount rate, free delivery option, online review sentiment etc., and their interactions. For e-tailers such as, Amazon, this kind of prediction capability is also extremely important in order to manage the supply chain efficiently as well as ensure customer satisfaction. This study investigates the efficacy of various modeling techniques, namely, regression analysis, decision-tree analysis and artificial neural network, for predicting the sales of books at, using various relevant factors and their interactions as predictor variables. Sentiment analysis is carried out to measure the polarity of online reviews, which are included as predictors in these models. The importance of each independent predictor variable, such as discount rate, review sentiment etc., is analyzed based on the outcome of each model to determine top significant predictors which can be controlled by the marketer to influence sales. In terms of accuracy of prediction, the artificial neural network model is found to perform better than the decision-tree based model. In addition, the regression analysis, with and without sentiment and interaction factors, generates comparable results. The comparative analysis of these models reveals several significant findings. Firstly, all three models confirm that review volume is the most important and significant predictor of sales of books at Secondly, discount rate, discount amount and average ratings have minimal or insignificant effect on sales prediction. Thirdly, both negative sentiment and positive sentiment of the reviews are individually significant predictors as per regression and decision-tree model, but they are not significant at all as per neural network model. This observation from the neural network model is contrary to the extant research which claims that both negative and positive sentiment are significant with the former having more influence in predicting sales. Finally, the interaction effects of review volume with negative and positive sentiment are also found to be significant predictors as per all three models. Hence, overall, out of various factors used for sales prediction of books, review volume, negative sentiment, positive sentiment and their interactions are found to be the most significant ones across all models. The results of this study can be utilized by online sellers to accurately predict the sales volume by adjusting these significant factors, thereby managing the supply chain effectively.


E-commerce Sentiment analysis Neural network Decision tree Regression analysis Predictive model 



  1. Archak N, Ghose A, Ipeirotis PG (2011) Deriving the pricing power of product features by mining consumer reviews. Manag Sci 57(8):1485–1509CrossRefGoogle Scholar
  2. Beautiful Soup (2017). Beautiful soup documentation. Accessed 1 Dec 2017
  3. Chen S-FS, Monroe KB, Lou Y-C (1998) The effects of framing price promotion messages on consumers’ perceptions and purchase intentions. J Retail 74(3):353–372CrossRefGoogle Scholar
  4. Cheung CM, Thadani DR (2012) The impact of electronic word-of-mouth communication: a literature analysis and integrative model. Decis Support Syst 54(1):461–470CrossRefGoogle Scholar
  5. Chevalier JA, Mayzlin D (2006) The effect of word of mouth on sales: online book reviews. J Mark Res 43(3):345–354CrossRefGoogle Scholar
  6. Chong AYL (2013) Predicting m-commerce adoption determinants: a neural network approach. Expert Syst Appl 40(2):523–530CrossRefGoogle Scholar
  7. Chong AYL, Zhou L (2014) Demand chain management: relationships between external antecedents, web-based integration and service innovation performance. Int J Prod Econ 154:48–58CrossRefGoogle Scholar
  8. Chong AYL, Ooi KB, Sohal A (2009) The relationship between supply chain factors and adoption of e-collaboration tools: an empirical examination. Int J Prod Econ 122(1):150–160CrossRefGoogle Scholar
  9. Chong AYL, Li B, Ngai EW, Ch’ng E, Lee F (2016) Predicting online product sales via online reviews, sentiments, and promotion strategies: a big data architecture and neural network approach. Int J Oper Prod Manag 36(4):358–383CrossRefGoogle Scholar
  10. Chong AYL, Ch’ng E, Liu MJ, Li B (2017) Predicting consumer product demands via big data: the roles of online promotional marketing and online reviews. Int J Prod Res 55(17):5142–5156CrossRefGoogle Scholar
  11. Cui G, Lui HK, Guo X (2012) The effect of online consumer reviews on new product sales. Int J Electron Commer 17(1):39–58CrossRefGoogle Scholar
  12. Davis A, Khazanchi D (2008) An empirical study of online word of mouth as a predictor for multi-product category e-commerce sales. Electron Mark 18(2):130–141CrossRefGoogle Scholar
  13. Dellarocas CN, Awad N, Zhang X (2004) Using online reviews as a proxy of word-of-mouth for motion picture revenue forecasting. SSRN Electron J.
  14. Doern RR, Fey CF (2006) E-commerce developments and strategies for value creation: the case of Russia. J World Bus 41:315–327CrossRefGoogle Scholar
  15. Drozdenko R, Jensen M (2005) Risk and maximum acceptable discount levels. J Prod Brand Manag 14(4):264–270CrossRefGoogle Scholar
  16. Duan W, Gu B, Whinston AB (2008) The dynamics of online word-of-mouth and product sales—an empirical investigation of the movie industry. J Retail 84(2):233–242CrossRefGoogle Scholar
  17. Faryabi M, Sadeghzadeh K, Saed M (2012) The effect of price discounts and store image on consumer’s purchase intention in online shopping context case study: Nokia and HTC. J Bus Stud Q 4(1):197Google Scholar
  18. Floyd K, Freling R, Alhoqail S, Cho HY, Freling T (2014) How online product reviews affect retail sales: a meta-analysis. J Retail 90(2):217–232CrossRefGoogle Scholar
  19. Gaikar D, Marakarkandy B (2015) Product sales prediction based on sentiment analysis using Twitter data. Int J Comput Sci Inf Technol 6(3):2303–2313Google Scholar
  20. Gendall P, Hoek J, Pope T, Young K (2006) Message framing effects on price discounting. J Prod Brand Manag 15(7):458–465CrossRefGoogle Scholar
  21. Ghose A, Ipeirotis P (2006) Designing ranking systems for consumer reviews. The impact of review subjectivity on product sales and review quality. In: Proceedings of the 16th annual workshop on information technology and systems. Accessed 1 June 2017
  22. Gong J, Smith MD, Telang R (2015) Substitution or promotion? the impact of price discounts on cross-channel sales of digital movies. J Retail 91(2):343–357CrossRefGoogle Scholar
  23. Gupta S, Cooper LG (1992) The discounting of discounts and promotion thresholds. J Consum Res 19:401–411CrossRefGoogle Scholar
  24. Hancock JT, Gee K, Ciaccio K, Lin JM-H (2008) I’m sad you’re sad: emotional contagion in CMC. In: Proceedings of the 2008 ACM conference on computer supported cooperative work. ACM, pp 295–298Google Scholar
  25. Hu N, Bose I, Koh NS, Liu L (2012) Manipulation of online reviews: an analysis of ratings, readability, and sentiments. Decis Support Syst 57:42–53CrossRefGoogle Scholar
  26. Hu N, Koh NS, Reddy SK (2014) Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales. Decis Support Syst 57:42–53CrossRefGoogle Scholar
  27. Ito TA, Larsen JT, Smith NK, Cacioppo JT (1998) Negative information weighs more heavily on the brain: the negativity bias in evaluative categorizations. J Personal Soc Psychol 75(4):887CrossRefGoogle Scholar
  28. Jain R, Kulhar M (2015) Growth drivers of online shopping in small cities of India. Int J Adv Res Comput Sci Manage Stud 3(9):80–87Google Scholar
  29. Kramer ADI, Guillory JE, Hancock JT (2014) Experimental evidence of massivescale emotional contagion through social networks. Proc Natl Acad Sci 111(24):8788–8790CrossRefGoogle Scholar
  30. Lee J, Park DH, Han I (2008) The effect of negative online consumer reviews on product attitude: an information processing view. Electron Commer Res Appl 7(3):341–352CrossRefGoogle Scholar
  31. Leino J, Raiha K (2007) Case Amazon: ratings and reviews as part of recommendations. In: RecSys. ACM, pp 137–140Google Scholar
  32. Li X, Hitt LM (2010) Price effects in online product reviews: an analytical model and empirical analysis. MIS Q 34(4):809–831CrossRefGoogle Scholar
  33. Lichtenstein DR, Netemeyer RG, Burton S (1990) Distinguishing coupon proneness from value consciousness: an acquisition-transaction utility theory perspective. J Mark 54(3):54–67CrossRefGoogle Scholar
  34. Lu X, Ba S, Huang L, Feng Y (2013) Promotional marketing or word-of-mouth? Evidence from online restaurant reviews. Inf Syst Res 24(3):596–612CrossRefGoogle Scholar
  35. Ludwig S, Ruyter K, Friedman M, Bruggen EC, Wetzels M, Pfann G (2013) More than words: the influence of affective content and linguistic style matches in online reviews on conversion rates. J Mark 77:87–103CrossRefGoogle Scholar
  36. Marshall R, Leng SB (2002) Price threshold and discount saturation point in Singapore. J Prod Brand Manag 11(3):147–159CrossRefGoogle Scholar
  37. McNeill L (2013) Sales promotion in Asia: successful strategies for Singapore and Malaysia. Asia Pac J Mark Logist 25:45–69CrossRefGoogle Scholar
  38. Mudambi S, Schuff D (2010) What makes a helpful review? A study of customer reviews on MIS Q 34(1):185–200CrossRefGoogle Scholar
  39. NLTK. Natural language toolkit documentation.
  40. Online Influence Trend Tracker Report (2011).
  41. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135CrossRefGoogle Scholar
  42. Requests (2017) Requests: HTTP for humans.
  43. Salehan M, Kim DJ (2016) Predicting the performance of online consumer reviews: a sentiment mining approach to big data analytics. Decis Support Syst 81:30–40CrossRefGoogle Scholar
  44. Santibanez SF, Kloft M, Lakes T (2015) Performance analysis of machine learning algorithms for regression of spatial variables. A case study in the real estate industryGoogle Scholar
  45. Schneider MJ, Gupta S (2016) Forecasting sales of new and existing products using consumer reviews: a random projections approach. Int J Forecast 32(2):243–256CrossRefGoogle Scholar
  46. Sentiment API (2017) Natural language processing APIs. Accessed 15 Dec 2017
  47. SentiStrength. (2017). SentiStrength. Accessed 15 Dec 2017
  48. Social Media Report 2012: social media comes of age (2012).
  49. Tang T, Fang E, Wang F (2014) Is neutral really neutral? The effects of neutral user-generated content on product sales. J Mark 78(4):41–58CrossRefGoogle Scholar
  50. Tsai W-C (2001) Determinants and consequences of employee displayed positive emotions. J Manag 27(4):497–512Google Scholar
  51. Xu N, Bai SZ, Wan X (2017) Adding pay-on-delivery to pay-to-order: the value of two payment schemes to online sellers. Electron Commer Res Appl 21:27–37CrossRefGoogle Scholar
  52. Yang J, Kim W, Amblee N, Jeong J (2012) The heterogeneous effect of WOM on product sales: why the effect of WOM valence is mixed? Eur J Mark 46(11/12):1523–1538CrossRefGoogle Scholar
  53. Yao R, Chen J (2013) Predicting movie sales revenue using online reviews. In: 2013 IEEE international conference on granular computing (GrC). IEEE, pp 396–401)Google Scholar
  54. Yu X, Liu Y, Huang X, An A (2012) Mining online reviews for predicting sales performance: a case study in the movie domain. IEEE Trans Knowl Data Eng 24(4):720–734CrossRefGoogle Scholar
  55. Zhou ZH, Jiang Y (2004) NeC4. 5: neural ensemble based C4. 5. IEEE Trans Knowl Data Eng 16(6):770–773CrossRefGoogle Scholar
  56. Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263CrossRefGoogle Scholar
  57. Zhu F, Zhang X (2010) Impact of online consumer reviews on sales: the moderating role of product and consumer characteristics. J Mark 74(2):133–148CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Satyendra Kumar Sharma
    • 1
  • Swapnajit Chakraborti
    • 2
    Email author
  • Tanaya Jha
    • 3
  1. 1.Department of ManagementBirla Institute of Technology and Science (BITS Pilani)PilaniIndia
  2. 2.Information ManagementS. P. Jain Institute of Management and Research (SPJIMR)MumbaiIndia
  3. 3.Department of Computer ScienceBirla Institute of Technology and Science (BITS Pilani)PilaniIndia

Personalised recommendations