Structuring Unstructured Data—Or: How Machine Learning Can Make You a Wine Sommelier

  • Oliver MüllerEmail author


Textual data, for example in the form of e-mails, instant messages, or social media posts, is ubiquitous today. As textual data typically comes in unstructured formats and is often ambiguous in meaning, it is difficult to analyze it using computational tools. However, advances in machine learning and the increasing availability of training data make it now possible to extract useful knowledge from large amounts of unstructured textual data. In this chapter, we showcase the use of unsupervised machine learning algorithms and visualization techniques to bring structure to—and thereby learn from—more than 100,000 professional wine reviews. Something that could be useful, for example, when choosing suitable wines for the celebration of your 60th birthday.


Unstructured data Machine learning Topic modelling Online reviews Wine 


  1. Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.CrossRefGoogle Scholar
  2. Blei, D., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.Google Scholar
  3. BrightLocal. (2014). Local consumer review survey 2014. Retrieved September 19, 2018, from
  4. Debortoli, S., Müller, O., Junglas, I., & vom Brocke, J. (2016). Text mining for information systems researchers: An annotated topic modeling tutorial. Communications of the Association for Information Systems, 39(1).CrossRefGoogle Scholar
  5. Dhar, V. (2013). Data science and prediction. Communications of the ACM, 56(12), 64–73.CrossRefGoogle Scholar
  6. Fan, W., Wallace, L., Rich, S., & Zhang, Z. (2006). Tapping the power of text mining. Communications of the ACM, 49(9), 76–82.CrossRefGoogle Scholar
  7. Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1992). Knowledge discovery in databases: An overview. AI Magazine, 13(3), 57–70.Google Scholar
  8. Friedman, J., Hastie, T., & Tibshirani, R. (2013). The elements of statistical learning. New York: Springer.Google Scholar
  9. Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12.CrossRefGoogle Scholar
  10. Hastie, T., Tibshirani, R., & Friedman, J. (2013). The elements of statistical learning. New York: Springer.Google Scholar
  11. IDC. (2014). The 2014 digital universe study. Retrieved September 19, 2018, from
  12. Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson.Google Scholar
  13. Mudambi, S. M., & Schuff, D. (2010). What makes a helpful online review? A study of customer reviews on MIS Quarterly, 34(1), 185–200.CrossRefGoogle Scholar
  14. Schmiedel, T., Müller, O., & vom Brocke, J. (2018). Topic modeling as a strategy of inquiry in organizational research: A tutorial with an application example on organizational culture. Organizational Research Methods.Google Scholar
  15. Statista. (2017). Number of user reviews and opinions on TripAdvisor worldwide from 2014 to 2017. Retrieved September 19, 2018, from

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Paderborn UniversityPaderbornGermany

Personalised recommendations