On user rationale in software engineering

RE 2017

Abstract

Rationale refers to the reasoning and justification behind human decisions, opinions, and beliefs. In software engineering, rationale management focuses on capturing design and requirements decisions and on organizing and reusing project knowledge. This paper takes a different view on rationale written by users in online reviews. We studied 32,414 reviews for 52 software applications in the Amazon Store. Through a grounded theory approach and peer content analysis, we investigated how users argue and justify their decisions, e.g., about upgrading, installing, or switching software applications. We also studied the occurrence frequency of rationale concepts such as issues encountered or alternatives considered in the reviews and found that assessment criteria like performance, compatibility, and usability represent the most pervasive concept. We identified a moderate positive correlation between issues and criteria and furthermore assessed the distribution of rationale concepts with respect to rating and verbosity. We found that issues tend to appear more in lower star rated reviews, while criteria, alternatives, and justifications seem to appear more in three star rated reviews. Also, reviews reporting alternatives seem to be more verbose than reviews reporting criteria. A follow-up qualitative study of sub-concepts revealed, that users also report other alternatives (e.g., alternative software provider), criteria (e.g., cost), and decisions (e.g., on rating software). We then used the truth set of manually labeled review sentences to explore how accurately we can mine rationale concepts from the reviews. We evaluated the classification algorithms Naive Bayes, Support Vector Machine, Logistic Regression, Decision Tree, Gaussian Process, Random Forest, and Multilayer Perceptron Classifier using a baseline and random configuration. Support Vector Classifier, Naive Bayes, and Logistic Regression, trained on the review metadata, syntax tree of the review text, and influential terms, achieved a precision around 80% for predicting sentences with alternatives and decisions, with top recall values of 98%. On the review level, precision was up to 13% higher with recall values reaching 99%. Using only word features, we achieved in most cases the highest precision and highest recall respectively using the Random Forest and Naive Bayes algorithm. We discuss the findings and the rationale importance for supporting deliberation in user communities and synthesizing the reviews for developers.

Keywords

App analytics Rationale Review mining 

Notes

Acknowledgements

The authors thank the coders, particularly A. Alizadeh, J. Hennings, E. Kurtanović, D. Martens, and M. Ziaei for their help with the coding. This work is partly funded by the H2020 EU research project OPENREQ (ID 732463).

References

  1. 1.
    Alkadhi R, Laţa T, Guzman E, Bruegge B (2017) Rationale in development chat messages: an exploratory study. In: Proceedings of the 14th international conference on mining software repositories. IEEE Press, pp 436–446Google Scholar
  2. 2.
    Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4Google Scholar
  3. 3.
    Boltuzic F, Snajder J (2014) Back up your stance: recognizing arguments in online discussions. In: ArgMining@ ACL, pp 49–58Google Scholar
  4. 4.
    Bruegge B, Dutoit AA (1999) Object-oriented software engineering; conquering complex and changing systems. Prentice Hall, LondonGoogle Scholar
  5. 5.
    Burge JE, Carroll JM, McCall R, Mistrk I (2008) Rationale-based software engineering. Springer, BerlinCrossRefMATHGoogle Scholar
  6. 6.
    Carre no LVG, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 582–591Google Scholar
  7. 7.
    Charrada EB (2016) Which one to read? Factors influencing the usefulness of online reviews for re. In: 2016 IEEE 24th international requirements engineering conference workshops (REW), pp 46–52Google Scholar
  8. 8.
    Chen H, Zimbra D (2010) Ai and opinion mining. IEEE Intell Syst 25(3):74–80CrossRefGoogle Scholar
  9. 9.
    Cohen J (1968) Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull 70(4):213CrossRefGoogle Scholar
  10. 10.
    de Rosis F, Novielli N (2007) From language to thought: inferring opinions and beliefs from verbal behavior. Proc AISB 7:377–384Google Scholar
  11. 11.
    Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923CrossRefGoogle Scholar
  12. 12.
    Dutoit AH, McCall R, Mistrik I, Paech B (2006) Rationale management in software engineering. Springer, BerlinCrossRefGoogle Scholar
  13. 13.
    Dutoit AH, Paech B (2002) Rationale-based use case specification. Requir Eng 7(1):3–19CrossRefMATHGoogle Scholar
  14. 14.
    Ebrahimi J, Dou D, Lowd D (2016) A joint sentiment-target-stance model for stance classification in tweets. In: COLING, pp 2656–2665Google Scholar
  15. 15.
    Grady RB (1992) Practical software metrics for project management and process improvement. Prentice-Hall, Inc., LondonGoogle Scholar
  16. 16.
    Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing, vol 280. Wiley, HobokenMATHGoogle Scholar
  17. 17.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHGoogle Scholar
  18. 18.
    Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: IEEE 22nd international requirements engineering conference, RE 2014, Karlskrona, Sweden, 25–29 August 2014, pp 153–162Google Scholar
  19. 19.
    Habernal I, Gurevych I (2017) Argumentation mining in user-generated web discourse. Comput Linguist 43:125–179MathSciNetCrossRefGoogle Scholar
  20. 20.
    Hand DJ, Yu K (2001) Idiot’s Bayes—not so stupid after all? Int Stat Rev 69(3):385–398MATHGoogle Scholar
  21. 21.
    He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  22. 22.
    Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13. IEEE PressGoogle Scholar
  23. 23.
    Jarczyk A, Loffler P, Shipman F (1992) Design rationale for software engineering: a survey. In: Proceedings of the twenty-fifth Hawaii international conference on system sciencesGoogle Scholar
  24. 24.
    Jongeling R, Datta S, Serebrenik A (2015) Choosing your weapons: on sentiment analysis tools for software engineering research. In: 015 IEEE international conference on Software maintenance and evolution (ICSME), vol 2. IEEE, pp 531–535Google Scholar
  25. 25.
    Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol 14Google Scholar
  26. 26.
    Kurtanović Z, Maalej W (2017) Automatically classifying functional and non-functional requirements using supervised machine learning. In: 2017 IEEE 25th international requirements engineering conference (RE). IEEE, pp 490–495Google Scholar
  27. 27.
    Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Proceedings of the 25rd IEEE international requirements engineering conference. IEEEGoogle Scholar
  28. 28.
    Lee J (1997) Design rationale systems: understanding the issues. IEEE Expert Intell Syst Their Appl 12(3):78–85Google Scholar
  29. 29.
    Liang Y, Liu Y, Kwong CK, Lee WB (2012) Learning the “why”: discovering design rationale using text mining—an algorithm perspective. Comput Aided Des 44(10):916–930CrossRefGoogle Scholar
  30. 30.
    Lippi M, Torroni P (2016) Argumentation mining: state of the art and emerging trends. ACM Trans Internet Technol 16:10CrossRefGoogle Scholar
  31. 31.
    Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167CrossRefGoogle Scholar
  32. 32.
    Liu Y, Liang Y, Kwong CK, Lee WB (2010) A new design rationale representation model for rationale mining. J Comput Inf Sci Eng 10(3):031009CrossRefGoogle Scholar
  33. 33.
    Loosen W, Häring M, Kurtanović Z, Merten L, Reimer J, van Roessel L, Maalej W (2017) Making sense of user comments. Identifying journalists’ requirements for a software framework. In: Preconference of the 67th annual conference of the International Communication Association (ICA). IEEEGoogle Scholar
  34. 34.
    López C, Codocedo V, Astudillo H, Cysneiros LM (2012) Bridging the gap between software architecture rationale formalisms and actual architecture documents: an ontology-driven approach. Sci Comput Program 77(1):66–80CrossRefGoogle Scholar
  35. 35.
    Maalej W, Kurtanović Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21:311–331CrossRefGoogle Scholar
  36. 36.
    Maalej W, Nayebi M, Johann T, Ruhe G (2016) Toward data-driven requirements engineering. IEEE Softw 33(1):48–54CrossRefGoogle Scholar
  37. 37.
    Maalej W, Robillard MP (2013) Patterns of knowledge in API reference documentation. IEEE Trans Softw Eng 39(9):1264–1282CrossRefGoogle Scholar
  38. 38.
    Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330Google Scholar
  39. 39.
    McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157CrossRefGoogle Scholar
  40. 40.
    Merriam-Webster. Merriam-webster. https://www.merriam-webster.com/. Last Accessed: Jan 2018
  41. 41.
    Neuendorf KA (2001) The content analysis guidebook, 1st edn. Sage Publications, Inc., Thousand OaksGoogle Scholar
  42. 42.
    Nguyen H, Litman D (2015) Extracting argument and domain words for identifying argument components in texts. In: Proceedings of the 2nd workshop on argumentation mining, pp 22–28Google Scholar
  43. 43.
    Pagano D, Brügge B (2013) User involvement in software evolution practice: a case study. In: 35th international conference on software engineering, ICSE ’13, pp 953–962Google Scholar
  44. 44.
    Pagano D, Maalej W (2013) User feedback in the appstore: an empirical study. In: RE. IEEE Computer Society, pp 125–134Google Scholar
  45. 45.
    Palau RM, Moens M-F (2009) Argumentation mining: the detection, classification and structure of arguments in text. In: Proceedings of the 12th international conference on artificial intelligence and law, ICAIL ’09. ACM, pp 98–107Google Scholar
  46. 46.
    Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135CrossRefGoogle Scholar
  47. 47.
    Peldszus A, Stede M (2013) From argument diagrams to argumentation mining in texts: a survey. Int J Cogn Inform Nat Intell (IJCINI) 7(1):1–31CrossRefGoogle Scholar
  48. 48.
    Ramos J et al (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242, pp 133–142Google Scholar
  49. 49.
    Rogers B, Gung J, Qiao Y, Burge J (2012) Exploring techniques for rationale extraction from existing documents. In: 2012 34th international conference on software engineering (ICSE)Google Scholar
  50. 50.
    Rogers B, Justice C, Mathur T, Burge JE (2017) Generalizability of document features for identifying rationale. In: Design computing and Cognition’16. Springer, pp 633–651Google Scholar
  51. 51.
    Rogers B, Qiao Y, Gung J, Mathur T, Burge JE (2015) Using text mining techniques to extract rationale from existing documentation. In: Gero SJ, Hanna S (eds) Design computing and cognition ’14. Springer, Berlin, pp 457–474Google Scholar
  52. 52.
    Schneider J, Samp K, Passant A, Decker S (2013) Arguments about deletion: how experience improves the acceptability of arguments in ad-hoc online task groups. In: Proceedings of the 2013 conference on Computer supported cooperative work. ACM, pp 1069–1080Google Scholar
  53. 53.
    Sedano T, Ralph P, Péraire C (2017) Lessons learned from an extended participant observation grounded theory study. In: Proceedings of the 5th international workshop on conducting empirical studies in industry. IEEE Press, pp 9–15Google Scholar
  54. 54.
    Stol K-J, Ralph P, Fitzgerald B (2016) Grounded theory in software engineering research: a critical review and guidelines. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 120–131Google Scholar
  55. 55.
    Strauss A, Corbin J (1998) Basics of qualitative research: techniques and procedures for developing grounded theory. SAGE, Thousand OaksGoogle Scholar
  56. 56.
    Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Comput Linguist 28(4):409–445CrossRefGoogle Scholar
  57. 57.
    Walker MA, Anand P, Abbott R, Tree JEF, Martell C, King J (2012) That is your evidence?: Classifying stance in online political debate. Decis Support Syst 53(4):719–729CrossRefGoogle Scholar
  58. 58.
    Wiebe J, Riloff E (2005) Creating subjective and objective sentence classifiers from unannotated texts. In: Computational linguistics and intelligent text processings of the 6th international conference, CICLingGoogle Scholar
  59. 59.
    Willemsen LM, Neijens PC, Bronner F, De Ridder JA (2011) Highly recommended! The content characteristics and perceived usefulness of online consumer reviews. J Comput Mediat Commun 17(1):19–38CrossRefGoogle Scholar
  60. 60.
    Wyner A, Schneider J, Atkinson K, Bench-Capon TJM (2012) Semi-automated argumentative analysis of online product reviews. In: Verheij B, Szeider S, Woltran S (eds) COMMA, volume 245 of frontiers in artificial intelligence and applications. IOS PressGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of HamburgHamburgGermany

Personalised recommendations