Empirical Software Engineering

, Volume 24, Issue 6, pp 3659–3695 | Cite as

Mining non-functional requirements from App store reviews

  • Nishant Jha
  • Anas MahmoudEmail author


User reviews obtained from mobile application (app) stores contain technical feedback that can be useful for app developers. Recent research has been focused on mining and categorizing such feedback into actionable software maintenance requests, such as bug reports and functional feature requests. However, little attention has been paid to extracting and synthesizing the Non-Functional Requirements (NFRs) expressed in these reviews. NFRs describe a set of high-level quality constraints that a software system should exhibit (e.g., security, performance, usability, and dependability). Meeting these requirements is a key factor for achieving user satisfaction, and ultimately, surviving in the app market. To bridge this gap, in this paper, we present a two-phase study aimed at mining NFRs from user reviews available on mobile app stores. In the first phase, we conduct a qualitative analysis using a dataset of 6,000 user reviews, sampled from a broad range of iOS app categories. Our results show that 40% of the reviews in our dataset signify at least one type of NFRs. The results also show that users in different app categories tend to raise different types of NFRs. In the second phase, we devise an optimized dictionary-based multi-label classification approach to automatically capture NFRs in user reviews. Evaluating the proposed approach over a dataset of 1,100 reviews, sampled from a set of iOS and Android apps, shows that it achieves an average precision of 70% (range [66% - 80%]) and average recall of 86% (range [69% - 98%]).


Requirements elicitation Non-functional requirements Application store Classification 



We would like to extend our gratitude to Dr. Daniel M. Berry from the University of Waterloo for his contribution to this work. This work was supported in part by the Louisiana Board of Regents Research Competitiveness Subprogram (LA BoR-RCS), contract number: LEQSF(2015-18)-RD-A-07 and by the LSU Economic Development Assistantships (EDA) program.


  1. Apté C, Damerau F, Weiss S (1994) Towards language independent automated learning of text categorization models. In: Special interest group on information retrieval, pp 23–30CrossRefGoogle Scholar
  2. Bakiu E, Guzman E (2017) Which feature is unusable? Detecting usability and user experience issues from user reviews. In: International requirements engineering conference workshops, pp 182–187Google Scholar
  3. Bano M, Zowghi D, da Rimini F (2017) User satisfaction and system success: An empirical exploration of user involvement in software development. Empir Softw Eng 22(5):2339–2372CrossRefGoogle Scholar
  4. Basole R, Karla J (2012) Value transformation in the mobile service ecosystem: A study of app store emergence and growth. Serv Sci 4(1):24–41CrossRefGoogle Scholar
  5. Berry D (2017) Evaluation of tools for hairy requirements and software engineering tasks. In: International requirements engineering conference workshops, pp 284–291Google Scholar
  6. Bi W, Kwok J (2014) Multilabel classification with label correlations and missing labels. In: AAAI conference on artificial intelligence, pp 1680–1686Google Scholar
  7. Bird S, Loper E, Klein E (2009) Natural language processing with python. Sentiment Short Strength Detect Informal Text 61(12):2544–2558zbMATHGoogle Scholar
  8. Blei D, Ng A, Jordan M (2003) LAtent Dirichlet Allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  9. Brinker K, Fürnkranz J, Hüllermeier E (2006) A unified model for multilabel classification and ranking. In: European conference on artificial intelligence, pp 489–493Google Scholar
  10. Brusilovsky P, Kobsa A, Nejdl W (2007) The Adaptive Web: Methods and Strategies of Web Personalization. Springer, Berlin, pp 335–336CrossRefGoogle Scholar
  11. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2):121–167CrossRefGoogle Scholar
  12. Carreño L, Winbladh K (2013) Analysis of user comments: An approach for software requirements evolution. In: International conference on software engineering, pp 582–591Google Scholar
  13. Chen N, Lin J, Hoi S, Xiao X, Zhang B (2014) AR-Miner: Mining informative reviews for developers from mobile app marketplace. In: International conference on software engineering, pp 767–778Google Scholar
  14. Cheng W, Hüllermeier E (2009) A simple instance-based approach to multilabel classification using the mallows model. In: International workshop on learning from multi-label data, pp 28–38Google Scholar
  15. Chung L, Cesar J, do Prado Leite S (2009) On non-functional requirements in software engineering. Springer, Berlin, pp 363–379Google Scholar
  16. Ciurumelea A, Schaufelbühl A, Panichella S, Gall H (2017) Analyzing reviews and code of mobile apps for better release planning. In: International conference on software analysis, evolution and reengineering, pp 91–102Google Scholar
  17. Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005) Goal-centric traceability for managing non-functional requirements. In: International conference on software engineering, pp 362–371Google Scholar
  18. Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: Requirements engineering, pp 39–48Google Scholar
  19. Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated classification of non-functional requirements. Requir Eng 12(2):103–120CrossRefGoogle Scholar
  20. Coulton P, Bamford W (2011) Experimenting through mobile apps and app stores. Int J Mob Hum Comput Interact 3(4):55–70CrossRefGoogle Scholar
  21. Dehlinger J, Dixon J (2011) Mobile application software engineering: Challenges and research directions. In: Workshop on mobile software engineering, pp 29–32Google Scholar
  22. Eisenstein J, OĆonnor B, Smith N, Xing E (2014) Diffusion of lexical change in social media. PLoS ONE 9:1–13CrossRefGoogle Scholar
  23. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: International conference on neural information processing systems: natural and synthetic, pp 681–687Google Scholar
  24. Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Knowl Discov Data Min 96(34):226–231Google Scholar
  25. Finkelstein A, Harman M, Jia Y, Martin W, Sarro F, Zhang Y (2014) App store analysis: Mining app stores for relationships between customer, business and technical characteristics, University of College London, Tech. Rep. rN/14/10, Tech Rep.Google Scholar
  26. Forman G, Zahorjan J (1994) The challenges of mobile computing. Computer 27(4):38–47CrossRefGoogle Scholar
  27. Fu B, Lin J, Li L, Faloutsos C, Hong J, Sadeh N (2013) Why people hate your app: Making sense of user feedback in a mobile app store. In: Knowledge discovery and data mining, pp 1276–1284Google Scholar
  28. Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: International conference on information and knowledge management, pp 195–200Google Scholar
  29. Giardino C, Wang X, Abrahamsson P (2014) Why early-stage software startups fail: A behavioral framework. In: International conference of software business, pp 27–41CrossRefGoogle Scholar
  30. Glinz M (2007) On non-functional requirements. In: International requirements engineering conference, pp 21–26Google Scholar
  31. Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Advances in knowledge discovery and data mining, pp 22–30CrossRefGoogle Scholar
  32. Gokcay D, Gokcay E (1995) Generating titles for paragraphs using statistically extracted keywords and phrases. Syst Man Cybern 4:3174–3179Google Scholar
  33. Gómez M, Adams B, Maalej W, Monperrus M, Rouvoy R (2017) App store 2.0: From crowdsourced information to actionable feedback in mobile ecosystems. IEEE Softw 34(2):81–89CrossRefGoogle Scholar
  34. Gotel O, Cleland-Huang J, Hayes J, Zisman A, Egyed A, Grünbacher P, Dekhtyar A, Antoniol G, Maletic J (2012) The grand challenge of traceability (v1. 0). In: Software and systems traceability, pp 343–409Google Scholar
  35. Gralha W, Damian D, Wasserman A, Goulao M, Araújo J (2018) The evolution of requirements practices in software startups. In: International conference on software engineeringGoogle Scholar
  36. Groen E, Kopczynska S, Hauer M, Krafft T, Doerr J (2017) Users - The hidden software product quality experts? Requirements Engineering, pp 80–89Google Scholar
  37. Gross D, Yu E (2001) From non-functional requirements to design through patterns. Requir Eng 6(1):18–36zbMATHCrossRefGoogle Scholar
  38. Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: Requirements engineering, pp 153–162Google Scholar
  39. Harman M., Jia Y., Zhang Y. (2012) App store mining and analysis: MSR for app stores, In: Mining software repositories, pp 108–111Google Scholar
  40. Harrison R, Flood D, Duce D (2013) Usability of mobile applications: Literature review and rationale for a new usability model. J Interact Sci 1(1):1–16CrossRefGoogle Scholar
  41. Hattori L, Lanza M (2008) On the nature of commits. In: International conference on automated software engineering, pp 63–71Google Scholar
  42. He W, Tian X, Shen J (2015) Examining security risks of mobile banking applications through blog mining. In: Modern artificial intelligence and cognitive science conference, pp 103–108Google Scholar
  43. Hindle A, Wilson A, Rasmussen K, Barlow J, Charles J, Romansky S (2014) GreenMiner: A hardware based mining software repositories software energy consumption framework. In: Working conference on mining software repositories, pp 21–21Google Scholar
  44. Hutto C, Gilbert, E (2014) VADER: A parsimonious rule-based model for sentiment analysis of social media text. In: International AAAI conference on weblogs and social mediaGoogle Scholar
  45. Ihm S, Loh W, Park Y (2013) App analytic: A study on correlation analysis of app ranking data. In: International conference on cloud and green computing, pp 561–563Google Scholar
  46. Javarone M, Armano G (2013) Emergence of acronyms in a community of language users. Eur Phys J B 86(11):474CrossRefGoogle Scholar
  47. Jha N, Mahmoud A (2017a) Mining user requirements from application store reviews using frame semantics. In: Requirements engineering: foundation for software quality, pp 273–287CrossRefGoogle Scholar
  48. Jha N, Mahmoud A (2017b) MARC: A Mobile application review classifier. In: Requirements engineering: foundation for software quality, workshops, pp 1-15CrossRefGoogle Scholar
  49. Jha N, Mahmoud A (2018) Using frame semantics for classifying and summarizing application store reviews. Empir Softw Eng 23(6):3734–3767CrossRefGoogle Scholar
  50. Joachims T (1998) Text categorization with Support Vector Machines: Learning with many relevant features, pp 137–142Google Scholar
  51. Johann T, Stanik C, Maalej W et al (2017) Safe: A simple approach for feature extraction from app descriptions and app reviews. In: Requirements engineering, pp 21–30Google Scholar
  52. Jongeling R, Sarkar P, Datta S, Serebrenik A (2017) On negative results when using sentiment analysis tools for software engineering research. Empir Softw Eng 22(5):2543–2584CrossRefGoogle Scholar
  53. Kurtanović Z, Maalej W (2017) Mining user rationale from software reviews. In: Requirements engineering, pp 61–70Google Scholar
  54. Lee G, Raghu T (2011) Product portfolio and mobile apps success: Evidence from app store market. In: Americas conference information systems, pp 3912–3921Google Scholar
  55. Lewis D (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In: European conference on machine learning, pp 4–15CrossRefGoogle Scholar
  56. Li J, Yan H, Liu Z, Chen X, Huang X, Wong D (2017) Location-sharing systems with enhanced privacy in mobile online social networks. IEEE Syst J 11 (2):439–448CrossRefGoogle Scholar
  57. Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering: How far can we go? In: International conference on software engineering, pp 94–104Google Scholar
  58. Luaces O, Díez J, Barranquero J, Coz J, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Prog Artif Intell 1(4):303–313CrossRefGoogle Scholar
  59. Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Requirements engineering, pp 116–125Google Scholar
  60. Maalej W, Kurtanović Z, nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331CrossRefGoogle Scholar
  61. Mahatanankoon R, Joseph Wen H, Lim B (2005) Consumer-based m-commerce: Exploring consumer perception of mobile applications. Comput Stand Interfaces 27 (4):347–357CrossRefGoogle Scholar
  62. Mahmoud A, Williams G (2016) Detecting, classifying, and tracing non-functional software requirements. Requir Eng 21(3):357–381CrossRefGoogle Scholar
  63. Mairiza D, Zowghi D, Nurmuliani N (2010) An investigation into the notion of non-functional requirements. In: Association for computing machinery symposium on applied computing, pp 311–317Google Scholar
  64. Martin W, Harman M, Jia Y, Sarro F, Zhang Y (2015) The app sampling problem for app store mining. In: Working conference on mining software repositories, pp 123–133Google Scholar
  65. Martin W, Sarro F, Harman M (2016a) Causal impact analysis for app releases in google play. In: International symposium on foundations of software engineering, pp 435–446Google Scholar
  66. Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016b) A survey of app store analysis for software engineering. IEEE Transactions on Software EngineeringGoogle Scholar
  67. Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2017) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43(9):817–847CrossRefGoogle Scholar
  68. McCallum A, Nigam K et al (1998) A comparison of event models for Naive Bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol 752, pp 41–48Google Scholar
  69. Mcllroy S, Ali N, Khalid H, Hassan A (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Softw Eng 21(3):1067–1106CrossRefGoogle Scholar
  70. Nayebi M, Adams B, Ruhe G (2016a) Release practices for mobile apps – what do users and developers think?. In: International conference on software analysis, evolution, and reengineering, pp 552–562Google Scholar
  71. Nguyen Duc A, Abrahamsson P (2016b) Minimum viable product or multiple facet product? The role of mvp in software startups. In: Agile processes in software engineering and extreme programming, pp 118–130CrossRefGoogle Scholar
  72. Nayebi M, Farahi H, Ruhe G (2017a) Which version should be released to app store?. In: International symposium on empirical software engineering and measurement, pp 324–333Google Scholar
  73. Nayebi M, Ruhe G (2017b) Optimized functionality for super mobile apps. In: International requirements engineering conference, pp 388–393Google Scholar
  74. Nayebi M, Cho H, Ruhe G (2018) App store mining is not enough for app improvement. Empir Softw Eng 23(5):2764–2794CrossRefGoogle Scholar
  75. Nuseibeh B (2001) Weaving together requirements and architectures. Computer 34(3):115–119CrossRefGoogle Scholar
  76. Pagano D, Maalej W (2013) User feedback in the appstore: An empirical study. In: Requirements engineering, pp 125–134Google Scholar
  77. Panichella S, Sorbo A, Guzman E, Visaggio C, Canfora G, Gall H (2015) How can I improve my app? Classifying user reviews for software maintenance and evolution. In: International conference on software maintenance and evolution, pp 281–290Google Scholar
  78. Paternoster N, Giardino C, Unterkalmsteiner M, Gorschek T, Abrahamsson P (2014) Software development in Startup companies: A systematic mapping study. Inf Softw Technol 56(10):1200–1218CrossRefGoogle Scholar
  79. Pedregosa F, et al. (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  80. Petsas T, Papadogiannakis A, Polychronakis M, Markatos E, Karagiannis T (2013) Rise of the planet of the apps: A systematic study of the mobile app ecosystem. In: Conference on internet measurement, pp 277–290Google Scholar
  81. Quinlan R (1986) Induction of Decision Trees. Mach Learn 1(1):81–106Google Scholar
  82. Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: International conference on data mining, pp 995–1000Google Scholar
  83. Regnell B, Höst M, Berntsson Svensson R (2007) A quality performance model for cost-benefit analysis of non-functional requirements applied to the mobile handset domain. In: Requirements engineering: foundation for software quality, pp 277–291CrossRefGoogle Scholar
  84. Ribeiro F, Araújo M, Gonċalves P, Benevenuto F, Gonċalves M (2015) SentiBench-a benchmark comparison of state-of-the-practice sentiment analysis methods, arXiv:
  85. Shah F, Sabanin Y, Pfahl D (2016) Feature-based evaluation of competing apps. In: International workshop on app market analytics, pp 15–21Google Scholar
  86. Sorower M (2010) A literature survey on algorithms for multi-label learning, vol 18. Oregon State University, CorvallisGoogle Scholar
  87. Tsoumakas G, Dimou A, Spyromitros E, Mezaris V, Kompatsiaris I, Vlahavas I (2009) Correlation-based pruning of stacked binary relevance models for multi-label learning. In: International workshop on learning from multi-label data, pp 101–116Google Scholar
  88. Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: International conference on software engineering, pp 14–24Google Scholar
  89. Wasserman A (2010) Software engineering issues for mobile application development. In: The FSE/SDP workshop on future of software engineering research, pp 397–400Google Scholar
  90. Williams G, Mahmoud A (2017a) Analyzing, classifying, and interpreting emotions in software users’ tweets. In: International workshop on emotion awareness in software engineering, pp 2–7Google Scholar
  91. Williams G, Mahmoud A (2017b) Mining Twitter feeds for software user requirements. In: International requirements engineering conference, pp 1–10Google Scholar
  92. Williams G, Mahmoud A (2018) Modeling user concerns in the app store: A case study on the rise and fall of Yik Yak. In: International requirements engineering conference, pp 64–75Google Scholar
  93. Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Human language technology and empirical methods in natural language processing, pp 347–354Google Scholar
  94. Wohlin C, Runeson P, Höst M, Ohlsson M, Regnell B (2012) A wesslèn Experimentation in Software Engineering. Springer, BerlinzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Division of Computer Science and EngineeringLouisiana State UniversityBaton RougeUSA

Personalised recommendations