Abstract
[Context and motivation] Automatic extraction and analysis of app features from user reviews is helpful for software developers to better understand users perceptions of delivered app features. Recently, a rule-based approach called safe was proposed to automatically extract app features from user reviews. safe was reported to obtain superior performance in terms of precision and recall over previously proposed techniques. However, the procedure used to evaluate safe was in part subjective and not repeatable and thus the whole evaluation might not be reliable. [Question/problem] The goal of our study is to perform an external replication of the safe evaluation using an objective and repeatable approach. [Principal ideas/results] To this end, we first implemented safe and checked the correctness of our implementation on the set of app descriptions that were used and published by the authors of the original study. We applied our safe implementation to eight review datasets (six app review datasets, one laptop review dataset, one restaurant review dataset) and evaluated its performance against manually annotated feature terms. Our results suggest that the precision of the safe approach is strongly influenced by the density of the annotated app features in a review dataset. Overall, we obtained an average precision and recall of 0.120 and 0.539, respectively which is lower than the performance reported in the original safe study. [Contribution] We performed an unbiased and reproducible evaluation of the safe approach for user reviews. We make our implementation and all datasets used for the evaluation available for replication by others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The dataset was obtained from the authors of study [3].
- 2.
- 3.
- 4.
- 5.
- 6.
Review titles with their annotated app features were removed for our study.
- 7.
Both coders were software engineering bachelors students at the University of Tartu.
- 8.
References
Groen, E.C., et al.: The crowd in requirements engineering: the landscape and challenges. IEEE Softw. 34(2), 44–52 (2017). https://doi.org/10.1109/MS.2017.33
Gu, X., Kim, S.: What parts of your apps are loved by users? In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 760–770, November 2015. https://doi.org/10.1109/ASE.2015.57
Guzman, E., Maalej, W.: How do users like this feature? A fine grained sentiment analysis of app reviews. In: 2014 IEEE 22nd International Requirements Engineering Conference (RE), pp. 153–162. IEEE (2014)
Harman, M., Jia, Y., Zhang, Y.: App store mining and analysis: MSR for app stores. In: Proceedings of the 9th IEEE Working Conference on Mining Software Repositories, MSR 2012, pp. 108–111. IEEE Press, Piscataway (2012). http://dl.acm.org/citation.cfm?id=2664446.2664461
Johann, T., Stanik, C., Maalej, W.: SAFE: a simple approach for feature extraction from app descriptions and app reviews. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), pp. 21–30. IEEE, September 2017. https://doi.org/10.1109/RE.2017.71
Juristo, N., Gómez, O.S.: Replication of software engineering experiments. In: Meyer, B., Nordio, M. (eds.) LASER 2008-2010. LNCS, vol. 7007, pp. 60–88. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25231-0_2
Keertipati, S., Savarimuthu, B.T.R., Licorish, S.A.: Approaches for prioritizing feature improvements extracted from app reviews. In: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, p. 33. ACM (2016)
Liu, P., Joty, S., Meng, H.: Fine-grained opinion mining with recurrent neural networks and word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1433–1443 (2015)
Maalej, W., Nabil, H.: Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Proceedings of RE 2015, pp. 116–125. IEEE, August 2015
Malik, H., Shakshuki, E.M., Yoo, W.S.: Comparing mobile apps by identifying ‘Hot’ features. Futur. Gener. Comput. Syst. (2018)
Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowl.-Based Syst. 108, 42–49 (2016)
Sänger, M., et al.: Scare–the sentiment corpus of app reviews with fine-grained annotations in German. In: LREC (2016)
Shah, F.A., Sabanin, Y., Pfahl, D.: Feature-based evaluation of competing apps. In: Proceedings of the International Workshop on App Market Analytics - WAMA 2016. pp. 15–21. ACM Press, New York (2016). https://doi.org/10.1145/2993259.2993267
Shah, F.A., Sirts, K., Pfahl, D.: The impact of annotation guidelines and annotated data on extracting app features from app reviews. arXiv preprint arXiv:1810.05187 (2018)
Vu, P.M., Nguyen, T.T., Pham, H.V., Nguyen, T.T.: Mining user opinions in mobile app reviews: a keyword-based approach. In: Proceedings of ASE 2015, pp. 749–759. IEEE (2015)
Acknowledgment
We are grateful to Emitza Guzman and Christoph Stanik for sharing the datasets. This research was supported by the institutional research grant IUT20-55 of the Estonian Research Council and the Estonian Center of Excellence in ICT research (EXCITE).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Shah, F.A., Sirts, K., Pfahl, D. (2019). Is the SAFE Approach Too Simple for App Feature Extraction? A Replication Study. In: Knauss, E., Goedicke, M. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2019. Lecture Notes in Computer Science(), vol 11412. Springer, Cham. https://doi.org/10.1007/978-3-030-15538-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-15538-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15537-7
Online ISBN: 978-3-030-15538-4
eBook Packages: Computer ScienceComputer Science (R0)