Abstract
Pull-based development is widely used in globally collaborative platforms, such as GitHub and BitBucket. A pull request is a set of changes to existing source code in a project. A developer submits a pull request and tends to update the source code. Due to the parallel mechanism, several developers may submit multiple pull requests to change the same lines of code. This fact results in the conflict between changes, which makes the project manager difficult to decide which pull request should be merged. In this paper, we conducted a preliminary study on measuring the similarity of pull requests that aim to change the same code in GitHub. We proposed two methods, i.e., the cosine and the doc2vec, to quantify the structural similarity and the semantic similarity between pull requests and evaluated the similarity on four widely-studied open source Java projects. Our study shows that there indeed exists high similarity between competing pull requests and the similarity among projects diversifies. This complicates the merging decision by project managers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Project spring-framework, http://github.com/spring-projects/spring-framework/.
- 2.
Project spring-boot, http://github.com/spring-projects/spring-boot/.
- 3.
Project incubator-dubbo, http://github.com/apache/incubator-dubbo/.
- 4.
Project elasticsearch, http://github.com/elastic/elasticsearch/.
References
Comparing and merging files (2016). http://www.gnu.org/software/diffutils/manual/
GitHub Repository Search (2018). https://github.com/search?q=+&type=
Apel, S., Leßenich, O., Lengauer, C.: Structured merge with auto-tuning: balancing precision and performance. In: IEEE/ACM International Conference on Automated Software Engineering, ASE 2012, Essen, Germany, 3–7 September 2012, pp. 120–129 (2012)
Apel, S., Liebig, J., Brandl, B., Lengauer, C., Kästner, C.: Semistructured merge: rethinking merge in revision control systems. In: 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE) and 13th European Software Engineering Conference (ESEC), Szeged, Hungary, 5–9 September 2011, pp. 190–200 (2011)
Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–401 (1990)
Gousios, G., Pinzger, M., van Deursen, A.: An exploratory study of the pull-based software development model. In: 36th International Conference on Software Engineering, ICSE 2014, Hyderabad, India, 31 May–07 June 2014, pp. 345–355 (2014)
Gu, Y., et al.: Does the fault reside in a stack trace? Assisting crash localization by predicting crashing fault residence. J. Syst. Softw. 148, 88–104 (2019)
Jiang, J., Lo, D., He, J., Xia, X., Kochhar, P.S., Zhang, L.: Why and how developers fork what from whom in GitHub. Empir. Softw. Eng. 22(1), 547–578 (2017)
Jiang, J., Lo, D., Ma, X., Feng, F., Zhang, L.: Understanding inactive yet available assignees in GitHub. Inf. Softw. Technol. 91, 44–55 (2017)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014, pp. 1188–1196 (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://arxiv.org/abs/1301.3781
Perry, D.E., Siy, H.P., Votta, L.G.: Parallel changes in large-scale software development: an observational case study. ACM Trans. Softw. Eng. Methodol. 10(3), 308–337 (2001)
Ross, S.M.: Introduction to Probability and Statistics for Engineers and Scientists, 2nd edn. Academic Press, London (2000)
Xuan, J., Cornu, B., Martinez, M., Baudry, B., Seinturier, L., Monperrus, M.: B-refactoring: automatic test code refactoring to improve dynamic analysis. Inf. Softw. Technol. 76, 65–80 (2016)
Xuan, J., Gu, Y., Ren, Z., Jia, X., Fan, Q.: Genetic configuration sampling: learning a sampling strategy for fault detection of configurable systems. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2018, Kyoto, Japan, 15–19 July 2018, pp. 1624–1631 (2018)
Xuan, J., et al.: Towards effective bug triage with software data reduction techniques. IEEE Trans. Knowl. Data Eng. 27(1), 264–280 (2015)
Yu, Y., Wang, H., Filkov, V., Devanbu, P.T., Vasilescu, B.: Wait for it: determinants of pull request evaluation latency on github. In: 12th IEEE/ACM Working Conference on Mining Software Repositories, MSR 2015, Florence, Italy, 16–17 May 2015, pp. 367–371 (2015)
Yu, Y., Wang, H., Yin, G., Wang, T.: Reviewer recommendation for pull-requests in GitHub: what can we learn from code review and bug assignment? Inf. Soft. Technol. 74, 204–218 (2016)
Zhang, X., et al.: How do multiple pull requests change the same code: a study of competing pull requests in GitHub. In: 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, 23–29 September 2018, pp. 228–239 (2018)
Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 34th International Conference on Software Engineering, ICSE 2012, Zurich, Switzerland, 2–9 June 2012, pp. 14–24 (2012)
Zhu, J., Zhou, M., Mockus, A.: Patterns of folder use and project popularity: a case study of GitHub repositories. In: 2014 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2014, Torino, Italy, 18–19 September 2014, pp. 30:1–30:4 (2014)
Zhu, J., Zhou, M., Mockus, A.: Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, 13–18 November 2016, pp. 871–882 (2016)
Acknowledgments
The work is supported by the National Key R&D Program of China under Grant No. 2018YFB1003901, the National Natural Science Foundation of China under Grant Nos. 61502345 and 61872273, the Young Elite Scientists Sponsorship Program by CAST under Grant No. 2015QNRC001, and the Technological Innovation Projects of Hubei Province under Grant No. 2017AAA125.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ma, P., Xu, D., Zhang, X., Xuan, J. (2019). Changes Are Similar: Measuring Similarity of Pull Requests That Change the Same Code in GitHub. In: Li, Z., Jiang, H., Li, G., Zhou, M., Li, M. (eds) Software Engineering and Methodology for Emerging Domains. NASAC NASAC 2017 2018. Communications in Computer and Information Science, vol 861. Springer, Singapore. https://doi.org/10.1007/978-981-15-0310-8_8
Download citation
DOI: https://doi.org/10.1007/978-981-15-0310-8_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0309-2
Online ISBN: 978-981-15-0310-8
eBook Packages: Computer ScienceComputer Science (R0)