Abstract
The volume of data archived in open source software project repositories makes automated, quantitative techniques attractive for extracting and analyzing information from these archives. However, many kinds of archival data include blocks of natural language text that are difficult to analyze automatically.
This paper introduces a qualitative analysis method that is transparent and repeatable, leads to objective findings when dealing with qualitative data, and is efficient enough to be applied to large archives.
The method was applied in a case study of developer and user forum discussions of an open source electronic medical record project. The study demonstrates that the qualitative repository mining method can be employed to derive useful results quickly yet accurately. These results would not be possible using a strictly automated approach.
Chapter PDF
Similar content being viewed by others
References
Bakeman, R.: Behavioral observation and coding. In: Reis, H.T., Judge, C.M. (eds.) Handbook of Research Methods in Social and Personality Psychology, pp. 138–159. Cambridge University Press (2000)
Burnard, P.: A method of analysing interview transcripts in qualitative research. Nurse Education Today 11, 461–466 (1991)
Dewey, M.E.: Coefficients of agreement. British Journal of Psychiatry 143, 487–489 (1983)
El Emam, K., Wieczorek, I.: The repeatability of code defect classifications. In: Proceedings, Ninth International Symposium on Software Reliability Engineering (November 1998)
El Emam, K., Goldenson, D., Briand, L., Marshall, P.: Interrater agreement in SPICE-based assessments: some preliminary results. In: Proceedings, Fourth International Conference on the Software Process (December 1996)
El Emam, K., Simon, J.-M., Rousseau, S., Jacquet, E.: Cost implications of interrater agreement for software process assessments. In: Proceedings, Fifth International Software Metrics Symposium (November 1998)
Fusaro, P., El Emam, K., Smith, B.: Evaluating the interrater agreement of process capability ratings. In: Proceedings, Fourth International Software Metrics Symposium (November 1997)
Hall, T., Bowes, D., Liebchen, G., Wernick, P.: Evaluating Three Approaches to Extracting Fault Data from Software Change Repositories. In: Ali Babar, M., Vierimaa, M., Oivo, M. (eds.) PROFES 2010. LNCS, vol. 6156, pp. 107–115. Springer, Heidelberg (2010)
Henningsson, K., Wohlin, C.: Assuring fault classification agreement - an empirical evaluation. In: International Symposium on Empirical Software Engineering (ISESE 2004) (August 2004)
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 2nd edn. Sage Publications (2004)
Landis, J.R., Koch, G.G.: An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33(2), 363–374 (1977)
Lee, H.-Y., Jung, H.-W., Chung, C.-S., Lee, J.M., Lee, K.W., Jeong, H.J.: Analysis of interrater agreement in ISO/IEC 15504-based software process assessment. In: Proceedings Second Asia-Pacific Conference on Quality Software (2001)
Mantyla, M.V.: An experiment on subjective evolvability evaluation of object-oriented software: explaining factors and interrater agreement. In: International Symposium on Empirical Software Engineering (November 2005)
Noll, J., Beecham, S., Seichter, D.: A qualitative study of open source software development: the OpenEMR project. In: 5th International Symposium on Empirical Software Engineering and Measurement (ESEM 2011), Banff, Alberta, Canada (Septemebr 2011)
Park, H.-M., Jung, H.-W.: Evaluating interrater agreement with intraclass correlation coefficient in SPICE-based software process assessment. In: Proceedings, Third International Conference on Quality Software (November 2003)
Vieira, S., Kaymak, U., Sousa, J.: Cohen’s kappa coefficient as a performance measure for feature selection. In: 2010 IEEE International Conference on Fuzzy Systems (FUZZ) (July 2010)
Vilbergsdóttir, S.G., Hvannberg, E.T., Law, L.-C.: Classification of usability problems (CUP) scheme. In: Proceedings of the 4th Nordic Conference on Human-Computer Interaction, NordiCHI 2006 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Noll, J., Seichter, D., Beecham, S. (2012). A Qualitative Method for Mining Open Source Software Repositories. In: Hammouda, I., Lundell, B., Mikkonen, T., Scacchi, W. (eds) Open Source Systems: Long-Term Sustainability. OSS 2012. IFIP Advances in Information and Communication Technology, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33442-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-33442-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33441-2
Online ISBN: 978-3-642-33442-9
eBook Packages: Computer ScienceComputer Science (R0)