Abstract
Coding is a process of assigning meaning to a given piece of evidence. Evidence may be found in a variety of data types, including documents, research interviews, posts from social media, conversations from learning platforms, or any source of data that may provide insights for the questions under qualitative study. In this study, we focus on text data and consider coding as a process of identifying words or phrases and categorizing them into codes to facilitate data analysis. There are a number of different approaches to generating qualitative codes, such as grounded coding, a priori coding, or using both in an iterative process. However, both qualitative and quantitative analysts face the same coding problem: when the data size is large, manually coding becomes impractical. nCoder is a tool that helps researchers to discover and code key concepts in text data with minimum human judgements. Once reliability and validity are established, nCoder automatically applies the coding scheme to the dataset. However, for concepts that occur infrequently, even with an acceptable reliability, the classifier may still result in too many false negatives. This paper explores these problems within the current nCoder and proposes adding a semantic component to the nCoder. A tool called “nCoder+” is presented with real data to demonstrate the usefulness of the semantic component. The possible ways of integrating this component and other natural language processing techniques into nCoder are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Not all researchers perform IRR tests. For example, researchers may use social moderation, where two or more raters code all of the data and resolve differences until they all agree on the code (Herrenkohl and Cornelius) [14].
References
Shaffer, D.W.: Quantitative Ethnography. Cathcart Press, Madison (2017)
Chi, M.T.H.: Quantifying qualitative analyses of verbal data: a practical guide. J. Learn. Sci. 6, 271–315 (1997)
Saldaña, J.: The Coding Manual for Qualitative Researchers (2014). https://doi.org/10.1007/s13398-014-0173-7.2
Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Transaction, New Brunswick (1967)
Charmaz, K.: Constructing Grounded Theory. SAGE, London (2006)
Eagan, B.R., Rogers, B., Serlin, R., Ruis, A.R., Irgens, G.A., Shaffer, D.W.: Can we rely on IRR? testing the assumptions of inter-rater reliability. In: CSCL 2017 Proceedings, pp. 529–532 (2017)
Blei, D.M., Edu, B.B., Ng, A.Y., Edu, A.S., Jordan, M.I., Edu, J.B.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1162/jmlr.2003.3.4-5.993
Hu, Y., Boyd-Graber, J., Satinoff, B.: Interactive topic modeling. In: Proceedings of the 49th Annual Meeting Association for Computational Linguistics Human Language Technologies, pp. 248–257 (2011)
Marquart, C.L., Swiecki, Z., Eagan, B., Shaffer, D.W.: ncodeR (Version 0.1.2) (2018)
Eagan, B.R., Rogers, B., Pozen, R., Marquart, C., Shaffer, D.W.: rhoR: Rho for inter rater reliability (Version 1.1.0) (2016). https://cran.r-project.org/web/packages/rhoR/index.html
Gašević, D., Joksimović, S., Eagan, B., Shaffer, D.W.: SENS: network analytics to combine social and cognitive perspectives of collaborative learning. Comput. Hum. Behav. 92, 562–577 (2019)
Cai, Z., Pennebaker, J.W., Eagan, B., Shaffer, D.W., Dowell, N.M., Graesser, A.C.: Epistemic network analysis and topic modeling for chat data from collaborative learning environment. In: Proceedings of the 10th International Conference on Educational Data Mining, pp. 104–111 (2017)
Sullivan, S., et al.: Using epistemic network analysis to identify targets for educational interventions in trauma team communication. Surg. (United States) 163, 938–943 (2018). https://doi.org/10.1016/j.surg.2017.11.009
Shaffer, D.W., Ruis, A.R.: Epistemic network analysis: a worked example of theory-based learning analytics. In: Handbook of Learning Analytics Data Mining, in press (2017)
Cohen, J., Cohen, J.: A coefficient of agreement for nomial scales. Educ. Psychol. Meas. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104a coefficient of agreement for nomial scales. Educ. Psychol. Meas. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104
Landauer, T., McNamara, D., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis (2007)
Acknowledgements
The research was supported by the National Science Foundation (SBR 9720314, REC 0106965, REC 0126265, ITR 0325428, REESE 0633918, ALT-0834847, DRK-12-0918409, 1108845; DRL-1661036, 1713110; ACI-1443068), the Institute of Education Sciences (R305H050169, R305B070349, R305A080589, R305A080594, R305G020018, R305C120001), the Army Research Lab (W911INF-12-2-0030), and the Office of Naval Research (N00014-00-1-0600, N00014-12-C-0643; N00014-16-C-3027), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W., Hu, X., Graesser, A.C. (2019). nCoder+: A Semantic Tool for Improving Recall of nCoder Coding. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds) Advances in Quantitative Ethnography. ICQE 2019. Communications in Computer and Information Science, vol 1112. Springer, Cham. https://doi.org/10.1007/978-3-030-33232-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-33232-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33231-0
Online ISBN: 978-3-030-33232-7
eBook Packages: Computer ScienceComputer Science (R0)