Abstract
We conducted a feasibility study for matching children (N = 2571, average age 12 years, 50.4% female) and their parents (N = 1931, average age 41 years, 83.3% female) represented by an anonymous self-generated identification code (SGIC) and assessed its methodological properties. We used a nine-character SGIC with the children and a mirrored version of the same code with the parents. The average overall error rate in generating the SGIC was 9.7% (4.0% in the parents and 13.9% in the children). We were able to link a total of 1765 parents’ and children’s codes uniquely (94.9% of all possible dyads) with any four-character combination and the employment of the “school” variable. The overall matching quality of linking using the SGIC only is characterized by precision (positive predictive value) of 0.979, recall (sensitivity, true positive rate) of 0.934, and an F-measure (harmonic mean of precision and recall) of 0.956. The analysis of the discrepant characters in the dyads identified the paternal grandmother’s name and eye color as those varying most often. This study is the first to look at SGIC match rates and error and omission rates in linking different subjects into dyads in prevention research. We identified a high number of unique child-parent matches while guaranteeing anonymity to the participants. We provided evidence that our SGIC is a suitable tool for between-group linking procedures and has a highly successful matching rate, while maintaining anonymity in the school-based prevention study samples.
Similar content being viewed by others
References
Bjarnason, T., & Adalbjarnardottir, S. (2000). Anonymity and confidentiality in school surveys on alcohol, tobacco, and cannabis use. Journal of Drug Issues, 30, 335–343.
Christen, P. (2012). Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection. Berlin: Springer Berlin Heidelberg.
Christen, P., & Goiser, K. (2007). Quality and complexity measures for data linkage and deduplication. In F. J. Guillet & H. J. Hamilton (Eds.), Quality measures in data mining (pp. 127–151). Berlin: Springer Berlin Heidelberg.
Damrosch, S. P. (1986). Ensuring anonymity by use of subject-generated identification codes. Research in Nursing & Health, 9, 61–63. doi:10.1002/nur.4770090110.
DiIorio, C., Soet, J. E., Van Marter, D., Woodring, T. M., & Dudley, W. N. (2000). Focus on research methods – An evaluation of a self-generated identification code. Research in Nursing & Health, 23, 167–174. doi:10.1002/(Sici)1098-240x(200004)23:2<167::Aid-Nur9>3.0.Co;2-K.
Fernandez-Hermida, J. R., Calafat, A., Becona, E., Secades-Villa, R., Juan, M., & Sumnall, H. (2013). Cross-national study on factors that influence parents’ knowledge about their children’s alcohol use. Journal of Drug Education, 43, 155–172. doi:10.2190/De.43.2.D.
Gabrhelík, R., Orosová, O., Miovský, M., Voňková, H., Berinšterová, M., & Minařík, J. (2014). Studying the effectiveness of school-based universal prevention interventions in the Czech Republic and Slovakia. Adiktologie, 14, 403–408.
Galanti, M. R., Siliquini, R., Cuomo, L., Melero, J. C., Panella, M., & Faggiano, F. (2007). Testing anonymous link procedures for follow-up of adolescents in a school-based trial: The EU-DAP pilot study. Preventive Medicine, 44, 174–177. doi:10.1016/j.ypmed.2006.07.019.
Gfroerer, J., &Kennet, J. (2014). Collecting survey data on sensitive topics: Substance use. Health Survey Methods, 447–472.
Grube, J. W., Morgan, M., & Kearney, K. A. (1989). Using self-generated identification codes to match questionnaires in panel studies of adolescent substance use. Addictive Behaviors, 14, 159–171. doi:10.1016/0306-4603(89)90044-0.
Jurczyk, P., Lu, J. J., Xiong, L., Cragan, J. D., & Correa, A. (2008). Fine-grained record integration and linkage tool. Birth Defects Research. Part A, Clinical and Molecular Teratology, 82, 822–829. doi:10.1002/bdra.20521.
Kandel, D. (1973). Adolescent marihuana use – Role of parents and peers. Science, 181, 1067–1070. doi:10.1126/science.181.4104.1067.
Kristjansson, A. L., Sigfusdottir, I. D., Sigfusson, J., & Allegrante, J. P. (2014). Self-generated identification codes in longitudinal prevention research with adolescents: A pilot study of matched and unmatched subjects. Prevention Science, 15, 205–212. doi:10.1007/s11121-013-0372-z.
Lee, B. C., Westaby, J. D., & Berg, R. L. (2004). Impact of a national rural youth health and safety initiative: Results from a randomized controlled trial. American Journal of Public Health, 94, 1743–1749. doi:10.2105/Ajph.94.10.1743.
McAlister, A., & Gordon, N. P. (1986). Attrition bias in a cohort study of substance-abuse onset and prevention. Evaluation Review, 10, 853–859. doi:10.1177/0193841x8601000609.
Rodriguez-Garcia, J. M., & Wagner, U. (2009). Learning to be prejudiced: A test of unidirectional and bidirectional models of parent-offspring socialization. International Journal of Intercultural Relations, 33, 516–523. doi:10.1016/j.ijintrel.2009.08.001.
Schnell, R., Bachteler, T., & Reiher, J. (2010). Improving the use of self-generated identification codes. Evaluation Review, 34, 391–418. doi:10.1177/0193841X10387576.
Schumacher,S. (2007). Probabilistic versus deterministic data matching: Making an accurate decision. Information Management Special Reports. Retrieved from http://www.information-management.com/specialreports/20070118/1071712-1.html.
Wilson, A. L. G., Hoge, C. W., McGurk, D., Thomas, J. L., Clark, J. C., & Castro, C. A. (2010). Application of a new method for linking anonymous survey data in a population of soldiers returning from Iraq. Annals of Epidemiology, 20, 931–938. doi:10.1016/j.annepidem.2010.08.008.
Yurek, L. A., Vasey, J., & Havens, D. S. (2008). The use of self-generated identification codes in longitudinal research. Evaluation Review, 32, 435–452. doi:10.1177/0193841X08316676.
Acknowledgements
The study was supported by the Czech Science Foundation, Grant No. 16-15771S, and the Charles University (PROGRES Q 06).
Author’s Contribution
RG and HV designed the study and wrote the protocol. RG performed the initial drafting of the manuscript. JV managed the literature searches and summaries of related work, conducted the statistical analyses, and participated in the data interpretation and manuscript preparation. RG, JV, and HV finalized the manuscript. All the authors contributed to and have approved the final manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The study was approved by the IRB of the General University Hospital in Prague. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Informed Consent
Informed consent was obtained from all the individual participants included in the study or their legal representatives in the case of children.
Conflict of Interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
ESM 1
(PDF 30 kb)
Rights and permissions
About this article
Cite this article
Vacek, J., Vonkova, H. & Gabrhelík, R. A Successful Strategy for Linking Anonymous Data from Students’ and Parents’ Questionnaires Using Self-Generated Identification Codes. Prev Sci 18, 450–458 (2017). https://doi.org/10.1007/s11121-017-0772-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11121-017-0772-6