Skip to main content

Advertisement

Log in

A Successful Strategy for Linking Anonymous Data from Students’ and Parents’ Questionnaires Using Self-Generated Identification Codes

  • Published:
Prevention Science Aims and scope Submit manuscript

Abstract

We conducted a feasibility study for matching children (N = 2571, average age 12 years, 50.4% female) and their parents (N = 1931, average age 41 years, 83.3% female) represented by an anonymous self-generated identification code (SGIC) and assessed its methodological properties. We used a nine-character SGIC with the children and a mirrored version of the same code with the parents. The average overall error rate in generating the SGIC was 9.7% (4.0% in the parents and 13.9% in the children). We were able to link a total of 1765 parents’ and children’s codes uniquely (94.9% of all possible dyads) with any four-character combination and the employment of the “school” variable. The overall matching quality of linking using the SGIC only is characterized by precision (positive predictive value) of 0.979, recall (sensitivity, true positive rate) of 0.934, and an F-measure (harmonic mean of precision and recall) of 0.956. The analysis of the discrepant characters in the dyads identified the paternal grandmother’s name and eye color as those varying most often. This study is the first to look at SGIC match rates and error and omission rates in linking different subjects into dyads in prevention research. We identified a high number of unique child-parent matches while guaranteeing anonymity to the participants. We provided evidence that our SGIC is a suitable tool for between-group linking procedures and has a highly successful matching rate, while maintaining anonymity in the school-based prevention study samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bjarnason, T., & Adalbjarnardottir, S. (2000). Anonymity and confidentiality in school surveys on alcohol, tobacco, and cannabis use. Journal of Drug Issues, 30, 335–343.

    Article  Google Scholar 

  • Christen, P. (2012). Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection. Berlin: Springer Berlin Heidelberg.

    Book  Google Scholar 

  • Christen, P., & Goiser, K. (2007). Quality and complexity measures for data linkage and deduplication. In F. J. Guillet & H. J. Hamilton (Eds.), Quality measures in data mining (pp. 127–151). Berlin: Springer Berlin Heidelberg.

    Chapter  Google Scholar 

  • Damrosch, S. P. (1986). Ensuring anonymity by use of subject-generated identification codes. Research in Nursing & Health, 9, 61–63. doi:10.1002/nur.4770090110.

    Article  CAS  Google Scholar 

  • DiIorio, C., Soet, J. E., Van Marter, D., Woodring, T. M., & Dudley, W. N. (2000). Focus on research methods – An evaluation of a self-generated identification code. Research in Nursing & Health, 23, 167–174. doi:10.1002/(Sici)1098-240x(200004)23:2<167::Aid-Nur9>3.0.Co;2-K.

    Article  CAS  Google Scholar 

  • Fernandez-Hermida, J. R., Calafat, A., Becona, E., Secades-Villa, R., Juan, M., & Sumnall, H. (2013). Cross-national study on factors that influence parents’ knowledge about their children’s alcohol use. Journal of Drug Education, 43, 155–172. doi:10.2190/De.43.2.D.

    Article  PubMed  Google Scholar 

  • Gabrhelík, R., Orosová, O., Miovský, M., Voňková, H., Berinšterová, M., & Minařík, J. (2014). Studying the effectiveness of school-based universal prevention interventions in the Czech Republic and Slovakia. Adiktologie, 14, 403–408.

  • Galanti, M. R., Siliquini, R., Cuomo, L., Melero, J. C., Panella, M., & Faggiano, F. (2007). Testing anonymous link procedures for follow-up of adolescents in a school-based trial: The EU-DAP pilot study. Preventive Medicine, 44, 174–177. doi:10.1016/j.ypmed.2006.07.019.

    Article  PubMed  Google Scholar 

  • Gfroerer, J., &Kennet, J. (2014). Collecting survey data on sensitive topics: Substance use. Health Survey Methods, 447–472.

  • Grube, J. W., Morgan, M., & Kearney, K. A. (1989). Using self-generated identification codes to match questionnaires in panel studies of adolescent substance use. Addictive Behaviors, 14, 159–171. doi:10.1016/0306-4603(89)90044-0.

    Article  CAS  PubMed  Google Scholar 

  • Jurczyk, P., Lu, J. J., Xiong, L., Cragan, J. D., & Correa, A. (2008). Fine-grained record integration and linkage tool. Birth Defects Research. Part A, Clinical and Molecular Teratology, 82, 822–829. doi:10.1002/bdra.20521.

    Article  CAS  PubMed  Google Scholar 

  • Kandel, D. (1973). Adolescent marihuana use – Role of parents and peers. Science, 181, 1067–1070. doi:10.1126/science.181.4104.1067.

    Article  CAS  PubMed  Google Scholar 

  • Kristjansson, A. L., Sigfusdottir, I. D., Sigfusson, J., & Allegrante, J. P. (2014). Self-generated identification codes in longitudinal prevention research with adolescents: A pilot study of matched and unmatched subjects. Prevention Science, 15, 205–212. doi:10.1007/s11121-013-0372-z.

    Article  PubMed  Google Scholar 

  • Lee, B. C., Westaby, J. D., & Berg, R. L. (2004). Impact of a national rural youth health and safety initiative: Results from a randomized controlled trial. American Journal of Public Health, 94, 1743–1749. doi:10.2105/Ajph.94.10.1743.

    Article  PubMed  PubMed Central  Google Scholar 

  • McAlister, A., & Gordon, N. P. (1986). Attrition bias in a cohort study of substance-abuse onset and prevention. Evaluation Review, 10, 853–859. doi:10.1177/0193841x8601000609.

    Article  Google Scholar 

  • Rodriguez-Garcia, J. M., & Wagner, U. (2009). Learning to be prejudiced: A test of unidirectional and bidirectional models of parent-offspring socialization. International Journal of Intercultural Relations, 33, 516–523. doi:10.1016/j.ijintrel.2009.08.001.

    Article  Google Scholar 

  • Schnell, R., Bachteler, T., & Reiher, J. (2010). Improving the use of self-generated identification codes. Evaluation Review, 34, 391–418. doi:10.1177/0193841X10387576.

    Article  PubMed  Google Scholar 

  • Schumacher,S. (2007). Probabilistic versus deterministic data matching: Making an accurate decision. Information Management Special Reports. Retrieved from http://www.information-management.com/specialreports/20070118/1071712-1.html.

  • Wilson, A. L. G., Hoge, C. W., McGurk, D., Thomas, J. L., Clark, J. C., & Castro, C. A. (2010). Application of a new method for linking anonymous survey data in a population of soldiers returning from Iraq. Annals of Epidemiology, 20, 931–938. doi:10.1016/j.annepidem.2010.08.008.

    Article  Google Scholar 

  • Yurek, L. A., Vasey, J., & Havens, D. S. (2008). The use of self-generated identification codes in longitudinal research. Evaluation Review, 32, 435–452. doi:10.1177/0193841X08316676.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The study was supported by the Czech Science Foundation, Grant No. 16-15771S, and the Charles University (PROGRES Q 06).

Author’s Contribution

RG and HV designed the study and wrote the protocol. RG performed the initial drafting of the manuscript. JV managed the literature searches and summaries of related work, conducted the statistical analyses, and participated in the data interpretation and manuscript preparation. RG, JV, and HV finalized the manuscript. All the authors contributed to and have approved the final manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman Gabrhelík.

Ethics declarations

The study was approved by the IRB of the General University Hospital in Prague. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed Consent

Informed consent was obtained from all the individual participants included in the study or their legal representatives in the case of children.

Conflict of Interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

ESM 1

(PDF 30 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vacek, J., Vonkova, H. & Gabrhelík, R. A Successful Strategy for Linking Anonymous Data from Students’ and Parents’ Questionnaires Using Self-Generated Identification Codes. Prev Sci 18, 450–458 (2017). https://doi.org/10.1007/s11121-017-0772-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11121-017-0772-6

Keywords

Navigation