Anonymous data collection systems are often necessary when assessing sensitive behaviors but can pose challenges to researchers seeking to link participants over time. To assist researchers in anonymously linking participants, we outlined and tested a novel security question linking (security question linking; SEEK) method. The SEEK method includes four steps: (1) data management and standardization, (2) many-to-many matching, (3) fuzzy matching, and (4) rematching and verification. The method is demonstrated in SAS with two samples from a longitudinal study of adolescent dating violence. After an initial assessment during a laboratory visit, participants were asked to complete an online assessment either (a) once, 3 months later (Sample 1, n = 60), or (b) three times at 1-month intervals (Sample 2, n = 140). Demographics, eye color, and responses to nine security questions were used as key variables to link responses from the laboratory and online follow-up assessments. The rates of matched cases were 100% in Sample 1 and from 94.3 to 98.3% in Sample 2. To quantify the confidence in the data quality of successfully matched pairs, we reported the means and standard deviations of the number of matched security questions. In addition, we reported the rank order and counts of the mismatched components in key variables. Results indicate that the SEEK method provides a feasible and reliable solution to link responses in longitudinal studies with sensitive questions.
This is a preview of subscription content, log in to check access.
Buy single article
Instant unlimited access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Barnea, Z., Rahav, G., & Teichman, M. (1987). The reliability and consistency of self-reports on substance use in a longitudinal study. British Journal of Addiction, 82, 891–898. https://doi.org/10.1111/j.1360-0443.1987.tb03909.x.
Bold, K. W., Kong, G., Cavallo, D. A., Camenga, D. R., & Krishnan-Sarin, S. (2016). Reasons for trying e-cigarettes and risk of continued use. Pediatrics, 138, 1–8. https://doi.org/10.1542/peds.2016-0895.
Brown, A. P., Ferrante, A. M., Randall, S. M., Boyd, J. H., & Semmens, J. B. (2017). Ensuring privacy when integrating patient-based datasets: New methods and developments in record linkage. Frontiers in Public Health, 5, 1–6. https://doi.org/10.3389/fpubh.2017.00034.
Cadieux, R. & Bretheim, D. R. (2014, March). Matching rules: Too loose, too tight, or just right? Proceedings of the 2014 SAS global forum (SGF) conference, Washington D.C. Retrieved from http://support.sas.com/resources/papers/proceedings14/1674-2014.pdf
Carifio, J., & Biron, R. (1978). Collective sensitive data anonymously: The CDRPG technique. Journal of Alcohol and Drug Education, 23, 47–66.
Daigneault, I., Hébert, M., McDuff, P., Michaud, F., Vézina-Gagnon, P., Henry, A., & Porter-Vignola, É. (2015). Effectiveness of a sexual assault awareness and prevention workshop for youth: A 3-month follow-up pragmatic cluster randomization study. The Canadian Journal of Human Sexuality, 24, 19–30. https://doi.org/10.3138/cjhs.2626.
Galanti, M. R., Siliquini, R., Cuomo, L., Melero, J. C., Panella, M., & Faggiano, F. (2007). Testing anonymous link procedures for follow-up of adolescents in a school-based trial: The EU-DAP pilot study. Preventive Medicine, 44, 174–177. https://doi.org/10.1016/j.ypmed.2006.07.019.
Gilbert, R., Lafferty, R., Hagger-Johnson, G., Harron, K., Zhang, L. C., Smith, P., et al. (2017). GUILD: Guidance for information about linking data sets. Journal of Public Health, 40, 191–198. https://doi.org/10.1093/pubmed/fdx037.
Grube, J. W., Morgan, M., & Kearney, K. A. (1989). Using self-generated identification codes to match questionnaires in panel studies of adolescent substance use. Addictive Behaviors, 14, 159–171. https://doi.org/10.1016/0306-4603(89)90044-0.
Haron, K. (2016). Introduction to data linkage. Retrieved March 16, 2019, from https://mail.google.com/mail/u/0/#search/michael+linking/QgrcJHrtqfZVLGTbpMWpsDZZbdHJkVFqSLg?projector=1&messagePartId=0.2
Heerwegh, D., & Loosveldt, G. (2008). Face-to-face versus web surveying in a high-internet-coverage population: Differences in response quality. Public Opinion Quarterly, 72, 836–846. https://doi.org/10.1093/poq/nfn045.
Holden, J. D. (2001). Hawthorne effects and research into professional practice. Journal of Evaluation in Clinical Practice, 7, 65–70. https://doi.org/10.1046/j.1365-2753.2001.00280.x.
Kearney, K. A., Hopkins, R. H., Mauss, A. L., & Weisheit, R. A. (1984). Self-generated identification codes for anonymous collection of longitudinal questionnaire data. Public Opinion Quarterly, 48, 370–378. https://doi.org/10.1093/poq/48.1b.370.
Kristjansson, A. L., Sigfusdottir, I. D., Sigfusson, J., & Allegrante, J. P. (2014). Self-generated identification codes in longitudinal prevention research with adolescents: A pilot study of matched and unmatched subjects. Prevention Science, 15, 205–212. https://doi.org/10.1007/s11121-013-0372-z.
McGloin, J., Holcomb, S., & Main, D. S. (1996). Matching anonymous pre-posttests using subject-generated information. Evaluation Review, 20, 724–736. https://doi.org/10.1177/0193841X9602000604.
Ong, A. D., & Weiss, D. J. (2000). The impact of anonymity on responses to sensitive questions. Journal of Applied Social Psychology, 30, 1691–1708. https://doi.org/10.1111/j.1559-1816.2000.tb02462.x.
Pérez, A., Ariza, C., Sánchez-Martínez, F., & Nebot, M. (2010). Cannabis consumption initiation among adolescents: A longitudinal study. Addictive Behaviors, 35, 129–134. https://doi.org/10.1016/j.addbeh.2009.09.018.
Pfeiffer, M., Slopen, M., Curry, A., & McVeigh, K. (2010). Creation of a linked inter-agency data warehouse: The longitudinal study of early development. A research report from the New York city department of health and mental hygiene. Retrieved from https://www1.nyc.gov/assets/doh/downloads/pdf/episrv/lsed-white-paper.pdf
Rabkin, A. (2008, July). Personal knowledge questions for fallback authentication: Security questions in the era of Facebook. In In proceedings of the 4th symposium on usable privacy and security, Pittsburgh, Pennsylvania (13–23). New York, New York: ACM.
Rubin, D., Schrauf, R., & Greenberg, D. (2004). Stability in autobiographical memories. Memory, 12, 715–721. https://doi.org/10.1080/09658210344000512.
SAS Institute Inc. (2018). COMPGED Function. Retrieved February 8 from http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm
Schnell, R., Bachteler, T., & Reiher, J. (2010). Improving the use of self-generated identification codes. Evaluation Review, 34, 391–418. https://doi.org/10.1177/0193841X10387576.
Staum, P. (2007, ). Fuzzy matching using the COMPGED function. In Proceedings of the 2007 NorthEast SAS users group (NESUG) conference, Baltimore, Maryland. Retrieved from https://www.lexjansen.com/nesug/nesug07/ap/ap23.pdf
Tamariz, L., Medina, H., Suarez, M., Seo, D., & Palacio, A. (2018). Linking census data with electronic medical records for clinical research: A systematic review. Journal of Economic and Social Measurement, 43, 105–118. https://doi.org/10.3233/JEM-180454.
Theis, M. K., Reid, R. J., Chaudhari, M., Newton, K. M., Spangler, L., Grossman, D. C., & Inge, R. E. (2010). Case study of linking dental and medical health records. The American Journal of Managed Care, 16, e51–e56.
Tromp, M., Ravelli, A. C., Bonsel, G. J., Hasman, A., & Reitsma, J. B. (2011). Results from simulated data sets: Probabilistic record linkage outperforms deterministic record linkage. Journal of Clinical Epidemiology, 64, 565–572. https://doi.org/10.1016/j.jclinepi.2010.05.008.
Yurek, L. A., Vasey, J., & Sullivan Havens, D. (2008). The use of self-generated identification codes in longitudinal research. Evaluation Review, 32, 435–452. https://doi.org/10.1177/0193841X08316676.
Zhu, Y., Matsuyama, Y., Ohashi, Y., & Setoguchi, S. (2015). When to conduct probabilistic linkage vs. deterministic linkage? A simulation study. Journal of Biomedical Informatics, 56, 80–86. https://doi.org/10.1016/j.jbi.2015.05.012.
Support for the Dating Study data collection was provided by Grants 2014-VA-CX-0066 and 1R21HD077345. We thank Gabriella Damewood, Ashley Dills, Nicole Graziano, and Angela Marinakis for their assistance in data collection.
The third author received research grants from the National Institutes of Health (1R21HD077345) and the National Institute of Justice (2014-VA-CX-0066) to support this study.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
Conflicts of Interest
The first, second, and fourth authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
About this article
Cite this article
Xu, S., Chan, A., Lorber, M.F. et al. Using Security Questions to Link Participants in Longitudinal Data Collection. Prev Sci (2019) doi:10.1007/s11121-019-01080-8
- Security questions
- Longitudinal studies
- Online studies