Anonymity, Confidentiality and De-identified Data
Anonymity is defined in the Oxford English Dictionary as “of unknown name, unknown or unclear sources or authorship, without character, featureless, impersonal.” However, within the social sciences, the concept of anonymity can have a less absolute interpretation allowing for a person’s identity to be disguised by pseudonyms. Some research council funding policies encourage this practice by mandating the storage of anonymized data for future unspecified research. Yet qualitative research data can only be stripped of identifiers; it cannot be anonymized. To understand the difference between an unknown anonymous source and a disguised confidential source, it is important to consider four nuances of data collection; first, how differently the process of informed consent is enacted in quantitative and qualitative research; second, while the concept external confidentiality appears in ethical guidelines, its inverse internal confidentiality does not, for example, the failure to recognize the practicalities of confidentiality in focus group research makes their participants vulnerable; third, the use of pseudonyms in qualitative research is the source of many historic ethical breaches; and fourth, subpoena of data threatens confidential data only.
Informed Consent and Anonymity
In quantitative data collection, a participant information sheet may advise participants that completing a questionnaire will imply informed consent. As there is no written consent form to reveal the participant’s identity, data provided to the researcher comes from an unknown source, and so in this context, anonymity can be a watertight ethical assurance. However, this assurance has caveats; it assumes the data collection instrument acquires no unique identifiers such as the subject’s name, social security number, or driver’s license number. Also if the survey sample size is small in number or based on a region or an occupation, demographic questions can threaten to expose the identity of participants and disclosure of their data. For example, a questionnaire asking military personnel to provide their rank, gender, and theaters of war served may identify the very few females in the military’s upper echelons.
Qualitative researchers do not, for good reason, have the same assumed consent process. Not only may qualitative researchers provide an information sheet, but they also routinely ask participants to provide their identity by signing a consent form to demonstrate their willingness to accept the ethical provisions offered by the researcher. Anonymity is not a valid provision as at least one other person, the researcher, knows the identity of the person and what the person said. This knowledge can never be unknown, and offering this ethical surety is ethically flawed, promising a participant something that cannot be achieved. Many researchers claim when reading a transcript decades after the interview they can hear the informant’s voice. The original ethical assurance to this person to disguise their identity and what they said endures.
The definition of confidentiality is simple: it refers both to the identity of the person and the information disclosed. The researcher knows the name of the person being quoted and promises not to tell other persons the identity of the person when reporting this information. However, there is a major relational caveat manifest as internal confidentiality can undermine this assurance; this stems from other participants sharing a relationship to the participant interviewed.
External and Internal Confidentiality
The distinction between external confidentiality and internal confidentiality reveals a major threat to confidentiality in qualitative research ethics. External confidentiality is traditional confidentiality: the researcher knows the name of the person who said the quote and what they said but promises not to tell other persons the identity of the person when reporting this information (Tolich 2004). This assurance can be undermined relationally. If a researcher interviews friends of the participant, family members, fellow workers, or a member of their small town, the threat to confidentiality is sourced not by strangers but fellow residents, occupants, or workers. Each of these can identify themselves and by default others.
Internal confidentiality is endemic in focus group research. Focus group researchers cannot offer participants internal confidentiality because it is outside of their control: the researcher can place few restrictions on focus group members. There is no ethical sanction on a participant should they reveal outside the focus group what was disclosed by another focus group member. Thus, promises of confidentiality must be limited to external confidentiality, that is, that the researcher will not identify any participant or what they said in any publication. Yet external confidentiality cannot limit focus group members from repeating other participants’ disclosures to a person outside the group. It does not matter if the participants are known to each other or strangers. Both are problematic and render participants vulnerable susceptible to harm from internal confidentiality. It makes sense if the participants were informed of their responsibilities and these limitations. At least the participant would know his or her own role in the minimization of harm.
In comparison, focus groups pose more distinctive ethical problems than one-on-one interviews. A participant in a private interview has opportunities to withdraw a remark during the interview or sometimes, if the participant reads an interview transcript, they can delete the remark. In focus groups verbal statements cannot be taken back and harm may result from the limits of internal confidentiality. Thus to use the word confidentiality without clarification may be taken as offering a layperson more than the concept can deliver.
Historical Ethical Breaches
Despite promises of external confidentiality, when his participants read the book, they saw themselves and those close to them.
Pecci (Doc) did everything he could to discourage local reading of the book for the possible embarrassment it might cause a number of individuals, including himself. (1981, 347)
When Scheper-Hughes returned to the site of her 1979 study of the mental health in an isolated Irish village, she found villagers had deciphered her attempts to provide pseudonyms as ethical assurances. She described her use of pseudonyms as ineffective:
When the [anthropology] book was published, many townspeople were highly disturbed to see some of the most intimate details of their lives recorded in print. Even though the author had attempted to protect his informants by using pseudonyms, their true identities were easily recognizable to anyone familiar with the area. Fifteen years later, another anthropologist who visited the town was surprised to discover that the local library’s copy of the book had the real names of all the individuals pencilled in next to their pseudonyms. Even after all those years, some of the community members were still visibly upset about the ways in which they had been portrayed.
What Whyte, Munchmore, and Scheper-Hughes graphically expose is that pseudonyms cannot make the known unknown. They are especially ethically weak solutions for qualitative researchers when the sample interviewed is relational.
I would be inclined to avoid the ‘cute’ and ‘conventional’ use of pseudonyms. Nor would I attempt to scramble certain identifying features of the individuals portrayed on the naive assumption that these masks and disguises could not be rather easily de-coded by villagers themselves. (2000, 128)
Are Pseudonyms a Genuine Ethical Assurance?
Saunders et al. (2015, 617) claim “anonymity” has commonly been used either interchangeably with or conflated with “confidentiality.” They muddle the known and unknown stating “anonymity is one form of confidentiality – that of keeping participants’ identities secret.” The essence of this confusion is mistakenly separating the identity of the person and the information they shared. Once collected, qualitative data is known, and by virtue of that fact, it cannot be unknown. This finding is important as it is out of step with policies on data archiving that tie funding for qualitative data collection to its archiving for future unspecified research.
Yet once confidential data is collected, it cannot “nearly be made” anonymized. At best the data can have identifiers removed. In fact, de-identification proves a more practical and a more ethical term for qualitative researchers especially in the storage of data. A definition of de-identified data would be data given to another researcher or stored in an archive after any identifiers had been redacted from the data. At no time should a qualitative researcher promise participant’s anonymity. The term anonymity has to be used in its dictionary sense when discussing consent with participants to collect qualitative data as that is how they will understand it. In other words, there are limits to confidentiality, and these are exposed below within focus groups and by the threat posed by a subpoena.
Anonymisation of data is a traditional option used for removing identifying information or disguising real names. ….The key issue here is that it is important to arrive at an appropriate level of anonymisation to ensure that data are not distorted to a degree which lessens their potential for reuse.
Any subpoena served on a quantitative researcher would compel the researcher to provide raw data to law enforcement officials knowing any respondent is protected by anonymity. This adheres to the statement that filling out this questionnaire implies informed consent which does not identify the respondent or what they said. The researcher and the respondent are protected from the subpoena as the researcher does not know how each individual respondent responded to the questionnaire. On the other hand, the qualitative researcher is vulnerable given their awareness of the limits of confidentiality, not only in terms of internal confidentiality but also that failure to provide data following its subpoena would expose researchers to a charge of contempt of court given that they know both the identity of the informant and what each individual said.
In sum, clarity of ethical concepts is essential. Researchers cannot offer participants ethical assurances of both confidentiality and anonymity interchangeably, as if the double assurance were better than one. It is not; the concepts of anonymity and confidentiality are mutually exclusive especially in terms of the distinct informed consent process. Once quantitative data is collected, the relationship between the researcher and the respondent is ephemeral. However, the relationship between the qualitative researcher and their informant endures. There are limits to confidentiality for qualitative researchers. Internal confidentiality for relational informants makes them vulnerable, so too does the use of pseudonyms to make data unknown. These limits are exposed in focus group research where membership creates relational problems; the researcher may promise external confidentiality, but they have no control over what the other participants say after data collection. Finally, the threat posed by a subpoena for any qualitative researcher demonstrates a fundamental difference between ethical assurances of anonymity and confidentiality.
- Corti L, Day A, Backhouse G (2000) Confidentiality and informed consent: issues for consideration in the preservation of and provision of access to qualitative data archives. Forum Qual Soc Res 1(3). Retrieved from www.qualitative-research.net/index.php/fqs/article/view/1024/2207
- ESRC Research Data Policy (2015) Economic and Social Research Council. Retrieved from http://www.esrc.ac.uk/files/about-us/policies-and-standards/esrc-research-data-policy/
- Muchmore JA (2002) Methods and ethics in a life history study of teacher thinking. Qual Rep 7(4):1–17Google Scholar
- Whyte WF (1981 ) Street corner society: the social structure of an Italian slum. University of Chicago Press, ChicagoGoogle Scholar