Why is anonymisation an important topic when using genomic data in scientific research?

Many of our life sciences clients use genomic or genetic data as part of their research projects, whether looking for causes of, or treatments for, specific illnesses. The research arrangements often involve multiple different parties collecting, sequencing and analysing the data. We advise parties at any and all stages in the process.


First published in our Biotech Review of the year – issue 9.

Where this data meets the legal threshold for anonymisation, data protection laws don’t apply to the processing of the data, allowing for reduced liability and logistical/ administrative burdens on the parties. Consent is also not required for research under the common law duty of confidentiality where the data is anonymised. In addition, some parties in the chain will have a business model that relies on the data being anonymised, particularly parties in third countries that specifically aim to avoid falling within the remit of the General Data Protection Regulation (GDPR).

The data is usually claimed to be “effectively” or “legally” anonymised rather than technically anonymised. In the research context this tends to mean that the data remains identifiable to the party collecting the data but, before it is shared with third parties, the data is pseudonymised, shared only with a limited number of organisations in a trusted research environment (TRE) with access controls in place and without giving any third party access to the re-identification key. The aim is to reduce the risk of re-identification of the data to a level sufficiently low so as to argue that it has been effectively anonymised, rather than completely removing the risk of re-identification.

Clients therefore often need to assess whether the arrangements in place are sufficient for the genetic data they are using to be considered legally anonymised or whether effective anonymisation is even possible for such rich, inherently individual, data.

What’s the challenge?

Organisations face a number of challenges when assessing whether genetic data can be effectively anonymised. First, the GDPR definition of personal data includes “factors specific to the genetic identity of that person” and under Article 9(1) “genetic data” is itself listed as a type of special category data, suggesting that genetic data is inherently personal data and cannot be anonymised.

Secondly, there has not been any UK case law on the topic of identifiability or anonymisation of genetic data specifically, and recent related UK case law on the topic of identifiability for other comparable types of data (biometric and browser-generated) has taken a very conservative direction that elides the concepts of individuation (i.e. singling someone out) with identifiability.

Thirdly, both UK and EU guidance on both the topic of anonymisation more generally and the topic of genetic data specifically are out of date (pre-GDPR) and in a state of flux. The ICO is currently in the process of updating its guidance on anonymisation and is drip-releasing draft chapters for consultation. However, it is not clear when all the draft chapters will be released, whether genetic data will be covered and how much the drafts will change as a result of consultation responses.

Finally, there are diverging views and some confusion within the industry itself about what actually constitutes “effective” anonymisation, particularly with differences arising between EU, UK and US organisations, meaning it can be difficult for parties to reach agreement on the status of the data when drafting research agreements.

What is the position in UK law?


Personal data is defined at article 4(3) GDPR: “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier…” (emphasis added)

Anonymised data is defined at recital 26 GDPR: “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable” (emphasis added).

The recital clarifies that to assess whether an individual is identifiable, “account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments” (emphasis added).

UK case law on identifiability

Two recent UK cases raise concern for the possibility of anonymising genetic data. First, in the case of Vidal-Hall v Google[1], it was held that browser-generated information was personal data because it singles out an individual and allows them to be distinguished from all others. Similarly, in the case of R(Bridges) v Chief Constable of South Wales Police[2], which followed Vidal-Hall, it was held that biometric facial image data was personal data as it allowed individuals to be singled out or “individuated”, despite the fact the data was immediately deleted and never connected to the individual’s name or any other data about them.

Can these cases be distinguished when it comes to genetic data (which is not the type of data in issue in any of them)? Genetic data and biometric data are sometimes grouped together as examples of “high risk processing” in ICO guidance[3] which does not bode well for those hoping to do so, as it might tend to the conclusion that non-aggregated genetic data is inherently identifiable without the need to link the data to any other identifiers, such as a name or patient ID.

The difficulty of assessing cases on anonymisation is highlighted at paragraph 131 of the judgment in Vidal-Hall: “Mr White, Mr Tomlinson and Ms Proops each say that the case supports their argument. However, none of the parties is able to say with any confidence precisely what the case decides on points that are material here, nor that the various passages to which we were referred, form part of the ratio”! (emphasis added). Take two QCs pre-eminent in their field and one soon-to-be, one of the leading cases (Common Services Agency), and confusion still reigns!

ICO guidance on genetic data

The ICO’s guidance on genetic data under the UK GDPR is not completely clear whether genetic data is inherently identifiable. It starts with a general statement that quickly becomes circular: “A genetic sample itself is not personal data until you analyse it to produce some data. And genetic analysis data is only personal data (and so genetic data) if you can link it back to an identifiable individual”.

It then goes on in a similar fashion, trying to offer examples of genetic data that fall either inside or outside the definition of personal data: “in practice, genetic analysis which includes enough genetic markers to be unique to an individual is personal data and special category genetic data, even if you have removed other names or identifiers” (emphasis added). On the other hand, “there are cases where genetic information is not identifiable personal data. For example, where you have anonymised or aggregated partial genetic sequences or genetic test results (e.g. for statistical or research purposes), and they can no longer be linked back to a specific genetic identity, sample or profile; a patient record; or to any other identifier”.

However the guidance does not address data that falls in the critical spectrum between these two cases. On the face of it, the ICO does not appear to agree that genetic data is inherently identifiable, but rather it focuses on there being enough “unique” information in the data to be able to distinguish a particular individual.

We should note that the EU guidance on genetic data[4] tends to be more conservative, though the EDPB admits, in its recent Response to the request from the European Commission for clarifications on the consistent application of the GDPR focusing on health research, that there is currently no agreed EU position on this issue: “the possibility to anonymise genetic data remains an unresolved issue. As yet, it remains open to be demonstrated whether any combination of technical and organisational means can be effectively employed to remove genetic information from the material scope of the GDPR”[5]. In addition, following Brexit, once the new ICO guidance is published it is unclear how much impact the EU position will continue to have in the UK, particularly given the UK government’s announced “New direction for data” draft policy[6].

ICO guidance on anonymisation

At the time of writing, two draft chapters of the new ICO guidance on anonymisation have been published and we are aware of a number of responses to the consultation that have been submitted from the Life Sciences industry. In particular, issues that have been raised include:

  • Singling out: it is not clear if this is considered to be sufficient for identifiability, such that any data presented in rows in a spreadsheet can never be anonymised. This would have a huge impact on the scientific research community.
  • Anonymous in whose hands? Can the data be anonymised in the hands of those only accessing the data through a TRE with sufficient controls in place? What if the individual gaining access is a research employee of the organisation that runs the TRE but has no access to the underlying identifiable data? What if the data is accessed by a processor that never has access to the controller’s identifiable version of the data or the pseudonymisation key?
  • Inferences: at what point of certainty that an inference is correct does the data become identifiable?
  • “Relates to”: is identifiability the only consideration for anonymisation, or should parties also consider the purpose of the processing and any impact on data subjects?
  • International transfers: if the data is identifiable in one party’s hands and they transfer the data to a third party outside of the UK/EU, in whose hands the data is considered to be effectively anonymised, does this count as a restricted transfer for which the exporter must comply with Article 46 GDPR and the Schrems II data transfer risk assessment implications? This would mean the data importer would have to sign up to GDPR provisions via the Standard Contractual Clauses despite only dealing with anonymised data.

Until an updated approved version of these chapters and the rest of the guidance is available, which we understand will include a case study based on a scientific research scenario, the current guidance is the ICO’s current standing Code of Practice on Anonymisation (Code)[7], which was published in 2012, six years pre-GDPR.

ICO’s Code of Practice on Anonymisation

The ICO’s Code notes that sharing of pseudonymised data without the re-identification key is one of the higher risk methods of anonymisation, given the higher risk of linkage with other datasets to allow re-identification. However, it also takes a practical approach to this issue, noting the difference between a public disclosure to the world at large, versus a limited disclosure to a small, known, trusted group of researchers with adequate contractual, organisational and technical controls in place to reduce the risk of re-identification to a sufficiently low level.

When assessing “all the means reasonably likely to be used” to re-identify the data, as required under Recital 26, the ICO suggests using a “motivated intruder” test, to assess the likelihood that a motivated individual, without access to special knowledge or tools, would be successful in re-identifying the data. When dealing with a limited disclosure, this test should consider the likely type of “motivated intruder” such as an over-zealous researcher with access to the usual tools they have access to as part of their job.


So what does the above mean from a practical point of view for Life Sciences organisations grappling with the question of whether the genetic data they are dealing with is effectively anonymised in the UK?

This is a notoriously difficult field, requiring a good deal of technical as well as legal problem-solving. Further, data protection supervisory authorities are not sector-focussed. We are not optimistic that clear guidance that can be readily implemented will be forthcoming in the short to medium term future.

We would argue that for the time being, whilst the current guidance is in such a state of flux (and with enforcement potentially unhelpfully diverging as between the UK and EU), the best way forward remains to build on the recommendations regarding limited disclosures in the current ICO Code, carefully setting out the case as to why a given TRE provides adequate risk mitigation, based on the technical, organisational and contractual controls that you have implemented.

If the parties do conclude that in certain situations the data can be treated as effectively anonymised in the recipient’s hands, it would then be prudent to consider contractual contingency provisions in the agreement between them, to the effect that upon a trigger event (a change in law or guidance), the parties must suspend or terminate relevant data processing activities until a replacement data protection solution can be agreed and implemented.

In the longer term, the field seems ripe for industry to develop a Code of Conduct under Article 40 of the UK GDPR. This could have a narrow focus on genomic data or, more ambitiously, seek to cover anonymisation in research across the sector as a whole.

 [1] Vidal-Hall v Google Inc [2015] EWCA Civ 311
[2] R (Bridges) v Chief Constable of South Wales [2019] EWHC 2341 (Admin)
[3] E.g. Guidance on DPIAs: https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/accountability-and-governance/data-protection-impact-assessments/
[4] https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2004/wp91_en.pdf
[5] https://edpb.europa.eu/sites/default/files/files/file1/edpb_replyec_questionnaireresearch_final.pdf
[6] https://www.gov.uk/government/consultations/data-a-new-direction
[7] https://ico.org.uk/media/for-organisations/documents/1061/anonymisation-code.pdf