Medical Research Council’s guidance on identifiability/ anonymisation


The Medical Research Council (MRC) published new guidance on the anonymisation of data in the sphere of scientific research – Guidance Note 5 (GN5) – with participation from the ICO. The guidance reiterates the difference between data that is a) “identifiable” in the health research sense; b) data which fulfils the definition of “personal data” in the GDPR and c) data which amounts to “confidential information” under the common law duty of confidentiality[1]. GN5’s main focus is on how to convert identifiable data into anonymous data (which, in turn, would mean that it is no longer personal data or confidential information and therefore the GDPR and common law duty of confidentiality do not apply).

The guidance centres around the concept of “jigsaw” identification, which is where different pieces of information are joined together to identify an individual. When assessing whether data has been rendered legally anonymous, a controller needs to not only remove real world identifiers, such as name, address, date of birth etc. (this would amount to pseudonymisation), but then also consider the audience of the data and the knowledge that they might already have, or have access to, which could enable them to fit the pieces of the jigsaw together and identify individuals from the data.

The final part of the test is to consider the likelihood and impact of identification. When considering how likely it is that someone would attempt to re-identify the data, the guidance recommends that you should have in mind a person who is more motivated than most to identify an individual (e.g. someone with an axe to grind or someone who could gain financial profit or notoriety from the re-identification) using all the means they have available to them.

Different audiences

The MRC lists a number of techniques for limiting identifiability of data which has already been pseudonymised. Which techniques will work for each controller or processor will depend on the audience they wish to share the data with. Either way, the end result in order successfully to achieve anonymity needs to be that it is not reasonably likely that individuals would be identified.

When sharing the data with another organisation, there will tend to be a greater separation between the “key” to the initially pseudonymised data and the new audience of the data. The receiving organisation will not be able to retrieve the key unless you provide it to them (other than by illegal means, which should be considered very unlikely with a professional organisation). You should consider what other datasets they might already have or have access to which would allow them to piece together the information to re-identify the data you propose to disclose to them. You can also obtain contractual protection by entering into a legal Data Sharing Agreement, which would include a prohibition on the receiving party attempting to re-identify the data.

Further recommended controls include: checking that the receiving organisations have an appropriate information governance and security policy in place; ensuring all individuals who will have access to the data have been trained with regard to data protection and confidentiality and will face sanctions for breach of any related policies; considering requirements to comply with codes of practice from professional bodies; and using a “Safe Environment” in high risk cases, meaning that the data is only shared in a highly controlled environment with physical access restrictions.

When reviewing the proposed sharing data within an organisation (e.g. to ensure it is only accessed by individuals who have a duty of confidence with regard to the data), the guidance highlights that it is not possible to render the data anonymous for the purposes of the GDPR, as the organisation is seen as a single corporate entity with indivisible responsibility for data protection; whilst it holds both the pseudonymisation key and the data, internal controls are seen as insufficient to prevent re-identification.

However, it is possible within an organisation, according to the guidance, to prevent breach of the common law duty of confidentiality, using the techniques recommended above for sharing data with another organisation (of course a DPA with your own organisation is not legally possible, and a controller/processor should make sure to use internal technical and organisational controls to protect the pseudonymisation key from those within the organisation who should not have access).

Genetic data

GN5 clarifies that the same principles apply to genetic data as apply to other types of data; that is, the test for whether genetic information is considered identifiable is whether the specific audience you are sharing it with would be able to identify an individual from the data alone, or in combination with some other information it is reasonably likely to have or have access to.

One point to bear in mind in particular with genetic data is that identification is likely to have an increased impact on family members of the individual as well as on the individual themselves. In addition, the amount of information available in databases regarding genetic sequences is increasing and is likely to continue to increase over time, making identification more and more possible. It is therefore important to have particularly robust controls in place when sharing genetic data.

Some particularly challenging scenarios

The MRC points out two specific scenarios where anonymising data will not be possible. The first is where NHS Numbers are left in the data. NHS Numbers are considered identifiable information as most NHS employees would have access to the pseudonymisation key to work out who the information relates to. Therefore, if the NHS Number cannot be removed from the dataset then it should be treated as pseudonymised rather than anonymised data.

The second is where specific staff members, particularly clinicians, would be able to recognise an individual with a sufficiently rare illness from very little information, for example because the individual is their patient. In such a case, the data should not be treated as anonymised and the processing should be managed by compliance with the GDPR and through consent for the common law disclosure.

What if you can’t anonymise?

The MRC reminds organisations that if it is not possible to anonymise the data, it can still be used for research, providing:

  1. Any disclosure of confidential information is managed by other means, for example by consent under the common law duty of confidentiality; and
  2. The processing of such information is conducted in line with the GDPR.

[1] The main differences being that personal data under the GDPR must relate to a living identified or identifiable individual, and confidential information under the common law duty of confidentiality must not be in the public domain, must relate to an identifiable individual (whether alive or deceased), must have a degree of sensitivity associated with it and must be shared with the expectation that it will be kept confidential (e.g. from a patient to a doctor). Identifiability, on the other hand, just requires that the data can be used to identify a specific individual, whether on its own or in combination with other data.