Avoiding bias and increasing diversity in AI and health research – Part 2


This article is part 2 of our bias in AI series,  an update to the original article in our Biotech Review of the year – issue 8. Read part 1 here.

In part 1 of our bias in AI series, we looked at how AI systems can contain bias and the issues this creates when AI is used in the healthcare and medical research sectors. In this part 2, we look at the solutions available for these challenges and how they can be implemented.

What can be done to decrease bias and increase diversity in health research?

A multi-pronged approach is likely to be the best way to address the causes of non-diverse datasets and the potential for biased or discriminatory outcomes from diagnostic and therapeutic treatments, as the most recent BIA Bioscience[1] conference explored.

Building trust

Firstly, any systemic distrust between minority communities and the pharmaceutical/medical device industry can be addressed through the building of long-term links with community focus groups, in particular partnerships with existing community groups, all the better if facilitated from someone who both works in the industry and is from that community.

Providing guidance and education about the GDPR  should assist with building trust, as organisations can emphasise the robust legal framework in the UK and the EU regulating the use of personal data in such research, combined with increasingly active enforcement by UK and EU supervisory authorities. Alongside this, recruitment processes should be reviewed and adapted to ensure a wider, more diverse pool of candidates is sourced for roles in the industry, so that there is wider representation, which will also increase trust in those communities.

Bristows Life Science Summit 2021

Trust in AI in the healthcare sector is one of themes we’re going to debate at our next Summit in November ’21.

Keep an eye out on our events page for further details, and register your interest here.


Making use of existing diverse datasets

Secondly, the industry can make use of existing datasets from other parts of the world rather than relying solely on local datasets from the Western world (which, for the reasons explored in part 1, are likely to underrepresent some ethnic groups). The data is there, but needs to be utilised and advertised to further increase the range of organisations that can benefit from it.

Companies should assess possible imbalanced datasets and seek to diversify them by looking for new sources of datasets from other areas of the world and incorporating them into studies. EU & UK research organisations would, as data controllers, still need to comply with the GDPR when processing any personal data within those datasets. The lawful bases relied on would need scrutiny, namely whether they could still rely on legitimate interests under Article 6 and public health, scientific research or provision of healthcare under Article 9.

Using real-time data from use of health apps

In the health tech world, particularly with health apps, it is now easier than ever for medical device companies to receive real-time feedback from users and patients and to use this knowledge to improve their service offering. For example, if someone feels that a particular feature or question isn’t relevant to them or is not something they can relate to, they can flag this and explain why, giving developers a much faster insight into potentially discriminatory processes.

With increasing choice over the apps available to use, it will be those that make patients feel their needs are met and that the company offers personalised, relevant services which are likely to win the greatest share of users. One innovative way to show care about each individual patient is by being able to personalise diagnosis/treatment based on sex, race and other relevant characteristics as relevant/necessary.

Thorough Assessment of AI systems

The ICO has also issued guidance[2] on possible technical methods, including mathematical models sometimes referred to as ‘algorithmic fairness’, that can be used to reduce the risk of discriminatory outcomes in AI systems. Different solutions are needed for different causes of bias, so analysis will need to be carried out by both technical and compliance teams to assess which one applies best to any particular situation:

  1. For imbalanced training data, it might be possible to add or remove data about under or overrepresented subsets of the population, to balance out the dataset (simply removing any protected characteristics from a model is unlikely to be enough as there are often variables which are proxies for those characteristics, e.g. certain jobs or postcodes etc.);
  2. Where training data reflects previous discrimination, the data can be modified, the learning process could be amended, or the model could be changed after training has taken place;
  3. Collecting more data on minority groups to try to reduce the disproportionate number of statistical errors they face;
  4. For any of the above, using mathematical “fairness” measures to test the results.

In addition, some of the methods conflict with one another, so it’s a case of assessing which would work best for the particular circumstances, including whether they would impact the statistical accuracy of the data. It should be noted that ‘statistical accuracy’, or how often an AI system determines the correct answer when measured against correctly labelled test data, is not the same as ‘accuracy’ as one of the fundamental data protection principles, which holds that personal data must be accurate and, where necessary, kept up to date.

Legal basis

In order to assess whether there is bias or the potential for discriminatory outcomes in AI systems or clinical trials, special category data may need to be processed for this purpose. Is this possible under the GDPR? As with all processing of personal data, you need to have an appropriate legal basis, and an additional condition under Article 9 for special category data, plus potentially meeting extra obligation under Schedule 1 of the DPA 2018. In the health world, data controllers are already processing special category data – i.e. health data – but that is separate from data about race and other characteristics for the purpose of ensuring the system isn’t biased. For this purpose, data controllers can rely on the research ground under Article 9(2)(j) if they can meet the extra requirements under article 89, or the substantial public interest condition under article 9(2)(g).

The future for AI in healthcare and medical research, post-COVID

In the post-COVID world, a focus on fair, balanced datasets is likely to become more commonplace as companies, public bodies and the general public have been awakened to the issues that arise when certain groups or communities are left out of research and trials. At the same time, new

entrants into the market and some established ones are increasingly run by a new generation which is very aware of the need to break down discriminatory practices in society and is more open to asking new questions and putting new SOPs in place to tackle this issue.

An increase in targeted technological investment will allow developments such as federated technologies, which can allow forensic analysis of the data where it is stored, without having to move it. This allows data controllers to avoid the transfer of data, with the added expenses, storage space and regulatory requirements that such movement incurs.

Developments in technology also allow companies to start interacting with patients in new ways which might be easier and more accessible for them, such as phone messaging or video calls, rather than more formal letters or face-to-face meetings. This in turn should open up communications with new communities. Companies can also now collect feedback much more easily online and therefore can quickly and efficiently understand what isn’t working for certain groups of patients. These findings can then be scaled up for populations. However the full benefits of technology will only be realised if companies take the time and effort to invest in providing technological and security reassurance to patients, in particular through emphasising the rights of data subjects and the obligations on data controllers and processors through data privacy laws.

It remains to be seen whether regulators will take the issue into their own hands. The FDA is arguably the most vocal regulator on this point and has published guidance about recruiting diverse populations for drug trials. The ICO has also issued specific guidance on avoiding bias in AI systems and this issue is likely to become more pressing over the coming years.

It could be that regulators now start to move from the ‘carrot’ of guidance to the ‘stick’ of regulation to combat bias in AI systems, and the more successful companies are likely to be those ahead of the curve on this issue. Success will come from being conscious of the possibility of bias, even in our machines.

[1] https://www.bioindustry.org/event-listing/uk-bioscience-forum-2020.html
[2] https://ico.org.uk/for-organisations/guide-to-data-protection/key-data-protection-themes/guidance-on-ai-and-data-protection/what-do-we-need-to-do-to-ensure-lawfulness-fairness-and-transparency-in-ai-systems/#howshouldweaddress

Related Articles