AI and Life Sciences: Has protein folding been solved?


Amongst the bad news that took up a lot of headlines last year, there was one story at the end of last year that caused a lot of excitement in the life sciences sector. DeepMind’s Artificial Intelligence – AlphaFold 2 – appears to have solved the conundrum of protein folding[1].

The technology to read (and indeed edit) DNA sequences and thus the amino acids sequences that they encode has developed rapidly over the past few decades. However, predicting exactly how amino acid sequences then fold into the complex three dimensional (3D) structures of proteins has, so far, not been possible. The 3D structure of a protein is critical to its biological activity. To identify a protein’s 3D structure, it has been necessary to utilise complex and expensive experimental methods (such as X-ray crystallography).  This has resulted in a vast chasm between the number of known DNA and amino-acid sequences encoding proteins and the number of known 3D protein structures they encode. The ability to actually predict a protein structure has been thought to be a problem too complex to solve with current technology. Enter artificial intelligence.

The stage for AlphaFold 2’s achievement was the biennial competition, Critical Assessment of Structure Prediction (CASP). Founded in 1994, the aim of CASP is to improve computation methods for accurately predicting protein structures. In brief, CASP releases target protein sequences (about 100 in total) to challengers who are set the challenge of predicting their structure. The target proteins consist of proteins that have recently had their structure determined via experimental techniques but have not had their structures made public (for example, the 2020 competition contained certain proteins from the novel coronavirus). The challenge is to see who can most accurately predict the protein structures from their amino acid sequences, assessed using Global Distance Test (GDT) metric which ranges from 0 – 100.

AlphaFold actually won the competition in 2018[2]. However it was AlphaFold 2’s performance in 2020 that is really astonishing. AlphaFold 2 achieved a median score of 92.4 GDT across all protein targets. To put this into context, a score of around 90 is considered comparable with results obtained from experimental methods. On the “moderately difficult” protein targets, the best of the other challengers achieved a score of around 75 whilst AlphaFold 2 consistently scored around 90. The implication is that AlphaFold 2 can predict the structure of proteins as accurately as experimental methods can actually measure them.

DeepMind’s achievement highlights the real world impact that AI can have, aside from dominating humans at Chess or Go. It also shows the important role that AI will have in advancing the life sciences sector and, in particular, drug discovery. The ability to predict protein structures from their amino-acid sequences with relative ease could massively accelerate the development of drugs to novel targets, as well as help us understand diseases caused by genetic variations. While there are undoubtedly further developments to be made in this space, IP lawyers should be braced for patent protection arising from AI such as AlphaFold 2 in the life sciences sector as well as, potentially, a marked acceleration in drug development.

[1] High Accuracy Protein Structure Prediction Using Deep Learning; John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Kathryn Tunyasuvunakool, Olaf Ronneberger, Russ Bates, Augustin Žídek, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Anna Potapenko, Andrew J Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Martin Steinegger, Michalina Pacholska, David Silver, Oriol Vinyals, Andrew W Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis; In Fourteenth Critical Assessment of Techniques for Protein Structure Prediction (Abstract Book), 30 November – 4 December 2020

[2] Improved protein structure prediction using potentials from deep learning; Senior, A.W., Evans, R., Jumper, J. et al.. Nature 577, 706–710 (2020).

Nicholas Michelmore