Has Google revolutionised Biology?
Summary:
- Proteins are an integral part of life, since they carry out most functions of cells. Their function is closely related to their structure.
- Determining a protein’s structure from its building blocks is very difficult, due to the high degree of interactions between those building blocks.
- Classically, intricate and time-consuming experiments, such as X-ray crystallography and cryo electron microscopy are used to find protein structures.
- The Google daughter company DeepMind has developed a machine learning algorithm that can predict protein structures from their building blocks with unprecedented speed and accuracy.
- This could revolutionise biology and pharmacology and is considered one of the most impactful advancements of artificial intelligence in recent times.
It seems that one of biology’s oldest problems has recently been solved: determining the three-dimensional shape of a protein from the sequence of its building blocks [1]. To non-life-scientists this may not immediately appear like a ‘big thing’, but it could well be one of the most significant advancements of biology in recent times. In order to understand just how impactful this is, we have to take a look at how cells function in detail.
On a molecular level, almost all functions of living organisms are executed by proteins. When a cell needs to perform a certain task, it generates the required proteins from its DNA by translating the corresponding DNA sequence into a sequence of amino acids, which is then folded into the correct shape, generating a functional protein. This last ‘folding’ step is crucial, since a protein’s function is mainly determined by its shape and one amino acid sequence can fold into many different configurations with vastly different properties.
While it is rather simple for scientists to determine a protein’s amino acid sequence, it has been close to impossible to use this information to predict its form and hence its function. This is mainly due to the astronomical number of possible configurations a protein can obtain; for some this number is in the realm of 10100 (a one followed by 100 zeroes) [2]. It is fair to say that trying out all possible configurations to find the most likely one is not an option.
However, it is crucial in many areas of biology, pharmacology and medicine to know the exact structure of the proteins involved in certain processes to be able to, for example, develop new drugs or investigate tumour growth. In recent years two experimental strategies have been used for this: X-ray crystallography and cryo electron microscopy [3]. Both have been very successful in finding the structures of individual proteins, but at the same time, they require a lot of time and resources to do so. Therefore, only a small number of the known proteins – less than one percent [2] – have been analysed with these techniques.
The recent rise of artificial intelligence has re-fuelled the efforts to predict protein structures from their amino acid sequences. This was mainly driven by small academic groups which participated in the biennial ‘Critical Assessment of Protein Structure Prediction’ (CASP) competition, in which the task was to predict previously unknown protein structures. This year’s iteration of the competition was won by a huge margin by the Google daughter company DeepMind – known for their AI’s success in chess and Go. The company trained their AI – called AlphaFold – on all known protein structures and used an ‘attention algorithm’ that breaks down complicated problems into manageable bits.
Strikingly, AlphaFold’s structure predictions were very close to experimental findings in general and almost indistinguishable in two-thirds of the cases. Even though the AI struggled with some more complicated protein structures, its success is considered a ‘game changer’ by many scientists [1,4].
The implications of this technology are vast. It may allow large-scale analyses of protein functions which could lead to more targeted drug development strategies and new ways to study the evolution and functions of cells. For the time being, it will be interesting to see how AlphaFold performs outside the CASP competition and how it will develop in the years ahead. However, one thing seems clear already: AI is on its way to solving a problem that could not be solved by people in over 50 years [1].
References:
- Noble K. Artificial intelligence solution to a 50-year-old science challenge could ‘revolutionise’ medical research. CASP Press Release. 2020;
- Service RF. ‘The game has changed.’ AI triumphs at protein folding. Science. 2020;
- Callaway E. The revolution will not be crystallized: A new method sweeps through structural biology. Nature. 2015;
- Callaway E. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature. 2020;