In a paper published in the Journal of Molecular Evolution this week, researchers from the University of Bath describe a new theory which they believe could solve a puzzle that has baffled scientists since they first deciphered the language of DNA almost 40 years ago.
In 1968, Marshall Nirenberg, Har Gobind Khorana and Robert Holley received a Nobel Prize for working out how proteins are produced from the genetic code. They discovered that three letter 'words' - known as codons - are read from the DNA code and then translated into one of 20 amino acids. These amino acids are then strung together in the order dictated by the DNA code and folded into complex shapes to form a specific protein.
As the DNA 'alphabet' contains four letters - called bases - there are as many as 64 three-letter words available in the DNA dictionary. This is because it is mathematically possible to produce 64 three-letter words from any combination of four letters.
But why there should be 64 words in the DNA dictionary which translate into just 20 amino acids, and why a process that is more complex than it needs to be should have evolved in the first place, has puzzled scientists for the last 40 years.
Dozens of scientists have suggested theories to solve the puzzle, but these have been quickly discounted or failed to explain some of the other quirks in protein synthesis.
"Why there are so many more codons than amino acids has puzzled scientists ever since it was discovered how the genetic code works," said Dr Jean van den Elsen from the Department of Biology and Biochemistry.
"It meant the genetic code did not have the mathematical brilliance you would expect from something so fundamental to life on earth."
One of quirks of the genetic code is that there are groups of codons which all translate to the same amino acid. For example, the amino acid leucine can be translated from six different codons whilst some amino acids, which have equally important functions and are translated in the same amount, have just one.
The new theory builds on an original idea suggested by Francis Crick - one of the discoverers of the structure of DNA - that the three-letter code evolved from a simpler two-letter code, although Crick thought the difference in number was simply an accident "frozen in time".
The University of Bath researchers suggest that the primordial 'doublet' code was read in threes - but with only either the first two 'prefix' or last two 'suffix' pairs of bases being actively read.
By combining arrangements of these doublet codes together, the scientists can replicate the table of amino acids - explaining why some amino acids can be translated from groups of 2, 4 or 6 codons. They can also show how the groups of water loving (hydrophilic) and water-hating (hydrophobic) amino acids emerge naturally in the table, evolving from overlapping 'prefix' and 'suffix' codons.
"When you evolve our theory for a doublet system into a triplet system, you get an exact match up with the number and range of amino acids we see today," said Dr van den Elsen, who has worked with Dr Stefan Babgy and Huan-Lin Wu on the theory.
"This simple theory explains many unresolved features of the current genetic code. No one has ever been able to do this before, so we are very excited."
The theory also explains how the structure of the genetic code maximises error tolerance. For instance, 'slippage' in the translation process tends to produce another amino acid with the same characteristics, and explains why the DNA code is so good at maintaining its integrity.
"This is important because these kinds of mistakes can be fatal for an organism," said Dr van den Elsen. "None of the older theories can explain how this error tolerant structure might have arisen."
The new theory also highlights two amino acids that can be excluded from the doublet system and are likely to be relatively recent 'acquisitions' by the genetic code. As these amino acids - glutamine and asparagine - are unable to hold their shape in high temperatures, this suggests that heat prevented them from being acquired by the code at some point in the past.
One possible reason for this is that the Last Universal Common Ancestor (LUCA), which evolved into all life on earth, lived in a hot sulphurous pool or thermal vent. As it moved into cooler conditions, it was able to take up these two additional amino acids and evolve into more complex organisms. This provides further evidence for the debate on whether life emerged from a hot or cold primordial soup.
"There are still relics of a very old simple code hidden away in our DNA and in the structures of our cells," said Dr van den Elsen, who points to several aminoacyl-tRNA synthetases - molecules involved in protein synthesis - which only look at pairs of bases in triplet codons, as well as other physical evidence in support of the theory.
"As the code evolved it has been possible for it to adapt and take on new amino acids. Whether we could eventually reach a full complement of 64 amino acids I don't know, a compromise between amino acid vocabulary and its error minimising efficiency may have fixed the genetic code in its current format."