By Jonas Waldenström
Pale horse, pale rider – the Apocalypse foretold in the book of revelations. (Gustave Doré [Public domain], via Wikimedia Commons)
It is time for yet another tale about pestilence – the scourges of mankind. Once again it is about flu, but today I will not present my own research. Instead I would like to write about a new article that was published a little while ago in Nature
. It is a story about the speed of virus evolution, and how new analytic methods has put a bright light on some of the most influential disease events in the last 200 years. It is time to rewrite influenza virus evolution, once more.
The paper in question is both a very simple and a very complicated paper. Or rather, the idea is simple, but the science to test the idea is complicated and novel.
If we start with the idea: does the rate of virus evolution differ depending on which host species the virus infects? Remember that the influenza A virus (the correct name for flu) has a remarkably broad host range – there are a lot of viruses in wild waterfowl (those that we study), a more limited set in domestic fowl (which we don’t study), and a handful of viruses in mammalian host, including viruses adapted to pigs, dogs, horses, and, of course, humans. It is likely to assume that all ducks are fairly similar from a flu virus perspective. But is a duck really similar to a pig, or a horse, for an influenza virus? If so, what consequences would among-host species differences have for influenza evolution?
As it turns out, a lot.
If we press fast forward and jump directly to the results, it comes in the form of a tree. Or rather eight little mini-trees, one for each of the virus’ 8 RNA segments. Using fancy new phylogenetic methods, allowing the evolutionary rates to vary in the different host species, the authors ended up with trees that were much better dated than those from early analyses. This means that we now can say with good precision when the forks in the trees occurred and use that data to put epidemiological data in a historical perspective. Firstly, look at the trees for hemagglutinin (HA) and neuraminidase (NA). As evident from the long branch lengths, these gene variants go back a long, long time. The estimated time since the most recent common ancestor of all circulating HA subtypes and all NA subtypes is roughly a thousand years. This is a bit further back than previously believed, but not dramatically different.
The evolutionary trees of the influenza A virus segmented genome. Note that the H7N7 virus from horses is a sister group to all present day internal virus genes. Original paper can be found here.
If we now instead focus on the remaining six segments. Look at each of them in detail and you’ll soon see that they all look similar. Surprisingly similar, actually. It seems that all viruses circulating today (except those in bats) have genes for the internal proteins that stem from a limited set of variants that existed sometime around 1870.
Thus, around the mide-1800s something happened to the pool of influenza viruses. Backtracking disease history events the authors found congruence with a panzootic event of flu in horses – a worldwide epidemic in our four-legged friends. This panzootic somehow purged the world of all other internal gene variants, like a big broom on a dusty floor. Left were only the variants whose ancestors we see today. This is truly remarkable: a global sweep in the evolution of influenza! Why horses, you may ask. Aren’t they just decorative animals? Well, if we go back in the pre-automobile era horses must have been everywhere. They were used in agriculture, in mines, in cities – everywhere where something heavy was to be transported you could bet there was a horse around (as well as a few oxen, mules, donkeys). I think it may be hard to phantom for our phlegmatic selves how the world looked like before the advent of fossil-based engines. A big panzootic in horses would have had the potential to reach everywhere humans were and, as the data suggests, beyond, as even the viruses in wild ducks today have internal genes dating back to the 1870s.
Another important finding in this study is that the Spanish flu of 1918 actually seems to have originated in the Western hemisphere, and perhaps wasn’t overly Spanish after all. Reading this paper is just overwhelming for a flu biologist. Full of trinkets and goodies – I will return to it many times, for sure.
OK, let’s rewind the tape all the way to the beginning and look at the rationales a bit deeper (for those of you who are interested).
What evidence did we have of different evolutionary rates in influenza viruses before this study? What would we expect? To start with, different species have different morphology and physiology, the receptors that the virus binds to vary in their molecular structures, as do the distributions of cells with different receptor types in different organs. A duck and human are different both on the outside and the inside, so to speak. Also, the routes of virus transmission are different, being mainly fecal-oral in birds, and mainly respiratory in mammals. You release influenza virions when you sneeze, while the ducks poo them out with their feces.
These are well-known facts. We also knew that a host shift could make a big splash, both for the virus that is introduced, and for the viruses that were there before. For instance, when H5N1 crossed from wild birds into poultry a decade and a half ago it embarked on a rapid evolutionary journey. It quickly acquired a score of new mutations as it adapted to the poultry niche and, with time, geographically separated virus clades emerged – each with its own evolutionary trajectory. This initial evolution was fast and furious, and much faster than the normal pace seen in wild ducks.
Another example is pandemic flu in humans: when a new virus is introduced in the human population it will rapidly increase in frequency and is likely to replace previous strains, especially if they are of a subtype similar to those that dominate the existing human virus population. The new virus sweeps the old one out, just like the broom analogy above. Thus, the H1N1 Spanish flu from 1918 was replaced by the H2N2 Asian flu in 1957, and was in turn replaced by the H3N2 Hong Kong flu in 1968.
Thus, there were several good reasons to assume that speed of virus evolution is dependent on where it happens to be. But how can we estimate these different rates of evolution, and how can we use that knowledge for predictions? This is really the big problem – a big technical problem.
Imagine you want to draw an evolutionary tree. The tips of the tree represents the taxa we have included in the analysis, and the branches represent the evolutionary pathways backwards in time all the way to the most common recent ancestor, represented by the trunk. Drawing this tree is very simple if the number of tips is few. Consider that we want to make a tree representing the evolutionary history of the large primates gorilla, chimpanzee, bonobo and us humans. There is just a limited set of possible tree that can be drawn, and with good input data we are likely to get a tree that encapsulates the true evolutionary history of these apes. But if we want to make a phylogeny of all mammals, or all plants of the Solanaceae family, or all contemporary influenza viruses, the number of possible trees quickly becomes massive, a big Amazon jungle of potential trees. Finding the ‘right tree’ in this forest of possible trees will rely on a multitude of factors.
If we then also want the tree nodes to correspond to real time events – dating the tree – it becomes very computer intensive. And very sensitive to model assumptions. To date the tree we need some estimate of time, a clock. There are a number of ways to set that clock, either to use specimens that have been dated – for the phylogeny of apes above, that could be fossils from strata with known ages. Or for viruses, specimens collected at different time points. But for most occasions we will have to make assumptions about the clock given the variation in genetic sequences in contemporary samples.
For flu, it has been conventional to use a model of time called a ‘relaxed clock’. When using that model it has been possible to estimate the times of divergences of different virus subtypes, making a timetable of flu history. In the new paper, the authors first developed a novel evolutionary model that they termed ‘host specific local clock’. This model allowed for different rates of evolution in different host species. They tested the performance of this model together with other models on a simulated dataset of influenza viruses. Prompted by the good fit of the model to the ‘true’ simulated data they then applied it to a large collection of influenza sequences and mapped the time of different virus divergence patterns; the forks in the trees. And the end product is the little figure I shared above.
Science is about testing hypotheses. To challenge established truths over and over again to see if they hold. This is particularly true for the influenza A virus research field, where the whole narrative has been rewritten over and over again. When I started to become interested in this virus some 10-12 years ago the field virtually exploded with activity. Some of the old knowledge still holds, but it is inspiring to see how large progress the field have gone through in recent times. A big collective forward movement. But, fortunately for us scientists, it seems that this virus still has many secrets yet to reveal.
Link to the paper:
Michael Worobey, Guan-Zhu Han & Andrew Rambaut. 2014. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature. doi:10.1038/nature13016
If you enjoyed this post, or other posts on this blog, why not follow the blog via email, Feedly or get updates via Twitter by following @DrSnygg?
Gustave Doré [Public domain], via Wikimedia Commons