Ten reasons to love academia (and to forget looming application deadlines)

By Jonas Waldenström

It is application times. Again. That period of the year when researchers look even more tired and haggard than usual. That period when the lingering coffee aroma in the hallway seems stronger, even pungent. A time of frantic scribbles on papers, pulled hairs, and the tick-tick-tick-tack sounds of fingers tapping away on keyboards. And in the distance, the muffled sounds of silent sobs from behind the bathroom door.

Application times are gruesome. Competition for grants is fierce – many apply, but few get them. A simple fact. But regardless of the odds, we pour down our souls on paper, weighing each word, trying to convey the message that this application is the best of the lot, the next Nobel Prize in coming. We wish that it reach perfection, but it seldom does. Because life cannot be paused. There is still teaching to done, administration to administer, student manuscripts to read – and a family to feed and care for.

For more great stuff of Cyanide and Happiness check out explosm.net/comics/

For more great stuff of Cyanide and Happiness check out explosm.net/comics/

In this time it is good to remember all the good things of life in academia. So for all you weary and tired, here is a list of remembrance:

1)   You are an expert. Of all the snowflakes in academia, you are one unique little snowflake. There is simply no one who is better than you at being you. Your training, all the hardships and struggle has produce your academic self.

2)   Smell the flowers. Stop for a while, go to your pile of reprints and read one of them. Feel that glossy paper and remember the joyous feeling you had when it was published. You wrote that! It is easy to forget the little victories, but you shouldn’t. There is lots of stuff to be proud of!

3)   Fieldwork. Application times also heralds the onset of spring – soon it is time to roam the green hills and collect new data. As I have written before: the fieldwork is one of the best quirks of being a biologist! More ducks to catch!

4)   The grandeur of science. Our job is not just an ordinary job. We are a part of a human movement for enlightment. We are getting paid to find things out – isn’t that awesome?

5)   Friends in many places. Contrary to peoples’ beliefs, science is all about collaborations. We make friends all over the world, we exchange ideas and sometimes we work together. And we get to learn other cultures, new people, and see things.

6)   ‘The pleasure of finding things out.’ There is a beauty to figure out things, to solve a question. The quote comes from Richard Feynman, a multifaceted scientist and Nobel laureate. Also a man that took part in the development of the atom bomb. There is an hour-long BBC documentary on YouTube that you ought to watch.

7)   You are your own boss. Freedom has always been viewed as a prerequisite for science. Freedom to explore ideas, freedom to participate in the discussion, freedom to think. Although this academic freedom isn’t as free as it used to be, life in academia is still very different from ordinary workplaces. Embrace that.

 8)   Science is a lucid sea to swim in. If you feel stuck, open your browser and explore the world from your computer. There are so many awesome studies conducted, in topics you couldn’t even imagine. Go on Twitter, follow some of the great science communicators – read and enjoy!

9)   Embrace your inner nerd/geek. Isn’t it great to know stuff? I can recite the names of viruses, I know where the Mallard flies, and how to find the best pubs in Oxford. All because of science!

10)                  Play the ‘swap professor with X’-game. If you are really down, and think that everyone else are so great and you are just a little pile of shit on the academic floor, then you should play the ‘swap professor with X’-game. It is simple. Imagine the hotshot professor in a non-academic setting. How would professor X do in a lumberyard, in a marathon, in a grocery store? Refreshing, I promise you.

Now go back to your application. Damn it.


If you enjoyed this post, or other posts on this blog, why not follow the blog via email, Feedly or get updates via Twitter by following @DrSnygg?

Influenza A virus history rewritten, once more

By Jonas Waldenström

Pale horse, pale rider - the Apocalypse foretold in the book of revelations. (Gustave Doré [Public domain], via Wikimedia Commons)

Pale horse, pale rider – the Apocalypse foretold in the book of revelations. (Gustave Doré [Public domain], via Wikimedia Commons)

It is time for yet another tale about pestilence – the scourges of mankind. Once again it is about flu, but today I will not present my own research. Instead I would like to write about a new article that was published a little while ago in Nature. It is a story about the speed of virus evolution, and how new analytic methods has put a bright light on some of the most influential disease events in the last 200 years. It is time to rewrite influenza virus evolution, once more.

The paper in question is both a very simple and a very complicated paper. Or rather, the idea is simple, but the science to test the idea is complicated and novel.

If we start with the idea: does the rate of virus evolution differ depending on which host species the virus infects? Remember that the influenza A virus (the correct name for flu) has a remarkably broad host range – there are a lot of viruses in wild waterfowl (those that we study), a more limited set in domestic fowl (which we don’t study), and a handful of viruses in mammalian host, including viruses adapted to pigs, dogs, horses, and, of course, humans. It is likely to assume that all ducks are fairly similar from a flu virus perspective. But is a duck really similar to a pig, or a horse, for an influenza virus? If so, what consequences would among-host species differences have for influenza evolution?

As it turns out, a lot.

If we press fast forward and jump directly to the results, it comes in the form of a tree. Or rather eight little mini-trees, one for each of the virus’ 8 RNA segments. Using fancy new phylogenetic methods, allowing the evolutionary rates to vary in the different host species, the authors ended up with trees that were much better dated than those from early analyses. This means that we now can say with good precision when the forks in the trees occurred and use that data to put epidemiological data in a historical perspective. Firstly, look at the trees for hemagglutinin (HA) and neuraminidase (NA). As evident from the long branch lengths, these gene variants go back a long, long time. The estimated time since the most recent common ancestor of all circulating HA subtypes and all NA subtypes is roughly a thousand years. This is a bit further back than previously believed, but not dramatically different.

The evolutionary trees of the influenza A virus segmented genome. Note that the H7N7 virus from horses is a sister group to all present day internal virus genes. Original paper can be found here.

The evolutionary trees of the influenza A virus segmented genome. Note that the H7N7 virus from horses is a sister group to all present day internal virus genes. Original paper can be found here.

If we now instead focus on the remaining six segments. Look at each of them in detail and you’ll soon see that they all look similar. Surprisingly similar, actually. It seems that all viruses circulating today (except those in bats) have genes for the internal proteins that stem from a limited set of variants that existed sometime around 1870.

Thus, around the mide-1800s something happened to the pool of influenza viruses. Backtracking disease history events the authors found congruence with a panzootic event of flu in horses – a worldwide epidemic in our four-legged friends. This panzootic somehow purged the world of all other internal gene variants, like a big broom on a dusty floor. Left were only the variants whose ancestors we see today. This is truly remarkable: a global sweep in the evolution of influenza! Why horses, you may ask. Aren’t they just decorative animals? Well, if we go back in the pre-automobile era horses must have been everywhere. They were used in agriculture, in mines, in cities – everywhere where something heavy was to be transported you could bet there was a horse around (as well as a few oxen, mules, donkeys). I think it may be hard to phantom for our phlegmatic selves how the world looked like before the advent of fossil-based engines. A big panzootic in horses would have had the potential to reach everywhere humans were and, as the data suggests, beyond, as even the viruses in wild ducks today have internal genes dating back to the 1870s.

Horse pulling a tram, late 1800s. (From BBC - A history of trams)

Horse pulling a tram, late 1800s. (From BBC – A history of trams)

Another important finding in this study is that the Spanish flu of 1918 actually seems to have originated in the Western hemisphere, and perhaps wasn’t overly Spanish after all. Reading this paper is just overwhelming for a flu biologist. Full of trinkets and goodies – I will return to it many times, for sure.

OK, let’s rewind the tape all the way to the beginning and look at the rationales a bit deeper (for those of you who are interested).

What evidence did we have of different evolutionary rates in influenza viruses before this study? What would we expect? To start with, different species have different morphology and physiology, the receptors that the virus binds to vary in their molecular structures, as do the distributions of cells with different receptor types in different organs. A duck and human are different both on the outside and the inside, so to speak. Also, the routes of virus transmission are different, being mainly fecal-oral in birds, and mainly respiratory in mammals. You release influenza virions when you sneeze, while the ducks poo them out with their feces.

These are well-known facts. We also knew that a host shift could make a big splash, both for the virus that is introduced, and for the viruses that were there before. For instance, when H5N1 crossed from wild birds into poultry a decade and a half ago it embarked on a rapid evolutionary journey. It quickly acquired a score of new mutations as it adapted to the poultry niche and, with time, geographically separated virus clades emerged – each with its own evolutionary trajectory. This initial evolution was fast and furious, and much faster than the normal pace seen in wild ducks.

Another example is pandemic flu in humans: when a new virus is introduced in the human population it will rapidly increase in frequency and is likely to replace previous strains, especially if they are of a subtype similar to those that dominate the existing human virus population. The new virus sweeps the old one out, just like the broom analogy above. Thus, the H1N1 Spanish flu from 1918 was replaced by the H2N2 Asian flu in 1957, and was in turn replaced by the H3N2 Hong Kong flu in 1968.

Thus, there were several good reasons to assume that speed of virus evolution is dependent on where it happens to be. But how can we estimate these different rates of evolution, and how can we use that knowledge for predictions? This is really the big problem – a big technical problem.

Imagine you want to draw an evolutionary tree. The tips of the tree represents the taxa we have included in the analysis, and the branches represent the evolutionary pathways backwards in time all the way to the most common recent ancestor, represented by the trunk. Drawing this tree is very simple if the number of tips is few. Consider that we want to make a tree representing the evolutionary history of the large primates gorilla, chimpanzee, bonobo and us humans. There is just a limited set of possible tree that can be drawn, and with good input data we are likely to get a tree that encapsulates the true evolutionary history of these apes. But if we want to make a phylogeny of all mammals, or all plants of the Solanaceae family, or all contemporary influenza viruses, the number of possible trees quickly becomes massive, a big Amazon jungle of potential trees. Finding the ‘right tree’ in this forest of possible trees will rely on a multitude of factors.

If we then also want the tree nodes to correspond to real time events – dating the tree – it becomes very computer intensive. And very sensitive to model assumptions. To date the tree we need some estimate of time, a clock. There are a number of ways to set that clock, either to use specimens that have been dated – for the phylogeny of apes above, that could be fossils from strata with known ages. Or for viruses, specimens collected at different time points. But for most occasions we will have to make assumptions about the clock given the variation in genetic sequences in contemporary samples.

For flu, it has been conventional to use a model of time called a ‘relaxed clock’. When using that model it has been possible to estimate the times of divergences of different virus subtypes, making a timetable of flu history. In the new paper, the authors first developed a novel evolutionary model that they termed ‘host specific local clock’. This model allowed for different rates of evolution in different host species. They tested the performance of this model together with other models on a simulated dataset of influenza viruses. Prompted by the good fit of the model to the ‘true’ simulated data they then applied it to a large collection of influenza sequences and mapped the time of different virus divergence patterns; the forks in the trees. And the end product is the little figure I shared above.

Science is about testing hypotheses. To challenge established truths over and over again to see if they hold. This is particularly true for the influenza A virus research field, where the whole narrative has been rewritten over and over again. When I started to become interested in this virus some 10-12 years ago the field virtually exploded with activity. Some of the old knowledge still holds, but it is inspiring to see how large progress the field have gone through in recent times. A big collective forward movement. But, fortunately for us scientists, it seems that this virus still has many secrets yet to reveal.

Link to the paper:

Michael Worobey, Guan-Zhu Han & Andrew Rambaut. 2014. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature. doi:10.1038/nature13016


If you enjoyed this post, or other posts on this blog, why not follow the blog via email, Feedly or get updates via Twitter by following @DrSnygg?

Gustave Doré [Public domain], via Wikimedia Commons

To share, or not to share your data – some thoughts on the new data policy for the PLOS journals

By Jonas Waldenström

This post was intended as a comment on a post by Terry McGlynn at Small Pond Science, but once I started writing it soon swelled and I transformed into a blog post instead. I suggest you start with reading the original post here. The short version is: the leading scientific publisher PLOS have taking the open access movement one step ahead, from not only making the studies freely available, but to also make it mandatory to include the original data used to draw the conclusions of a paper. It seems as a good thing – too many studies can’t be replicated, much data is lost when people leave science, or are deposited in ways that don’t stand the test of time. However, it is also problematic, as good quality data is painstakingly hard to gather and could be viewed as a currency on its own.

I have published quite frequently with PLOS journals, and in particular with PLOS ONE. In fact, over the last couple of years I have authored/co-authored 15 papers in in PLOS ONE and two in PLOS Pathogens. My experiences so far have been very positive: good review processes, beautiful final prints, and, because of absence of pay walls, a very good spread among peers. I regularly check the altmetrics of the articles and it is exiting to see how many times they are viewed, downloaded, and cited. I have been very pro-PLOS, even in times when many ecologist peers didn’t consider PLOS ONE as a venue for publication. However, all the good things with PLOS considered, the new policy launched a little while ago have made me a bit more reluctant for submitting future work to the journal.

So what has changed? Is it a revolution, or ‘same old, same old’ – the answer is no one knows for sure (at least I don’t). The short version of the policy was published as an editorial in PLOS Biology, and although it states that the new data policy will make ‘more bang for the buck’ and ‘foster scientific progress’ it wasn’t overly clear what it means in practice for the researcher about to submit a paper:

 PLOS defines the “minimal dataset” to consist of the dataset used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Core descriptive data, methods, and study results should be included within the main paper, regardless of data deposition. PLOS does not accept references to “data not shown”. Authors who have datasets too large for sharing via repositories or uploaded files should contact the relevant journal for advice.

In many cases it wouldn’t make a huge difference. There are already options to upload supporting data as appendixes, and repositories like Figshare, Genebank and Dryad are already out there. As an example, in one paper published in PLOS Pathogens we had 20 supplementary files, including details on statistical analyses, plenty of extra tables and figures. And for another publication in Molecular Ecology, the full alignments of genes analyzed were uploaded (per the journal instructions) to Dryad to facilitate others to replicate the analysis if need be. But for much of my current work on long-term pathogen dynamics in waterfowl it wouldn’t feel good to upload all the raw data. The question is really what a minimal dataset is. And importantly, what data you don’t include in the dataset.

A FAQ from PLOS has been published where this is addressed, but as of yet it remains to see how this is done in practice:

The policy applies to the dataset used to reach the conclusions drawn in the manuscript with related metadata and methods and any additional data required to replicate the reported study findings in their entirety. You need not submit your entire dataset, or all raw data collected during an investigation, but you must provide the portion that is relevant to the specific study.

So why am I a bit reluctant? Let me give you some background. The study system I run was started 12 years ago by professor Björn Olsen, and I have taken over the running of it 5-6 years ago. Over the years we have published quite many papers on avian influenza in this migratory Mallard population, but it is now when the time series is long enough that we can do more advanced studies on the effects of immunity on disease dynamics, long-term subtype dynamics, and influenza A virus evolution. Big stuff, based on the same large datasets but analyzed in different ways. If publishing one paper now means we have to submit 12 years of original data, i.e. the ringing data and disease data of 22,000 mallards, then it comes with a potential cost. I see the dataset as a work in progress, a living entity that is accumulating new data as we go along and where analyses are planned for both now-now and in the distant future. In the cow analogy of Terry McGlynn, the dataset is a herd with a balanced age structure, some cattle destined for the pot already today, some fattening for slaughter, and yet others to grow into the breeding stock.

The unique longevity of the time series has gotten our research into much fruitful collaboration. Since a few years we work with capture-mark-recapture researchers in France for making epidemiological models, just to name one aspect. I have also turned down invitations to collaborate, although much more rarely. In those instances it has been because we have planned to do these analyses ourselves, or that the time wasn’t right to do so. And just to make this clear: the cases of not sending data were not refusals to send background data for replicating a paper, rather they were requests to do new stuff with it. With posting your raw data in close to its entirety, such situations could be cumbersome, and you run the risk of seeing your data being analyzed by someone else.

The problem is much less on genetic data. After all, it is conventional in all fields to submit your sequences to Genebank along with your submission, and you know that it works. I have seen ‘our’ sequences in many phylogenetic trees without having been asked about the usage in advance. But it is one thing where your data is used as a brick in a new construction, and another to have someone taking over your house and having to give away the key. Many people say that risks of getting scooped of your data are exaggerated, and this is likely true. Scientists are usually decent people, after all. But, knowing there is such a risk, albeit small, can make an impact on publication options, or to delay publication of smaller papers until all the big papers from a dataset have been published (which may become problem for graduate student theses).

It is essential that we very soon gets to know what a minimal dataset is. For example, would it be OK to submit the raw data on Mallard infection histories without the unique ring number? Exchanging the individual identifier with an arbitrary number, for example. Or to exclude data such as actual date, morphology or indexes of condition? That way an analysis on the prevalence of a disease in a population (which is a very simple exercise) doesn’t immediately lead to the possibility for another researcher to investigate the effect of infection on a bird’s condition, effect of immunity, migration dynamics, to name a few options. Would PLOS allow that? I don’t know.

An issue raised by Terry McGlynn was the differences between a small lab and those at resource-intense research labs. In a small lab, the research takes longer time due to limited resources and smaller staff, and a good dataset is extremely precious and could also act as a currency, enabling co-authorships through collaborations. I would like to end with an additional story. A story of long-term data collected by volunteers at Ottenby Bird Observatory.

Ottenby Bird Observatory was founded in 1946 and have run without end since then. Each year volunteers help out with the trapping and banding of birds, mostly passerines, but also a chunk of waders, ducks and birds of prey. The number of birds caught each year is between 15 and 20 thousands, and in total more than 1 million birds have been banded. This dataset, together with all morphometrics collected in connection with the banding and all band recoveries is a unique and extremely valuable data series. The problem is that few pay for it. The observatory receives a small fund from the Swedish Environmental Board, but not enough to cover the costs. Additional money comes from tourists and subsidies from the Swedish Ornithological Society. And, although not substantial, from researchers that pay for the service of collecting data or getting data from the trapping series.

The observatory really provides a service. To date at least 278 peer-reviewed articles have been published with data emanating from Ottenby, including two papers in Science and >10 in PLOS journals. The unbroken trapping series has proven to be one of few datasets where the time scale is sufficiently long to investigate effects of climate change on biological phenologies, measured as timing of migration of common passerine birds. Researchers that want to use the data put in a request to the observatory, and a sort of contract is settled between the parties. Usually the money is little, but also small sums are essential for a volunteer-based operation. What happens when all data becomes immediately available for everyone without restrictions? A question to ponder, really.

In many ways, PLOS has revolutionized scholarly publishing and the open access movement has made research results available fast for the masses. I sincerely hope that the new data policy does not inadvertently work the opposite way, by making researchers less prone on submitting their studies to PLOS journals. It is still too early to tell, but I think many like me really wonders what the ‘minimal dataset’ really means in practice.


If you enjoyed this post, or other posts on this blog, why not follow the blog via email, Feedly or get updates via Twitter by following @DrSnygg?