Workflow for ReadCube and Papers3

October 31st, 2014

I am switching between ReadCube and Papers a lot. I like both tools (although I have had my share of problems with both as well), but they have different strengths and I want a combination.

ReadCube is really great for finding papers. Their enhanced PDFs makes it very easy to get to cited papers and they are really good at finding papers that have cited what you are reading. So for reading through the literature it is my favourite. It sucks when you have to cite papers, though. I have tried to use its citation tool but it only works with Word and you are screwed if you want to sort the cited references alphabetically.

Papers is better for citing, especially when you want to use BibTeX. It works well with most editors and it is very easy to export your library to a BibTex file. It isn't automated, which is a pity, but it is reasonably good.

So what I really want is to use ReadCube when collecting my papers and then use Papers when citing them.

Here's a little trick for automatically importing papers to Papers when you import them into ReadCube.

You can tell Papers to watch for PDFs in a folder and automatically import them. So if you go to preferences you can tell it to watch the ReadCube files. It should be in your Documents folder and be called something like ReadCube Media.

This won't import the papers there, though, it just makes a folder where, if you add the PDFs there it will import them.

To automatically add new papers you go to Automator and make a Folder Action that simply copies new files into this folder.

Now, when you import a file in ReadCube you will also get it added to Papers.

Discusing papers during the review process...

March 3rd, 2014

For the last two-three years I've been signing my reviews. I find that I write better reviews when I'm not anonymous.

It shouldn't be like that, but it is. I'm less likely to get lazy if I know that people will see who wrote the review.

It creates a dilemma, though: should I discuss manuscripts with authors before they are published? I am likely to run into them at meetings, and it is hard not to talk about the manuscripts there. If it is authors I'm frequently discussing with online the dilemma is there as well.

As a reviewer, you have two tasks: making sure that the science is solid and improving the presentation of the results (the manuscript). The second part is probably easier to do with a back-and-forth discussion, but the thing is, as a reviewer you are not really working for the authors but for the editor.

It is the editor who ultimately has to make a decision on the manuscript, and it is him or her you are assisting. This is why you really shouldn't write your recommendations for acceptance or rejection in the review but only tell the editor. The editor needs to know what your concerns are and what the authors are doing to address them.

On the other hand, the paper is moving faster forward when you don't have to wait weeks between each point and counter-point.

It gets even weirder when the manuscript is already out there on a preprint server, and you have already discussed it with the authors before you find yourself a reviewer of it, which has happened to me a couple of times recently.

How do you guys deal with these things?


February 26th, 2014
Ok, I just got invited to a Danish blog site where I just wrote my first post. Below is the google translation of it, fixed for some rather bad translations. Like "Since nothing" should be "Science, nothing else".
Below my first post there:
Hi there.
I'm new here on the blog so I'll just say hi and tell you who I am.
My name is Thomas Mailund. I am a computer scientist but the last decade or so I have worked with genetics and especially human and the great apes evolution. That's what I will be blogging about in the future.
My first post will be about something else entirely. It communicating science. It was not always that it was natural for scientists also help to disseminate their research and their knowledge outside academic circles .
One of the first who really did it was the cosmologist Carl Sagan. Since then many have followed, but I think it really started with him. He showed up a bit on radio and television in the past, but what really got science communication initiated was a television program he did in 78 and 79 called Cosmos, broadcasted in 1980.
I was just five years old when Cosmos was sent. I have no idea when, if at all, Cosmos came to Denmark. The only science program I remember from my childhood is Vitek. I'm not embarrassed to say that I downloaded Cosmos as internet pirate. It was worth it. But you can see it on YouTube if you're interested.
There is no doubt, however, that the Cosmos change the relationship between the science ivory tower and television viewers. A population who didn't care about science after the moon landing - or perhaps more a press who lost interest - was interested again by the universe's wonders. For 10 years Cosmos held the record for the most popular PBS show and as far as I know is still the PBS program most have seen around the world.
Carl Sagan was never a member of the Academy of Sciences of the United States . Whether he should have been a lengthy discussion, he was best known for explaining science to non- scientists and maybe not so much for its own science, but I will not go into . It's a different story.
What he did was absolutely fantastic. He had showed us the fascination of the cosmos and was science's voice to a whole generation. A lot of what he did was pure "pop". A gold plate on a probe that no civilisation will ever see is nothing but pop. But it's good pop. It creates dreams, and that we need, after all.
Too often we see headlines with the scary stuff science has now made ​​possible. We forget too easily what science has done. Like we can easily live in harmony with nature, but if we do so we will not live much longer than 35 years, and there wouldn't be seven billion of us.
I will leave for another day the discussing of how much basic science contributes to our lives, because it is not measured in only how long we live and how many of us it can sustain, after all...
Carl Sagan did something unheard of and fantastic. He explained the deep questions so that everyone could understand them and created an interest in science in a world that was about to forget it. In the post World War II, scientists superstars, everyone knew who Albert Einstein was . All the moon landings . Just ten or fifteen years later, wonders forgotten and people feared the science that created the atomic bomb and the Cold War .
Nowadays people fear science again. There are horror stories on TV about GM plants. These plants are the only option for many millions if they want to survive . Sure, you must be a little scared about what we can knit together in our science labs, but still...
I have even heard people oppose genes in their food. It is a profound ignorance. It is time that we have explained what science tells us and what the technological wonders we live with today actually are and what they are not.
We live in the future we could not imagine mere twenty years ago. From anywhere on the planet (almost) you can pull out your mobile phone and call your friend on the other side of the globe . On the internet you can argue with the people around the world about the most trivial things. We live in the future and did not notice when we got here. And we totally forget what created the future we live in.
It was nothing else.
We live in a future science fiction writers in the 70s could not imagined, and we forget about it altogether. They thought that maybe phones would allow us to watch each other on a TV screen, but not that these phones would be carried around in our pocket and that we would be able to call each other from a mountaintop in the Alps. It's almost magical, if you do not know any better.
I find the decoupling of the wonders I see in my everyday life and the science that led to the disturbing. I fear a future where people do not understand science and how much it has done for us.
If you are not an expert in a scientific discipline you do not have an earthly chance to keep up on the edge of the science. I will freely admit that I have very little understanding of modern physics. But I understand how science works. How to evaluate ideas and assess whether they stay or not. I trust the process and believe that scientists check each other and speak up if anything does not hold water. I know that we will in genetics, which I understand and work with.
I do not expect that the people will follow us to the shores of ignorance where the exciting science unfolds. The science is so specialized today that it is an impossibility. I just hope that we can respect the method and have confidence in the process.
We have come a long way since we first invented the scientific method , and we can go much further. However, this happens only if everyone is on the bandwagon and see the utility of science.
Therefore,  Cosmos is so important.
That is precisely why I am very pleased to learn that there will be a new Cosmos. Seth MacFarlane known from American Dad and Family Guy put a new version together, and Neil deGrasse Tyson will be the presenter; he takes the role as this generations Carl Sagan.
It will be fantastic and I am excited to see it. Until then, I will watch Cosmos again on YouTube.

High school

August 24th, 2013

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way - Dickens

Middle school — or junior high, which we called it when I went — should really be classified as a form of child abuse. I recognize that it isn’t as bad for everyone as it was for me, but those two years I spent in 7th and 8th grade were easily the worst and most unhappy times of my life. - Starts with a Bang

My high school (gymnasium) just turned 90. I wasn't actually aware of this until I visited my home town this weekend where I got jubilee issue of the local news paper about it from my uncle.

I can sort of relate to Ethan's (from Starts with a Bang) description of high school. Late teens is definitely a time where you are being judged all the time for what ever you do, fair or not. It wasn't a bad time for me, though.

Working in science was far from my ideas of my future at the time. I wanted to be a musician and spent my years there playing in bands (earning a small living teaching guitar and playing at bars) and made a lot of friends doing that.

Reading this special issue of the paper about my old school is a lot of fun for me. I see people there who were good friends back then. People I played music with or just partied with. Found out for the first time that one of the professors at computer science went to my old school. Really a lot of fun.

I remember my time there very fondly. It was a time of music, philosophy and a lot of parties. Very different from my life now, focused on science. I think I would have been a very different man now if I hadn't spent three years immersed in what is essentially humanities.

So happy birthday Herning Gymnasium and I hope you will invite me to write a little piece when you turn 100.

Effective population size...

July 31st, 2013

In population genetics we have this thing called the "effective population size". I'm thinking about it right now because we are discussing it on twitter.

It's a parameter of different mathematical models, and essentially a way of translating from one model to another.

In the Wrights-Fisher model, that I've mentioned a number of times before, you imagine that you have a fixed population of N individuals. If you have diploid individuals (like you and me) you have 2N genomes in your model, so mostly you will see the model using 2N genes, and the model then describes how these 2N genes evolve in a population.

The model describes how these 2N genes change in frequency as some have fewer and some have more descendants.

It's not the only model we use, however, there are lots of different models that are useful for different purposes, so we tend to pick one that fits our purpose for whatever we are trying to do. One example is the coalescence model, where instead of having a population of 2N genes evolving forward in time, you consider a sample of n genes out of the 2N and looking back in time on how their ancestry was.

In coalescence models, you don't have a population evolving one generation after another, instead you have a continuous time (moving backward in time, so time t=0 is the present and t>0 is in the past), and you model how these genes find common ancestors as you go back in time.

In the coalescence model, the rate at which two genes find a common ancestor (the rate at which the coalesce) is 1, but this correspond to the number of generations in the WF model we have to go back in time for two randomly selected genes will last have had a common ancestor (which turns out to be 2N generations) so the coalescence rate in the coalescence model is directly related to the size of the WF population.

In diffusion models we work forward in time, like the WF model, but in continuos time, like the coalescence models. This time, we model how frequencies of gene variants changes over time as solutions to differential equations, but the speed at which frequencies change are directly related to the 2N in the WF. If we want to know how fast we expect to move from frequency x1 to x2 we can model this in a diffusion model, and if we consider a unit time in this model (so time t=0 to t=1) this will correspond to a time unit of 2N in the WF model.

We can move from one model to another by changing time units. A time unit of 1 in coalescence or diffusion models correspond to a unit of 2N generations in the WF models. So N is important as a parameter because it lets us translate from one time unit to another, and relate results we can get mathematically in one model to another model where the same then is true.

Incidentally, the "effective" population size, Ne, is just the N in this model, except that we talk about the breeding part of a population. If only 10% of a population actually contribute to the next generation, we don't so much care about the full population size, N, but only those that actually matter for the genetics, which we then call the "effective" population size, Ne.  In the models above, though, N is just Ne.

Things get a little complicated when you don't just have a single parameter to translate from one population to another. In the models I have mentioned, the N can directly give you the expected number of mutations you see as differences between two randomly chosen individuals and at the same time the total amount of differences within a population. Those two numbers are not exactly the same measure of diversity in a population however. If you pick a sample of n individuals and these all have their most recent common ancestor (MRCA) in exactly the same individual, then differences between them are independent. If the MRCA depend on which pair you choose, mutations will be shared between some lineages and not others.

The probability that the MRCAs are different or the same depends on demographics, and in an expanding population (like humans) you are more likely to have independent lineages than in a population that has had the same size for a long time.

I like to think of Ne as a measure of the time it takes to lineages to go back to their MRCA, which works well for a coalescence model. If you have an expanding population, though, this way of looking at it doesn't quite let you translate between models.  In an expanding population you will find a MRCA faster than in staple population but the gene frequencies changes slower, so coalescence model will want a smaller Ne and a diffusion model will need a larger Ne.

The effective population size is a mathematical parameter and if you understand the models it is easy enough to translate the Ne from one model to another, but when it gets to modelling populations that are not at equilibrium you really need to just consider it parts of the math and not try to interpret it too much...