Note taking apps for MacOS

Way way back, a long time ago on a computer far away I wrote about Evernote versus Simplenote. I have actually abandoned Evernote since and have started using Bear instead. I like writing notes in Markdown, and this is a very nice application for it.

Anyway, someone read the old post and wanted me to link to this review of note taking apps for MacOS, so here is the link. I hope you find it useful.

Update on Functional Data Structures in R

I’ve gone through the entire manuscript of Functional Data Structures in R now and edited it.

There is definitely still some work to be done, but for now I need to put it aside for a few weeks. I’m hoping to get some feedback on it from some algorithmic people and then make a final version I can send to Apress if they decide to give me an offer.

In the mean time, I’ve started thinking about the next R book. I think I will write about embedded domain specific languages. I already have some ideas for what to include, but not yet enough for a full book, so some more thinking is required.

It will be a lot of meta-programming, but this time around I will base it on tidyeval instead of raw quotes and eval. Tidyeval, in the rlang package, provides a lot of great tools to design and implement domain specific languages, and it will be fun to play around with that.

If I can get my new combination of iA Writer and WordPress to play nice together, I will give you an example in a post very soon.

Nice to see my books around

Rene Thomsen posted this picture on twitter with the text

@ThomasMailund we have a lot of your ‘Beginning Data Science in R’ books available for both current and future data scientists at Scio+

It is great to see that somebody buys them at least. And there’s more there than I ever had myself.

By now, I only have one copy left of the Data Science book and one copy of the Functional Programming book, but still plenty of Metaprogramming and Object-oriented programming.

Fast admixture analysis and population tree estimation for SNP and NGS data

Jade Yu Cheng, Thomas Mailund, and Rasmus Nielsen

Bioinformatics (2017)

Motivation: Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components.
Contribution: We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana.