Computer science, bioinformatics, genetics, and everything in between
Author: Thomas Mailund
My name is Thomas Mailund and I am a research associate professor at the Bioinformatics Research Center, Uni Aarhus. Before this I did a postdoc at the Dept of Statistics, Uni Oxford, and got my PhD from the Dept of Computer Science, Uni Aarhus.
I got started on a new book, on domain-specific languages in R, a little while back and also got started on the first proper chapter. Then I had to leave it for a bit to work on another project, but while it’s been simmering I’ve started to have second thoughts about its structure.
What I had in mind when I started was another short book where I would explain a little bit of meta-programming and a little bit about operator overloading and show how you could combine this to write small embedded languages in R. However, I’m now halfway through the first chapter that was supposed to be about how to use classes and operator overloading to do a bit of this, and it turned out to be all about proper language parsing and syntax tree manipulation, so now I’m thinking that I might as well do a proper job of it.
I’m not entirely sure how to structure the book, now, but I think I want some of the easy stuff in there still. Maybe start out with a chapter that shows this. I have a small example where I use delayed evaluation and some rewriting to optimise matrix multiplication and I could use that as a motivating example. Then the next chapter could go into some theory about parsing and evaluating expressions such to motivate the rest of the book.
I had a mental distinction in mind where I would distinguish between standard evaluation and non-standard evaluation, but I’m not sure that is the right way to split the book now. Instead, if I am going to treat books properly, I will make the distinction between parsing an embedded language and evaluating it. The evaluation part can then have some (simpler) standard evaluation parts and some (perhaps harder) non-standard evaluation parts.
I’ll be sitting on a train for an hour and a half today, so I have time to think about that.
I listen to audiobooks when I’m walking to and from work, and today I started listening to this one, read by Al Franken himself. It is absolutely fantastic. If you are into audiobooks, you should listen to it.
Over the weekend I finished updating the manuscript for Functional Data Structures in R after the tech reviews and proof-reading from a friend of mine. It is now being processed by Apress and I should get the proofs shortly. After that, it goes to print. I was happy to see that there is already a cover done, and you can pre-order it on Amazon. Probably a good idea if you are interested in it, because the pre-order cost is lower than my other (and shorter) Advanced Statistical Programming books price tags.
The only thing I worry about a bit is the description on Amazon. It says that the book covers implementing data structures in C/C and how to wrap those.
It doesn’t. I do write that this is an option, and a valid one, if you want to write really efficient R packages, but the point I am trying to make in the book is that you can get most of the performance gains you need by using appropriate data structures rather than micro-optimisations, and all the data structures I present are implemented in pure R.
The publisher and I are fixing the description. It is a pure R book, and mostly about purely functional data structures, although there are some data structures using environments to modify states.
I just finished preparing slides for my new computational thinking class, and since I don’t want to start on anything new half an hour before I leave the office, I went on twitter and saw this interesting post. It’s a programming challenge described here, with an imperative solution, and with a functional solution described in the first post.
The imperative solution is rather involved and looks like this:
The functional solution is more elegant but also a bit slower. Broken into three functions, as it is in the blog post referenced above, it looks like this:
Well, if you want to write fast code, your first thought should be: can I come up with a simple algorithm. Later, you can worry about implementing it.
Here, we have a problem that involves figuring out if a wall at any given position has higher walls to the left and the right, and if so, what the smallest wall to the left or right are. So, very easily, we can break it down into collecting a list of the tallest wall seen to the left and to the right of each position. This is something we can compute with the accumulate and accumulate_right functions from purrr. After that, we can solve the problem by mapping over these two vectors and the vector of wall heights, and we have a three line solution to the challenge.
Not only is this solution much simpler, it is also a lot faster.
Update: Ok, if I had been smarter half an hour ago I would have seen that the pmap expression was overkill. A vector expression will work just fine and be faster.