Uhoo! I managed to figure out how to embed a search on Amazon for my books on my books page. It looks like this:

Cool, right?

Skip to content
# Category: Fun

## Amazon search

## Dictating instead of writing today

## Fun with non-standard evaluation and binding parameters

## Trying out the Ulysses editor

## Adding units as types to numbers in R

### Representing expressions

### Unit conversion

Computer science, bioinformatics, genetics, and everything in between

Anything that isn’t really serious… not necessarily fun, either, but I am sure that if I write under this category I am doing it for fun.

Uhoo! I managed to figure out how to embed a search on Amazon for my books on my books page. It looks like this:

Cool, right?

I have done something to my back that makes it very painful to sit at the computer today. It’s probably because I’ve been sitting too long at the computer the last couple of days. I have had this problem before, but usually only if I’ve been using the mouse too much. That hasn’t been a problem lately, but I do spend a lot of hours at the computer these days compared to some weeks ago. Anyway, I have some lecture notes that I would like to finish this week, so I can’t really give it a complete rest. So this gives me an excuse to try dictating text instead of writing.

I am dictating this blog post on my phone. I am dictating in the Ulysses editor. That is the editor I use for blogging. For some reason, I cannot dictate in Ulysses on my iPad. On the iPad I can dictate to iA Writer, but not Ulysses. On the phone I can dictate in both.

I’m surprised at how well dictation is working. When I tried it some years ago it really didn’t understand my thick Danish accent. Now it seems to understand most of the words I’m saying; I do have to change a few of them after I’ve dictated, but mostly to get stuff right. I still prefer using the keyboard over the microphone, but dictation is a working solution while my back is painful.

I am thinking about examples for something useful but simple that can be done with non-standard evaluation in R for when I, in the far future, get to that book. And preferably something I can find a use for myself so it isn’t just wasted on writing but also helps me in my programming.

One thing I miss in R is assigning to more than one variable in pattern matching, or just for tuples or lists for that matter. So that is something I could start playing with. So I did, and you can see the first result on GitHub.

Right now it is just equivalent to tuple/list assignment in Python where you can assign multiple values like this:

x, y, z = 1, 2, 3

and you can use this for vectors and lists. Nothing fancy with pattern matching at all.

The syntax is inspired by the boost implementation of this in C++ and you use a function `bind`

to specify the variables you want to bind and then an infix operator `%< -%`

to assign to the binding.

bind(x, y, z) %<-% 1:3

I briefly considered overwriting the assignment operator, `<-`

, because I just know that I am likely to write

bind(x, y, z) <- 1:3

in such expressions, but sanity prevailed and I left the assignment operator the way it is. It breaks all kinds of things that I cannot fully understand if I mess with it. So therefore a new infix operator `%<-%`

.

The implementation is pretty simple. I just collect the parameters and the enclosing environment in the `bind`

function and then put values in the environment in the `%<-%`

function.

bind <- function(...) {

bindings <- eval(substitute(alist(...)))

scope <- parent.frame()

structure(list(bindings = bindings, scope = scope), class = "bindings")

}

.unpack < - function(x) UseMethod(".unpack")

.unpack.default <- function(x) x

.unpack.list <- function(x) x[[1]]

<code>%<-%</code> <- function(bindings, value) {

if (length(bindings$bindings) > length(value))

stop("More variables than values to bind.")

for (i in seq_along(bindings$bindings)) {

variable <- bindings$bindings[[i]]

val <- .unpack(value[i])

assign(as.character(variable), val, envir = bindings$scope)

}

}

bindings <- eval(substitute(alist(...)))

scope <- parent.frame()

structure(list(bindings = bindings, scope = scope), class = "bindings")

}

.unpack < - function(x) UseMethod(".unpack")

.unpack.default <- function(x) x

.unpack.list <- function(x) x[[1]]

<code>%<-%</code> <- function(bindings, value) {

if (length(bindings$bindings) > length(value))

stop("More variables than values to bind.")

for (i in seq_along(bindings$bindings)) {

variable <- bindings$bindings[[i]]

val <- .unpack(value[i])

assign(as.character(variable), val, envir = bindings$scope)

}

}

(Sorry for the formatting of the `%<-%`

name here, I don’t know why the highlighter absolutely has to replace backticks with code-tags, but it does).

The unpack helper functions are just there so I get the value when I have a list and not a length-one list. Otherwise, there is nothing much to it.

For my next trick, I need to figure out how I can bind variables by accessing the names in a value list.

For some reason iA Writer stays at ~50% CPU on my laptop, draining the battery and heating up the entire house, so I went looking for another editor to play with.

I have been looking at Ulysses for a while and found it a bit pricy, but what the hell, if it is a great editor it is worth it. So I am trying it out, and I am writing this post on the iPad version. For some reason I had trouble connecting with WordPress from the laptop.

Actually I can not connect to WordPress.com on either version of the app but I can connect to my own server on both. There is just a timeout when I try to publish from the laptop. I will look into that later.

I am pretty happy with it so far. There is a nice distraction free mode, reasonable word count and other statistics, and the way text is organized as sheets you can move around, although something you need to get used to, is pretty neat.

My only main concern is that you only have all the features if your files are imported into the application. You *can* edit plain Markdown files but you only get a limited number of the features then. Which means that it is better to import but for my writing that destroys the workflow I have with knitr and Pandoc. I have to remember to export explicitly before I can compile a document.

Now Ulysses actually have some nice style sheets for exporting so I guess it isn’t a problem for most, but I do need to run my documents through knitr for evaluating R code so I can’t work exclusively with Ulysses. I will see how it goes during the next week where I will try to do my writing in Ulysses.

I haven’t tested things like cross reference and citations yet so I don’t knowi what f it can do that. Otherwise I definitely need the Pandoc workflow and then it is annoying if it is necessary to explicitly export.

We will see how I like it after a week, I guess.

This weekend I’ve been playing with an R package that assigns units to numerical data and checks unit consistency in arithmetic expressions.

It is something I’ve been wanting to write for a couple of weeks, after I had a chat with one of my PhD students that involved a lot of unit conversions. It is frighteningly easy to get the units wrong in an expression if you are not careful, but keeping track of units can easily be done completely automatically by a computer. Unlike my brain, the computer doesn’t have any problems keeping track of what units, which we can think of much like types, are associated with a value.

Such a package is certainly something I could find useful in my own work, when I am analysing coalescent models and need to scale between mutation rates, in years or generations, and coalescence rates, in 2N generations, where I never get it right the first time around. I could also see it as a nice example of something to include in my next book on object oriented programming in R. So looking into it has been on my todo-list for a little while.

Saturday morning I googled around and found that there was already such an R package on GitHub. I had been beaten by a couple of months. So I had a look at the existing package.

It assigns units to numerical values by setting an attribute that describes the units as a string and then overloads arithmetic operations so they check the equivalence of types when you add or subtract them, possibly by converting types to make them equal, and modifies the strings when you multiply or divide. All the manipulation of types is outsourced to the `udunit2`

package. Because the units are simply strings they are not simplified at the end of an operation.

This works fine, but isn’t a good example for me to use in a book, and it only works if you stick to units known by `udunits2`

. So I dug into the code to see if I could write something based on it that I could use for my two purposes.

My version can be found here.

The existing package creates unit holding numerical types using the standard pattern in R and has an as.units function for it. It takes a numeric type, a vector or matrix for example, and a string describing the units.

x < - as.units(1:4, "km/h")

The units are checked against the udunits2 package to make sure that they are valid units.

That check of the units won’t work if I want to allow user-defined units, so I had to get rid of that. If I allow arbitrary strings as units, though, a simple spelling mistake will introduce a new unit. Since expressions involving `units`

types will be checked for unit consistency it is not something that will go unnoticed and cause subtle bugs, but it is still worth avoiding.

So instead of having arbitrary strings as units I decided to require that units are explicitly defined, but by being able to multiply plain numbers to units we can get a natural syntax for defining numerical values with associated units.

km <- make_unit("km")

h <- make_unit("h")

x <- 1:4 * km/h

h <- make_unit("h")

x <- 1:4 * km/h

I have pulled the units defined in `udunits2`

and put them in a list, `ud_units`

, so you can also just evaluate expressions in the context of it and write

To be able to manipulate units I have an explicit representation of the unit types. Objects of the units class are just numerical values with an attribute that points to a class of type symbolic_units and it is the latter that represents the actual unit type.

So far I have just implemented it as two lists, the nominator and denominator of units. Each are just represented as a sequence of unit dimensions. This isn’t *quite* as general as one could hope for, you cannot have non-integer numbers of a unit and you cannot have a log-transformation of a unit and such, but it is a start. I believe the design can be made more general by having the `symbolic_units`

represented as expressions instead of an explicit fraction of unit symbols, but I will have to think a bit more about that.

But because I have a representation of units I can work with in R, I can simplify units so for example `m/s/m`

will be reduced to type `1/s`

. Such a reduction is simple enough, you can just run through the nominator and denominator and catch when a symbol appears both places and then remove it from both nominator and denominator. Since `m*m/m`

should have type `m`

you can only remove a symbol the right number of times, but that is not hard to get right.

Of course, you would also want to reduce something like `km/m`

since that is essentially just `1000*m/m`

and that makes it a little more tricky. For each symbol pair you compare you should check if you can convert one to the other, not just whether they are the same symbol, and you need to keep track of conversion constants.

I didn’t come up with a nice way of doing it, so I just brute-forced it. I don’t expect unit lists to particularly long, so I don’t worry too much about runtime issues (and if that becomes a problem you can get a speedup just by removing the type in time-critical parts).

My implementation looks like this:

.simplify_units <- function(sym_units) {

# This is just a brute force implementation that takes each element in the

# nominator and tries to find a value in the denominator that can be converted

# to the same unit. If so, we pull out the conversion constant, get rid of

# both terms, and move on. At the end we return a units object with the

# conversion constant and the new symbolic units type. Converting units can then

# be done as this <code>x <- as.numeric(x) * .simplify_units(units(x))</code>.

# Returning a units instead of a symbolic_units object is not idea, it means that

# you cannot simply multiply or divide symbolic units together, you need to wrap

# each pair-wise operator in units() but it is necessary when conversion constants

# must be taken into account.

conversion_constant <- 1

new_nominator <- sym_units$nominator

new_denominator <- sym_units$denominator

delete_nom < - c()

for (i in seq_along(new_nominator)) {

for (j in seq_along(new_denominator)) {

conversion <- .get_unit_conversion_constant(new_nominator[i], new_denominator[j])

if (!is.na(conversion)) {

conversion_constant <- conversion_constant * conversion

delete_nom <- c(delete_nom, i)

new_denominator <- new_denominator[-j]

break

}

}

}

if (length(delete_nom) > 0)

new_nominator < - new_nominator[-delete_nom]

as.units(conversion_constant, .symbolic_units(new_nominator, new_denominator))

}

# This is just a brute force implementation that takes each element in the

# nominator and tries to find a value in the denominator that can be converted

# to the same unit. If so, we pull out the conversion constant, get rid of

# both terms, and move on. At the end we return a units object with the

# conversion constant and the new symbolic units type. Converting units can then

# be done as this <code>x <- as.numeric(x) * .simplify_units(units(x))</code>.

# Returning a units instead of a symbolic_units object is not idea, it means that

# you cannot simply multiply or divide symbolic units together, you need to wrap

# each pair-wise operator in units() but it is necessary when conversion constants

# must be taken into account.

conversion_constant <- 1

new_nominator <- sym_units$nominator

new_denominator <- sym_units$denominator

delete_nom < - c()

for (i in seq_along(new_nominator)) {

for (j in seq_along(new_denominator)) {

conversion <- .get_unit_conversion_constant(new_nominator[i], new_denominator[j])

if (!is.na(conversion)) {

conversion_constant <- conversion_constant * conversion

delete_nom <- c(delete_nom, i)

new_denominator <- new_denominator[-j]

break

}

}

}

if (length(delete_nom) > 0)

new_nominator < - new_nominator[-delete_nom]

as.units(conversion_constant, .symbolic_units(new_nominator, new_denominator))

}

The function `.get_unit_conversion_constant`

function is responsible for checking if I can convert one unit into another. If so, I get the conversion constant, if not I get `NA`

. This function is also used when explicitly changing a type in expressions like this:

That conversion simply looks like this:

There is also some parameter checking at the beginning but I have cut that here to just show the relevant code. The `.get_conversion_constant`

function checks if I can convert an entire units definition into another and it does that separately for nominator and denominator. If `units`

always have their unit representation simplified that should give the right result. And getting the conversion constant is just a question of matching up unit names once more.

.get_conversion_constant_sequence <- function(s1, s2) {

conversion_constant <- 1

remaining_s2 <- s2

for (i in seq_along(s1)) {

for (j in seq_along(remaining_s2)) {

convert <- .get_unit_conversion_constant(s1[i], remaining_s2[j])

if (!is.na(convert)) {

conversion_constant <- conversion_constant * convert

remaining_s2 <- remaining_s2[-j]

break

}

}

}

# if we make it through these loops and there are still units left in s2

# then there are some we couldn't convert, and then we return NA

if (length(remaining_s2) > 0)

NA

else

conversion_constant

}

.get_conversion_constant <- function(u1, u2) {

# if the expressions are well formed, and can be converted, we can convert

# nominator and denominator independently. If either cannot be converted

# then the function call returns NA which will also be returned (since NA and /)

# will convert to NA.

.get_conversion_constant_sequence(u1$nominator, u2$nominator) /

.get_conversion_constant_sequence(u1$denominator, u2$denominator)

}

conversion_constant <- 1

remaining_s2 <- s2

for (i in seq_along(s1)) {

for (j in seq_along(remaining_s2)) {

convert <- .get_unit_conversion_constant(s1[i], remaining_s2[j])

if (!is.na(convert)) {

conversion_constant <- conversion_constant * convert

remaining_s2 <- remaining_s2[-j]

break

}

}

}

# if we make it through these loops and there are still units left in s2

# then there are some we couldn't convert, and then we return NA

if (length(remaining_s2) > 0)

NA

else

conversion_constant

}

.get_conversion_constant <- function(u1, u2) {

# if the expressions are well formed, and can be converted, we can convert

# nominator and denominator independently. If either cannot be converted

# then the function call returns NA which will also be returned (since NA and /)

# will convert to NA.

.get_conversion_constant_sequence(u1$nominator, u2$nominator) /

.get_conversion_constant_sequence(u1$denominator, u2$denominator)

}

The remaining thing for me to figure out is how to implement that pair-of-units conversion in a general way. Right now I simply use `udunits2`

still, but for user-defined units that won’t work.

.get_unit_conversion_constant <- function(u1, u2) {

# FIXME: Unit conversion only has limited support right now

# I always just ask ud to convert units.

su1 <- as.character(u1)

su2 <- as.character(u2)

if (!udunits2::ud.are.convertible(su1, su2)) return(NA)

ud.convert(1, su1, su2)

}

# FIXME: Unit conversion only has limited support right now

# I always just ask ud to convert units.

su1 <- as.character(u1)

su2 <- as.character(u2)

if (!udunits2::ud.are.convertible(su1, su2)) return(NA)

ud.convert(1, su1, su2)

}

This, besides dealing with more complex unit expressions than fractions, is what I haven’t figured out how to do yet. Obviously I can do something with having a table of conversions, but I would need the transitive closure of such conversions and that could be a big table. I don’t expect it to be *very* bit, a typical program doesn’t deal with *that* many units, after all, but the user needs to be able to update it and I don’t want to compute the transitive closure every time I need to look something up, so I have to think a bit more about that. There would probably also need to be some consistency checking when a conversion constant is defined so you don’t get different values depending on how the transitive closure is computed.

I don’t expect it will be hard to program, I just haven’t come up with a nice solution yet. It is something to think about in the shower tomorrow.