## Lazy queues

Today I implemented one of the data structures from C. Okasaki, Simple and efficient purely functional queues and deques, J. of Functional Programming, 5(4), 583-592, (1995) in R, and wrote about it in the chapter I am working on for my next book.

This is a queue implementation based on lazy lists that combine list concatenation and reversal, to move elements from a “back” list to a “front” list one step at a time as the queue is modified, through a “rotate” function. It requires that you construct the rotations as lazy operations and in its simplest form, the rotation function could look like this:

rot <- function(front, back, a) {
if (is_nil(front)) cons(car(back), a)
else {
lazy_thunk <- function(lst) function() lst()
lazy_thunk(cons(car(front), rot(cdr(front), cdr(back), cons(car(back), a))))
}
}

The function does a combination of concatenation and reversal, that makes sense if you read my two previous posts. The back list is guaranteed to be one element longer than the front list when the function is called, so the base case of the recursion just puts a single element from the back list into an accumulator, and the recursive call puts the first element of the front list at the front of a list that contains the concatenation of the rest of the front list and the reversal of the back list.

Because there is no lazy evaluation of expressions in R, we have to wrap the result in a thunk. Because there is lazy evaluation of expressions, but not the way you would expect, the function doesn’t work. And it was interesting (although also a little frustrating) to figure out why.

The three parameters to the function are passed as promises—unevaluated expressions—and the accumulator is updated by prepending an element to each through a cons call. If this expression is passed along in the recursion as an unevaluated expression, each construction constructs a thunk with another cons call that needs to be evaluated later. Once you get to the end of the recursion and actually do need to access the accumulator, you have to evaluate this expression—so you need to evaluate a list of calls to cons as long as the accumulator is. This will exceed the stack size in R for long lists.

You can get around the problem in different ways. You can explicitly make sure that the cons calls are evaluated immediately

rot <- function(front, back, a) {
if (is_nil(front)) cons(car(back), a)
else {
lazy_thunk <- function(lst) function() lst()
tail <- cons(car(back), a)
lazy_thunk(cons(car(front), rot(cdr(front), cdr(back), tail)))
}
}

or you can just force the accumulator:

rot <- function(front, back, a) {
force(a)
if (is_nil(front)) cons(car(back), a)
else {
lazy_thunk <- function(lst) function() lst()
lazy_thunk(cons(car(front), rot(cdr(front), cdr(back), cons(car(back), a))))
}
}

Usually, it is a good idea to force all parameters if you return them in a closure, so that is the conservative approach and probably what you should aim for whenever you can get away with it.

Anyway, you have to wait a little bit to see how this all fits in with functional queues. I need to implement one more data structure before I’m finished with the current chapter.

## More lazy lists

As a follow up to my previous post, I want to look a bit more on my lazy lists in R.

The implementation I showed delays evaluation of some expressions, but not as much delayed as they could be. Well, at least not for list concatenation.

I had these implementations of concatenation and reversal:

reverse <- function(lst) {
do_reverse <- function(lst) {
result <- nil
while (!is_nil(lst)) {
result <- cons(car(lst), result)
lst <- cdr(lst)
}
result
}
force(lst)
lazy_thunk <- function(lst) {
function() lst()
}
lazy_thunk(do_reverse(lst))
}

cat <- function(l1, l2) {
do_cat <- function(l1, l2) {
rev_l1 <- nil
while (!is_nil(l1)) {
rev_l1 <- cons(car(l1), rev_l1)
l1 <- cdr(l1)
}
result <- l2
while (!is_nil(rev_l1)) {
result <- cons(car(rev_l1), result)
rev_l1 <- cdr(rev_l1)
}
result
}
force(l1)
force(l2)
lazy_thunk <- function(lst) {
function() lst()
}
lazy_thunk(do_cat(l1, l2))
}

They delay evaluation but only of the first operation following them. For concatenation, we can actually delay evaluation a bit more such that all operations, concatenation and access to the concatenated list, can be done in constant time.

lazy_cat <- function(l1, l2) {
force(l1)
force(l2)
first <- l1()
if (is.null(first)) l2
else {
lazy_thunk <- function(lst) function() lst()
lazy_thunk(cons(first$car, lazy_cat(first$cdr, l2)))
}
}

microbenchmark(lst <- cat(l1, reverse(l2)), times = 1) # fast operation
microbenchmark(car(lst), times = 1) # slow operation -- needs to copy l1
microbenchmark(lst <- lazy_cat(l1, l2), times = 1) # fast operation
microbenchmark(car(lst), times = 1) # fast

We can use a similar trick for reversal, but we don’t gain much from it. We can implement lazy reversal as this:

lazy_reverse <- function(lst) {
rev <- function(l, t) {
force(l)
force(t)
first <- l()
if (is.null(first)) t
else {
lazy_thunk <- function(lst) function() lst()
lazy_thunk(rev(first$cdr, cons(first$car, t)))
}
}
rev(lst, nil)
}

But the first time we access the head of a reversed list, we will still need to go all the way down the recursion. We cannot get the first element of the reversed list without going to the end of the original list. So in this case, the imperative solution I had before is actually still better—plus, it won’t run out of stack space with too deep recursions, which could easily happen with the lazy version.

## Lazy lists in R

Playing around with functional implementations of queues, I want to implement a version with O(1) amortised running time that also works as a persistent data structure, meaning that any earlier version of the queue can be used again. This is, in general, a problem with amortised analysis, that consists of cheap and expensive operations where any sequence of n operations will take time O(n). If such a data structure is used as a persistent data structure, there is nothing preventing you from repeating expensive operations many times, breaking the complexity. It can be achieved for queues, however, by a combination of lazy evaluation and memorisation (C. Okasaki, Simple and efficient purely functional queues and deques, J. of Functional Programming, 5(4), 583-592, 1995).

So, I need to be able to delay evaluation of list operations. R doesn’t do lazy evaluation of expressions, though. It does, however, evaluate function arguments lazily, and this can be exploited to implement delayed evaluation through thunks.

My lazy lists will therefore be implemented as thunks with the invariant that they always evaluate to either NULL—for the empty list—or a list-object with a head and tail (car and cdr in lisp terminology). Functions for constructing and accessing lists look like this:

nil <- function() NULL
cons <- function(car, cdr) {
force(car)
force(cdr)
function() list(car = car, cdr = cdr)
}

is_nil <- function(lst) is.null(lst())
car <- function(lst) lst()$car cdr <- function(lst) lst()$cdr

The force() calls in the cons function are needed because of how lazy evaluation of function arguments are handled in R, but since cdr is a thunk, it doesn’t evaluate the actual body of the function, it just evaluates the parameter into the underlying function.

Operations on lists can now be delayed by wrapping them in thunks. I can give these thunks an expression that I want to delay the evaluation of. This will be an expression that evaluates to a list, and to make the thunk behave as if it was the list, I just need to evaluate the list in the body of the thunk.

Reversing a list, and concatenating two lists, can thus be implemented like this:

reverse <- function(lst) {
do_reverse <- function(lst) {
result <- nil
while (!is_nil(lst)) {
result <- cons(car(lst), result)
lst <- cdr(lst)
}
result
}
force(lst)
lazy_thunk <- function(lst) {
function() lst()
}
lazy_thunk(do_reverse(lst))
}

cat <- function(l1, l2) {
do_cat <- function(l1, l2) {
rev_l1 <- nil
while (!is_nil(l1)) {
rev_l1 <- cons(car(l1), rev_l1)
l1 <- cdr(l1)
}
result <- l2
while (!is_nil(rev_l1)) {
result <- cons(car(rev_l1), result)
rev_l1 <- cdr(rev_l1)
}
result
}
force(l1)
force(l2)
lazy_thunk <- function(lst) {
function() lst()
}
lazy_thunk(do_cat(l1, l2))
}

Again, we need to use force() for the arguments we give to the functions before we use them in the thunk we create—also when we use them in expressions we give to the thunk—because they might be referring to variables that change between the time we call the functions and the time we access the thunk. Do not force the list you give to the thunk, though. That would defy the purpose of making the thunk in the first place—we explicitly do not want the argument evaluated yet.

We can build some lists to test it all out:

vector_to_list <- function(v) {
lst <- nil
for (x in v) lst <- cons(x, lst)
reverse(lst)
}

l1 <- vector_to_list(1:10000)
l2 <- vector_to_list(1:10000)

First, we can try to concatenate two lists. If you want to try this at home, install the microbenchmark package and paste the code into R.

library(microbenchmark)
microbenchmark(lst <- cat(l1, l2), times = 1) # fast operation
microbenchmark(car(lst), times = 1) # slow operation -- needs to copy l1
microbenchmark(car(lst), times = 1) # fast operation
microbenchmark(lst <- cat(l1, reverse(l2)), times = 1) # fast operation
microbenchmark(car(lst), times = 1) # slow operation -- needs to copy l1
microbenchmark(car(lst), times = 1) # fast operation

length <- function(lst) {
n <- 0
while (!is_nil(lst)) {
lst <- cdr(lst)
n <- n + 1
}
n
}

microbenchmark(length(lst), times = 1) # slow operation -- needs to reverse l2
microbenchmark(length(lst), times = 1) # faster operation

This is slower still — we need to both copy l1 and reverse l2:

microbenchmark(length(cat(l1, l2)), times = 1)

## Looking for collaborators

There are a few R packages I just wrote to have examples for my books, but I now think they might be generally useful. I just haven’t taken them to the point where they are. I know I am lazy enough to just leave them as they are, but if I could get collaborators who would like to work on them with me, I think we can make something out of them.

There is the bindr package that makes it just a tad easier to work with functions that return more than a single value.

Then there is the dfdr package that computes the derivative of a function with a bit of meta programming, that I can see greatly helping in numerical optimisation.

And, of course, there is the ralgo package I am working on right now for efficient data structures in R; I implement data structures in it as I write my next R book.

If anyone out there would like to be involved in a little bit of R programming with me, I would love to collaborate.

## Evaluating queue data structures

Working on my book on functional data structures in R, I have implemented three versions of functional queues, two that are not really functional—they need to update a data structure as side effects—and one that is. As far as I can tell, they should have roughly the same performance, but they don’t.

Asymptotically, they all work in $$O(1)$$ time for all operations, this nice figure explains why:

The implementations have two lists and need to move elements from one to the other when you access the front of the queue, so you can imagine all enqueue operations costing twice as many computrones as the other operations and then that can pay for the movement of elements later.

However, the three implementations I have in R have different performances when I measure them. Two of them modify an environment and are not persistent; the third remembers one element more than the others and gives me a persistent data structure. For some reason, the last version has a performance that is about twice as bad as the other two.

I am stumped as to why.

You can reproduce the experiments with the code below—there might be a beer in it for you if you can explain to me why the functional version is so much slower.