Lazy lists in R

Playing around with functional implementations of queues, I want to implement a version with O(1) amortised running time that also works as a persistent data structure, meaning that any earlier version of the queue can be used again. This is, in general, a problem with amortised analysis, that consists of cheap and expensive operations where any sequence of n operations will take time O(n). If such a data structure is used as a persistent data structure, there is nothing preventing you from repeating expensive operations many times, breaking the complexity. It can be achieved for queues, however, by a combination of lazy evaluation and memorisation (C. Okasaki, Simple and efficient purely functional queues and deques, J. of Functional Programming, 5(4), 583-592, 1995).

So, I need to be able to delay evaluation of list operations. R doesn’t do lazy evaluation of expressions, though. It does, however, evaluate function arguments lazily, and this can be exploited to implement delayed evaluation through thunks.

My lazy lists will therefore be implemented as thunks with the invariant that they always evaluate to either NULL—for the empty list—or a list-object with a head and tail (car and cdr in lisp terminology). Functions for constructing and accessing lists look like this:

nil <- function() NULL
cons <- function(car, cdr) {
  force(car)
  force(cdr)
  function() list(car = car, cdr = cdr)
}

is_nil <- function(lst) is.null(lst())
car <- function(lst) lst()$car
cdr <- function(lst) lst()$cdr

The force() calls in the cons function are needed because of how lazy evaluation of function arguments are handled in R, but since cdr is a thunk, it doesn’t evaluate the actual body of the function, it just evaluates the parameter into the underlying function.

Operations on lists can now be delayed by wrapping them in thunks. I can give these thunks an expression that I want to delay the evaluation of. This will be an expression that evaluates to a list, and to make the thunk behave as if it was the list, I just need to evaluate the list in the body of the thunk.

Reversing a list, and concatenating two lists, can thus be implemented like this:

reverse <- function(lst) {
  do_reverse <- function(lst) {
    result <- nil
    while (!is_nil(lst)) {
      result <- cons(car(lst), result)
      lst <- cdr(lst)
    }
    result
  }
  force(lst)
  lazy_thunk <- function(lst) {
    function() lst()
  }
  lazy_thunk(do_reverse(lst))
}

cat <- function(l1, l2) {
  do_cat <- function(l1, l2) {
    rev_l1 <- nil
    while (!is_nil(l1)) {
      rev_l1 <- cons(car(l1), rev_l1)
      l1 <- cdr(l1)
    }
    result <- l2
    while (!is_nil(rev_l1)) {
      result <- cons(car(rev_l1), result)
      rev_l1 <- cdr(rev_l1)
    }
    result
  }
  force(l1)
  force(l2)
  lazy_thunk <- function(lst) {
    function() lst()
  }
  lazy_thunk(do_cat(l1, l2))
}

Again, we need to use force() for the arguments we give to the functions before we use them in the thunk we create—also when we use them in expressions we give to the thunk—because they might be referring to variables that change between the time we call the functions and the time we access the thunk. Do not force the list you give to the thunk, though. That would defy the purpose of making the thunk in the first place—we explicitly do not want the argument evaluated yet.

We can build some lists to test it all out:

vector_to_list <- function(v) {
  lst <- nil
  for (x in v) lst <- cons(x, lst)
  reverse(lst)
}

l1 <- vector_to_list(1:10000)
l2 <- vector_to_list(1:10000)

First, we can try to concatenate two lists. If you want to try this at home, install the microbenchmark package and paste the code into R.

library(microbenchmark)
microbenchmark(lst <- cat(l1, l2), times = 1) # fast operation
microbenchmark(car(lst), times = 1) # slow operation -- needs to copy l1
microbenchmark(car(lst), times = 1) # fast operation
microbenchmark(lst <- cat(l1, reverse(l2)), times = 1) # fast operation
microbenchmark(car(lst), times = 1) # slow operation -- needs to copy l1
microbenchmark(car(lst), times = 1) # fast operation

length <- function(lst) {
  n <- 0
  while (!is_nil(lst)) {
    lst <- cdr(lst)
    n <- n + 1
  }
  n
}

microbenchmark(length(lst), times = 1) # slow operation -- needs to reverse l2
microbenchmark(length(lst), times = 1) # faster operation

This is slower still — we need to both copy l1 and reverse l2:

microbenchmark(length(cat(l1, l2)), times = 1)

Author: Thomas Mailund

My name is Thomas Mailund and I am a research associate professor at the Bioinformatics Research Center, Uni Aarhus. Before this I did a postdoc at the Dept of Statistics, Uni Oxford, and got my PhD from the Dept of Computer Science, Uni Aarhus.

Leave a Reply