Playing around with functional implementations of queues, I want to implement a version with O(1) amortised running time that also works as a persistent data structure, meaning that any earlier version of the queue can be used again. This is, in general, a problem with amortised analysis, that consists of cheap and expensive operations where any sequence of *n* operations will take time O(*n*). If such a data structure is used as a persistent data structure, there is nothing preventing you from repeating expensive operations many times, breaking the complexity. It can be achieved for queues, however, by a combination of lazy evaluation and memorisation (C. Okasaki, Simple and efficient purely functional queues and deques, J. of Functional Programming, 5(4), 583-592, 1995).

So, I need to be able to delay evaluation of list operations. R doesn’t do lazy evaluation of expressions, though. It does, however, evaluate function arguments lazily, and this can be exploited to implement delayed evaluation through thunks.

My lazy lists will therefore be implemented as thunks with the invariant that they always evaluate to either NULL—for the empty list—or a list-object with a head and tail (car and cdr in lisp terminology). Functions for constructing and accessing lists look like this:

[cce_rsplus]

nil <- function() NULL

cons <- function(car, cdr) {

force(car)

force(cdr)

function() list(car = car, cdr = cdr)

}

is_nil <- function(lst) is.null(lst())

car <- function(lst) lst()$car

cdr <- function(lst) lst()$cdr

[/cce_rsplus]

The force() calls in the cons function are needed because of how lazy evaluation of function arguments are handled in R, but since cdr is a thunk, it doesn’t evaluate the actual body of the function, it just evaluates the parameter into the underlying function.

Operations on lists can now be delayed by wrapping them in thunks. I can give these thunks an expression that I want to delay the evaluation of. This will be an expression that evaluates to a list, and to make the thunk behave as if it *was* the list, I just need to evaluate the list in the body of the thunk.

Reversing a list, and concatenating two lists, can thus be implemented like this:

[cce_rsplus]

reverse <- function(lst) {

do_reverse <- function(lst) {

result <- nil

while (!is_nil(lst)) {

result <- cons(car(lst), result)

lst <- cdr(lst)

}

result

}

force(lst)

lazy_thunk <- function(lst) {

function() lst()

}

lazy_thunk(do_reverse(lst))

}

cat <- function(l1, l2) {

do_cat <- function(l1, l2) {

rev_l1 <- nil

while (!is_nil(l1)) {

rev_l1 <- cons(car(l1), rev_l1)

l1 <- cdr(l1)

}

result <- l2

while (!is_nil(rev_l1)) {

result <- cons(car(rev_l1), result)

rev_l1 <- cdr(rev_l1)

}

result

}

force(l1)

force(l2)

lazy_thunk <- function(lst) {

function() lst()

}

lazy_thunk(do_cat(l1, l2))

}

[/cce_rsplus]

Again, we need to use force() for the arguments we give to the functions before we use them in the thunk we create—also when we use them in expressions we give to the thunk—because they might be referring to variables that change between the time we call the functions and the time we access the thunk. Do *not* force the list you give to the thunk, though. That would defy the purpose of making the thunk in the first place—we explicitly do *not* want the argument evaluated yet.

We can build some lists to test it all out:

[cce_rsplus]

vector_to_list <- function(v) {

lst <- nil

for (x in v) lst <- cons(x, lst)

reverse(lst)

}

l1 <- vector_to_list(1:10000)

l2 <- vector_to_list(1:10000)

[/cce_rsplus]

First, we can try to concatenate two lists. If you want to try this at home, install the microbenchmark package and paste the code into R.

[cce_rsplus]

library(microbenchmark)

microbenchmark(lst <- cat(l1, l2), times = 1) # fast operation

microbenchmark(car(lst), times = 1) # slow operation — needs to copy l1

microbenchmark(car(lst), times = 1) # fast operation

[/cce_rsplus]

[cce_rsplus]

microbenchmark(lst <- cat(l1, reverse(l2)), times = 1) # fast operation

microbenchmark(car(lst), times = 1) # slow operation — needs to copy l1

microbenchmark(car(lst), times = 1) # fast operation

length <- function(lst) {

n <- 0

while (!is_nil(lst)) {

lst <- cdr(lst)

n <- n + 1

}

n

}

microbenchmark(length(lst), times = 1) # slow operation — needs to reverse l2

microbenchmark(length(lst), times = 1) # faster operation

[/cce_rsplus]

This is slower still — we need to both copy l1 and reverse l2:

[cce_rsplus]

microbenchmark(length(cat(l1, l2)), times = 1)

[/cce_rsplus]