Purely Functional Data Structures, in Julia

Wherein I work my way through the book Purely Functional Data Structures, but in Julia instead of ML/haskell

What is a persistent data structure?

Consider the following bit of code:

julia> x = 42;
julia> y = x;
julia> y += 143
julia> x42

Compare to the similar looking code:

julia> x = [1,2,3];
julia> y = x;
julia> push!(y, 4)4-element Vector{Int64}: 1 2 3 4
julia> x4-element Vector{Int64}: 1 2 3 4

In the first example, changing the value of y did not affect the value of x. However, in the second, I was able to change the contents of x without re-assigning it directly! This can lead to unexpected program behavior: any time you pass your object as an argument to a function, you can't know for sure if the called function had the side effect of changing the contents of your object. A workaround when you want to avoid that possibility is defensive copying. If we're writing multiple small functions as recommended in the Julia docs, this can get expensive.

Julia's base arrays, sets, and dictionaries are all mutable. This package provides data structures that are immutable, so they can be treated as values.

julia> using PureFun
julia> x = PureFun.Linked.List(1:3);
julia> y = x;
julia> y = pushfirst(y, 4)4-element PureFun.Linked.NonEmpty{Int64} 4 1 2 3
julia> x3-element PureFun.Linked.NonEmpty{Int64} 1 2 3

Overall design

PureFun.jl provides a bunch of different container types, each has an empty constructor to create a new empty container of that type, and a constructor that takes an iterable. All of the collection types satisfy the iteration interface and can be used in loops, with iterators, etc. There are several functors that allow you to create more complicated container types from simpler container types.

Whenever an immutable method has a mutating analogue in Base, the immutable version of the function has the same name as the mutating one without the !. For example, push is a non-mutating version of push!, instead of modifying its input argument it returns a new collection with an element added.

  • Lists: insert/remove from the front of the list, get the element at the front of the lists, and iterate inserted elements in LIFO order

  • Queues: insert at the end, remove from the front, iterate in FIFO order

  • Heaps: insert elements, retrieve or remove the minimum, and iterate in sorted order (wrt an Ordering)

  • Dictionaries: associate keys with values

  • Sets: insert keys, test for membership, various set operations including intersect and union

Customizable container types

These use generic design strategies to assemble more powerful data structures from simpler ones:

  • PureFun.Batched.@deque takes a list implementation and adds efficient access (push/pop/last) to the rear of the list, making it into a double-ended queue. The resulting deque maintains the advantages of the list used to create it, so for example a deque made from PureFun.RandomAccess.List will maintain fast indexing (get/set index) operations.

  • PureFun.Chunky.@list takes any list implementation, and uses it to store chunks of elements contiguously in memory, rather than single elements, in order to improve iteration speed.

  • PureFun.Tries.@trie builds efficient dictionaries for aggregate key types by chaining together dictionaries of single keys.

  • PureFun.HashMaps.@hashmap nests dictionaries and combines them with hash functions to produce dictionary types that achieve efficient lookups and insertions for arbitrary key types.

PureFun.Contiguous: small size optimizations

Tries and chunky lists assume the availability of efficient small collections that can be linked together to build general purpose collections. The PureFun.Contiguous module provides a variety of performant data structures that make use of data locality

See also