Data parallelism
We'll look at two examples of using data parallelism to speed up computation using multiple CPUs
Example 1: reducing a list
using PureFun, FLoops, BenchmarkTools
using PureFun.Chunky
We are tasked with optimizing the following function:
sequential(xs) = mapreduce(round ∘ sin, +, xs);
For any applications that require reducing over an entire collection, it makes sense to reach for a PureFun.Chunky.@list
:
Chunky.@list(ChunkyRandomAccessList,
list = PureFun.RandomAccess.List,
chunk = PureFun.Contiguous.StaticChunk{8})
We start out by benchmarking the current solution:
julia> @btime(sequential(xs),
setup = xs = ChunkyRandomAccessList(rand(Int, 5_000)),
evals = 10, samples=300);
107.854 μs (1256 allocations: 58.81 KiB)
The lists in PureFun.jl implement the SplittablesBase
interface, and as a result can be used with a variety of parallel algorithms. Here we use Floops.jl
to parallelize our reduction:
function parallel(xs)
@floop for x in xs
@reduce s += round(sin(x))
end
return s
end;
A quick test to make sure the function works as expected:
test = ChunkyRandomAccessList(rand(Int, 100_000))
@assert sequential(test) == parallel(test)
And indeed, we see a decent speedup:
julia> @btime(parallel(xs),
setup = xs = ChunkyRandomAccessList(rand(Int, 5_000)),
evals = 10, samples = 300)
53.725 μs (793 allocations: 28.46 KiB)
Example 2: parallel heap sort
using PureFun, Folds, BenchmarkTools
using PureFun.Pairing
If comparisons are expensive, then constructing a heap can get pretty slow:
xs = rand(Int, 1_000);
julia> @btime Pairing.Heap($xs, Base.Order.By(sin));
45.333 μs (2998 allocations: 93.69 KiB)
In How to Think about Parallel Programming: Not! Guy Steele describes a useful strategy for parallel programming:
- from each input construct a singleton solution
- merge solutions using an associative merge operation
We can use that strategy to construct a heap:
const ∅ = Pairing.Heap{Int}(Base.Order.By(sin))
singleton(x) = push(∅, x)
test_seq = Pairing.Heap(xs, Base.Order.By(sin))
test_par = Folds.mapreduce(singleton, merge, xs, init=∅)
@assert all(test_seq .== test_par)
And once again, we see a decent speedup:
julia> @btime Folds.mapreduce(singleton, merge, $xs, init=∅);
18.625 μs (7016 allocations: 221.66 KiB)
This page was generated using Literate.jl.