Efficient Lock-Free Work-stealing Iterators for Data-Parallel Collections

Aleksandar Prokopec, Dmitry Petrashko and Martin Odersky

Data-parallel

					        
                            hugeCollectionOfInts.sum()
                            
                            

has low work per element

Perfect world

Regular workload

Real life:

irregularities

Irregular workload

					        
                                collection.map(x => fun(x))
                            
                        

Irregular workload

WorkStealing scheduler

Threads steal work from each other.

No cooperation from target thread needed, lock free.

A. Prokopec and M. Odersky. Near optimal work-stealing tree scheduler for highly irregular data-parallel workloads. LCPC 2013

Even more real life:

Overheads

Irregular workload

Types of overhead

  • Data access overhead,
  • Scheduling overhead,
  • Invocation overhead. Won't be covered here.

Data access overhead

Data consumer abstracts over data source:
Inefficiencies:
  • iteration scheme(eg iterate over tree as if over indexed collection),
  • redundant data copying,
  • boxing (Jvm, Python, Ruby, JavaScript).

How severe is it?

unboxedboxed
a + bbox(unbox(a) + unbox(a))
~13x slowdown

Scheduling overhead

Lets say we have a HashMap:
Most HashMaps by nature create irregular workload.
Efficient scheduling has to be data-structure aware.

Work-stealing Iterators

Define:
  • how to iterate;
  • how to steal work.

Transparent for user.
Easy to implement for developer.

Benchmarks

For cycle:

Benchmarks

Irregular workload:

More details?

See paper :-).
Or ask questions.

Takeaway message

Eliminating abstraction penalties is necessary to realistically assess the scheduling quality.
High abstraction penalties could entirely mask scheduling penalties.

Thank you.

See my other talks.