The leading manufacturers of microprocessors promote the development of multicore architectures. And indeed, the current generation of dual and quadcore processors has already captured the market. This means that parallel computation is not only becoming possible on ordinary desktop computers, it will become the standard of the future. This provides opportunities for high-performance applications from discrete optimization that cannot be ignored.

The project investigates the parallelization of a column generation algorithm for duty scheduling in public transit, see also project BMBF-DS: Duty Scheduling in Public Transit.

"Embarrassingly parallel" is what the computer science community calls an application in which a substantial part of the workload can obviously be distributed among several processors. An amazing number of algorithms in integer and combinatorial optimization seem to fall in this category: Dantzig-Wolfe decomposition, Lagrangean relaxation, stochastic programming, and branch-and-bound are just a few striking examples that immediately come to ones mind.

The devil with parallelization, however, is in the details. Hidden dependencies in the propagation of information and the increase in signalling information that is needed to synchronize the individual threads of a parallel process can nullify expected performance boosts. One prominent example for this is the simplex algorithm for linear programming, which seems to have ample potential for parallelization in its linear algebra. However, earlier attemtps on parallel LU-factorization as well as on parallel pricing turned out to be failures, see Grammel, Hege & Wunderling [1993] and Bixby & Martin [2000].

We undertake a new attempt on a coarser algorithmic level, that looks less vulnerable to subtile effects such as communication latencies. The aim of the project is to parallelize a column generation algorithm for duty scheduling in public transit. The bulk of the workload in such a method deals with the following three tasks:

  • solving linear programs, LPs, here with the bundle method,
  • pricing new duties, which amounts to solving a bunch of constraint shortest path problems in acyclic graphs,
  • and computing an integer solution, in our case using a branch-and-generate heuristic.
A more detailed description of the method can be fond in the article Borndörfer, Grötschel & Löbel [2003].

A good part of the bundle method consists of a simple matrix multiplication; this is known to parallelize well. Duty pricing involves solving several dozen constraints shortest path problems sequentially; it seems that there is not much that can go wrong in doing this simultaneously. Branch-and-generate searches a partial branch-and-bound tree; of course, several branches can be searched in parallel, although it is in this case not completely clear how to proceed.

Our goal is to achieve a speedup that scales to substantially more than four processors.