Chapter 6
Parallel computers and the Message Passing Interface
Context: This chapter sets the stage for discussing parallelization of the molecular dynamics simulation method introduced in the previous chapters. We first need to talk about parallel hardware architectures and how to program for them. The specific programming model that we will employ is known under the term Single Program Multiple Data. The Message Passing Interface (MPI) is a library that facilitates programming for massively parallel machines under this programming model.
6.1 Parallel hardware architectures
Parallel hardware has become ubiquitous over the past decade. Most central processing units (CPUs) in computers, phones or other hardware have multiple cores that can execute instructions in parallel. Massively parallel computing systems combine multiple CPUs into nodes that share a common memory. These nodes are then combined into the full compute system through a network interconnect.
Parallel architecture are often hierachical and have parallelization at different levels. Notable is vectorization at the core-level, share memory parallelization for multicore architectures and distributed memory parallelization for large computing systems that communicate via an interconnect (a network connection).
6.2 Scaling consideration
Software that runs on parallel computers needs to scale. Scaling describes how the time to returning the result changes as the number of available compute units (cores) changes. The simplest model for scaling assumes that our code can be divided into a fraction \(f_s\) that needs to be executed on a single core while a fraction \(f_p\) scales perfectly, i.e. its execute time is \(\propto 1/p\) where \(p\) is the number of available processes or cores. (Note that \(f_s+f_p=1\) since they are fractions.) This leads to Amdahl’s law that describes the speedup \(S\) as a function of \(p\): \begin {equation} S = p f_p + f_s p \end {equation}
6.3 Programming model
The Message Passing Interface (MPI) is an application programming interface (API) for distributed memory parallelization. (A code parallelized with MPI also works on shared memory machines!) The programming model underlying MPI is called single program multiple data (SPMD): The identical program is executed multiple times but operates on different datums.
6.3.1 Example: Monte-Carlo estimate of the number \(\pi \)
As the simplest example of a parallelization, we consider a Monte-Carlo estimate of the number \(\pi \).