Parallel computing
Hardware architectures
Parallel hardware has become ubiquitous over the past decade. Most central processing units (CPUs) in computers, phones or other hardware have multiple cores that can execute instructions in parallel. Massively parallel computing systems combine multiple CPUs into nodes that share a common memory. These nodes are then combined into the full compute system through a network interconnect.
Parallel architecture are often hierachical and have parallelization at different levels. Notable is vectorization at the core-level, share memory parallelization for multicore architectures and distributed memory parallelization for large computing systems that communicate via an interconnect (a network connection). The following video gives an overview over common architectures.
Parallel scaling
Software that runs on parallel computers needs to scale. Scaling describes how the time to returning the result changes as the number of available compute units (cores) changes. The simplest model for scaling assumes that our code can be divided into a fraction \(f_s\) that needs to be executed on a single core while a fraction \(f_p\) scales perfectly, i.e. its execute time is \(\propto 1/p\) where \(p\) is the number of available processes or cores. (Note that \(f_s+f_p=1\) since they are fractions.) This leads to Amdahl’s law that describes the speedup \(S\) as a function of \(p\): \[S = \frac{p}{f_p + f_s p}\] The following video explains the background of Amdahl’s law.