I was trying to understand the working of Amdahl's law but got confused in the process. Consider the following problem:
Suppose
a program has a part at the beginning that is sequential in nature (must be executed by only one processor) and takes 3 ms. Also, there is a part at the end of the program that is sequential (must be executed by only one processor) and takes 4 ms. The rest of the code is divided into 5 equal parts that are executed in parallel on 5 processes and each of these parts takes 16 ms. Calculate speedup using Amdahl's law.
Here is how I approached this problem. I first calculated the serial and parallel fraction, where 0.3 is the serial part and 0.7 is the parallel part calculated from the following logic:
Serial Part = 3 ms + 4 ms = 7 ms
Parallel Part = 16 ms (Only taking once as the code executes parallel on 5 processors)
Total = 7 ms + 16 ms = 23 ms
Serial Fraction = 7 ms / 23 ms = 0.3 (approx)
Parallel Fraction = 16 ms / 23 ms = 0.7 (approx)
Now putting values in Amdahl's law:
Speedup = 1 / (S + P/N) (where N = Processors, S = Serial Fraction, P = Parallel Fraction)
Speedup = 1 / (0.3 + 0.7/5) = 2.27 (approx)
So is my approach correct or is there any other value of speedup for this problem?
Let's start with a basic Flow-of-Work schedule, as if there were no additional resources, but to allow for a single ( a pure-
[SERIAL]) stream of running the whole amount of work.This baseline schedule, not using any sort of concurrent or parallel orchestration, shows, that an initial
3 [ms]-sprint (SSS) is followed by a consecutive execution of five independent16 [ms]-sprints ( marked by blocks of 16-P-s ) and the whole workflow terminates after a final4 [ms]-sprint completes the baseline computing topology in about 87 [ms].Amdahl's law defines a maximum speedup that is fair to be expected, if all
[PARALLEL]-is-able units-of-work can & do run on sufficient enough & free in time additional processing resources ( five CPU-s as given in O/P ).Schedule, now using at least those 5 free CPU resources on otherwise non-blocking processing fabric, running the computing topology in resources optimal orchestration, completes the same amount of work, yet in about only 27 [ms].
This is due to an advantage of running all the P-able blocks in true-
[PARALLEL]fashion ( having in due time free & non-blocking access to 5+ CPU resources ).Further we can see, that no matter how many additional CPU-resources were made available, beyond those very 5 CPUs for the very said 5 P-able sections, no further speedup would ever appear, as the P-able sections were already mapped onto CPU-resources [A:E] and any other CPU will not help them do anything faster or complete the whole computing topology any time sooner.
Q.E.D.
For more details
on Amdahl's law of diminishing returns ( adding more CPUs will make zero additional speedups ), on effects of atomicity of P-able work-units execution, on effects of setup/termination add-on overheads, you might want to read this