The trapezoidal rule algorithm performs numerical integration of any arbitrary function. We have learned how to parallel
Posted: Sun May 15, 2022 12:59 pm
The trapezoidal rule algorithm performs numerical integration of
any arbitrary function. We
have learned how to parallelize the trapezoidal rule algorithm.
Here, let us parallelize a simple
numerical differentiation algorithm across many processors using
MPI. Below is the serial code.
// Func(x) is the function to be differentiated
// Diff(x) below estimates the derivative at point x.
#define DELTA 0.00001
double Diff(double x) {
return (Func(x + DELTA) - Func(x))/ DELTA;
}
// input_array is an array of input points
// output_array is an array of derivatives at these points
// both arrays are of size N
for (int i = 0; i < N; i++){
output_array = Diff(input_array)
}
a) Please write the code snippet that distributes the
input_array across P
processes, computes the derivatives in the P processes in parallel,
and then collects the
estimated derivatives from the P processes to the process 0. Please
assume that the array size,
N, is evenly divisible by P. Create any variable that you need to
use or assume that they have
been created elsewhere. The code snippet should just be ~10 lines
long.
b) Let us assume that each invocation of a collective
communication across the P
processes takes 𝛼∙log 𝑃 microseconds, where 𝛼 is a constant in
microseconds. Let us also
assume that each invocation of the numerical differentiation
function, Diff(x), takes fixed 𝛽
microseconds to return.
- Please derive the efficiency of your parallel MPI program for the
array size, N. Please
just consider the computational cost in the code snippets.
- In order for this program to be strongly scalable with an
efficiency approaching 1, what
value should the ratio of 𝛽 to 𝛼 approach? In other words, how
should computation cost
determined by 𝛽 compare relative to the communication cost
determined by 𝛼 in order
for this program to be strongly scalable?
c) In the question (5b) above, we assume that each
invocation of Diff(x) takes the
same amount of time to complete (i.e., 𝛽 is a constant). In this
question, let us assume that the
computational cost of Diff(x) increases quadratically with the
value of x (i.e., 𝛽~𝑥2) and let us
assume that the values in the input array are sorted in an
ascending order. Under these two
new assumptions, can your code snippet in the question (5a) can
still reach the parallelization
efficiency derived in the question (5b)? If no, please describe a
strategy to mitigate the
problem. If yes, please describe why.
any arbitrary function. We
have learned how to parallelize the trapezoidal rule algorithm.
Here, let us parallelize a simple
numerical differentiation algorithm across many processors using
MPI. Below is the serial code.
// Func(x) is the function to be differentiated
// Diff(x) below estimates the derivative at point x.
#define DELTA 0.00001
double Diff(double x) {
return (Func(x + DELTA) - Func(x))/ DELTA;
}
// input_array is an array of input points
// output_array is an array of derivatives at these points
// both arrays are of size N
for (int i = 0; i < N; i++){
output_array = Diff(input_array)
}
a) Please write the code snippet that distributes the
input_array across P
processes, computes the derivatives in the P processes in parallel,
and then collects the
estimated derivatives from the P processes to the process 0. Please
assume that the array size,
N, is evenly divisible by P. Create any variable that you need to
use or assume that they have
been created elsewhere. The code snippet should just be ~10 lines
long.
b) Let us assume that each invocation of a collective
communication across the P
processes takes 𝛼∙log 𝑃 microseconds, where 𝛼 is a constant in
microseconds. Let us also
assume that each invocation of the numerical differentiation
function, Diff(x), takes fixed 𝛽
microseconds to return.
- Please derive the efficiency of your parallel MPI program for the
array size, N. Please
just consider the computational cost in the code snippets.
- In order for this program to be strongly scalable with an
efficiency approaching 1, what
value should the ratio of 𝛽 to 𝛼 approach? In other words, how
should computation cost
determined by 𝛽 compare relative to the communication cost
determined by 𝛼 in order
for this program to be strongly scalable?
c) In the question (5b) above, we assume that each
invocation of Diff(x) takes the
same amount of time to complete (i.e., 𝛽 is a constant). In this
question, let us assume that the
computational cost of Diff(x) increases quadratically with the
value of x (i.e., 𝛽~𝑥2) and let us
assume that the values in the input array are sorted in an
ascending order. Under these two
new assumptions, can your code snippet in the question (5a) can
still reach the parallelization
efficiency derived in the question (5b)? If no, please describe a
strategy to mitigate the
problem. If yes, please describe why.