Answer Happy

Posted: **Sun May 15, 2022 2:00 pm**

: Question 5 The Trapezoidal Rule Algorithm Performs Numerical Integration Of Any Arbitrary Function We Have Learned How 1 (47.87 KiB) Viewed 63 times

Question 5: The trapezoidal rule algorithm performs numerical integration of any arbitrary function. We have learned how to parallelize the trapezoidal rule algorithm. Here, let us parallelize a simple numerical differentiation algorithm across many processors using MPI. Below is the serial code. // Func(x) is the function to be differentiated // Diff(x) below estimates the derivative at point x. #define DELTA 0.00001 double Diff (double x) { return (Func(x + DELTA) Func(x))/ DELTA; } // input.array is an array of input points // output.array is an array of derivatives at these points // both arrays are of size N for (int i = 0; i < N; i++) { 0 output.array Diff.linput.array) } == (5a) (20 points} Please write the code snippet that distributes the input array across P processes, computes the derivatives in the P processes in parallel, and then collects the estimated derivatives from the P processes to the process 0. Please assume that the array size, N, is evenly divisible by P. Create any variable that you need to use or assume that they have been created elsewhere. The code snippet should just be ~10 lines long.

(5b) (10 points} Let us assume that each invocation of a collective communication across the P processes takes a · log P microseconds, where a is a constant in microseconds. Let us also assume that each invocation of the numerical differentiation function, Diff(x), takes fixed B microseconds to return. Please derive the efficiency of your parallel MPI program for the array size, N. Please just consider the computational cost in the code snippets. In order for this program to be strongly scalable with an efficiency approaching 1, what value should the ratio of B to a approach? In other words, how should computation cost determined by B compare relative to the communication cost determined by a in order for this program to be strongly scalable? . Answer: (5c) {5 points} In the question (5b) above, we assume that each invocation of Diff(x) takes the same amount of time to complete (i.e., ß is a constant). In this question, let us assume that the computational cost of Diff(x) increases quadratically with the value of x (i.e.B~x~) and let us assume that the values in the input array are sorted in an ascending order. Under these two new assumptions, can your code snippet in the question (5a) can still reach the parallelization efficiency derived in the question (5b)? If no, please describe a strategy to mitigate the problem. If yes, please describe why.