Consider the following code: For(i=0;i<128;i++){ c[i] = a[i]*b[i] - d[i]*e[i]; f[i] = a[i]*e[i] + d[i]*b[i];} The above
Posted: Tue Apr 12, 2022 10:19 am
Consider the following code: For(i=0;i<128;i++){ c = a*b - d*e; f = a*e + d*b;} The above code is translated into RV64V assembly and placed into the following 6 convoys: 1. vmul vld 2. vld vmul 3. vsub vst 4. vmul vld 5. vmul vld 6. vadd vst #a*b (assume a,b pre-loaded), load d # load e, d*e # subtract and store e #a*e, load next a vector #d*b, load next b vector # add and store f Problem #3a (10 pts) Using the "chime" approximation how long would it take to execute on your vector machine. Problem #3b (10 pts). Why can't the first two convoys be merged into a single convoy?