Question 3 (40 points) (40%) Consider the following ARMv8 code: loop: LDUR X9, [X1, #0) LSL X9, X9, #2 ADD X9,X9, X10 ST
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
Question 3 (40 points) (40%) Consider the following ARMv8 code: loop: LDUR X9, [X1, #0) LSL X9, X9, #2 ADD X9,X9, X10 ST
Question 3 (40 points) (40%) Consider the following ARMv8 code: loop: LDUR X9, [X1, #0) LSL X9, X9, #2 ADD X9,X9, X10 STUR X9, [X2, #0] SUB X10, X10, #1 ADD X1, X1, #8 ADD X2, X2, #8 SUB X12, X1, X11 CBNZ X12, loop Register X1, X2 initially store the base addresses of arrays, X9, X10 are temporary regis- ters, X11 stores size of array. a). Assume that there is no data forwarding in the execution of the code. List all the stalls in the instructions and show the total number of the stalls. You don't need to draw the whole diagram for the pipeline, but you need to list the stalls between instructions. (Assume there are 2 stalls for Branch instructions.) b). Assume that we apply normal data forwarding (forward results of R type instructions but not for LDUR). List all the stalls in the instructions and show the total number of the stalls. You don't need to draw the whole diagram for the pipeline, but you need to list the stalls between instructions. (Assume there are 2 stalls for Branch instructions.) c). Unroll the loop two times, reorder the code to optimize the cycles for the execution of the instructions. Unroll the loop two times: Reorder the code:
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!