A.2 TLB and Cache based on a previous final exam question You are designing a paged memory system for a high-performance
Posted: Mon May 02, 2022 11:59 am
Please answer A.2.6. I will thumb up if the answer is
helpful.
A.2 TLB and Cache based on a previous final exam question You are designing a paged memory system for a high-performance 16-bit embedded processor. The initial virtual memory subsystem design has the following characteristics: . 16-bit virtual address • 30-bit physical address, with 1 GiB physical memory always installed • 4 KiB page size • 8-entries, 2-way set associative TLB (i.e. 4 rows with 2 sets each), true LRU • A single linear page table for each user process • Page fault penalty: 7000 cycles • On TLB hit and cache hit, combined TLB and cache access takes 1 cycle • Page tables are not cached • Single level cache: physically tagged, direct map, 4 word line size, 1 MiB capacity, • Cache hit time: 1 cycle • Cache miss penalty: 400 cycles
A.2.5 You now have an opportunity to redesign the TLB with 2 new possible designs. Together with the original TLB, you now have 3 design choices: (i) 16 entries direct map TLB (ii) 4 entries fully-associative TLB (iii) (original) 4 entires, each with 2 set associative TLB In all cases, the physically-tagged cache remains the same. On your computer system, you do not know what programs will be running on this processor, but it is confirmed that the programs running on this processor generally make repeated accesses to all the addressable memory. Also, most of the time there are at most 3 processes running at the same time. Now, considering both the access time for TLB and cache, out of the 3 TLB designs (2 new and the original), which one would likely results in the best performance in general? Give examples and quantitive measurements (based on the size and organization of TLB, cache, hit/miss timing etc.) to support your answer. A.2.6 Your project partner argues that since there are plenty of cache memory on board, it is possible to remove the TLB entirely. Instead, the processor can treat the in-memory page table just as any other data memory locations and allow them to be cached in the data cache. Your partner further argues that since most of the access to the page table entires will be a hit on the cache, overall average memory access time will not be affected significantly. Explain the following: (i) Can a paged virtual memory system function correctly without a TLB? (ii) If it works, is the performance expected to be similar, better, worse than the original scheme with TLB? (iii) If it does not work, give counter example on why TLB is needed for a VM system to be functionally correct. When making your argument for this question, you can assume the hit rate of the data cache is 90%, which includes both page table entries and any other normal data access. Also, you can assume that overall probability of a page fault is 0.05%.
helpful.
A.2 TLB and Cache based on a previous final exam question You are designing a paged memory system for a high-performance 16-bit embedded processor. The initial virtual memory subsystem design has the following characteristics: . 16-bit virtual address • 30-bit physical address, with 1 GiB physical memory always installed • 4 KiB page size • 8-entries, 2-way set associative TLB (i.e. 4 rows with 2 sets each), true LRU • A single linear page table for each user process • Page fault penalty: 7000 cycles • On TLB hit and cache hit, combined TLB and cache access takes 1 cycle • Page tables are not cached • Single level cache: physically tagged, direct map, 4 word line size, 1 MiB capacity, • Cache hit time: 1 cycle • Cache miss penalty: 400 cycles
A.2.5 You now have an opportunity to redesign the TLB with 2 new possible designs. Together with the original TLB, you now have 3 design choices: (i) 16 entries direct map TLB (ii) 4 entries fully-associative TLB (iii) (original) 4 entires, each with 2 set associative TLB In all cases, the physically-tagged cache remains the same. On your computer system, you do not know what programs will be running on this processor, but it is confirmed that the programs running on this processor generally make repeated accesses to all the addressable memory. Also, most of the time there are at most 3 processes running at the same time. Now, considering both the access time for TLB and cache, out of the 3 TLB designs (2 new and the original), which one would likely results in the best performance in general? Give examples and quantitive measurements (based on the size and organization of TLB, cache, hit/miss timing etc.) to support your answer. A.2.6 Your project partner argues that since there are plenty of cache memory on board, it is possible to remove the TLB entirely. Instead, the processor can treat the in-memory page table just as any other data memory locations and allow them to be cached in the data cache. Your partner further argues that since most of the access to the page table entires will be a hit on the cache, overall average memory access time will not be affected significantly. Explain the following: (i) Can a paged virtual memory system function correctly without a TLB? (ii) If it works, is the performance expected to be similar, better, worse than the original scheme with TLB? (iii) If it does not work, give counter example on why TLB is needed for a VM system to be functionally correct. When making your argument for this question, you can assume the hit rate of the data cache is 90%, which includes both page table entries and any other normal data access. Also, you can assume that overall probability of a page fault is 0.05%.