CS 3853 Computer Architecture Notes on Appendix B Section 4
Read Appendix B.4
Today's News: October 9, 2015
Assignment 2 is still available
B.4: Virtual Memory
With virtual memory, main memory is used as a cache for disk.
The traditional reason for using a cache is to reduce latency, so a smaller, faster memory is used for the cache.
Another way of looking at it is to think of the cache as being too small, and using a slower, larger memory to
extend capacity. This is what we did when we added an L2 cache to supplement the L1 cache.
This is the way you should think of virtual memory: extending the size of main memory.
Virtual memory also provides other services such as protection, but these are discussed mainly in OS.
Why is virtual memory different from other forms of cache we have looked at:
The size of the faster memory is orders of magnitude larger (1000 times) than typical cache sizes
The size of the slower memory is orders of magnitude larger than (1000 times) than the faster memory
For typical caches, the faster memory is 3-100 times faster than the slower memory.
For virtual memory, the faster memory (RAM) is at least 100,000 times faster than the slower memory (disk).
Because of this difference in speed, the hit rate needs to be very close to 1.
Suppose the main memory access time is 50 ns.
Typical disk access times are 5 ms = 5,000,000 ns.
If the hit rate is 99.99%, the miss rate is .0001
The average access time is 50 ns + .0001 × 5,000,000 ns = 550 ns.
This is not good enough.
Direct mapping or set associative mapping are not good enough.
Main memory is not designed to support parallel lookup of tags.
Block sizes need to be relatively large (4-16KB).
Cannot use bit selection.
Since access times are very large, we can manage misses with software.
Paged Virtual Memory
We will only look at paged virtual memory and not discuss segmented memory.
Virtual Memory Terminology
blocks of physical memory are called frames
blocks on disk are called pages
a memory address generated by the CPU is called a virtual (or logical) address
a memory address used by the memory system is called a physical address
tags are called page numbers
the conversion of a logical address to a physical address is called address translation
Where can a block be placed in memory?
anywhere
Today's News: October 12, 2015
No news
How is a block found in main memory?
A page table is used:
an array of frame numbers indexed by the page number
page tables can be very large, so they are stored in memory.
to speed address translation, a cache for the page table is used.
this cache is called the translation lookaside buffer.
TLB and Memory Access Time
Example: Suppose memory access time is 10 ns. What is the effective access time of paging
with no TLB
with a TLB having access time 1 ns and hit ratio 90%
Figure B.24
shows the data TLB for the Opteron processor.
Since this TLB (cache) is fully associative, the virtual address does not have an index.
The field labeled tag is the virtual page number.
The filed labled Physical Address <28> is the physical frame number.
A 32-bit virtual address with a 4 KB page size requires a tag (page number) of 20 bits so the page table has 1 million entries.
Some processors hash the page table since most entries are never needed. We will not discuss this here.
Which block (page) should be replaced on a virtual memory miss
A virtual memory miss causes an interrupt and the OS handles it. Take the OS course.
What happens on a write?
write-back and write-allocate are always used.
Typically, caches use physical addresses.
Before you can look up something an the L1 cache, you need to convert the logical address to a physical address.
Both the address translation and L1 cache lookup need to be completed in 1 cycle.
This is typically done by overlapping part of the address translation and cache lookup.
There are 2 parts to a cache lookup:
1. Read the appropriate cache tags from the cache
2. Compare the cache tags to the memory address tag.
How can you start the first of these before you have the physical address?
An example: Figure B.17
shows a hypothetical processor with:
a 64-bit virtual address
a 40-bit physical address
a 16KB page size
A 2-way set associative TLB with 256 entries
A 16KB direct mapped L1 cache with 64-byte blocks
A 4-way set associative L2 4MB cache with 64-byte blocks