CS 3853 Architecture Notes on Appendix B Section 5

Read Appendix B.5

B.5: Virtual Memory Protection and Examples

We will concentrate on the paged virtual memory examples and skip the segmentation examples.

The 64-bit Opteron Memory Management

Supports page sizes of 4K, 2MB, and 4MB
We will look at the 4K page
Uses up to 64-bit virtual addresses and up to 52 bit physical addresses
We will look at an example that uses a 48-bit virtual address and a 40-bit physical address
How much memory can be addressed by 40 bits? 1024 gigabytes
A 48-bit virtual address with a 4K page size would have page tables with 64 billion entries.
To manage these sizes, uses a 4-level translation of virtual address to physical address.
Figure B.27 shows how a 48-bit virtual address is mapped into a physical address.

Today's News: October 29
No news yet.

Each page table entry is 64 bits (8 bytes) and has 512 entries for a total of 4K (the page size).

A frame number is 40 bits - 12 bits = 28 bits.

Each page table entry contains a frame number and some additional bits.

Additional bits include

presence (valid)
read/write
user/superuser
dirty
accessed
no execute

Separate TLBs are used for instruction and data translation, each with 2 levels

block size: 1 PTE (page table entry of 8 bytes)
block selection: LRU
other parameters:

hit time size placement

L1 1 cycle 40 entries fully associative

L2 7 cycles 512 entries 4-way set associative

Relationship Between Virtual Memory and Cache Design

Traditionally, caches use physical addresses, not virtual addresses
This requires address translation before cache lookup
This is not a problem for L2 or higher level caches.
For L1 caches, this might add to the cycle time.
In figures B.17 and B.25 we saw that if the cache index bits fit in the page offset, part of the cache lookup (the indexing) can be done in parallel with the address translation (by the TLB).
- For these, the index bits are part of the virtual address and the cache is called virtually indexed.
- The tag is still gotten from the physical address, so this is a virtually indexed, physically tagged cache.
- This puts a restriction on the size of the L1 cache relative to the page size.
- If the L1 cache is direct mapped, it cannot be larger than the page size.
- 2-way set associativity allows for doubling the cache size.
- Since page sizes are typically 16K or less, this can be a severe restriction on the size of an L1 cache.
Two alternatives to allow larger caches: physically indexed, and virtually tagged.
With physically indexed caches, we need to do the address translation before any L1 cache lookup starts.
This can slow the processor, either by increasing the cycle time or the pipeline depth.
With virtually indexed, virtually tagged caches, the cache uses virtual addresses.
- The TLB only needs to be accessed on an L1 cache miss.
- The TLB access time does not affect the cycle time, just the L1 miss penalty.
- Two different processes can share the same set of virtual addresses.
  - This can be handled by flushing the L1 cache when a context switch occurs.
  - Or it can be handled by storing an ASID (address space identifier) with the cache tag.
    The ASID is like a process ID, but is used by the hardware.
- Other problems need to be handled:
  - Two different virtual addresses of the same process can reference the same physical memory.
  - The same virtual address in different processes can reference the same physical memory (think fork).
  - Two different virtual addresses in different processes can reference the same physical memory (shared memory).
  - The problem is more complicated with multiprocessors (multi-core CPUs)

ClassQue: Cache Indexing and Tagging

Another Example

Figure B.25 shows the first two levels of a 64-bit memory system.

virtual address: 64 bits
physical address: 41 bits
page size 8KB
TLB: direct mapped with 256 entries
L1 cache: 8KB direct mapped, block size 64 bytes: 128 blocks
L2 cache: 4MB direct mapped, block size 64 bytes: 64K blocks

Next Notes

Back to CS 3853 Notes Table of Contents

	hit time	size	placement
L1	1 cycle	40 entries	fully associative
L2	7 cycles	512 entries	4-way set associative