Memory Management

Background: Computer Architecture

Program executable starts out on disk.
The OS loads the program into memory.
CPU fetches instructions and data from memory while executing the program.

Terminology

Segment: A chunk of memory assigned to a process.
Physical Address: A real address in memory
Virtual Address: An address relative to the start of a process's address space.
Frame: A frame is a fixed-sized block of physical memory that corresponds to a page in the virtual memory. When a process's virtual memory page is brought into physical memory, it is stored in a frame.
Page Table: A page table is a data structure used in paging to map virtual addresses to physical addresses. It contains the base address of each page in physical memory.
Segmentation: Segmentation is a memory management technique that divides the memory into segments of varying lengths, typically reflecting the logical divisions of a program, such as functions, data, and stack.
Fragmentation: Fragmentation refers to the inefficient use of memory space, which can occur in two forms: internal and external. Internal fragmentation happens when allocated memory may have small unused portions, while external fragmentation occurs when free memory is scattered throughout.
Compaction: Compaction is the process of rearranging memory contents to place all free memory together in one large block, thereby reducing external fragmentation.
Demand Paging: Demand paging is a paging technique where pages are loaded into memory only when they are needed during program execution, rather than loading the entire process into memory at the start

Where do addresses come from?

How do programs generate instruction and data addresses?

Compile time: The compiler generates the exact physical location in memory starting from some fixed starting position k. The OS does nothing.
Load time: Compiler generates an address, but at load time the OS determines the process starting position. Once the process loads, it does not move in memory.
Execution time: Compiler generates an address, and OS can place it any where it wants in memory.

Memory Allocation

As processes enter the system, grow, and terminate, the OS must keep track of which memory is available and utilized.

Holes: Pieces of free memory.

Given a new process, the OS must decide which hole to use for the process.

Memory Allocation Policies

First-Fit: Allocate the first one in the list in which the process fits. The search can start with the first hole, or where the previous first fit search ended.
Best-Fit: Allocate the smallest hole that is big enough to hold the process. The OS must search the entire list or store the list sorted by size hole list.
Worst-Fit: Allocate the largest hole to the process. Again the OS must search the entire list or keep the list sorted.

Simulations show first-fit and best-fit usually yield better storage utilization than worst-fit, first-fit is generally faster than best-fit

Initializing Memory when Starting a Process

Process needing k pages arrives.
If k page frames are free, then allocate these frames to pages. Else free frames that are no longer needed.
The OS puts each page in a frame and then puts the frame number in the corresponding entry in the page table.
OS marks all TLB entries as invalid (flushes the TLB).
OS starts process.
As process executes, OS loads TLB entries as each page is accessed, replacing an existing entry if the TLB is full.

Saving/Restoring Memory on a Context Switch

The Process Control Block (PCB) must be extended to contain:

The page table
Possibly a copy of the TLB

On a context switch:

Copy the page table base register value to the PCB.
Copy the TLB to the PCB (optionally).
Flush the TLB.
Restore the page table base register.
Restore the TLB if it was saved.

Virtual Memory

Not enough memory was a problem, CPUs only supported 32 bit registers which stores max of 2^32 = 4 GB, modern CPUs have 64 bit address space, but we'll use 32 bit for easier calculations.

Problems

What happens when a program tries to access first view bytes, but when program tries to use more than 2 GB of memory for example, the program will crash.

Now the second problem is memory fragmentation, lets take this as an example:

We first run youtube and the game which runs without any problem, but we now want to open the editing software so we will close youtube. Now here is the problem, the memory is now split and we can't open the editing software even though there is enough space in total, this is called memory fragmentation.

Why can't we split our editing software on the 2 memory locations?

Its possible, but it would be too hard to implement.

Our third problem is security, lets say a music player and a game are accessing a memory location 64, this is dangerous because maybe the music player changes the player health value to 0 instead of stopping the song and changing its value to 0.

Summary of problems:

Not Enough Memory: Crash if not enough RAM.
Memory Fragmentation: Runs out of space.
Security: Corrupt other programs data.

1st problem solution

What if we give each program its own memory space, this is called virtual memory and each program has its own virtual memory which doesn't overlap with other programs.

But this means we eventually need to map the virtual memory address of each program to a physical address (RAM).

Physical memory isn't only RAM, it could be registers, disk etc...

Now each virtual address space has 4 GB size and there is a mapper in the middle for each program so the data can be mapped to a physical address.

If a program tries to access more data that can fit in RAM, then the data can be stored somewhere else which is on disk (swap memory):

The program mapper tells us that the data is not in memory its on hard disk for example.
The operating system will find the oldest data and replace it with the data we want from disk.

When data is not found in RAM and the OS needs to go and read it from disk, this is called a page fault and this process is expensive. That's why having more RAM makes your laptop faster as the OS doesn't need to swap that much.

2nd problem solution

Now with virtual memory, we can map parts of the program into each of the available chunks of physical memory, from program perspective its still contiguous.

3rd problem solution

For solving the security problem, each program now has its own virtual address space, so if two programs use same location it will work as they are seperated and each one is mapped to a different physical address.

Although we still need to share some memory to read the same shared library, for example accessing glibc or win32 API or the 2 programs can use a shared memory space in RAM to exchange data without going to other devices such as disk or network interface for example.

Virtual memory implementation

Let's say a program wants to load a program for address 64 into R1 register.

The program stores the data in its virtual address, then maps it to a physical address and stores the virtual address and physical address destination into a page table and then reads data from the physical address.

Each mapping is called a page table entry (PTE).
CPUs work with words which are the size of a CPU register.

How much space do we need to store mapping for each word? There are 2^32 addresses for each byte = 2^30 words which is approximately 1 billion PTEs which is equivalent to 4 GB per page table.

This is clearly a lot, so we can divide the memory into chunks which are called pages and map pages of memory instead of mapping each individual word.

For example, virtual addresses 0-4095 are mapped to some physical addresses, say 16384-20479 this means we don't need a single page entry for each word anymore, instead we need one entry for every 4 KB of data which is 1024 words.

To get the page size of your Linux machine, you can run getconf PAGE_SIZE

Page Table Mapping

Now, our page table tells us that our range of virtual addresses are mapped to a range of physical addresses.

Lets say that a virtual page 1 is mapped to the physical page 2, what is the physical address for virtual addresses 4200?

To get the physical addresses, we can get the offset which is the same in both pages, in our case its 4200 - 4096 = 104, then we can add 8192 + 104 = 8296, so the physical address is 8296.

Address Translation

Lets say we have 32 bit of virtual address space and 30 bit physical addresses and page sizes is 4 KB which is 12 bits, to translate the virtual address to a physical address we keep the offset which is the first 12 bits from physical and virtual address space.

To map the remaining 20 bits of virtual address, CPU asks the page table and gets back the remaining 18 bits of the physical address.

Part of virtual address without the offset is called Virtual Page Number
Part of physical address without the offset is called Physical Page Number

Page tables map virtual to physical page numbers.

Lets take an example of translating virtual address 0x12345678:

Last 12 bits remain unchanged which are 678 and we just copy them to the physical address.

Then we look up the virtual page number 0x12345 in the page table which returns the physical page number 0x4321

To get the full physical address, we concatenate the physical page number with the offset.

Address Translation Example #1

Address Translation Example #2

To read more about address spacing:

Understanding Address Spacing in Detailtbhaxor

Page Faults

Page Fault:
- Occurs when a program accesses a memory page not in RAM.
- Triggers the OS to pause the program and handle the fault.
- Page faults are slow because accessing the disk is much slower than accessing RAM.
OS Checks Disk:
- The OS checks if the requested page is on the disk (swap space).
Swapping Out a Page:
- If RAM is full, the OS may need to swap out a page to make space.
- Dirty Page: If the page was modified, it is saved to the disk.
- Clean Page: If the page was not modified, it doesn't need to be saved.
Loading Requested Page:
- The OS loads the requested page from the disk into RAM.
Resuming Execution:
- The OS resumes the program’s execution at the same instruction that caused the page fault.

Modern architectures use Direct Memory Access (DMA) to handle data transfers between RAM and disk, allowing the CPU to perform other tasks simultaneously.

Page Replacement Algorithms

First In First Out (FIFO)

This algorithm replaces the first reference string that was added and replaces it.

3 frames (3 pages can be in memory at a time per process).

if we continue we will find out that it will give us 15 page faults which is not the best algorithm to consider, also adding more frames can cause more page faults which is called Belady's Anomaly.

Optimal Page Replacement

Replaces the page that will not be used for the longest time in the future.
Not implementable: Requires future knowledge of page requests.
Benchmark: Used to measure the performance of other algorithms.

Least Recently Used (LRU)

Replaces the page that has not been used for the longest time.
Use past knowledge rather than future.
Replace page that has not been used in the most amount of time.
Associate time of last use with each page.

Using LRU, we look at the reference bits from the left and check which was not recently used and we replace it.

It has lower page faults than FIFO, but not the most optimal solution as its also hard to implement.

Second Chance

A simpler approximation of LRU, at its core its FIFO.
Mechanism: Uses a circular buffer (clock) and a reference bit.
Reference Bit:
- If the reference bit is 0, the page is replaced.
- If the bit is 1, the bit is cleared, and the page is given a "second chance."

We will assume all reference bits are set to 0 at first.

When a bit gets referenced while its in the page, it gets changed to bit value of 1

Now, we reached 8 which we need is a page fault, we follow FIFO and look for which one was first in, in our case its 4 and it has a bit of 0 so we can replace it with 8.

After that we just continue and eventually we will have same values but all of them will have a reference bit of 1, now we reached 4 and we follow FIFO again, 5 was the first one but it has a reference bit of 1, so we give it a second chance and change it to 0, after that we follow FIFO again to eventually changing all of them to 0s, then we follow FIFO again and 5 is replaced by 4 because now it has a reference bit of 0.

Enhanced Second Chance (Clock with Dirty Bit)

An enhancement of the Second Chance algorithm.
Mechanism: Uses a circular buffer and two bits per page: a reference bit and a modification (dirty) bit.
Categories:
- (0, 0): Not recently used, not modified.
- (0, 1): Not recently used, modified.
- (1, 0): Recently used, not modified.
- (1, 1): Recently used, modified.
Replacement Preference:
- Prefers to replace pages that are neither recently used nor modified ((0, 0)).
- Followed by those that are not recently used but modified ((0, 1)).

Reference String: 1, 2, 3, 4, 1, 2, 5

Frames: 3

Steps:

1:
- Load 1: [1(1,0), -, -]
- Clock: Frame 2
2:
- Load 2: [1(1,0), 2(1,0), -]
- Clock: Frame 3
3:
- Load 3: [1(1,0), 2(1,0), 3(1,0)]
- Clock: Frame 1
4:
- Clear 1(1,0) -> 1(0,0)
- Clear 2(1,0) -> 2(0,0)
- Clear 3(1,0) -> 3(0,0)
- Replace 1 with 4: [4(1,0), 2(0,0), 3(0,0)]
- Clock: Frame 2
1:
- Replace 2 with 1: [4(1,0), 1(1,0), 3(0,0)]
- Clock: Frame 3
2:
- Replace 3 with 2: [4(1,0), 1(1,0), 2(1,0)]
- Clock: Frame 1
5:
- Clear 4(1,0) -> 4(0,0)
- Clear 1(1,0) -> 1(0,0)
- Clear 2(1,0) -> 2(0,0)
- Replace 4 with 5: [5(1,0), 1(0,0), 2(0,0)]
- Clock: Frame 2

Translation Lookaside Buffer (TLB)

Address translation involves converting virtual addresses used by programs into physical addresses in RAM. This process can be slow because it requires looking up the page table, which maps virtual addresses to physical addresses.

To speed this up, modern CPUs use a hardware component called the Translation Lookaside Buffer (TLB). TLB is a fast fully associative memory that stores page numbers (key) and the frame (value) in which they are stored.

If memory accesses have locality, address translation has locality too.
Typical TLB sizes range from 8 to 2048 entries.

Locality is when a program tries to use data and instructions with addresses near or equal to those they have used recently.

If the translation is found (a TLB hit), the physical address is quickly retrieved. If not (a TLB miss), the CPU accesses the page table to get the translation and then stores it in the TLB for future use.

Address Translation

Virtual Addresses: Used by programs to access memory.
Physical Addresses: Actual addresses in RAM where data is stored.
Page Table: Maps virtual addresses to physical addresses.
Overhead: Can be slow due to multiple memory accesses.

Translation Lookaside Buffer (TLB)

Purpose: Speeds up address translation.
Function:
- TLB Hit: The virtual address translation is found in the TLB, so it's quickly retrieved.
- TLB Miss: The translation is not in the TLB; requires a page table lookup and updates the TLB with new translation.
Advantage: Reduces the time spent on address translation, improving overall performance.

Valid/Invalid Bits

Page Table Entries: Include valid/invalid bits to manage memory.
- Valid Bit: Indicates that the page is currently in RAM and the mapping is valid.
- Invalid Bit: Indicates that the page is not in RAM or the entry is not in use.
Page Fault: Occurs when a page is invalid, triggering a need to load the page from disk.

Summary

Address Translation: Necessary but slow; involves converting virtual to physical addresses.
TLB: A hardware cache that speeds up address translation by storing recent mappings.
Valid/Invalid Bits: Help manage and verify the presence of pages in memory.

The original piece of hardware that is responsible for address translations and generating page faults is MMU (Memory Management Unit) and its programmed by the OS.

How much RAM is required just for page tables to run 50 programs?

4 MB for each table, which is 4 x 50 = 200 MB

PreviousCPU Scheduling NextFlight Physics

Last updated 11 months ago