2020-08-11notes2 hours read (About 14771 words)

Operating Systems

The more you know about operating system, the better you can cooperate with it.

Operating System Overview

Role

Role 1

The Operating System is an Abstract Machine

Extends the basic hardware with added functionality
Provides high-level abstractions
- More programmer friendly
- Common core for all applications
It hides the details of the hardware
- Makes application code portable

Role 2

The Operating System is a Resource Manager

Responsible for allocating resources to users and processes
Must ensure
- No Starvation
- Progress
- Allocation is according to some desired policy
  - First-come, first-served; Fair share; Weighted fair share; limits (quotas), etc…
- Overall, that the system is efficiently used

Structural (Implementation) View

the Operating System is the software Privileged mode.

Operating System Kernel

Portion of the operating system that is running in privileged mode
Usually resident (stays) in main memory
Contains fundamental functionality
- Whatever is required to implement other services
- Whatever is required to provide security
Contains most-frequently used functions
Also called the nucleus or supervisor

The Operating System is Privileged

Applications should not be able to interfere or bypass the operating system

OS can enforce the “extended machine”

OS can enforce its resource allocation policies

Prevent applications from interfering with each other

The Structure of a Computer System

Applications interact with themselves and via function calls to library procedures
OS and application interacts via System calls

A Note on System Libraries

System libraries are just that, libraries of support functions (procedures, subroutines)

Only a subset of library functions are actually system calls, and system call functions are in the library for convenience

Privilege-less OS

Can implement OS functionality, but cannot enforce it.

Some Embedded OSs have no privileged component

All software runs together
No isolation
One fault potentially brings down entire system

Operating System Software

Fundamentally, OS functions the same way as ordinary computer software.
- It is machine code that is executed (same machine instructions as application).
- It has more privileges (extra instructions and access).
Operating system relinquishes control of the processor to execute other programs, it reestablishes control after
- System calls
- and Interrupts (especially timer interrupts).

Operating System Internal Structure

The Monolithic Operating System Structure, Also called the “spaghetti nest” approach(Everything is tangled up with everything else. )

Processes and Threads

Major Requirements of an Operating System

Interleave the execution of several processes to maximize processor utilization while providing reasonable response time
Allocate resources to processess
Support interprocess communication and user creation of processes

Processes and Threads

Processes

Also called a task or job
Execution of an individual program
“Owner” of resources allocated for program execution
- trace the usage of processes, clean up memory after finishing execution
Encompasses one or more threads

Threads

The sequence of execution through an application

Unit of execution
Can be traced
- list the sequence of instructions that execute
Belongs to a process
- Process provide environment, and all thread running in the process share the environment
- Executes within it

Process

The Process Model

Single process machine
the process model

Multiprogramming of four programs
Conceptual model of 4 independent, sequential processes(with a single thread each)
Only one program active at any instant

threads and processes

the box represents a process, and the line represents the thread and the dotted lines represent what the OS can do

Process and thread models of selected OSes

Single process, single thread
- MSDOS
Signle process, multiple threads
- OS/161 as distributed
Multiple processes, single thread
- Traditional UNIX
Multiple processes, multiple threads
- Modern Unix(Linux, Solaris), Windows

Process Creation

Principal events that cause process creation

System initialization
- Foreground processes(interactive programs)
- Background processes
  - Email server, web server, print server, etc.
  - Called a daemon(unix) or service(Windows)
Execution of a process creation system call by running a process
- New login shell for an incoming ssh connection
User request to create a new process
Initiation of a bach job

Note: Technically, all these cases use the same system machanism to create new processes

Process Termination

Conditions which terminate processes

Normal exit(voluntary)
Error exit(voluntary)
Fatal error(involuntary)
- e.g. segmentation fault
Killed by another process(involuntary)

Implementation of Processes

A processes’ information is stored in a process control block(PCB)
The PCBs form a process table
- Reality can be complex(hashing, chaining, allocation bitmaps,…)
- Size can be dynamic

Example fields of a process table entry

Process Management	Memory management	File management
Registers	Pointer to text segment	Root directory
Program counter	Pointer to data segment	working directory
Program status word	Pointer to stack segement	File descriptors
Stack pointer		User ID
Process state		Group ID
Priority
Scheduling parameters
Process ID
Parent process
process group
Signals
time when process started
CPU time used
Children’s CPU time
Time of next alarm

Process/Thread States

Three states process model(the minial model the OSes implement) states

In the image the numbers prepresent:

Process blocks for input

Scheduler picks another process

Scheculer picks this process

Input becomes available

Possible process/thread states
- running
- blocked
- ready

Generally, if no IO, processes almost all sitting in running and ready

Transitions between states shown

some Transition Causing Events

Running -> Ready
- Voluntary Yield()
- End of timeslice
Running -> Blocked
- Waiting fot input
  - File, network,
- Waiting for a timer(alarm signal)
- Waiting for a resource to become available

Scheduler

responsible for want to run next

Sometimes alse called the dispatcher
- The literature is alse a little inconsistent on with terminology
Has to choose a Ready process to run
- It is inefficent to search through all processes

The Ready Queue

What about blocked processes

When an unblocking event occurs, we also wish to avoid scanning all processes to select one to make Ready

using two queues(One option)
blocked queue

another option(using a queue for every event)
blocked queues

Implementation

Minimally consist of three segments
- Text
  - contains the code (instructions)
- Data
  - Global variables
- Stack
  - Activation records of procedure/function/method
  - Local variables
User-mode
- Processes (programs) scheduled by the kernel
- Isolated from each other
- No concurrency issues between each other
System-calls transition into and return from the kernel
Kernel-mode
- Nearly all activities still associated with a process
- Kernel memory shared between all processes
- Concurrency issues exist between processes concurrently executing in a system call

Thread

The Thread Model

thread model

(a) Three processes each with one thread
(b) One process with three thread

Separating execution from the environment

Per process items	Per thread items
Address space	Program counter
Global variables	Registers
Open files	Stack
Child processess	State
Pending alarms
Signals and signal handlers
Accounting information

Items shared by all threads in a process
Item private to each thread
The Thread Model
- State implicitly stored on the stack
- Local variables are per thread
  - Allocated on the stack
  - Don’t have concurrency issues
- Global variables are shared between all threads
  - Allocated in the data section
  - Concurrency control is an issue
- Dynamically allocated memory(malloc) can be glocal or local
  - Program defined(the pointer can be global or local)
  - If the pointer is global variable(has concurrency issue) while locals don’t

Thread Usage

Example

// dispatcher thread
While(TRUE) {
	get_next_request(&buf);
	handoff_work(&buf);
}

// worker thread - can overlap disk I/O with the execution of other threads
while (TRUE) {
	wait_for_work(&buf);
	look_for_page_in_cache(&buf, &page);
	if (page_not_in_cache(&page)
		read_page_from_disk(&buf, &page);
	return_page(&page);
}

Model	Characteristics
Threads	Parallelism, blocking system calls
Sigle-threaded process	No parallelism, blocking system calls
Finite-state machine	Parallelism, nonblocking system calls, interrupts

Why Threads?

Simpler to program than a state machine
Less resources are associated with them a complete process
- Cheaper to create and destory
- Shares resources(especially memory) between them
Performace: Threads waiting for I/O can be overlapped with computing threads
- Note if all threads are compute bound, then there is no performance improvement(on a uniprocessor)
Threads can take advantage of the parallelism available on machines with more than one CPU(multiprocessor)

User-level Threads

Implementation at user-level
- User-level Thread Control Block (TCB), ready queue, blocked queue, and dispatcher
- Kernel has no knowledge of the threads (it only sees a single process)
- If a thread blocks waiting for a resource held by another thread inside the same process, its state is saved and the dispatcher switches to another ready thread
- Thread management (create, exit, yield, wait) are implemented in a runtime support library
Pros
- Thread management and switching at user level is much faster than doing it in kernel level
  - No need to trap (take syscall exception) into kernel and back to switch
- Dispatcher algorithm can be tuned to the application
- Can be implemented on any OS (thread or non-thread aware)
- Can easily support massive numbers of threads on a per-application basis
  - Use normal application virtual memory
  - Kernel memory more constrained. Difficult to efficiently support wildly differing numbers of threads for different applications.
Cons
- Threads have to yield() manually (no timer interrupt delivery to userlevel)
  - Co-operative multithreading
    - A single poorly design/implemented thread can monopolise the available CPU time
  - There are work-arounds (e.g. a timer signal per second to enable preemptive multithreading), they are course grain and a kludge.
- Does not take advantage of multiple CPUs (in reality, we still have a single threaded process as far as the kernel is concerned)
- If a thread makes a blocking system call (or takes a page fault), the process (and all the internal threads) blocks
  - Can’t overlap I/O with computation

Kernel-level threads

Cons
- Thread creation and destruction, and blocking and unblocking threads requires kernel entry and exit.
  - More expensive than user-level equivalent
Pros
- Preemptive multithreading
- Parallelism
  - Can overlap blocking I/O with computation
  - Can take advantage of a multiprocessor

Context Switch

A context switch can refer to a switch between threads, involving saving and restoring of state associated with a thread; A switch between processes, involving the above, plus extra state associated with a process.

Context Switch Occurrence

A switch between process/threads can happen any time the OS is invoked

On a system call
- Mandatory if system call blocks or on exit()
On an exception
- Mandatory if offender is killed
On an interrupt
- Triggering a dispatch is the main purpose of the timer interrupt

A thread switch can happen between any two instructions

Note instructions do not equal program statements

Context Switch

Context switch must be transparent for processes/threads
- When dispatched again, process/thread should not notice that something else was running in the meantime (except for elapsed time)
- OS must save all state that affects the thread
This state is called the process/thread context
Switching between process/threads consequently results in a context switch.

Concurrency and Synchronisation

Concurrency Example

// count is a global variable shared between two threads
int count = 0;
void increment () 
{
	int t;
	t = count;
	t = t + 1;
	count = t;
{
void decrement ()
{
	int t;
	t = count;
	t = t - 1;
	count = t;
}

Have a race condition. Occurred when global variables shared by threads

There is in-kernel concurrency even for single-threaded process
The OS has to deal with concurrency even if it is a single thread application.

Critical Region

We can control access to the shared resource by controlling access to the code that accesses the resource

A critical region is a region of code where shared resources are accessed(e.g. Variables, memory, files, etc…)

Uncoordinated entry to the critical region results in a race condition
- => Incorrect behavior, deadlock, lost work,…

Indentifying critical regions

Critical regions are regions of code that:
- Access a shared resource,
- and correctness relied on the shared resource not being concurrently modified by another thread/process/entity.

// count is a global variable shared between two threads
int count = 0;
void increment ()
{
        int t;
        t = count; // the start of critical region
        t = t + 1;
        count = t; // end
{
void decrement ()
{
        int t;
        t = count; // start
        t = t - 1;
        count = t; // end
}

if we can make the critical regions never overlap, we won’t have concurrency problem.
mutual exclusion using critical regions

Example critical regions

struct node {
	int data;
	struct node *next;
}
struct node *head;

void init(void) 
{
	head = NULL;
}

void insert(struct *item)
{
	item->next = head;
	head = item;
}

struct node *remove(void) 
{
	struct node *t;
	t = head;
	if (t != NULL) {
		head = head->next;
	}
	return t;
}

Race example

// thread 1
void insert(struct *item)
{
        item->next = head; // 1
        head = item;       // 2
}

// thread 2
void insert(struct *item)
{
        item->next = head; // 3
        head = item;       // 4
}

one possible sequence: 1 -> 3 -> 2 -> 4, then the item in thread 1 is lost(seg fault)
both1-2 and 3-4 are critical regions

Critical regions solutions

We seek a solution to coordinate access to critical regions
- Also called critical sections
Conditions required of any solution to the critical region problem
- Mutual Exclusion: No two processes simultaneously in critical region
- No assumptions made about speeds or numbers of CPUs
- Progress
  - No process running outside its critical region may block another process
- Bounded
  - No process waits forever to enter its critical region

A solution?

A lock variable
- if lock == 1, somebody is in the critical section and we mush wait
- if lock == 0, nobody is in the critical section and we are free to enter

// A problematic execution sequence
while (TRUE) {
	while (lock == 1); // 1
	lock = 1;          // 2
	critical();        // 3
	lock = 0;          // 4
	non_critical();    // 5
}

while (TRUE) {
	while (lock == 1); // 6
        lock = 1;          // 7
        critical();        // 8
        lock = 0;          // 9
        non_critical();    // 10
}
/* does not work because if it execute like
* 6 -> 1 -> 2 -> 3
* both of them see the lock is not one and may all work in their critical section
*/

In the example above, there is race contition between the observation that the lock variable is not one and setting the lock to prevent somebody else from entering.

Mutual Exclusion by Taking Turns

while (TRUE) {
	while (turn != 0);
	critcal_rigion();
	turn = 1;
	noncritical_region();
}
while (TRUE) {
        while (turn != 1);
        critcal_rigion();
        turn = 0;
        noncritical_region();
}

This example works because the update is only been updated by the thread having its turn.

Works due to strict alternation
- Each process takes turns
Cons
- Busy waiting(loop)
  - sol: not my turn, do sth else, come back to my turn
- Process must wait its turn even while the other process is doing something else.
  - With many processes, must wait for everyone to have a turn
    - Does not guarantee progress if a process no longer needs a turn
  - Poor solution when processes require the critical section at differing rates

Mutual Exclusion by Disabling Interrupts

only used in very short critical sections, because it prespond other devices in the machine and make them less responsive

Before entering a critical region, disable interrupts
After leaving the critical region, enable interrupts
Pros
- simple
Cons
- Only available in the kernel
- Blocks everybody else, even with no contention
  - slows interrupt response time
- Does not work on a multiprocessor

Hardware Support for mutual exclusion

Test and set instruction
- can be used to implement lock variables correctly
  - It loads the value of the lock
  - If lock == 0,
    - set the lock to 1
    - return the result 0 – we acquire the lock
  - If lock == 1
    - return 1 – another thread/process has the lock
Hardware guarantees that the instruction executes atomically(atomically: as an indivisible unit)

enter_region:
	TSL REFISTER,LOCK | copy lock to register and set lock to 1
	CMP REGISTER, #0  | was lock one?
	JNE enter_region  | if it was non zero, lock was set, so loop
	RET               | return to caller; critical region entered

leave_region:
	MOVE LOCK, #0     | store a 0 in lock
	RET               | return to caller

Pros
- Simple(easy to show it’s correct)
- Available at user-level
  - To any number of processors
  - To implement any number of lock variables
Cons
- Busy waits(also termed a spin lock)
  - consumes CPU
  - starvation is possible when a process leaves its critical section and more than one process is waiting

Tackling the Busy-Wait Problem

Sleep / Wake up
- The idea
  - When process is waiting for an event, it calls sleep to block, instead of busy waiting
  - The event happens, the event generator(another process) calls wakeup to unblock the sleeping process.
  - Walking a ready/running process has no effect

The producer-Consumer Problem

Also called the bounded buffer problem
A producer produces data items and stores the items in a buffer
A consumer takes the items out of the buffer and consumes them.

two problems: Concurrency(because shared resource) and the producer should be blocked when the buffer is full and the consumer should be blocked when the buffer is empty.
- We must keep accurate count of items in the buffer
- The consumer can call wakeup when it consumes the first entry of the full buffer
- The Producer can call wakeup when it adds the first item to the buffer

Semaphores

Dijkstra introduced two primitives that are more powerful than simple sleep and wakeup alone.
- P(): proberen, from Dutch to test
- V(): verhogen, from Dutch to increment
- Also called wait & signal, down & up

How do they work

If a resource is not available, the corresponding semaphore blocks any process waiting for the resource
Blocked processes are put into a process queue maintained by the semaphore(avoids busy waiting)
When a process releases a resource, it signals this by means of the semaphore
Signalling resumes a blocked process if there is any
Wait(P) and signal(V) operations cannot be interrupted
Complex coordination can be implemented by multiple semaphores

Semaphore implementation

Define a semaphore as a record

Each primitive is atomit(wait and signal), the OS would use the disabling of interrupts in the implementation of Semaphore inside the OS.

typedef sturct {
	int count;
	struct process *L;
} semaphore;
// Semaphore operations now defined as
wait(S):
	S.count --;
	if (S.count < 0) {
		add this process to S.L;
		sleep;
	}
signal(S):
	S.count++;
	if (S.count <= 0) {
		remove a process P from S.L;
		wakeup(P);
	}

Semaphore as a General Synchronization Tool

Execute B in Pj only after A executed in Pi
Use semaphore count initlialized to 0
Code:
- A first: signal(count is 1) -> then B runs(count is 0) -> A(this is what we want)
- B first: wait(count < 0, sleep) -> A runs, wake up B(still what we want)

Semaphore implementation of a Mutex

Mutex is short for Mutual Exclusion
- Can also be called a lock

semaphore mutex;
mutex.count = 1; /* init the mutex */
wait(mutex); /* enter the critical region */
Blahblah();
signal(mutex); /* exit the critical region */

Notice that the inital count determines how mant waits can progress before blocking and requiring a signal => mutex.count initialised as 1

Solve the producer-consumer problem

#define N = 4;
semaphore mutex = 1;

/* count empty slots */
semaphore empty = N

/* count full slots */
semaphore full = 0;

prod() {
	while(TRUE) {
		item = produce()
		wait(empty);
		wait(mutex)
		insert_item();
		signal(mutex);
		signal(full);
	}
}

con() {
	while(TRUE) {
		wait(full);
		wait(mutex);
		remove_item();
		signal(mutex);
		signal(empty);
	}
}

Summary

Semaphores can be used to solve a varity of concurrency problems
However, programming with then can be error-prone
- E.g. must signal for every wait for mutexes
  - Too many or few signals or waits, or signals and waits in the wrong order, can have catastrophic results

Monitors

To ease concurrent programming, Hoare(1974) proposed monitors
- A higher level synchronisation primitive
- Programming language construct
Idea
- A set of procedures, variables, data types are grouped in a special kind of module, a monitor.
  - Variables and data types only accessed from within the monitor
- Only one process/thread can be in the monitor at any one time
  - Mutual exclusion is implemented by the compiler(which should be less error prone)
When a thread calls a monitor procedure that has a thread already inside, it is queued and it sleeps until the current thread exits the monitor
example

monitor exmaple
	integer i;
	condition c;

	procedure producer();
	.
	.
	.
	end;

	procedure consumer();
	.
	.
	.
	end;
end monitor;

How do we block waiting for an event?

We need a mechanism to block waiting for an event( in addition to ensuring mutual exclusion)
- for producer consumer problem, when buffer is empty or full
Condition variables
- To allow a process to wait within the monitor, a condition variable must be declared, as condition x, y
- Condition variable can only be used with the operations wait and sigal
  - The operation x.wait();
    - means that the process invoking this operation is suspended until another process invokes
    - Another thread can enter the monitor while original is suspended
  - x.signal();
    - The operation resumes exactly on suspended process. If no process is suspended, then the signal opeartion has no effect.

The Readers and Writers Problem

Models access to a data base
- E.g. airline reservation system
- Can have more thant one concurrent reader
  - To check schedules and reservations
- Writers must have exclusive access
  - To book a ticket or update a schedule

A solution

typedef in semaphore; 						/* use your imagination */
semaphore mutex = 1;							/* controls access to 'rc'*/
semaphore db = 1;									/* controls access to the database */
int rc = 0;                       /* # of processes reading ot wanting to */

void reader(void) {               
  while(TRUE) {                   /* repeat forever */
    down(&mutex);                 /* get exclusive access to 'rc' */
    rc = rc + 1;                  /* one reader more now */
    if(rc == 1) down(&db);        /* if this is the first reader */
    up(&mutex);                   /* release exclusive access to 'rc' */
    read_data_base();             /* access the data */
    down(&mutex);                 /* get exclusive access to 'rc' */
    rc = rc - 1;                  /* one reader fewer now */
    if(rc == 0) up(&db);          /* if this is the last reader */
    up(&mutex);                   /* release exclusive access to 'rc */
    use_data_read();              /* noncritical region */
  }
  
  void writer(void) { 
    while (TRUE) {                /* repeat forever */
      think_up_date();            /* noncritical regionn */
      down(&db);                  /* get exclusive access */
      wirte_data_base();          /* update the data */
      up(&db);                    /* release exclusive access */
    }
  }
}

Deadlock

Resources

Examples of computer resources
- printers
- tape drives
- Tables in a database
Processes need access to resources in reasonable order
Preemptable resources
- e.g. virtual memory
- can be taken away from a process with no ill effects
Nonpreemtable resources
- will cause the process to fail if take away

Resources & Deadlocks

Suppose a process holds resource A and requests resource B
- at same time another process holds B and requests A
- both are blocked and remain so – Deadlock
Deadlocks occur when …
- processes are granted exclusive access to devices, locks, tables, etc..
- we refer to these entities generally as resources

Resource Access

Sequence of events required to use a resource

request the resource
use the resource
release the resource

Must wait if request is denied
- requesting process may be blocked
- may fail with error code

Deadlock

A set of processes is deadlocked if each process in the set is waiting for an event that only another process in the set can cause.

Usually the event is release of a currently held resource
None of the processes can …
- run
- release resources
- be awakened

Four Conditions for Deadlock

Mutual exclusion condition
- each resource assigned to 1 process or is available
Hold and wait condition
- process holding resources can request additional
No preemption condition
- previously granted resources cannot be forcibly taken away
Circular wait condition
- must be a circular chain of 2 or more processes
- each is waiting for resource held by next member of the chain

Strategies for dealing with Deadlocks

Just ignore the problem altogether
prevention

Negating one of the four necessary conditions
Detection and recovery
Dynamic avoidance

Careful resource allocation

Appoarch 1: The Ostrich Algorithm

Pretend there is no problem
Reasonable if
- deadlocks occur very rarely
- Cost of prevention is high
  - Example of “cost”, only one process runs at a time
UNIX and Windows takes this approach for some of the more complex resource relationships they manage
It’s a trade off between
- Convenience (engineering approach)
- Correctness(mathematical approach)

Approach 2: Deadlock Prevention

The most common used strategy

Resource allocation rules prevent deadlock by prevent one of the four condition required for deadlock from occuring

Mutual exclusion
- Not feasible in general
  - Some devices/resource are intrinsically not shareable
Hold and wait
- Require processes to request resources before starting
  - a process never has to wait for what it needs
- Issues
  - may not know required resources at start of run
    - must be determined at run time or by user input(e.g. using Word to open a file)
    - $\Rightarrow$ not always possible
  - Also ties up resources other processes could be using
- Varitions:
  - Process must give up all resources if it would block holding a resource
  - then request all immediately needed
  - prone to livelock
    - Livelocked processes are not blocked, change state regularly, but never make progress.
No preemption
- This is not a viable option
Circular Wait
- aquire resource in the same order, order resources

Approach 3: Detection and Recovery

Need a method to determine if a system is deadlocked
Assuming deadlocked is detected, we need a method of recorvery to restore progress to the system.

Modeling resources and processes as a directed graph

(a) represents everthing in the system

(b) is a deadlock

What about resources with multiple units?

Some examples of multi-unit resources
- RAM
- Blocks on a hard disk drive
- Slots in a buffer
We need an approach for dealing with resources that consists of more than a single unit

Modeling using matrix

Data structured needed by deadlock detection algorithm

sum of current resource allocation + resources available = resources that exist
$$
\sum_{i=1}^{n}{C_{ij}}+A_j = E_j
$$
Example

Process 1: has 1 scanners
Process 2: has 2 tape drivers and 1 CD roms
Process 3: has 1 plotters and 2 scanners

Detection Algoritm

Look for an unmarked process $Pi$, for which the $i$-th row of R is less than equal to A
If found, add the $i$-th row of C to A, and mark $Pi$. Got to step 1
If no such process exists, terminate

Remaining processes are deadlocked.

Recovery from Deadlock

Recovery through preemption
- Take a resource from some other process
- Depends on nature of the resource
Recovery through rollback
- Checkpoint a process periodically
  - not come for free, not practical
- Use this saved state
- Restart the process if it is found deadlocked
  - No guarantee is won’t deadlock again
Recovery through killing processes
- Crudest but simplest way to break a deadlock
- Kill one of the processes in the deadlock cycle
- The other processes get its resources
- Choose process that can be rerun from the beginning

Approach 4 Deadlock Avoidance

Not practical for all systems, need to know enough information in advance

Two process resource trajectories

Safe and Unsafe States

A state is safe if
- The system is not deadlocked
- There exists a scheduling order that results in every process running to completion, even if they all request their maximum resources immediately

Unsafe states are not necessarily deadlocked
- with a lucky sequence, all processes may complete
- However, we cannot guarantee that they will complete(not deadlock)
Safe states guarantee we will eventually complete all processes
Deadlock avoidance algorithm
- Only grant requests that result in safe states

Bankers Algorithm

Modelled on a Banker with Customers
- The banker has a limited amount of money to loan customers
  - Limited number of resources
- Each customer can borrow money up to the customer’s credit limit
  - Maximum number of resources required
Basic idea
- keep the bank in a safe state
  - So all customers are happy even if they all request to borrow up to their credit limit at the same time
- Customers wishing to borrow such that the back would enter an unsafe state must wait until somebody else repays their loan such that the transaction becomes safe.
Bankers Algorithm is not commonly used in practice
- It is difficult (sometimes impossible ) to know in advance
  - the resources a process will require
  - the number of processes in a dynamic system

Starvation

A process never receives the resource it is waiting for, despite the resource(repeatedly) becoming free, the resource is always allocated to another waiting process.

One solution: First-come, first-serve policy

System Call

System Calls

Can be viewed as special function calls
- Provides for a controlled entry into the kernel
- While in kernel, they perform a privileged operation
- Returns to original caller with the result
The system call interface represents the abstract machine provided by the operating system.

A brief overview

From the user’s perspective

Process Management
File I/O
Dirctories management
Some other selected Calls
There are many more
- On Linux, see man syscalls for a list

Process Management

Call	Descritption
pid=fork()	Create a child process identical to the parent
Pid = waitpid(pid, &statloc, options)	Wait for a child to terminate
s = execve(name, argv, environp)	Replace a process’ core image
exit(status)	Terminate process execution and return status

File Management

Call	Descrition
fd=open(file, how, …,)	Open a file for reading, writing or both
s = close(fd)	Close an open file
n = read(fd, buffer, bytes)	Read data from a file into a buffer
n = write(fd, buffer, bytes)	Write data from a buffer into a file
Position = lseek(fd, offset, whence)	Move the file pointer
s = stat(name, &buf)	Get a file’s status information

A stripped down shell:

while(TRUE) {                       /* repeat forever */
  type_prompt();                    /* display prompt */
  read_command(command, parameters);
  if (fork() != 0) {                /* fork off child process*/
    /* Parent code */
    waitpid(-1, &status, 0);        /* wait for child to exit */
  } else {
    /* Child code */
    execve(command, parameters, 0); /* execute command */
  }
}

System Call Implementation

A Simple Model of CPU Computation

The fetch-execute cycle

Load memory contents from address in program counter (PC)
- The instruction
Execute the instruction
increment PC
Repeat

A Simple Model of CPU Computation

Stack Pointer(SP)
Status Register

Condition codes
- Positive result
- Zero result
- Negative result
General Purpose Register
- Holds operands of most instructions
- Enables programmers (compiler) to minmise memory reference

Privileged-mode Operation

To protect operating system execution, two or more CPU modes of operation exist

Privileged mode(system, kernel-mode)
- All instructions and register are available
User-mode
- Users ‘safe’ subset of instruction set
  - Only affects the state of the application itself
  - They cannot be used to uncontrollably interference with OS
- Only ‘safe’ registers are accessible

System Call

System Call Mechanism Overview

System call transitions triggered by special processor instructions
- User to Kernel
  - System call instruction
- Kernel to User
  - Return from privileged mode instruction

Processor mode
- Switched from user-mode to kernel-mode
  - Switched back when returning to user mode
Stack Pointer (SP)
- User-level SP is saved and a kernel SP is initialised
  - User-level SP restored when returning to user-mode
Program Counter (PC)
- User-level PC is saved and PC set to kernel entry point
  - User-level PC restored when returning to user-level
- Kernel entry via the designated entry point must be strictly enforced
Registers
- Set at user-level to indicate system call type and its arguments
  - A convention between applications and the kernel
- Some registers are preserved at user-level or kernel-level in order to restart user-level execution
  - Depends on language calling convention etc
- Result of system call placed in registers when returning to user-level
  - Another convention

Why do we need system calls?

Why not simply jump into the kernel via a function call????

Function calls do not

Change from user to kernel mode
- and eventually back again
Restrict possible entry points to secure locations
- To prevent entering after any security checks

Steps in Making a System Call

File Management

Overview of the FS abstraction

User’s view	Under the hood
Uniform namespace	Heterogenrous collection of storage devices
Hierachical structure	Flat address space(block numbers)
Arbitrarily-sized files	Fixed-size blocks
Symbolic file names	Numeric block addresses
Access control	No access control
Tools for (formatting, defragmentation, Backup, Consistency checking)

File Names

File system must provide a convenient naming scheme

Textual Names
May have restrictions
- Only certain characters
- Limited length
- Only certain format
Case (in)sensitive
Names may obey conventions
- Interpreted by tools
- interpreted by operating system

File Structure

Sequence of Bytes

OS consider a file to be unstructured
Applications can impose their own structure
Used by UNIX, Window, most modern OSes

File Types

Regular File

Directories

Device Files

May be divided into
- Character Devices – stream of bytes
- Block Devices
Some systems distinguish between regular file types
- ASCII text files, binary files

File Access Types(Patterns)

Sequential access

read all bytes/records from the beginning
cannot jump around, could rewind or back up
convenient when medium was magnetic tape

Random access

bytes/records read in any order
essential for data base systems
read can be …
- move file pointer (seek), then read or
  - lseek(location,…);read(…)
- each read specifies the file pointer
  - read(location,…) 11

Typical File Operations

Create Delete Open Close Read Write Append Rename Seek Get attributes Set Attributes

File Organisation and Access

(Programmer’s Perspective)

Given an operating system supporting unstructured files that are a stream-of-bytes,how can one organise the contents of the files?

E.g. Executable Linkable Format (ELF)

Some possible access patterns:

Read the whole file –Read individual records from a file
record = sequence of bytes containing the record
Read records preceding or following the current one
Retrieve a set of records
Write a whole file sequentially
Insert/delete/update records in a file

Programmers are free to structure the file to suit the application

Criteria for File Organization

Things to consider when designing file layout

Rapid access
- Needed when accessing a single record
- Not needed for batch mode
  - read from start to finish
Ease of update
- File on CD-ROM will not be updated, so this is not a concern
Economy of storage
- Should be minimum redundancy in the data
- Redundancy can be used to speed access such as an index

File Directories

Provide mapping between file names and the files themselves

Contain information about files

Attributes
Location
Ownership

Directory itself is a file owned by the operating system

Hierarchical (Tree-Structured) Directory

Files can be located by following a path from the root, or master, directory down various branches

This is the absolute pathname for the file

Can have several files with the same file name as long as they have unique path names

Current Working Directory

Always specifying the absolute pathname for a file is tedious!

Introduce the idea of a working directory

Files are referenced relative to the working directory

Relative and Absolute Pathnames

Absolute pathname

A path specified from the root of the file system to the file

A Relative pathname

A pathname specified from the cwd

Note: ‘.’ (dot) and ‘..’ (dotdot) refer to current and parent directory

Typical Directory Operations

Create Delete Opendir Closedir Readdir Rename Link Unlink

Nice properties of UNIX naming

Simple, regular format

Names referring to different servers, objects, etc., have the same syntax.
- Regular tools can be used where specialised tools would be otherwise be needed.

Location independent

Objects can be distributed or migrated, and continue with the same names.

In multiuser system, allow files to be shared among users

Two issues

Access rights
- None
  - User may not know of the existence of the file
  - User is not allowed to read the directory that includes the file
- Knowledge
  - User can only determine that the file exists and who its owner is
- Execution
  - The user can load and execute a program but cannot copy it
- Reading
  - The user can read the file for any purpose, including copying and execution
- Appending
  - The user can add data to the file but cannot modify or delete any of the file’s contents
- Updating
  - The user can modify, delete, and add to the file’s data. This includes creating the file, rewriting it, and removing all or part of the data
- Changing protection
  - User can change access rights granted to other users
- Deletion
  - User can delete the file
- Owners
  - Has all rights previously listed
  - May grant rights to others using the following classes of users
    - Specific user
    - User groups
    - All for public files
Management of simultaneous access
- Most OSes provide mechanisms for users to manage concurrent access to files
  - Example: flock(), lockf(), system calls
- Typically
  - User may lock entire file when it is to be updated
  - User may lock the individual records (i.e. ranges) during the update
- Mutual exclusion and deadlock are issues for shared access

File System

The FS must map symbolic file names into a collection of block addresses
The FS must keep track of
- which blocks belong to which files
- in what order the blocks form the file
- which blocks are free for allocation
Given a logical region of a file, the FS must track the corresponding block(s) on disk
- Stored in file system metadata

File Allocation Methods

A file is divided into “blocks” – the unit of transfer to storage

External and internal fragmentation

External fragmentation

The space wasted external to the allocated memory regions
Memory space exists to satisfy a request but it is unusable as it is not contiguous

Internal fragmentation

The space wasted internal to the allocated memory regions
Allocated memory may be slightly larger than requested memory; this size difference is wasted memory internal to a partition

Contiguous Allocation

✅ Easy bookkeeping (need to keep track of the starting block and length of the file)

✅ Increases performance for sequential operations

❌ Need the maximum size for the file at the time of creation

❌ As files are deleted, free space becomes divided into many small chunks (external fragmentation)

Dynamic Allocation Strategies

Disk space allocated in portions as needed

Allocation occurs in fixed-size blocks

✅ No external fragmentation

✅ Does not require pre-allocating disk space

❌ Partially filled blocks (internal fragmentation)

❌ File blocks are scattered across the disk

❌ Complex metadata management (maintain the collection of blocks for each file)

Dynamic allocation: Linked list allocation

Each block contains a pointer to the next block in the chain. Free blocks are also linked in a chain.

✅ Only single metadata entry per file

✅ Best for sequential files

❌ Poor for random access

❌ Blocks end up scattered across the disk due to free list eventually being randomised

Dynamic Allocation: File Allocation Table (FAT)

Keep a map of the entire FS in a separate table

A table entry contains the number of the next block of the file
The last block in a file and empty blocks are marked using reserved values

The table is stored on the disk and is replicated in memory

✅ Random access is fast (following the in-memory list)

❌ Requires a lot of memory for large disks

❌ Free block lookup is slow

Dynamical Allocation: inode-based FS structure

Idea: separate table (index-node or i-node) for each file.

Only keep table for open files in memory
Fast random access

The most popular FS structure today

❌ i-nodes occupy one or several disk areas

❌ i-nodes are allocated dynamically, hence free-space management is required for i-nodes

Use fixed-size i-nodes to simplify dynamic allocation
Reserve the last i-node entry for a pointer (a block number) to an extension i-node.

i-node implementation issues

Free-space management

Approach 1: linked list of free blocks in free blocks on disk
- List of all unallocated blocks
- Background jobs can re-order list for better contiguity
- Store in free blocks themselves
  - Does not reduce disk capacity
- Only one block of pointers need be kept in the main memory
Approach 2: keep bitmaps of free blocks and free i-nodes on disk
- Individual bits in a bit vector flags used/free blocks
- May be too large to hold in main memory
- Expensive to search
- Concentrating (de)allocations in a portion of the bitmap has desirable effect of concentrating access
- Simple to find contiguous free space

Implementing directories

Directories are stored like normal files

directory entries are contained inside data blocks

The FS assigns special meaning to the content of these files

a directory file is a list of directory entries
a directory entry contains file name, attributes, and the file i-node number
- maps human-oriented file name to a system-oriented name

Fixed-size vs variable-size directory entries

Fixed-size directory entries
- Either too small
- Or waste too much space
Variable-size directory entries
- Freeing variable length entries can create external fragmentation in directory blocks
  - Can compact when block is in RAM

Searching Directory Listings

Locating a file in a directory

Linear scan
- Implement a directory cache in software to speed-up search
Hash lookup
B-tree (100’s of thousands entries)

Storing file attributes

(a) disk addresses and attributes in directory entry –FAT

(b) directory in which each entry just refers to an i-node –UNIX

Inode Contents

Size

Offset of the highest byte written

Block count

Number of disk blocks used by the file.
Note that number of blocks can be much less than expected given the file size

Files can be sparsely populated

E.g. write(f,“hello”); lseek(f, 1000000); write(f, “world”);
Only needs to store the start and end of file, not all the empty blocks in between.
Size = 1000005
Blocks = 2 + any indirect blocks

Direct Blocks

Block numbers of first 12 blocks in the file
Most files are small – We can find blocks of file directly from the inode

Single Indirect Block – Block number of a block containing block numbers

Requires two disk access to read – One for the indirect block; one for the target block
Max File Size – Assume 1Kbyte block size, 4 byte block numbers
- 12 * 1K + 1K/4 * 1K = 268 KiB
For large majority of files (< 268 KiB), given the inode, only one or two further accesses required to read any block in file

Double Indirect Block – Block number of a block containing block numbers of blocks containing block numbers

Triple Indirect

Hard Links

Note that inodes can have more than one name -– Called a Hard Link

Symbolic links

A symbolic link is a file that contains a reference to another file or directory

Has its own inode and data block, which contains a path to the target file
Marked by a special file attribute
Transparent for some operations
Can point across FS boundaries

FS reliability

Disk writes are buffered in RAM

OS crash or power outage ==> lost data
Commit writes to disk periodically
Use the sync command to force a FS flush

FS operations are non-atomic

Incomplete transaction can leave the FS in an inconsistent state

Journaling file systems

Keep a journal of FS updates
Before performing an atomic update sequence
write it to the journal
Replay the last journal entries upon an unclean shutdown

The ext3 journal

Option1: Journal FS data structure updates

✅ Efficient use of journal space; hence faster journaling

❌ Individual updates are applied separately

❌ The journaling layer must understand FS semantics

Option2: Journal disk block updates

Ext3 implements Option 2

❌ Even a small update adds a whole block to the journal

✅ Multiple updates to the same block can be aggregated into a single update

✅ The journaling layer is FSindependent (easier to implement)

Journaling Block Device (JBD)

JBD interface (continued)

Commit: write transaction data to the journal (persistent storage)
- Multiple FS transactions are committed in one go
Checkpoint: flush the journal to the disk
- Used when the journal is full or the FS is being unmounted

Trade-off in FS block size

File systems deal with 2 types of blocks
- Disk blocks or sectors (usually 512 bytes)
- File system blocks 512 * 2^N bytes
Larger blocks require less FS metadata
Smaller blocks waste less disk space (less internal fragmentation)
Sequential Access
- The larger the block size, the fewer I/O operations required
Random Access
- The larger the block size, the more unrelated data loaded
- Spatial locality of access improves the situation
Choosing an appropriate block size is a compromise

Virtual File System(VFS)

Provides single system call interface for many file systems
Transparent handling of network file systems
File-based interface to arbitrary device drivers
File-based interface to kernel data structures
Provides an indirection layer for system calls
- File operation table set up at file open time
- Points to actual handling code for particular type
- Further file operations redirected to those functions

VFS Interface

Two major data types

VFS

Represents all file system types
Contains pointers to functions to manipulate each file system as a whole (e.g. mount, unmount)

Vnode

Represents a file (inode) in the underlying filesystem
Points to the real inode
Contains pointers to functions to manipulate files/inodes (e.g. open, close, read, write,…)

File Descriptors

Each open file has a file descriptor
Read/Write/lseek/…. use them to specify which file to operate on.

State associated with a file descriptor

File pointer
- Determines where in the file the next read or write is performed
Mode
- Was the file opened read-only, etc….

Memory Management

OS Memory Management

Keeps track of what memory is in use and what memory is free
Allocates free memory to process when needed
- And deallocates it when they don’t
Manages the transfer of memory between RAM and disk.
Two broad classes of memory management systems
- Those that transfer processes to and from external storage during execution.
  - Called swapping or paging
- Those that don’t
  - Simple
  - Might find this scheme in an embedded device, dumb phone, or smartcard.

Basic Memory Management

Monoprogramming without Swapping or Paging

Three simple ways of organizing memory

Monoprogramming
Okay if

Only have one thing to do

Memory available approximately equates to memory required

Otherwise,

Poor CPU utilisation in the presence of I/O waiting

Poor memory utilisation with a varied job mix

Fixed partitioning

Simple MM: Fixed, equal-sized partitions

Any unused space in the partition is wasted
- Called internal fragmentation
Processes smaller than main memory, but larger than a partition cannot run

Simple MM: Fixed, variable-sized partitions

Divide memory at boot time into a selection of different sized partitions
- Can base sizes on expected workload
Each partition has queue:
- Place process in queue for smallest partition that it fits in.
- Processes wait for when assigned partition is empty to start
Issue
- Some partitions may be idle
  - Small jobs available, but only large partition free
  - Workload could be unpredictable

Alternative queue strategy

Single queue, search for any jobs that fit

Small jobs in large partition if necessary
Increases internal memory fragmentation

Fixed Partition Summary

Simple
Easy to implement
Can result in poor memory utilisation
- Due to internal fragmentation
Used on IBM System 360 operating system (OS/MFT)
- Announced 6 April, 1964
Still applicable for simple embedded systems
- Static workload known in advance

Dynamic Partitioning

Partitions are of variable length
- Allocated on-demand from ranges of free memory
Process is allocated exactly what it needs
- Assumes a process knows what it needs
We end up with unusable holes (external fragmentation)

Classic Approach

Represent available memory as a linked list of available “holes” (free memory ranges).

Base, size
Kept in order of increasing address
- Simplifies merging of adjacent holes into larger holes.
List nodes be stored in the “holes” themselves

Dynamic Partitioning Placement Algorithm

First-fit algorithm

Scan the list for the first entry that fits
- If greater in size, break it into an allocated and free part
- Intent: Minimise amount of searching performed
Aims to find a match quickly
Biases allocation to one end of memory
Tends to preserve larger blocks at the end of memory

Next-fit

Like first-fit, except it begins its search from the point in list where the last request succeeded instead of at the beginning.
- (Flawed) Intuition: spread allocation more uniformly over entire memory to avoid skipping over small holes at start of memory
- Performs worse than first-fit as it breaks up the large free space at end of memory.

Best-fit algorithm

Chooses the block that is closest in size to the request
Performs worse than first-fit
- Has to search complete list
- does more work than first-fit
- Since smallest block is chosen for a process, the smallest amount of external fragmentation is left
- Create lots of unusable holes

Worst-fit algorithm

Chooses the block that is largest in size (worst-fit)
- (whimsical) idea is to leave a usable fragment left over
Poor performer
- Has to do more work (like best fit) to search complete list
- Does not result in significantly less fragmentation

Dynamic Partition Allocation Algorithm Summary

First-fit generally better than the others and easiest to implement

Compaction

We can reduce external fragmentation by compaction
- Shuffle memory contents to place all free memory together in one large block.
- Only if we can relocate running programs?
  - Pointers?
Generally requires hardware support

When are memory addresses bound?

Compile/link time
- Compiler/Linker binds the addresses
- Must know “run” location at compile time
- Recompile if location changes
Load time
- Compiler generates relocatable code
- Loader binds the addresses at load time
Run time
- Logical compile-time addresses translated to physical addresses by special hardware.

Hardware Support for Runtime Binding and Protection

For process B to run using logical addresses
- Process B expects to access addresses from zero to some limit of memory size
- Need to add an appropriate offset to its logical addresses
  - Achieve relocation
  - Protect memory “lower” than B
- Must limit the maximum logical address B can generate
  - Protect memory “higher” than B

Base and Limit Register

Also called
- Base and bound registers
- Relocation and limit registers
Base and limit registers
- Restrict and relocate the currently active process
- Base and limit registers must be changed at
  - Load time
  - Relocation (compaction time)
  - On a context switch
Pro
- Supports protected multi-processing (-tasking)
Cons
- Physical memory allocation must still be contiguous
- The entire process must be in memory
- Do not support partial sharing of address spaces
  - No shared code, libraries, or data structures between processes

Thus far, we have a system suitable for a batch system

Limited number of dynamically allocated processes
- Enough to keep CPU utilised
Relocated at runtime
Protected from each other

Swapping

A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution.
Backing store – fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images.
Can prioritize – lower-priority process is swapped out so higher-priority process can be loaded and executed.
Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped.
- slow
we assume a process is smaller than memory

Virtual Memory – Paging Overview

Partition physical memory into small equal sized chunks
- Called frames
Divide each process’s virtual (logical) address space into same size chunks
- Called pages
- Virtual memory addresses consist of a page number and offset within the page
OS maintains a page table
- contains the frame location for each page
- Used by hardware to translate each virtual address to physical address
- The relation between virtual addresses and physical memory addresses is given by page table
Process’s physical memory does not have to be contiguous
No external fragmentation
Small internal fragmentation (in last page)
Allows sharing by mapping several pages to the same frame
Abstracts physical organisation
- Programmer only deal with virtual addresses
Minimal support for logical organisation
- Each unit is one or more pages

Virtual Memory

Memory Management Unit(or TLB)

The position and function of the MMU

Page-based VM

Virtual Memory

Divided into equalsized pages
A mapping is a translation between
- A page and a frame
- A page and null
Mappings defined at runtime – They can change
Address space can have holes
Process does not have to be contiguous in physical memory

Physical Memory

Divided into equal-sized frames

Typical Address Space Layout

Stack region is at top, and can grow down
Heap has free space to grow up
Text is typically read-only
Kernel is in a reserved, protected, shared region
0-th page typically not used

Page Faults

Referencing an invalid page triggers a page fault – An exception handled by the OS

Broadly, two standard page fault types

Illegal Address (protection error)
- Signal or kill the process
Page not resident
- Get an empty frame
- Load page from disk
- Update page (translation) table (enter frame #, set valid bit, etc.)
- Restart the faulting instruction

Shared Pages

Private code and data
- Each process has own copy of code and data
- Code and data can appear anywhere in the address space
Shared code
- Single copy of code shared between all processes executing it
- Code must not be self modifying
- Code must appear at same address in all processes

Page Table Structure

Page table is (logically) an array of frame numbers

Index by page number

Each page-table entry (PTE) also has other bits

Present/Absent bit
Also called valid bit, it indicates a valid mapping for the page
Modified bit
- Also called dirty bit, it indicates the page may have been modified in memory
Reference bit
- Indicates the page has been accessed
Protection bits
- Read permission, Write permission, Execute permission
- Or combinations of the above
Caching bit
- Use to indicate processor should bypass the cache when accessing memory
  - Example: to access device registers or memory

Address Translation

Every (virtual) memory address issued by the CPU must be translated to physical memory

Every load and every store instruction
Every instruction fetch

Need Translation Hardware

In paging system, translation involves replace page number with a frame number

Page Tables

Page tables are implemented as data structures in main memory
Most processes do not use the full 4GB address space
We need a compact representation that does not waste space
Three basic schemes
- Use data structures that adapt to sparsity
- Use data structures which only represent resident pages
- Use VM techniques for page tables (details left to extended OS)

Two-level Page Table

2nd –level page tables representing unmapped pages are not allocated – Null in the top-level page table

Alternative: Inverted Page Table

“Inverted page table” is an array of page numbers sorted (indexed) by frame number (it’s a frame table).

Algorithm

Compute hash of page number
Extract index from hash table
Use this to index into inverted page table
Match the PID and page number in the IPT entry
If match, use the index value as frame # for translation
If no match, get next candidate IPT entry from chain field
If NULL chain entry $\Rightarrow$ page fault

Properties of IPTs

IPT grows with size of RAM, NOT virtual address space
Frame table is needed anyway (for page replacement, more later)
Need a separate data structure for non-resident pages
Saves a vast amount of space (especially on 64-bit systems)
Used in some IBM and HP workstations

Improving the IPT: Hashed Page Table

Retain fast lookup of IPT – A single memory reference in best case
Retain page table sized based on physical memory size (not virtual)
- Enable efficient frame sharing
- Support more than one mapping for same frame

Sizing the Hashed Page Table

HPT sized based on physical memory size
With sharing
- Each frame can have more than one PTE
- More sharing increases number of slots used – Increases collision likelihood
However, we can tune HPT size based on:
- Physical memory size
- Expected sharing
- Hash collision avoidance.
HPT a power of 2 multiple of number of physical memory frame

VM Implementation Issue

Performance

Each virtual memory reference can cause two physical memory accesses

One to fetch the page table entry
One to fetch/store the data

$\Rightarrow$ Intolerable performance impact!!

Solution:

High-speed cache for page table entries (PTEs)
- Called a translation look-aside buffer (TLB)
- Contains recently used page table entries
- Associative, high-speed memory, similar to cache memory
- May be under OS control (unlike memory cache)

TLB

Translation Lookaside Buffer

Given a virtual address, processor examines the TLB
If matching PTE found (TLB hit), the address is translated
Otherwise (TLB miss), the page number is used to index the process’s page table
- If PT contains a valid entry, reload TLB and restart
- Otherwise, (page fault) check if page is on disk
  - If on disk, swap it in
  - Otherwise, allocate a new page or raise an exception

TLB properties

Page table is (logically) an array of frame numbers
TLB holds a (recently used) subset of PT entries
- Each TLB entry must be identified (tagged) with the page # it translates
- Access is by associative lookup:
  - All TLB entries’ tags are concurrently compared to the page #
  - TLB is associative (or content-addressable) memory
TLB may or may not be under direct OS control
- Hardware-loaded TLB
  - On miss, hardware performs PT lookup and reloads TLB
  - Example: x86, ARM
- Software-loaded TLB
  - On miss, hardware generates a TLB miss exception, and exception handler reloads TLB
  - Example: MIPS, Itanium (optionally)
TLB size: typically 64-128 entries
Can have separate TLBs for instruction fetch and data access
TLBs can also be used with inverted page tables (and others)

TLB and context switching

TLB is a shared piece of hardware
Normal page tables are per-process (address space)
TLB entries are process-specific
- On context switch need to flush the TLB (invalidate all entries)
  - high context-switching overhead (Intel x86)
- or tag entries with address-space ID (ASID)
  - called a tagged TLB
  - used (in some form) on all modern architectures
  - TLB entry: ASID, page #, frame #, valid and write-protect bits

TLB effect

Without TLB
- Average number of physical memory references per virtual reference
  
  = 2
With TLB (assume 99% hit ratio)
- Average number of physical memory references per virtual reference
  
  = 0.99 * 1+ 0.01 * 2
  
  = 1.01

Demand Paging/Segmentation

With VM, only parts of the program image need to be resident in memory for execution
Can transfer presently unused pages/segments to disk
Reload non-resident pages/segment on demand.
- Reload is triggered by a page or segment fault
- Faulting process is blocked and another scheduled
- When page/segment is resident, faulting process is restarted
- May require freeing up memory first
  - Replace current resident page/segment
  - How determine replacement “victim”?
- If victim is unmodified (“clean”) can simply discard it
  - This is reason for maintaining a “dirty” bit in the PT

Why does demand paging/segmentation work?

Program executes at full speed only when accessing the resident set.
TLB misses introduce delays of several microseconds
Page/segment faults introduce delays of several milliseconds

Answer

Less physical memory required per process
- Can fit more processes in memory
- Improved chance of finding a runnable one
Principle of locality

Principle of Locality

An important observation comes from empirical studies of the properties of programs.
- Programs tend to reuse data and instructions they have used recently.
- 90/10 rule – “A program spends 90% of its time in 10% of its code”
We can exploit this locality of references
An implication of locality is that we can reasonably predict what instructions and data a program will use in the near future based on its accesses in the recent past
Two different types of locality have been observed:
- Temporal locality: states that recently accessed items are likely to be accessed in the near future
- Spatial locality: says that items whose addresses are near one another tend to be referenced close together in time

Working Set

The pages/segments required by an application in a time window ($\Delta$)is called its memory working set.

Working set is an approximation of a programs’ locality

if $\Delta$ too small will not encompass entire locality
if $\Delta$ too large will encompass several localities
if $\Delta = \infin\Rightarrow$ will encompass entire program.
$\Delta$’s size is an application specific tradeoff

System should keep resident at least a process’s working set – Process executes while it remains in its working set

Working set tends to change gradually

Get only a few page/segment faults during a time window
Possible (but hard) to make intelligent guesses about which pieces will be needed in the future – May be able to pre-fetch page/segments

Thrashing

CPU utilisation tends to increase with the degree of multiprogramming – number of processes in system

Higher degrees of multiprogramming – less memory available per process

Some process’s working sets may no longer fit in RAM – Implies an increasing page fault rate

Eventually many processes have insufficient memory

Can’t always find a runnable process
Decreasing CPU utilisation
System become I/O limited

This is called thrashing

$ \sum$working set sizes > total physical memory size

Recovery From Thrashing

In the presence of increasing page fault frequency and decreasing CPU utilisation

Suspend a few processes to reduce degree of multiprogramming
Resident pages of suspended processes will migrate to backing store
More physical memory becomes available
Resume suspended processes later when memory pressure eases

VM Management Policies

Operation and performance of VM system is dependent on a number of policies:

Page table format (may be dictated by hardware)
- Multi-level
- Inverted/Hashed
Page size (may be dictated by hardware)
Fetch Policy
Replacement policy
Resident set size
- Minimum allocation
- Local versus global allocation
Page cleaning policy

Page Size

Increasing page size

❌ Increases internal fragmentation –  reduces adaptability to working set size

✅ Decreases number of pages – Reduces size of page tables

✅ Increases TLB coverage – Reduces number of TLB misses

❌ Increases page fault latency – Need to read more from disk before restarting process

✅ Increases swapping I/O throughput – Small I/O are dominated by seek/rotation delays

Optimal page size is a (work-load dependent) trade-off

Multiple page sizes provide flexibility to optimise the use of the TLB

Large page sizes can be use for code
Small page size for thread stacks

Most operating systems support only a single page size

Dealing with multiple page sizes is hard!

Fetch Policy

Determines when a page should be brought into memory

Demand paging only loads pages in response to page faults – Many page faults when a process first starts

Pre-paging brings in more pages than needed at the moment

✅ Improves I/O performance by reading in larger chunks

❌ Wastes I/O bandwidth if pre-fetched pages aren’t used

❌ Especially bad if we eject pages in working set in order to pre-fetch unused pages.

❌ Hard to get right in practice

Replacement Policy

Page removed should be the page least likely to be references in the near future
Most policies attempt to predict the future behaviour on the basis of past behaviour
Constraint: locked frames
- Kernel code
- Main kernel data structure
- I/O buffers
- Performance-critical user-pages (e.g. for DBMS)
Frame table has a lock (or pinned) bit

Optimal Replacement policy

Toss the page that won’t be used for the longest time

❌ Impossible to implement

❌ Only good as a theoretic reference point:The closer a practical algorithm gets to optimal, the better

FIFO Replacement Policy

First-in, first-out: Toss the oldest page

✅ Easy to implement

❌ Age of a page is isn’t necessarily related to usage

Least Recently Used (LRU)

Toss the least recently used page

Assumes that page that has not been referenced for a long time is unlikely to be referenced in the near future

⚪ Will work if locality holds

⚪ Implementation requires a time stamp to be kept for each page, updated on every reference

❌ Impossible to implement efficiently

⚪ Most practical algorithms are approximations of LRU

Clock Page Replacement

Clock policy, also called second chance.

Employs a usage or reference bit in the frame table.

Set to one when page is used

While scanning for a victim, reset all the reference bits

Toss the first page with a zero reference bit

Issue

How do we know when a page is referenced?

Use the valid bit in the PTE:

When a page is mapped (valid bit set), set the reference bit
When resetting the reference bit, invalidate the PTE entry
On page fault
- Turn on valid bit in PTE
- Turn on reference bit
We thus simulate a reference bit in software

Performance

It terms of selecting the most appropriate replacement, they rank as follows:

Optimal
LRU
Clock
FIFO

Resident Set Size

How many frames should each process have?

Fixed Allocation

Gives a process a fixed number of pages within which to execute.

Isolates process memory usage from each other

When a page fault occurs, one of the pages of that process must be replaced.

❌ Achieving high utilisation is an issue – Some processes have high fault rate while others don’t use their allocation.

Variable Allocation

Number of pages allocated to a process varies over the lifetime of the process

Global Scope

Operating system keeps global list of free frames

Free frame is added to resident set of process when a page fault occurs

If no free frame, replaces one from any process

✅ Easiest to implement

✅ Adopted by many operating systems

✅ Automatic balancing across system

❌ Does not provide guarantees for important activities

Local Scope

Allocate number of page frames to a new process based on

Application type
Program request
Other criteria (priority)

When a page fault occurs, select a page from among the resident set of the process that suffers the page fault

Re-evaluate allocation from time to time!

Cleaning Policy

Observation – Clean pages are much cheaper to replace than dirty pages

Demand cleaning

A page is written out only when it has been selected for replacement
High latency between the decision to replace and availability of free frame.

Precleaning

Pages are written out in batches (in the background, the pagedaemon)
Increases likelihood of replacing clean frames
Overlap I/O with current activity

Multiprocessor Systems

Amdahl’s law

Given a proportion P of a program that can be made parallel, and the remaining serial portion (1-P), speedup by using N processors$\frac{1}{(1-P)+\frac{P}{N}}$

Types of Multiprocessors (MPs)

UMA MP (Uniform Memory Access)
- Access to all memory occurs at the same speed for all processors.
NUMA MP (Non-uniform memory access)
- Access to some parts of memory is faster for some processors than other parts of memory

Bus Based UMA

Simplest MP is more than one processor on a single bus connect to memory
- Bus bandwidth becomes a bottleneck with more than just a few CPUs
Each processor has a cache to reduce its need for access to memory
- Hope is most accesses are to the local cache
- Bus bandwidth still becomes a bottleneck with many CPUs
- With only a single shared bus, scalability can be limited by the bus bandwidth of the single bus - Caching only helps so much
- Alternative bus architectures do exist.
  - They improve bandwidth available
  - Don’t eliminate constraint that bandwidth is limited

Multi-core Processor

(Multi-core) share the same Bus interface
Multiprocessors can

Increase computation power beyond that available from a single CPU
Share resources such as disk and memory

However

Assumes parallelizable workload to be effective
Assumes not I/O bound
Shared buses (bus bandwidth) limits scalability
- Can be reduced via hardware design
- Can be reduced by carefully crafted software behaviour
  - Good cache locality together with limited data sharing where possible

How do we construct an OS for a multiprocessor?

Each CPU has its own OS?

Statically allocate physical memory to each CPU
Each CPU runs its own independent OS
Share peripherals
Each CPU (OS) handles its processes system calls

Used in early multiprocessor systems to ‘get them going’

Simpler to implement
Avoids CPU-based concurrency issues by not sharing
Scales – no shared serial sections
Modern analogy, virtualisation in the cloud.

Issues

Each processor has its own scheduling queue
- We can have one processor overloaded, and the rest idle
Each processor has its own memory partition
- We can a one processor thrashing, and the others with free memory
  - No way to move free memory from one OS to another

Symmetric Multiprocessors (SMP)

OS kernel run on all processors
- Load and resource are balance between all processors
  - Including kernel execution

Issue: Real concurrency in the kernel

Need carefully applied synchronisation primitives to avoid disaster
One alternative: A single mutex that make the entire kernel a large critical section
- Only one CPU can be in the kernel at a time
- The “big lock” becomes a bottleneck when in-kernel processing exceeds what can be done on a single CPU
Better alternative: identify largely independent parts of the kernel and make each of them their own critical section
- Allows more parallelism in the kernel
Issue: Difficult task
- Code is mostly similar to uniprocessor code
- Hard part is identifying independent parts that don’t interfere with each other
  - Remember all the inter-dependencies between OS subsystems.
- Lock contention can limit overall system performance

Test-and-Set

Hardware guarantees that the instruction executes atomically on a CPU.
Atomically: As an indivisible unit.
The instruction can not stop half way through

Test-and-Set on SMP

It does not work without some extra hardware support

A solution:

Hardware blocks all other CPUs from accessing the bus during the TSL instruction to prevent memory accesses by any other CPU.

TSL has mutually exclusive access to memory for duration of instruction.

Test-and Set is a busy-wait synchronisation primitive * Called a spinlock

Issue:

Lock contention leads to spinning on the lock
- Spinning on a lock requires blocking the bus which slows all other CPUs down
  - Independent of whether other CPUs need a lock or no
  - Caching does not help reduce bus contention
Either TSL still blocks the bus
Or TSL requires exclusive access to an entry in the local cache

Reducing Bus Contention

Read before TSL
- Spin reading the lock variable waiting for it to change
- When it does, use TSL to acquire the lock
Allows lock to be shared read-only in all caches until its released
- no bus traffic until actual release
No race conditions, as acquisition is still with TSL.
Test and set performs poorly once there is enough CPUs to cause contention for lock
- Expected
Read before Test and Set performs better
- Performance less than expected
- Still significant contention on lock when CPUs notice release and all attempt acquisition
Critical section performance degenerates
- Critical section requires bus traffic to modify shared structure
- Lock holder competes with CPU that’s waiting as they test and set, so the lock holder is slower
- Slower lock holder results in more contention

Cache Consistency

Cache consistency is usually handled by the hardware.

Spinning versus Blocking and Switching

Spinning (busy-waiting) on a lock makes no sense on a uniprocessor
- The was no other running process to release the lock
- Blocking and (eventually) switching to the lock holder is the only sensible option.
On SMP systems, the decision to spin or block is not as clear.
- The lock is held by another running CPU and will be freed without necessarily switching away from the requestor

Spinning versus Switching

Blocking and switching

to another process takes time
- Save context and restore another
- Cache contains current process not new process
  - Adjusting the cache working set also takes time
- TLB is similar to cache
- Switching back when the lock is free encounters the same again

Spinning wastes CPU time directly

Trade off

If lock is held for less time than the overhead of switching to and back

$\Rightarrow$ It’s more efficient to spin
$\Rightarrow$ Spinlocks expect critical sections to be short
$\Rightarrow$ No waiting for I/O within a spinlock
$\Rightarrow$ No nesting locks within a spinlock

Preemption and Spinlocks

Critical sections synchronised via spinlocks are expected to be short
- Avoid other CPUs wasting cycles spinning
What happens if the spinlock holder is preempted at end of holder’s timeslice
- Mutual exclusion is still guaranteed
- Other CPUs will spin until the holder is scheduled again!!!!!

$\Rightarrow$ Spinlock implementations disable interrupts in addition to acquiring locks to avoid lock-holder preemption

Scheduling

The scheduler decides who to run next. This process is sheduling

Application behaviour

Bursts of CPU usage alternate with periods of I/O wait
a) CPU-Bound process

Spends most of its computing
Time to completion largely determined by received CPU time

b) I/O-Bound process

Spend most of its time waiting for I/O to complete
- Small bursts of CPU to process I/O and request next I/O
Time to completion largely determined by I/O request time
We need a mix of CPU-bound and I/O-bound processes to keep both CPU and I/O systems busy
Process can go from CPU- to I/O-bound (or vice versa) in different phases of execution

Key Insights

Choosing to run an I/O-bound process delays a CPU-bound process by very little
Choosing to run a CPU-bound process prior to an I/O-bound process delays the next I/O request significantly

No overlap of I/O waiting with computation
Results in device (disk) not as busy as possible
Generally, favour I/O-bound processes over CPU-bound processes
Generally, a scheduling decision is required when a process (or thread) can no longer continue, or when an activity results in more than one ready process

Preemptive versus Non-preemptive Scheduling

Non-preemptive

Once a thread is in the running state, it continues until it completes, blocks on I/O, or voluntarily yields the CPU
A single process can monopolised the entire system
Preemptive Scheduling (responsive system)
Current thread can be interrupted by OS and moved to ready state.
Usually after a timer interrupt and process has exceeded its maximum run time
Can also be as a result of higher priority process that has become ready (after I/O interrupt).
Ensures fairer service as single thread can’t monopolise the system
Requires a timer interrupt

Categories of Scheduling Algorithms

Batch Systems

No users directly waiting, can optimise for overall machine performance

Interactive Systems

Users directly waiting for their results, can optimise for users perceived performance

Realtime Systems

Jobs have deadlines, must schedule such that all jobs (predictably) meet their deadlines

Goals of Scheduling Algorithms

All Algorithms

Fairness
- Give each process a fair share of the CPU
Policy Enforcement
- What ever policy chosen, the scheduler should ensure it is carried out
Balance/Efficiency
- Try to keep all parts of the system busy
Interactive Algorithms
Minimise response time
- Response time is the time difference between issuing a command and getting the result
  - E.g selecting a menu, and getting the result of that selection
- Response time is important to the user’s perception of the performance of the system.
Provide Proportionality
- Proportionality is the user expectation that short jobs will have a short response time, and long jobs can have a long response time.
- Generally, favour short jobs
Real-time Algorithms
- Must meet deadlines
  - Each job/task has a deadline.
  - A missed deadline can result in data loss or catastrophic failure
    - Aircraft control system missed deadline to apply brakes
- Provide Predictability
  - For some apps, an occasional missed deadline is okay – E.g. DVD decoder
  - Predictable behaviour allows smooth DVD decoding with only rare skips

Interactive scheduling

Round Robin Scheduling

Each process is given a timeslice to run in
When the timeslice expires, the next process preempts the current process, and runs for its timeslice, and so on
- The preempted process is placed at the end of the queue
Implemented with
- A ready queue
- A regular timer interrupt

Pros – Fair, easy to implement
Con – Assumes everybody is equal

What should the timeslice be?

Too short

Waste a lot of time switching between processes
Example: timeslice of 4ms with 1 ms context switch = 20% round robin overhead

Too long

System is not responsive
Example: timeslice of 100ms – If 10 people hit “enter” key simultaneously, the last guy to run will only see progress after 1 second.
Degenerates into FCFS if timeslice longer than burst length

Priorities

Each Process (or thread) is associated with a priority
Provides basic mechanism to influence a scheduler decision:
- Scheduler will always chooses a thread of higher priority over lower priority
Priorities can be defined internally or externally
- Internal: e.g. I/O bound or CPU bound
- External: e.g. based on importance to the user
Usually implemented by multiple priority queues, with round robin on each queue
Con
- Low priorities can starve
  - Need to adapt priorities periodically
    - Based on ageing or execution history

Traditional UNIX Scheduler

Two-level scheduler

High-level scheduler schedules processes between memory and disk
Low-level scheduler is CPU scheduler
- Based on a multilevel queue structure with round robin at each level
The highest priority (lower number) is scheduled
Priorities are re-calculated once per second, and re-inserted in appropriate queue
Avoid starvation of low priority threads
Penalise CPU-bound threads
Priority = CPU_usage +nice +base
CPU_usage = number of clock ticks
Nice is a value given to the process by a user to permanently boost or reduce its priority ( Reduce priority of background jobs)
Base is a set of hardwired, negative values used to boost priority of I/O bound system activities

Single Shared Ready Queue

Pros

Simple
Automatic load balancing

Cons

Lock contention on the ready queue can be a major bottleneck
- Due to frequent scheduling or many CPUs or both
Not all CPUs are equal
- The last CPU a process ran on is likely to have more related entries in the cache

Affinity Scheduling

Basic Idea – Try hard to run a process on the CPU it ran on last time

One approach: Multiple Queue Multiprocessor Scheduling

Multiple Queue SMP Scheduling

Each CPU has its own ready queue
Coarse-grained algorithm assigns processes to CPUs
- Defines their affinity, and roughly balances the load
The bottom-level fine-grained scheduler:
- Is the frequently invoked scheduler (e.g. on blocking on I/O, a lock, or exhausting a timeslice)
- Runs on each CPU and selects from its own ready queue
- Ensures affinity
- If nothing is available from the local ready queue, it runs a process from another CPUs ready queue rather than go idle
- Termed “Work stealing”

✅ No lock contention on per-CPU ready queues in the (hopefully) common case
✅ Load balancing to avoid idle queues
✅ Automatic affinity to a single CPU for more cache friendly behaviour

I/O Management

I/O Devices

There exists a large variety of I/O devices:

Many of them with different properties
They seem to require different interfaces to manipulate and manage them
- We don’t want a new interface for every device
- Diverse, but similar interfaces leads to code duplication
Challenge:
- Uniform and efficient approach to I/O

Device Drivers

Drivers classified into similar categories

Block devices and character (stream of data) device

OS defines a standard (internal) interface to the different classes of devices

Example: USB Human Input Device (HID) class specifications
- human input devices follow a set of rules making it easier to design a standard interface.

Device drivers job

translate request through the device-independent standard interface (open, close, read, write) into appropriate sequence of commands (register manipulations) for the particular hardware
Initialise the hardware at boot time, and shut it down cleanly at shutdown

After issuing the command to the device, the device either

Completes immediately and the driver simply returns to the caller
Or, device must process the request and the driver usually blocks waiting for an I/O complete interrupt.

Drivers are thread-safe as they can be called by another process while a process is already blocked in the driver

Thead-safe: Synchronised…

Device-Independent I/O Software

There is commonality between drivers of similar classes

Divide I/O software into device-dependent and device-independent I/O software

Device independent software includes

Buffer or Buffer-cache management
TCP/IP stack
Managing access to dedicated devices
Error reporting

I/O Device Handling

Data rate

May be differences of several orders of magnitude between the data transfer rates

Driver $\Leftrightarrow$ Kernel Interface

Major Issue is uniform interfaces to devices and kernel

Uniform device interface for kernel code
Allows different devices to be used the same way
Allows internal changes to device driver with fear of breaking kernel code
Uniform kernel interface for device code
- Drivers use a defined interface to kernel services (e.g. kmalloc, install IRQ handler, etc.)
- Allows kernel to evolve without breaking existing drivers
Together both uniform interfaces avoid a lot of programming implementing new interfaces
- Retains compatibility as drivers and kernels change over time.

Interrupts

Devices connected to an Interrupt Controller via lines on an I/O bus (e.g. PCI)
Interrupt Controller signals interrupt to CPU and is eventually acknowledged.
Exact details are architecture specific.

I/O Interaction

Programmed I/O

Also called polling, or busy waiting
I/O module (controller) performs the action, not the processor
Sets appropriate bits in the I/O status register
No interrupts occur
Processor checks status until operation is complete
- Wastes CPU cycles

Interrupt-Driven I/O

Processor is interrupted when I/O module (controller) ready to exchange data
Processor is free to do other work
No needless waiting
Consumes a lot of processor time because every word read or written passes through the processor

Direct Memory Access (DMA)

Transfers data directly between Memory and Device
CPU not needed for copying
Transfers a block of data directly to or from memory
An interrupt is sent when the task is complete
The processor is only involved at the beginning and end of the transfer

DMA Considerations

✅ Reduces number of interrupts

Less (expensive) context switches or kernel entry-exits 

❌ Requires contiguous regions (buffers)

Copying
Some hardware supports “Scatter-gather”

Synchronous/Asynchronous

Shared bus must be arbitrated (hardware)

– CPU cache reduces (but not eliminates) CPU need for bus

I/O Management Software

I/O Software Layers

Interrupt Handlers

Interrupt handlers
- Can execute at (almost) any time
  - Raise (complex) concurrency issues in the kernel
  - Can propagate to userspace (signals, upcalls), causing similar issues
  - Generally structured so I/O operations block until interrupts notify them of completion kern/dev/lamebus/lhd.c

Interrupt Handler Steps

Save Registers not already saved by hardware interrupt mechanism
(Optionally) set up context for interrupt service procedure
Typically, handler runs in the context of the currently running process
- No expensive context switch
Set up stack for interrupt service procedure
- Handler usually runs on the kernel stack of current process
- Or “nests” if already in kernel mode running on kernel stack
Ack/Mask interrupt controller, re-enable other interrupts
- Implies potential for interrupt nesting.
Run interrupt service procedure
- Acknowledges interrupt at device level
- Figures out what caused the interrupt
  - Received a network packet, disk read finished, UART transmit queue empty
- If needed, it signals blocked device driver
In some cases, will have woken up a higher priority blocked thread
- Choose newly woken thread to schedule next.
- Set up MMU context for process to run next
- What if we are nested?
Load new/original process’ registers
Re-enable interrupt; Start running the new process

Sleeping in Interrupts

An interrupt generally has no

context

(runs on current kernel stack)
- Unfair to sleep on interrupted process (deadlock possible)
- Where to get context for long running operation?
- What goes into the ready queue?
What to do?
- Top and Bottom Half
- Linux implements with tasklets and workqueues
- Generically, in-kernel thread(s) handle long running kernel operations.

Top/Half Bottom Half

Top Half
- Interrupt handler
- remains short
Bottom half
- Is preemptable by top half (interrupts)
- performs deferred work (e.g. IP stack processing)
- Is checked prior to every kernel exit
- signals blocked processes/threads to continue
Enables low interrupt latency
Bottom half can’t block

Deferring Work on In-kernel Threads

Interrupt
- handler defers work onto in-kernel thread
In-kernel thread handles deferred work (DW)
- Scheduled normally
- Can block
Both low interrupt latency and blocking operations

Buffering

No Buffering

Process must read/write a device a byte/word at a time
- Each individual system call adds significant overhead
- Process must what until each I/O is complete
  - Blocking/interrupt/waking adds to overhead.
  - Many short runs of a process is inefficient (poor CPU cache temporal locality)

User-level Buffering

Process specifies a memory buffer that incoming data is placed in until it fills
- Filling can be done by interrupt service routine
- Only a single system call, and block/wakeup per data buffer
  - Much more efficient
Issues
- What happens if buffer is paged out to disk
  - Could lose data while unavailable buffer is paged in
  - Could lock buffer in memory (needed for DMA), however many processes doing I/O reduce RAM available for paging. Can cause deadlock as RAM is limited resource
- Consider write case
  - When is buffer available for re-use?
    - Either process must block until potential slow device drains buffer
    - or deal with asynchronous signals indicating buffer drained

Single Buffer

Operating system assigns a buffer in kernel’s memory for an I/O request
In a stream-oriented scenario
- Used a line at time
- User input from a terminal is one line at a time with carriage return signaling the end of the line
- Output to the terminal is one line at a time
Block-oriented
- Input transfers made to buffer
- Block copied to user space when needed
- Another block is written into the buffer
  - Read ahead
User process can process one block of data while next block is read in
Swapping can occur since input is taking place in system memory, not user memory
Operating system keeps track of assignment of system buffers to user processes
Assume
- T is transfer time for a block from device
- C is computation time to process incoming block
- M is time to copy kernel buffer to user buffer
Computation and transfer can be done in parallel
Speed up with buffering
What happens if kernel buffer is full
- the user buffer is swapped out, or
- The application is slow to process previous buffer and more data is received???
- => We start to lose characters or drop network packets

Double Buffer

Use two system buffers instead of one
A process can transfer data to or from one buffer while the operating system empties or fills the other buffer

Double Buffer Speed Up

Computation and Memory copy can be done in parallel with transfer
Speed up with double buffering
Usually M is much less than T giving a favourable result
May be insufficient for really bursty traffic
- Lots of application writes between long periods of computation
- Long periods of application computation while receiving data
- Might want to read-ahead more than a single block for disk

Circular Buffer

More than two buffers are used
Each individual buffer is one unit in a circular buffer
Used when I/O operation must keep up with process

Notice that buffering, double buffering, and circular buffering are all Bounded-Buffer Producer-Consumer Problems

Reference:
UNSW COMP3231 2020 lecture slides

#OS

Operating Systems

Operating System Overview

Role

Role 1

Role 2

Structural (Implementation) View

Operating System Kernel

The Operating System is Privileged

The Structure of a Computer System

A Note on System Libraries

Privilege-less OS

Operating System Software

Operating System Internal Structure

Processes and Threads

Major Requirements of an Operating System

Processes and Threads

Process

The Process Model

Process and thread models of selected OSes

Process Creation

Process Termination

Implementation of Processes

Process/Thread States

Scheduler

What about blocked processes

Implementation

Thread

The Thread Model

Thread Usage

Why Threads?

User-level Threads

Kernel-level threads

Context Switch

Context Switch Occurrence

Context Switch

Concurrency and Synchronisation

Critical Region

Critical regions solutions

The producer-Consumer Problem

Semaphores

Monitors

The Readers and Writers Problem

Deadlock

Resources

Resources & Deadlocks

Resource Access

Deadlock

Four Conditions for Deadlock

Strategies for dealing with Deadlocks

Starvation

System Call

System Calls

A brief overview

Process Management

File Management

A stripped down shell:

System Call Implementation

A Simple Model of CPU Computation

A Simple Model of CPU Computation

Privileged-mode Operation

System Call

System Call Mechanism Overview

Steps in Making a System Call

File Management

Overview of the FS abstraction

File Names

File Structure

File Types

File Access Types(Patterns)

Typical File Operations

File Organisation and Access

Criteria for File Organization

File Directories

Hierarchical (Tree-Structured) Directory

Current Working Directory

Relative and Absolute Pathnames

Typical Directory Operations

Nice properties of UNIX naming

File Sharing

File System