Threads and synchronization

Introduction to threads and synchronization.

Definitions

Concurrency

Concurrency refers to the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. ¹

Two tasks are concurrent if the order in which the two tasks are executed in time is not predetermined.
Tasks are actually split up into chunks that share the processor with chunks from other tasks by interleaving the execution in a time-slicing way. Tasks can start, run, and complete in overlapping time periods.

Process

To run a program the operating system must allocate memory and creates a process. A processes has a stack, a heap, a data segment and a text segment in user space. The CPU context and file descriptor table are kept in kernel space. When executing the program, the program counter (PC) jumps around in the text segment.

Threads

A process can have one or more threads of execution. Threads share heap, data segment, text segment and file descriptor table but have separate stacks and CPU contexts. Each thread has a private program counter and threads executes concurrently. Depending on how threads are implemented, the CPU contexts can be stored in user space or kernel space. In the below figure a process with three threads is shown.

Concurrent tasks

If not explicitly stated otherwise, the term task will be used to refer to a concurrent unit of execution such as a thread or a process.

Atomic operations

In concurrent programming, an operation (or set of operations) is atomic if it appears to the rest of the system to occur at once without being interrupted. Other words used synonymously with atomic are: linearizable, indivisible or uninterruptible. ²

Non-atomic operations

Incrementing and decrementing a variable are examples of non-atomic operations. The following C expression,

X++;

, is translated by the compiler to three instructions:

load X from memory into a CPU register.
increment X and save result in a CPU register.
store result back to memory.

Similarly, the following C expression,

X--;

, is translated by the compiler to three instructions:

load X from memory into a CPU register.
decrement X and save result in a CPU register.
store the new value of X back to memory.

Race condition

A race condition or race hazard is the behaviour of an electronic, software or other system where the output is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when events do not happen in the intended order. The term originates with the idea of two signals racing each other to influence the output first. ³

Data race

A data race occurs when two instructions from different threads access the same memory location and:

at least one of these accesses is a write
and there is no synchronization that is mandating any particular order among these accesses.

Critical section

Concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed are protected. This protected section is the critical section or critical region. Typically, the critical section accesses a shared resource, such as a data structure, a peripheral device, or a network connection, that would not operate correctly in the context of multiple concurrent accesses.⁴

Mutual exclusion (mutex)

In computer science, mutual exclusion is a property of concurrency control, which is instituted for the purpose of preventing race conditions; it is the requirement that one thread of execution never enter its critical section at the same time that another concurrent thread of execution enters its own critical section.⁵

Often the word mutex is used as a short form of mutual exclusion.

Clone repository

Before you continue, you must clone the threads-synchronization-deadlock repository.

Use the git command

From the terminal, navigate to a directory where you want the cloned directory to be created and execute the following command.

git clone https://github.com/os-assignments/threads-synchronization-deadlock.git

Now you should see something similar to this in the terminal.

Cloning into 'threads-synchronization-deadlock'...
remote: Counting objects: 23, done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 23 (delta 1), reused 23 (delta 1), pack-reused 0
Unpacking objects: 100% (23/23), done.

Use the tree command

To get an overview of the cloned repository, use the tree -d command.

tree -d threads-synchronization-deadlock

Now you should see a tree view of the directory strucure.

threads-synchronization-deadlock
├── examples
│   ├── bin
│   ├── obj
│   └── src
├── higher-grade
│   ├── bin
│   ├── obj
│   └── src
└── mandatory
│   ├── bin
│   ├── obj
│   ├── psem
│   └── src
└── psem

14 directories

Install tree on macOS

If you run macOS and tree is not installed, use Homebrew to install tree.

brew install tree

The all utility

In the threads-synchronization-deadlock directory you find the all utility script.

The all utility can be used to run a shell command in all subdirectories, i.e., run a command in all of the three directories examples, higher grade and mandatory. In the terminal, navigate to the threads-synchronization-deadlock directory.

For example, you can use the all utility together with make to compile all programs.

./all make

The all utility can also be used to delete all objects files and executables.

./all make clean

Mutual exclusion

Mandatory assignment

In this assignment you will study different methods to enforce mutual exclusion.

Linux or macOS on Intel CPUs only

This assignment make use of atomic operations available only on computers with Intel CPUs running Linux or macOS.

The department Linux system

If you are working on Windows or on Linux or macOS but with a non-Intel CPU, you can use the department Linux system for this assignment. You may still use your private computer to access the department Linux system using SSH but make sure to log in to one of the Linux hosts.

Overview

File to use: mandatory/src/mutex.c
Description: In this program, a process creates a number of threads and waits for their termination. The threads are divided into two sets I (increment) and D (decrement). A shared variable which works as a counter is initialized to 0 and then incremented by the threads in set I and decremented by the threads in set D. The sum of all the increments is equal to the absolute value of the sum of all the decrements.
Example: There are five threads in set I, each incrementing the counter 20 times by 2. In total the counter will be incremented by 5*20*2 = 200. There are two threads in set D, each decrementing the counter 20 times by 5. In total the counter will be decremented by 2*20*5 = 200.
Desired behavior: When all threads are done executing the value of the shared counter should be 0 (the same as value it was initialized with).

A collection of tests

The provided program will execute 4 different tests. For each test, a different approach to synchronization will be used by the threads updating the shared counter variable.

Compile and run

In the terminal, navigate to the mandatory directory. Use make to compile the program.

make

The executable will be named mutex and placed in the bin sub directory. Run the program from the terminal.

./bin/mutex

Summary of all the tests

At the end of the program output, a summary of the four tests will be printed.

Questions

Run the program several times and study the provided source code.

Does the program work according to the specification?
Can you predict the output before execution? Why?

Identify the critical sections

Identify the critical sections in the program. The critical sections should be as small as possible.

Mutual exclusion

Your task is to make sure that at any time, at most one thread execute in a critical section, i.e., access to the critical sections should be mutual exclusive. You will use three different methods to ensure that updates to the shared counter variable appear to the system to occur instantaneously, i.e., make sure updates are atomic.

Test 0 - no synchronization

In test 0 the threads uses the functions inc_no_syncx and dec_no_sync to update the shared variable counter. Updates are done with no synchronization.

You don’t have to change or add any code to these two functions. This test is used to demonsrate the effects of unsynchronized updates to shared variables.

Test 1 - pthread mutex locks

The first synchronization option you will study is to use the mutex lock provided by the pthreads library to enforce mutual exclusion when accessing the shared counter variable. Add the needed synchronization in the functions inc_mutex and dec_mutex to make the program execute according to the desired behavior.

Reference

Pthreads Manual: Mutexes

Test 2 - Spin lock using test-and-set

An alternative to mutex locks is to use the atomic test-and-set built-in provided by the GCC compiler to implement a spinlock.

Linux or macOS on Intel CPUs

This part of the assignment make use of atomic operations available only on computers with Intel CPUs running Linux or macOS.

You will use the following function to access the underlying test-and-set operation.

int __sync_lock_test_and_set(int *lock, int value): This atomic operation sets *lock to value and returns the previous contents of *lock.

At the beginning of mutex.c you find the following declaration.

/* Shared variable used to implement a spinlock */
volatile int lock = false;

Todo:

Use the shared lock variable to implement the spin_lock() and spin_unlock() operations.
Change the functions inc_tas_spinlock and dec_tas_spinlock to use use the test-and-set spinlock to enforce mutual exclusion.

Reference

GCC Manual (version 5.4.0): Atomic built-ins

Test 3 - atomic addition and subtraction

A third option is to use the atomic addition and subtraction GCC built-ins.

Linux or macOS on Intel CPUs

This part of the assignment make use of atomic operations available only on computers with Intel CPUs running Linux or macOS.

You will use the following functions to access the underlying atomic increment and decrement operations.

int __sync_fetch_and_add(int *counter, int n): This atomic operation increments *counter by n and returns the previous value of *counter.
int __sync_fetch_and_sub(int *counter, int n): This atomic operation decrements *counter by n and returns the previous value of *counter.

Change the functions inc_atomic and dec_atomic to use atomic addition and subtraction to make the program execute according to the desired behavior.

Reference

GCC Manual (version 5.4.0): Atomic built-ins

Evaluation

Study the test summary table printed after all tests have completed.

Make sure all test but test 0 (no synchronization) are successful.
Compare the execution times for the 4 tests.
- Which method of synchronization is the fasted?
- Which method of synchronization is the slowest?
- Which method of synchronization is neither the fastest, nor the slowest?
- Why do you think the speed of the three methods of synchronization are ordered like this?

Code grading questions

Here are a few examples of questions that you should be able to answer, discuss and relate to the source code of you solution during the code grading.

Explain the following concepts and relate them to the source code and the behavior of the program.

Critical section.
Mutual exclusion (mutex).
Race condition.
Data race.

Locks and semaphores:

What is the purpose of mutex locks?
If you had to make a choice between using a semaphore or a mutex to enforce mutex, what would you recommend and why?
How do you construct a spinlock using the atomic test-and-set instruction?

Performance analysis:

Discuss and analyze the results in test summary table.

Portable semaphores

Linux and macOS natively supports different types of semaphores. In order to have a single (simple) semaphore API that works on both Linux and macOS you will use a portable semaphore library named psem.

In the psem/psem.h header file you find the documentation of the portable semaphore API.

Example usage

A small example program that demonstrates how to use the sempaphores provided by the psem library.

#include "psem.h"

int main(void) {
  // Create a new semaphore and initialize the semaphore counter to 3. 
  psem_t *sem = psem_init(3);

  // Wait on the semaphore. 
  psem_wait(sem);

  // Signal on the semaphore. 
  psem_signal(sem);

  // Destroy the semaphore (deallocation)
  psem_destroy(sem)
}

Example program

In mandatory/src/psem_test.c you find a complete example program with two threads (main and a pthread) that synchronize their execution using a psem semaphore.

Compile

In the terminal, navigate to the examples directory. Use make to compile the program.

make

Run

Run the program from the terminal.

./bin/psem_test

Two thread barrier synchronization

Mandatory assignment

Background

A mutex lock is meant to first be taken and then released, always in that order, by each task that uses the shared resource it protects. In general, semaphores should not be used to enforce mutual exclusion. The correct use of a semaphore is to signaling from one task to another, i.e., a task either signal or wait on a semaphore, not both.

In this assignment you will solve the two thread barrier problem using semaphores. This problem clearly demonstrates the type of synchronization that can be provided by semaphores.

Supported platforms

The provided code has been developed and tested on the department Linux system and macOS. Most likely you will be able to use any reasonable modern version of Linux.

Overview

File to use: mandatory/src/two_thread_barrier.c
Description: In this program, a process creates two threads A and B, and waits for their termination. Each thread performs N iterations. In each iteration the threads print their name (A or B) and sleeps for some random amount of time. For each iteration the order between the threads should not be restricted.

Compile and run

In the terminal, navigate to the mandatory directory. Use make to compile the program.

make

The executable will be named barrier and placed in the bin sub directory. Run the program from the terminal.

./bin/two_thread_barrier

Questions

Run the programs several times and study the source code. Think about the following questions.

Could you predict the output before execution? Why?
Adjust the sleeping duration of one thread (or both threads) to have a different output (ie. another interleaving of the tasks traces). Could you predict this output? Why?

Rendezvous

Rendezvous or rendez-vous (French pronunciation: [ʁɑ̃devu]) refers to a planned meeting between two or more parties at a specific time and place. ¹

Barrier synchronization (desired behavior)

We now want to put in a barrier in each thread that enforces the two threads to have rendezvous at the barrier in each iteration.

For each iteration the order between the threads should not be restricted. The first thread to reach the barrier must wait for the other thread to also reach the barrier. Once both threads have reached the barrier the threads are allowed to continue with the next iteration. Examples of execution traces:

AB BA BA AB BA (valid barrier synchronization)
AB BB AB AA AB (invalid synchronization)

Lock-step and rendezvous

Other names for barrier synchronization are lock-step and rendezvous.

Two thread barrier implementation

Your task is to use semaphores to enforce barrier synchronization between the two threads A and B.

Before you continue think about the following questions.

What needs to be initialized?
What needs to be done by each thread inside the barrier?

Compile and run

Compile:

make

, and run the program:

./bin/two_thread_barrier

Example incorrect synchronization

This is an example of an invalid execution trace when running the two_thread_barrier program.

Two threads A and B doing 5 iterations each in lockstep.

Iteration 0

  A
  B

Iteration 1

  B
  A

Iteration 2

  B
  B <===== ERROR: should have been A

Example of correct synchronization

This is an example of a valid execution trace when running the two_thread_barrier program.

Two threads A and B doing 5 iterations each in lockstep.

Iteration 0

  A
  B

Iteration 1

  A
  B

Iteration 2

  B
  A

Iteration 3

  A
  B

Iteration 4

  A
  B

SUCCESS: All iterations done in lockstep!

Configuration

In the beginning of two_thread_barrier.c you find the following definitions.

#define ITERATIONS      10  // Number of iterations each thread will execute. 
#define MAX_SLEEP_TIME  3   // Max sleep time (seconds) for each thread.

Experiment by changing the number of iterations and the max sleep time.

Does the output change?
What are the possible outputs?

Portable semaphores

Use the psem semaphores to enforce rendezvous between the two threads. This is how you declare and initialize a portable psem semaphore.

psem_t *semaphore;

semaphore = psem_init(0);

Number of semaphores

How many semaphores are needed to enforce barrier synchronization between two threads?

Can this be achieved using a single semaphore?
Do you need to use two semaphores?
Do you need more than two semaphores?

Declare global semaphore variables

For simplicity, declare all psem_t semaphore variables globally.

Initial semaphore values

Initialize your semaphore(s) in the beginning of main(). Should the semaphore(s) be initialized with counter value:

0?
1?
some other value?

Destroy semaphores after use

At the end of main(), don’t forget to destroy any semaphores you have initialized.

Automatic error detection

Instead of printing their identity (A or B) directly, the threads uses the provided trace function. In each iteration thread A calls trace('A') and thread B calls trace('B').

The trace functions prints the thread identity and keeps track of the next valid thread identity in the traced sequence. If the wrong thread identity is traced an error is reported and the process is terminated.

Code grading questions

Here are a few examples of questions that you should be able to answer, discuss and relate to the source code of you solution during the code grading.

How are mutex locks different compared to semaphores?
In general, what will happen when you wait on a semaphore?
In general, What will happen when you signal on a semaphore?
Explain the barrier synchronization concept.
How can semaphores be used to enforce barrier synchronization between two threads?
Why can’t mutex locks be used as drop-in replacements for the semaphores in your solution with lock instead of wait and unlock instead of signal?

Wikipedia - Rendezvous ↩︎

Simple bank simulation

Optional assignment for higher grade (1 point)

New assignment

This is a new 1 point higher grade assignment replacing the N Thread barrier assignment.

In this higher grade assignment you will do a simple simulation of money transfers between bank accounts.

Git pull

In order to get the new source files and the updated Makefile for this new assignment, you need to pull the updates from the GitHub repo.

git pull

Overview

In the higher-grade/src directory you find the following files.

bank.h: Header file with the API of the simple bank.
bank.c: The implementation of the bank API.
bank_test.c: A program testing the implemented bank API.

The bank API

The simple bank API is defined in bank.h.

A single bank account is represented by the following C structure.

typedef struct {
  int balance;

} account_t;

What more is needed to be added to the structure?

One ore more pthread mutex locks?
Anything else?

These are the functions in the simple bank API.

/**
 * account_new()
 *   Creates a new bank account.
 *
 * balance
 *   The initial balance for the new account.
 *
 * NOTE: You may need to add more parameters.
 *
 */
account_t* account_new(unsigned int balance);

/**
 * account_destroy()
 *   Destroys a bank account, freeing the resources it might hold.
 *
 * account:
 *   Pointer to a bank account.
 *
 */
void account_destroy(account_t* account);

/**
 * transfer()
 *   Attempts to transfer money from one account to another.
 *
 * amount:
 *   Amount to transfer.
 *
 * from:
 *   The account to transfer money from. Money should only be transferred if
 *   thee are sufficient funds.
 *
 * to:
 *   The account to transfer the money to.
 *
 * return value:
 *   Return -1 if not sufficient funds in the from account. Otherwise returns 0.
 *
 */
int transfer(int amount, account_t* from, account_t* to);

Add synchronization

You must add the needed synchronization.

Prevent deadlocks

You must make sure to prevent deadlocks.

Pthread mutex locks

This is an example of how you can declare and initialize a Pthread mutex lock.

pthread_mutex_t mutex;

if (pthread_mutex_init(&mutex, NULL) < 0) {
  perror("Init mutex lock");
  exit(EXIT_FAILURE);
}

Testing the bank API implementation

In bank_test.c you find a working program testing your implementation.

Compile and run the test of the simple bank API.

Compile:

make

, and run the test program:

./bin/bank_test

Example of correct synchronization

This is an example showing two rounds of correct synchronization.

Account Amount
------------------
0	      500
1	      0
2	      0
3	      200
4	      0
5	      0
6	      0
7	      0
8	      0
9	      0
------------------
     Sum: 700

Round 1 of 100

From  Amount    To  Result

7 ---- 200 ---> 8   Insufficient funds
1 ---- 200 ---> 7   Insufficient funds
2 ---- 200 ---> 7   Insufficient funds
0 ---- 150 ---> 4   Ok
3 ---- 200 ---> 0   Ok

Account Amount
------------------
0	      550
1	      0
2	      0
3	      0
4	      150
5	      0
6	      0
7	      0
8	      0
9	      0
------------------
     Sum: 700

Total amount of money was initially 700 and is now 700.

System invariant (conservation of money) not broken.

Example of incorrect synchronization

In this example, everything looks ok in round 4, but in round 5 a race condition is detected.

Round 4 of 100

From  Amount    To  Result

8 ---- 050 ---> 3   Insufficient funds
1 ---- 200 ---> 7   Insufficient funds
4 ---- 100 ---> 6   Insufficient funds
2 ---- 150 ---> 6   Insufficient funds
4 ---- 150 ---> 7   Insufficient funds

Account Amount
------------------
0	      500
1	      50
2	      0
3	      50
4	      0
5	      0
6	      100
7	      0
8	      0
9	      0
------------------
     Sum: 700

Total amount of money was initially 700 and is now 700.

System invariant (conservation of money) not broken.

Round 5 of 100

From  Amount    To  Result

4 ---- 250 ---> 9   Insufficient funds
9 ---- 150 ---> 3   Insufficient funds
8 ---- 100 ---> 0   Insufficient funds
0 ---- 100 ---> 5   Ok
0 ---- 100 ---> 9   Ok

Account Amount
------------------
0	      400
1	      50
2	      0
3	      50
4	      0
5	      100
6	      100
7	      0
8	      0
9	      100
------------------
     Sum: 800

Total amount of money was initially 700 and is now 800.

RACE CONDITION: System invariant (conservation of money) broken!

Deadlock

You must make sure the simulation never deadlocks. If a deadlock occur, the simulation will halt, for example like this.

Round 57 of 100

From  Amount    To  Result

1 ---- 250 ---> 7   Insufficient funds
6 ---- 150 ---> 3   Insufficient funds
6 ---- 100 ---> 3   Insufficient funds
9 ---- 100 ---> 7   Insufficient funds
8 ---- 100 ---> 5   Ok

Account Amount
------------------
0	      50
1	      50
2	      50
3	      50
4	      0
5	      100
6	      0
7	      100
8	      300
9	      0
------------------
     Sum: 700

Total amount of money was initially 700 and is now 700.

System invariant (conservation of money) not broken.

Round 58 of 100

From  Amount    To  Result

0 ---- 200 ---> 1   Insufficient funds
2 ---- 050 ---> 7   Ok
3 ---- 200 ---> 7   Insufficient funds

Bounded buffer

Mandatory assignment

Introduction

The bounded-buffer problems (aka the producer-consumer problem) is a classic example of concurrent access to a shared resource. A bounded buffer lets multiple producers and multiple consumers share a single buffer. Producers write data to the buffer and consumers read data from the buffer.

Producers must block if the buffer is full.
Consumers must block if the buffer is empty.

Preparations

Among the slides you find self study material about classic synchronization problems. In these slides the bounded buffer problem is described. Before you continue, make sure to read the self study slides about the bounded buffer problem.

Synchronization

A bounded buffer with capacity N has can store N data items. The places used to store the data items inside the bounded buffer are called slots. Without proper synchronization the following errors may occur.

The producers doesn’t block when the buffer is full.
Two or more producers writes into the same slot.
The consumers doesn’t block when the buffer is emtpy.
A Consumer consumes an empty slot in the buffer.
A consumer attempts to consume a slot that is only half-filled by a producer.
Two or more consumers reads the same slot.
And possibly more …

Overview of the provided source code

You will implement the bounded buffer as an abstract datatype. These are the files you will use to implement and test your implementation.

bounded_buffer.h: The internal representation of the bounded buffer and the public API is defined in mandatory/src/bounded_buffer.h.
bounded_buffer.c: The implementation of the API will be in the mandatory/src/bounded_buffer.c file.
bounded_buffer_test.c: To test your various aspect of your implementation a collection of tests are provided in the mandatory/src/bounded_buffer_test.c file. Here you can add your own tests if you want.
bounded_buffer_stress_test.c: The mandatory/src/bounded_buffer_test.c program make it easy to test the implementation for different sizes of the buffer and different numbers of producers and consumers.

Supported platforms

The provided code has been developed and tested on the department Linux system and macOS. Most likely you will be able to use any resonable modern version of Linux.

Semaphores

You will us the psem semaphores to synchronize the bounded buffer.

Data structures

To implement the bounded buffer you will use two C structures and an array of C structures.

Tuple

The buffer will store tuples (pairs) with two integer elements a and b. A tuple is represented by the following C struct.

typedef struct {
  int a;
  int b;
} tuple_t;

Buffer array

To implement the bounded buffer a finite size array in memory is shared by a number for producer and consumer threads.

Producer threads “produce” an item and place the item in the array.
Consumer threads remove an item from the array and “consume” it.

In addition to the shared array, information about the state of the buffer must also be shared.

Which slots in buffer are free?
To which slot should the next data item be stored?
Which parts of the buffer are filled?
From which slot should the next data item be read?

In the below example three data items B, C and D are currently in the buffer. On the next write data will be written to index in = 4 . On the next read data will be read from index out = 1.

Buffer struct

The buffer is represented by the following C struct.

typedef struct {
  tuple_t *array;
  int     size;
  int     in;
  int     out;
  psem_t  *mutex;
  psem_t  *data;
  psem_t  *empty;
} buffer_t;

The purpose of the struct members are described below.

tuple_t *array: This pointer will point to a dynamically allocated array of buffer items of type tuple_t.
int size: The number of elements in the array, i.e., the size of the buffer.
int in: The index in the array where the next item should be produced.
int out: The index in the array from where the next item should be consumed.
psem_t *mutex: A binary semaphore used to enforce mutual exclusive updates to the buffer.
psem_t *data: A counting semaphore used to count the number of items in the buffer. This semaphore will be used to block consumers if the buffer is empty.
psem_t *empty: A counting semaphore used to count the number of empty slots in the buffer. This semaphore will be used to block producers if the buffer is full.

Critical sections and mutual exclusion

All updates to the buffer state must be done in a critical section. More specifically, mutual exclusion must be enforced between the following critical sections:

A producer writing to a buffer slot and updating in.
A consumer reading from a buffer slot and updating out.

A binary semaphore can be used to protect access to the critical sections.

Synchronize producers and consumers

Producers must block if the buffer is full. Consumers must block if the buffer is empty. Two counting semaphores can be used for this.

Use one semaphore named empty to count the empty slots in the buffer.

Initialise this semaphore to N.
A producer must wait on this semaphore before writing to the buffer.
A consumer will signal this semaphore after reading from the buffer.

Use one semaphore named data to count the number of data items in the buffer.

Initialise this semaphore to 0.
A consumer must wait on this semaphore before reading from the buffer.
A producer will signal this semaphore after writing to the buffer.

Initialization example

A new bounded buffer with 10 elements will be represented as follows.

The empty semaphore counts the number of empty slots in the buffer and is initialized to 10. Initially there are no data element to be consumed in the buffer and the data semaphore is initialized to 0. The mutex semaphore will be used to enforece mutex when updating the buffer and is initialized to 1.

Implementation

To complete the implementation you must add code at a few places.

Pointers to C structs

Make sure you know how to use pointers to C strutcs before you continue.

Step by step workflow

You will complete the implementaion step by step.

For each step you will need to add code to a single function in the mandatory/src/bounded_buffer.c source file. For each step there is also a test in the mandatory/src/bounded_buffer_test.c source file that should pass without any failed assertions.

In the terminal, navigate to the mandatory directory. Use make to compile the program.

make

Run the test(s).

./bin/bounded_buffer_test

An example of a failed test where the assertion that the buffer size should be 10 fails.

==== init_test ====

Assertion failed: (buffer.size == 10), function init_test, file src/bounded_buffer_test.c, line 24.

When a test passes you will see the following output.

Test SUCCESSFUL :-)

Step 1 - Buffer initialization

You must complete the buffer initialization.

Function: buffer_init
Test: init_test

Make sure to initialize all members of the buffer_t struct.

Step 2 - Buffer destruction

You must complete the buffer destruction.

Function: buffer_destroy
Test: destroy_test

Deallocate all resources allocated by buffer_init().

Use free() to dealloacte the buffer array.
Use psem_destroy() to deallocate all semaphores.
Set all pointers to NULL after dealloacation to avoid dangling pointers.

Step 3 - Print the buffer

Function: buffer_print
Test: print_test

Add code to print all elements of the buffer array.

Step 4 - Put an item into the buffer

Function: buffer_put
Test: put_test

Here you need to:

add the needed synchronization
update the buffer->in index make sure it wraps around.

Step 5 - Get an item out of the buffer

Function: buffer_get
Test: get_test

Here you need to:

add the needed synchronization
update the buffer->out index make sure it wraps around.

Step 6 - Concurrent puts and gets

For this step you only need to run the concurrent_put_get_test. If the test doesn’t terminate you most likely have made a mistake with the synchronization of buffer_put and/or buffer_get that result in a deadlock.

Stress testing

It is now time to test the bounded buffer with multiple concurrent producers and consumers. In mandatory/src/bounded_buffer_stress_test.c you find a complete test program that creates:

a bounded buffer of size s
p producer threads, each producing n items into the buffer
c consumer threads, each consuming m items from the buffer

For the test program to terminate, the total number of consumed items (c*m) must equal the total number of items produced (p*n). Run the stress test.

./bin/bounded_buffer_stress_test

Default stress test

The default stress test will now be exuded. On success you should see output similar to the following.

Test buffer of size 10 with:

 20 producers, each producing 10000 items.
 20 consumers, each consuming 10000 items.

Verbose: false

The buffer when the test ends. 

---- Bounded Buffer ----

size: 10
  in: 0
 out: 0

array[0]: (7, 9994)
array[1]: (15, 9999)
array[2]: (13, 9998)
array[3]: (4, 9999)
array[4]: (7, 9995)
array[5]: (13, 9999)
array[6]: (7, 9996)
array[7]: (7, 9997)
array[8]: (7, 9998)
array[9]: (7, 9999)

---------------------draft: false
---


====> TEST SUCCESS <====

Note the ====> TEST SUCCESS <==== at the end.

Overriding the default parameters

The default values for test parameters can be overridden using the following flags.

Flag	Description	Argument
-s	Size of the buffer	positive integer
-p	Number of producer threads	positive integer
-n	Number of items produced by each producer	positive integer
-c	Number of consumer threads	positive integer
-m	Number of items consumed by each consumer	positive integer
-v	Turns on verbose output	no argument

In the following example, the default buffer size 10 is changed to 3.

./bin/bounded_buffer_stress_test -s 3

Experiment with different values for the test parameters.

Code grading questions

Here are a few examples of questions that you should be able to answer, discuss and relate to the source code of you solution during the code grading.

What do we mean by a counting semaphore?
What happens when you do wait on counting semaphore?
What happens when you do signal on a counting semaphore?
Explain how producers and consumers are synchronized in order to:
- block consumers if the buffer is empty
- block producers if the buffer is full.
Explain why mutex locks cannot be used to synchronize the blocking of consumers and producers.
Explain why you must ensure mutual exclusive (mutex) when updating the the buffer array and the in and out array indexes.
Explain how you achieve mutual exclusion (mutex) when updating the the buffer array and the in and out array indexes.

Implementing threads

A thread library provides programmers with an API for creating and managing threads. Support for threads must be provided either at the user level or by the kernel.

Kernel level threads are supported and managed directly by the operating system.
User level threads are supported above the kernel in user space and are managed without kernel support.

Kernel level threads

Kernel level threads are supported and managed directly by the operating system.

The kernel knows about and manages all threads.
One process control block (PCP) per process.
One thread control block (TCB) per thread in the system.
Provide system calls to create and manage threads from user space.

Advantages

The kernel has full knowledge of all threads.
Scheduler may decide to give more CPU time to a process having a large number of threads.
Good for applications that frequently block.

Disadvantages

Kernel manage and schedule all threads.
Significant overhead and increase in kernel complexity.
Kernel level threads are slow and inefficient compared to user level threads.
Thread operations are hundreds of times slower compared to user-level threads.

User level threads

User level threads are supported above the kernel in user space and are managed without kernel support.

Threads managed entirely by the run-time system (user-level library).
Ideally, thread operations should be as fast as a function call.
The kernel knows nothing about user-level threads and manage them as if they where single-threaded processes.

Advantages

Can be implemented on an OS that does not support kernel-level threads.
Does not require modifications of the OS.
Simple representation: PC, registers, stack and small thread control block all stored in the user-level process address space.
Simple management: Creating, switching and synchronizing threads done in user-space without kernel intervention.
Fast and efficient: switching threads not much more expensive than a function call.

Disadvantages

Not a perfect solution (a trade off).
Lack of coordination between the user-level thread manager and the kernel.
OS may make poor decisions like:
- scheduling a process with idle threads
- blocking a process due to a blocking thread even though the process has other threads that can run
- giving a process as a whole one time slice irrespective of whether the process has 1 or 1000 threads
- unschedule a process with a thread holding a lock.
May require communication between the kernel and the user-level thread manager (scheduler activations) to overcome the above problems.

User-level thread models

In general, user-level threads can be implemented using one of four models.

Many-to-one
One-to-one
Many-to-many
Two-level

All models maps user-level threads to kernel-level threads. A kernel thread is similar to a process in a non-threaded (single-threaded) system. The kernel thread is the unit of execution that is scheduled by the kernel to execute on the CPU. The term virtual processor is often used instead of kernel thread.

Many-to-one

In the many-to-one model all user level threads execute on the same kernel thread. The process can only run one user-level thread at a time because there is only one kernel-level thread associated with the process.

The kernel has no knowledge of user-level threads. From its perspective, a process is an opaque black box that occasionally makes system calls.

One-to-one

In the one-to-one model every user-level thread execute on a separate kernel-level thread.

In this model the kernel must provide a system call for creating a new kernel thread.

Many-to-many

In the many-to-many model the process is allocated m number of kernel-level threads to execute n number of user-level thread.

Two-level

The two-level model is similar to the many-to-many model but also allows for certain user-level threads to be bound to a single kernel-level thread.

Scheduler activations

In both the many-to-many model and the two-level model there must be some way for the kernel to communicate with the user level thread manager to maintain an appropriate number of kernel threads allocated to the process. This mechanism is called scheduler activations.

The kernel provides the application with a set of kernel threads (virtual processors), and then the application has complete control over what threads to run on each of the kernel threads (virtual processors). The number of kernel threads (virtual processors) in the set is controlled by the kernel, in response to the competing demands of different processes in the system.¹

The kernel notify the user-level thread manager of important kernel events using upcalls from the kernel to the user-level thread manager. Examples of such events includes a thread making a blocking system call and the kernel allocating a new kernel thread to the process.¹

Example

Let’s study an example of how scheduler activations can be used. The kernel has allocated one kernel thread (1) to a process with three user-level threads (2). The three user level threads take turn executing on the single kernel-level thread.

The executing thread makes a blocking system call (3) and the the kernel blocks the calling user-level thread and the kernel-level thread used to execute the user-level thread (4). Scheduler activation: the kernel decides to allocate a new kernel-level thread to the process (5). Upcall: the kernel notifies the user-level thread manager which user-level thread that is now blocked and that a new kernel-level thread is available (6). The user-level thread manager move the other threads to the new kernel thread and resumes one of the ready threads (7).

While one user-level thread is blocked (8) the other threads can take turn executing on the new kernel thread (9).

User-level thread scheduling

Scheduling of the usea-level threads among the available kernel-level threads is done by a thread scheduler implemented in user space. There are two main methods: cooperative and preemptive thread scheduling.

Cooperative scheduling of user-level threads

The cooperative model is similar to multiprogramming where a process executes on the CPU until making a I/O request. Cooperative user-level threads execute on the assigned kernel-level thread until they voluntarily give back the kernel thread to the thread manager.

In the cooperative model, threads yield to each other, either explicitly (e.g., by calling a yield() provided by the user-level thread library) or implicitly (e.g., requesting a lock held by another thread). In the below figure a many-to-one system with cooperative user-level threads is shown.

Preemptive scheduling of user-level threads

The preemptive model is similar to multitasking (aka time sharing). Multitasking is a logical extension of multiprogramming where a timer is set to cause an interrupt at a regular time interval and the running process is replaced if the job requests I/O or if the job is interrupted by the timer. This way, the running job is given a time slice of execution than cannot be exceeded.

In the preemptive model, a timer is used to cause execution flow to jump to a central dispatcher thread, which chooses the next thread to run. In the below figure a many-to-one system with preemptive user-level threads is shown.

Cooperative and preemptive (hybrid) scheduling of user-level threads

A hybrid model between cooperative and preemptive scheduling is also possible where a running thread may yield() or preempted by a timer.

An Implementation of Scheduler Activations on the NetBSD Operating System ↩︎ ↩︎

Simple Threads

Optional assignment for higher grade (3 points)

In this assignment you will implement a simplified version of many-to-one user level threads in form of a library that we give the name Simple Threads.

Preparations

Before you continue make sure you’ve read the implementing threads page. Among the slides in Studium you also find self study material about implementing threads.

Git

You should already have cloned the threads-and-synchronization repository.

Manage execution contexts

The ucontext.h header file defines the ucontext_t type as a structure that can be used to store the execution context of a user-level thread in user space. The same header file also defines a small set of functions used to manipulate execution contexts. Read the following manual pages.

Example

In examples/src/contexts.c you find an example program demonstrating how to create and manipulate execution contexts. Study the source. To compile, navigate to the examples directory in the terminal, type make and press enter.

make

To run:

./bin/contexts

Get started

In higher-grade/src you find the following files.

sthreads.h: Header file specifying the Simple Threads API.
sthreads.c: Implementation of the Simple Threads API.
sthreads_test.c: A program used to test the Simple Threads API.

Study the source and pay attention to all comments.

First compile and test run

In the terminal, navigate to the higher-grade directory. Use make to compile.

make

Run the test program.

./bin/sthreads_test

The program prints the following to terminal and terminates.

==== Test program for the Simple Threads API ====

Cooperative scheduling (2 points)

For grade 2 points you must implement all the functions in the Simple Threads API except done() and join. This includes cooperative scheduling where threads yield() control back to the thread manager.

The test program’s main() thread and all threads created by main() share a single kernel-level thread. The threads must call yield() to pass control to the thread manager. When calling yield() one of the ready threads is selected to execute and changes state from ready to running. The thread calling yield() changes state from running to ready.

Non-terminating threads

For grade 4 you may assume the main thread and all other threads are non-terminating loops.

Preemptive scheduling (3 points)

For 3 points, you must also implement the done() and join() functions in the Simple Threads API. In addition to cooperative scheduling with yield() you must also implement preemptive scheduling.

If a thread doesn’t call yield() within its time slice it will be preempted and one of the threads in the ready queue will be resumed. The preempted thread changes state from running to ready. The resumed thread changes state from ready to running.

Timer

To implement preemptive scheduling you need to set a timer. When the timer expires the kernel will send a signal to the process. You must register a signal handler for the timer signal. In the signal handler you must figure out a way to suspend the running thread and resume one of the threads in the ready queue.

Timer example

In examples/src/timer.c you find an example of how to set a timer. When the timer expires the kernel sends a signal to the process. A signal handler is used to catch the timer signal.

Threads and synchronization

Subsections of Threads and synchronization

Definitions

Concurrency

Process

Threads

Concurrent tasks

Atomic operations

Non-atomic operations

Race condition

Data race

Critical section

Mutual exclusion (mutex)

Clone repository

Use the git command

Use the tree command

The all utility

Mutual exclusion

Mandatory assignment

Linux or macOS on Intel CPUs only

The department Linux system

Overview

A collection of tests

Compile and run

Summary of all the tests

Questions

Identify the critical sections

Mutual exclusion

Test 0 - no synchronization

Test 1 - pthread mutex locks

Reference

Test 2 - Spin lock using test-and-set

Reference

Test 3 - atomic addition and subtraction

Reference

Evaluation

Code grading questions

Portable semaphores

Example usage

Example program

Compile

Run

Two thread barrier synchronization

Mandatory assignment

Background

Supported platforms

Overview

Compile and run

Questions

Rendezvous

Barrier synchronization (desired behavior)

Two thread barrier implementation

Compile and run

Example incorrect synchronization

Example of correct synchronization

Configuration

Portable semaphores

Number of semaphores

Declare global semaphore variables

Initial semaphore values

Destroy semaphores after use

Automatic error detection

Code grading questions

Simple bank simulation

Optional assignment for higher grade (1 point)

Git pull

Overview

The bank API

Add synchronization

Prevent deadlocks

Pthread mutex locks

Testing the bank API implementation

Compile and run the test of the simple bank API.

Example of correct synchronization

Example of incorrect synchronization

Deadlock

Bounded buffer

Mandatory assignment

Introduction

Preparations