The process concept and IPC

The fundamentals of program execution, processes and inter process communication (IPC).

Subsections of The process concept and IPC

Clone repository

Before you continue, you must clone the processes-and-ipc repository.

Use the git command

From the terminal, navigate to a directory where you want the cloned directory to be created and execute the following command.

git clone https://github.com/os-assignments/processes-and-ipc.git

Now you should see something similar to this in the terminal.

Cloning into 'processes-and-ipc'...
remote: Counting objects: 38, done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 38 (delta 8), reused 38 (delta 8), pack-reused 0
Unpacking objects: 100% (38/38), done.
Checking connectivity... done.
Checking out files: 100% (32/32), done.

Use the tree command

To get an overview of the cloned repository, use the tree command.

tree processes-and-ipc

Now you should see a tree view of all files and directories in the processes-and-ipc directory.

processes-and-ipc
├── examples
│   ├── bin
│   ├── data
│   │   └── data.txt
│   ├── Makefile
│   ├── obj
│   └── src
│       ├── child.c
│       ├── execlp_ls.c
│       ├── execv_ls.c
│       ├── execvp_ls.c
│       ├── fork.c
│       ├── fork_exec.c
│       ├── fork_exit_wait.c
│       ├── fork_exit_wait_status.c
│       ├── fork-template.c
│       ├── fork_zombie.c
│       ├── ls_pipe_wc.c
│       ├── open_read.c
│       ├── perror.c
│       ├── pipe.c
│       └── random_mystery.c
├── higher-grade
│   ├── bin
│   ├── Makefile
│   ├── obj
│   └── src
│       ├── parser.c
│       ├── parser.h
│       └── shell.c
├── mandatory
│   ├── bin
│   ├── Makefile
│   ├── obj
│   └── src
│       ├── pipeline.c
│       └── signals.c
└── tools
    ├── monitor
    └── my-ps

14 directories, 26 files
Install tree on macOS

If you run macOS and tree is not installed, use Homebrew to install tree.

brew install tree

The exec family of system calls

The exec family of system calls replaces the program executed by a process. When a process calls exec, all code (text) and data in the process is lost and replaced with the executable of the new program. Although all data is replaced, all open file descriptors remains open after calling exec unless explicitly set to close-on-exec. In the below diagram a process is executing Program 1. The program calls exec to replace the program executed by the process to Program 2.

execlp

The execlp system call duplicates the actions of the shell in searching for an executable file if the specified file name does not contain a slash (/) character. The search path is the path specified in the environment by the PATH variable. If this variable isn’t specified, the default path ":/bin:/usr/bin" is used.

The execlp system call can be used when the number of arguments to the new program is known at compile time. If the number of arguments is not known at compile time, use execvp.

#include <unistd.h>

int execlp(const char *file, const char *arg, ...);
file
Name of the program to execute.
Remaining arguments
The const char *arg and subsequent ellipses can be thought of as arg0, arg1, ..., argn. Together they describe a list of one or more pointers to null-terminated strings that represent the argument list available to the executed program. The first argument, by convention, should point to the filename associated with the file being executed. The list of arguments must be terminated by a NULL pointer.

Example

In processes-and-ipc/examples/src/execlp_ls.c you find the following example program demonstrating how execv can be used.

#include <unistd.h> // execlp()
#include <stdio.h>  // perror()
#include <stdlib.h> // EXIT_SUCCESS, EXIT_FAILURE

int main(void) {
  execlp("ls", "ls", "-l", NULL);
  perror("Return from execlp() not expected");
  exit(EXIT_FAILURE);
}

The program uses execlp to search the PATH for an executable file named ls and passing -l as argument to the new program. The new program is the same program used by the shell command ls to list files in a directory.

Use make to compile:

make

Run the program.

./bin/execlp_ls

You should see something similar to this in the terminal.

-rw-r--r--@  1 karl  staff  410 Jan 27 21:16 Makefile
drwxr-xr-x  17 karl  staff  578 Jan 28 22:08 bin
drwxr-xr-x   3 karl  staff  102 Dec  1  2016 data
drwxr-xr-x   2 karl  staff   68 Jan 28 22:08 obj
drwxr-xr-x  17 karl  staff  578 Jan 28 22:08 src
path
The path to the new program executable.
file
The name of the program executable
path

execvp

The execvp system call will duplicate the actions of the shell in searching for an executable file if the specified file name does not contain a slash (/) character. The search path is the path specified in the environment by the PATH variable. If this variable isn’t specified, the default path ":/bin:/usr/bin" is used. In addition, certain errors are treated specially.

#include <unistd.h>

int execvp(const char *file, char *const argv[]);
file
Name of the program to execute.
argv
Argument vector. An array of pointers to null-terminated strings that represent the argument list available to the new program. The first argument, by convention, should point to the filename associated with the file being executed. The array of pointers must be terminated by a NULL pointer.

Example

In processes-and-ipc/examples/src/execvp_ls.c you find the following example program demonstrating how execvp can be used.

#include <unistd.h> // execvp()
#include <stdio.h>  // perror()
#include <stdlib.h> // EXIT_SUCCESS, EXIT_FAILURE

int main(void) {
  char *const cmd[] = {"ls", "-l", NULL};
  execvp(cmd[0], cmd);
  perror("Return from execvp() not expected");
  exit(EXIT_FAILURE);
}

The program uses execvp to search the PATH for an executable file named ls and passing -l as argument to the new program. The new program is the same program used by the shell command ls to list files in a directory. In comparison to using execv we don’t have to provide the full path to ls when using execvp, only the name of the executable.

Use make to compile:

make

Run the program.

./bin/execvp_ls

You should see something similar to this in the terminal.

total 8
-rw-r--r--@ 1 abcd1234  staff  410 Jan 27 21:16 Makefile
drwxr-xr-x  5 abcd1234  staff  170 Jan 27 21:17 bin
drwxr-xr-x  2 abcd1234  staff   68 Jan 27 21:17 obj
drwxr-xr-x  5 abcd1234  staff  170 Jan 27 21:16 src

execv

In comparison to execvp the execv system call doesn’t search the PATH. Instead, the full path to the new executable must be specified. .

#include <unistd.h>

int execv(const char *path, char *const argv[]);
path
The path to the new program executable.
argv
Argument vector. The argv argument is an array of character pointers to null-terminated strings. The last member of this array must be a null pointer. These strings constitute the argument list available to the new process image. The value in argv[0] should point to the filename of the executable for the new program.

Example

In processes-and-ipc/examples/src/execv_ls.c you find the following example program demonstrating how execv can be used.

#include <unistd.h> // execv()
#include <stdio.h>  // perror()
#include <stdlib.h> // EXIT_SUCCESS, EXIT_FAILURE

int main() {
  char *const argv[] = {"/bin/ls", "-l", NULL};
  execv(argv[0], argv);
  perror("Return from execv() not expected");
  exit(EXIT_FAILURE);
}

The program uses execv to replace itself with the /bin/ls program passing -l as argument to the new program. The new program is the same program used by the shell command ls to list files in a directory.

Use make to compile:

make

Run the program.

./bin/execv_ls

You should see something similar to this in the terminal.

total 8
-rw-r--r--@ 1 abcd1234  staff  410 Jan 27 21:16 Makefile
drwxr-xr-x  5 abcd1234  staff  170 Jan 27 21:17 bin
drwxr-xr-x  2 abcd1234  staff   68 Jan 27 21:17 obj
drwxr-xr-x  5 abcd1234  staff  170 Jan 27 21:16 src

Process management

One of the philosophies behind Unix is the motto do one thing and do it well. In this spirit, basic process management is done with a number of system calls, each with a single (simple) purpose. These system calls can then be combined to implement more complex behaviors.

The following system calls are used for basic process management.

fork
A parent process uses fork to create a new child process. The child process is a copy of the parent. After fork, both parent and child executes the same program but in separate processes.
exec
Replaces the program executed by a process. The child may use exec after a fork to replace the process’ memory space with a new program executable making the child execute a different program than the parent.
exit
Terminates the process with an exit status.
wait
The parent may use wait to suspend execution until a child terminates. Using wait the parent can obtain the exit status of a terminated child.

Parent and child

The process invoking fork is called the parent. The new process created as the result of a fork is the child of the parent.

After a successful fork, the child process is a copy of the parent. The parent and child processes executes the same program but in separate processes.

Fork

The fork system call is the primary (and historically, only) method of process creation in Unix-like operating systems.

#include <unistd.h>

pid_t fork(void);
Return value
On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately.

Fork returns twice on success

On success fork returns twice: once in the parent and once in the child. After calling fork, the program can use the fork return value to tell whether executing in the parent or child.

  • If the return value is 0 the program executes in the new child process.
  • If the return value is greater than zero, the program executes in the parent process and the return value is the process ID (PID) of the created child process.
  • On failure fork returns -1.

Template program

In the file examples/src/fork-template.c you find a template for a typical program using fork.

 1#include <stdio.h>  // perror()
 2#include <stdlib.h> // exit(), EXIT_SUCCESS, EXIT_FAILURE
 3#include <unistd.h> // fork()
 4
 5int main(void) {
 6
 7  pid_t pid;
 8
 9  switch (pid = fork()) {
10   
11   case -1:
12      // On error fork() returns -1.
13      perror("fork failed");
14      exit(EXIT_FAILURE);
15   
16   case 0:
17      // On success fork() returns 0 in the child.
18      
19      // Add code for the child process here. 
20      
21      exit(EXIT_SUCCESS);
22   
23   default:
24      // On success fork() returns the pid of the child to the parent.
25      
26      // Add code for the parent process here. 
27      
28      exit(EXIT_SUCCESS);
29  }
30}

Header files

On lines 1-3 a number of header files are included to get access to a few functions and constants from the C Standard library.

pid_t

One line 7 the variable pid of type pid_t is declared. The pid_t data type is the data type used for process IDs.

fork

On line 9 the parent process calls fork and stores the return value in the variable pid.

switch

On line 9 a switch-statement is used to check the return value of fork.

Error (case -1)

On failure fork returns -1 and execution continues in the case -1 branch of the switch statement (line 11). The operating system was not able to create a new process. The parent uses perror to print an error message (line 13) and then terminates with exit status EXIT_FAILURE (line 14).

Child (case 0)

On success fork returns 0 in the new child process and execution continues in the case 0 branch of the switch statement (line 16). Any code to be executed only by the child is placed here (line 19). The child terminates with exit status EXIT_SUCCESS (line 21).

Parent (default)

If the value fork returned by fork was neither -1 (error) nor 0 (child), execution continues in the parent process in the default branch of the switch statement (line 23). In this case, the value returned by fork is the process ID (PID) of the newly created child process.

A first fork example

In the file examples/fork.c you find a program with the following main function.

int main(void) {
  pid_t pid;

  switch (pid = fork()) {
    case -1:
      // On error fork() returns -1.
      perror("fork failed");
      exit(EXIT_FAILURE);
    case 0:
      // On success fork() returns 0 in the child.
      child();
    default:
      // On success fork() returns the pid of the child to the parent.
      parent(pid);
  }
}

The code for the child is in the function child and the code for the parent in the function parent.

void child() {
  printf(" CHILD <%ld> I'm alive! My PID is <%ld> and my parent got PID <%ld>.\n",
         (long) getpid(), (long) getpid(), (long) getppid());
  printf(" CHILD <%ld> Goodbye!\n",
         (long) getpid());
  exit(EXIT_SUCCESS);
}

void parent(pid_t pid) {
  printf("PARENT <%ld> My PID is <%ld> and I spawned a child with PID <%ld>.\n",
         (long) getpid(), (long) getpid(), (long) pid);
  printf("PARENT <%ld> Goodbye!\n",
         (long) getpid());
  exit(EXIT_SUCCESS);
}

Both parent and child prints two messages and then terminates. Navigate to the examples directory. Compile using make.

make

Run the program.

./bin/fork 

You should see output similar to this in the terminal.

PARENT <87628> Spawned a child with PID = 87629.
PARENT <87628> Goodbye.
 CHILD <87629> I'm alive and my PPID = 1.
 CHILD <87629> Goodbye.

Run the program multiple times and look specifically at the PPID value reported by the child. Sometimes the child reports PPID = 1 but sometimes it is equal to the PID of the parent. Clearly the PID of the parent is not 1? Why doesn’t report the “correct” PPID value all the time?

Orphans

An orphan process is a process whose parent process has terminated, though it remains running itself. Any orphaned process will be immediately adopted by the special init system process with PID 1.

Processes execute concurrently

Both the parent process and the child process competes for the CPU with all other processes in the system. The operating systems decides which process to execute when and for how long. The process in the system execute concurrently.

In our example program:

  • most often the parent terminates before the child and the child becomes an orphan process adopted by init (PID = 1) and therefore reports PPID = 1
  • sometimes the child process terminates before its parent and then the child is able to report PPID equal to the PID of the parent.

Wait

The wait system call blocks the caller until one of its child process terminates. If the caller doesn’t have any child processes, wait returns immediately without blocking the caller. Using wait the parent can obtain the exit status of the terminated child.

#include <sys/types.h>
#include <sys/wait.h>

pid_t wait(int *status); 
status
If status is not NULL, wait store the exit status of the terminated child in the int to which status points. This integer can be inspected using the WIFEXITED and WEXITSTATUS macros.
Return value
On success, wait returns the PID of the terminated child. On failure (no child), wait returns -1.

WIFEXITED

#include <sys/types.h>
#include <sys/wait.h>
 
WIFEXITED(status);
status
The integer status value set by the wait system call.
Return value
Returns true if the child terminated normally, that is, by calling exit or by returning from main.

WEXITSTATUS

#include <sys/types.h>
#include <sys/wait.h>
       
int WEXITSTATUS(status);
status
The integer status value set by the wait system call.
Return value
The exit status of the child. This consists of the least significant 8 bits of the status argument that the child specified in a call to exit or as the argument for a return statement in main. This macro should be employed only if WIFEXITED returned true.

Example using wait

In the examples/fork_exit_wait.c example program the parent execute the parent function.

 1void parent(pid_t pid) {
 2
 3  printf("PARENT <%ld> Spawned a child with PID = %ld.\n",
 4         (long) getpid(), (long) pid);
 5
 6  wait(NULL);
 7
 8  printf("PARENT <%ld> Child with PID = %ld terminated.\n",
 9         (long) getpid(), (long) pid);
10
11  printf("PARENT <%ld> Goodbye.\n",
12         (long) getpid());
13
14  exit(EXIT_SUCCESS);
15}

On line 6 the parent calls wait(NULL) to wait for the child process to terminate.

Compile and run the program. Now the parent should always wait for the child to terminate before terminating itself. As a consequence the child should:

  • newer be adopted by init
  • new report PPID = 1
  • always report PPID equal to the PID of the parent.

Example using wait to obtain the exit status of the child

In the examples/fork_exit_wait_status.c example program the parent execute the parent function.

 1void parent(pid_t pid) {
 2  int status;
 3
 4  printf("PARENT <%ld> Spawned a child with PID = %ld.\n",
 5         (long) getpid(), (long) pid);
 6
 7  wait(&status);
 8
 9  if (WIFEXITED(status)) {
10    printf("PARENT <%ld> Child with PID = %ld and exit status = %d terminated.\n",
11           (long) getpid(), (long) pid, WEXITSTATUS(status));
12  }
13
14  printf("PARENT <%ld> Goodbye.\n",
15         (long) getpid());
16
17  exit(EXIT_SUCCESS);
18}

One line 2 the parent creates the variable status. On line 7 the parent calls wait(&status) to wait for the child process to terminate. The & is the address-of operator and &status returns the address of the status variable. When the child terminates the exit status of the child will be stored in variable status.

Compile using make.

make 

Run the program.

./bin/fork_exit_wait_status

In the output you should be able to see that the parent obtained the exit status of the child.

PARENT <99340> Spawned a child with PID = 99341.
 CHILD <99341> I'm alive and my PPID = 99340.
 CHILD <99341> Goodbye, exit with status 42.
PARENT <99340> Child with PID = 99341 and exit status = 42 terminated.
PARENT <99340> Goodbye.

Zombies

A terminated process is said to be a zombie or defunct until the parent does wait on the child.

  • When a process terminates all of the memory and resources associated with it are deallocated so they can be used by other processes.
  • However, the exit status is maintained in the PCB until the parent picks up the exit status using wait and deletes the PCB.
  • A child process always first becomes a zombie.
  • In most cases, under normal system operation zombies are immediately waited on by their parent.
  • Processes that stay zombies for a long time are generally an error and cause a resource leak.

An example with a zombie process

In the examples/fork_zombie.c example program the child terminates before the parent does wait on the child and becomes a zombie process. The parent execute the parent function.

 1void parent(pid_t pid) {
 2
 3  printf("PARENT <%ld> Spawned a child with PID = %ld.\n",
 4         (long) getpid(), (long) pid);
 5
 6  printf("PARENT <%ld> Press any key to reap a zombie!\n",
 7         (long) getpid());
 8
 9  getchar();
10
11  pid = wait(NULL);
12
13  printf("PARENT <%ld> Zombie child with PID = %ld",
14         (long) getpid(), (long) pid);
15
16  exit(EXIT_SUCCESS);
17}

On line 9 the parent uses getchar to block itself until the user presses a key on the keyboard.

When the child terminates, the exit status of the child is stored in the child process control block (PCB). The operating system deallocates all memory used by the child but the PCB cannot be deallocated until the parent does wait on the child.

Compile using make.

make 

Run the program.

./bin/fork_zombie

In the output you should be able to see that the child terminates and that the parent blocks waiting for a keypress.

PARENT <4636> Spawned a child with PID = 4637.
PARENT <4636> Press any key to reap a zombie!
 CHILD <4637> I'm alive and my PPID = 4636.
 CHILD <4637> Goodbye.

The child process has terminated but the parent has yet not read the exit status of the child using wait. The child process has now become a zombie process.

Monitor

On Unix-like systems, the top command produces an ordered list of running processes selected by user-specified criteria, and updates it periodically.

There are a few differences between the top command used by OS X and Linux. The tools/monitor tools provides a simple but portable alternative for monitoring processes with a specified command name.

Open a second terminal and navigate to the processes-and-ipc directory. The tools/monitor tool can be used to view process status information about process. Use the --help flag to see the documentation.

./tools/monitor --help

This is the built in documentation for the monitor tool.

Usage: monitor [-s delay] [-p pid] cmd

A top-like command that only lists USER, PID, STAT and COMM for the
current user and and processes with a command name with a grep match of cmd.

Options:
 -s delay    Delay in seconds between refresh, default = 1.
 -p pid      Include process with PID pid.
 

The cmd argument is the name of the program executed by the processes we want to monitor.

Now let’s try and use the monitor tool to view process status information for the parent and child, both executing the fork_zombie example program.

./tools/monitor fork_zombie

On Linux you should see something similar to this.

Monitoring processes matching: fork_zombie

Press Ctrl-C to exit

USER       PID  PPID S COMMAND
abcd1234  4636  4311 S fork_zombie
abcd1234  4637  4636 Z fork_zombie <defunct>

In the PID column you see the PID of the listed processes. The first line shows information about the parent and the second line shows information about the child.

The S column show the status of the process.

  • The parent got status S (sleep) meaning the process is waiting for an event to complete. In this case the parent is blocked waiting for the call to getchar() to return, i.e, the parent is blocked waiting for a key to be pressed on the keyboard.
  • The child got status Z (zombie) meaning the process terminated but not yet reaped by its parent, i.e., the parent is alive but have not yet done wait on the terminated child.

Another name used for a zombie process is defunct.

Reap the zombie

From the terminal used to run the fork_zombie program, press any key to make the parent do wait on the child.

PARENT <4636> Spawned a child with PID = 4637.
PARENT <4636> Press any key to reap a zombie!
 CHILD <4637> I'm alive and my PPID = 4636.
 CHILD <4637> Goodbye.

PARENT <4636> Zombie child with PID = 4637 reaped!
PARENT <4636> Press any key to terminate!

In the terminal used to run monitor the zombie process should have disappear, leaving only the parent process.

Monitoring processes matching: fork

Press Ctrl-C to exit

USER       PID  PPID S COMMAND
abcd1234  4636  4311 S fork_zombie

The parent is now blocked, waiting for user input.

Terminate the parent

From the terminal used to run the fork_zombie program, press any key to make the parent terminate.

PARENT <4636> Spawned a child with PID = 4637.
PARENT <4636> Press any key to reap a zombie!
 CHILD <4637> I'm alive and my PPID = 4636.
 CHILD <4637> Goodbye.

PARENT <4636> Zombie child with PID = 4637 reaped!
PARENT <4636> Press any key to terminate!

PARENT <4636> Goodbye!

Execute a new program in the child

If you don’t want to execute the same program in both the parent and the child, you will need to use a system call of the exec family. The exec system calls will replace the currently executing program with a new executable.

In examples/src/child.c you this small program.

#include <stdio.h>    // puts(), printf(), perror(), getchar()
#include <stdlib.h>   // exit(), EXIT_SUCCESS, EXIT_FAILURE
#include <unistd.h>   // getpid(), getppid()

int main(void) {

  printf(" CHILD <%ld> I'm alive and my PPID = %ld.\n",
       (long) getpid(), (long) getppid());

  printf(" CHILD <%ld> Press any key to make me terminate!\n",
         (long) getpid());

  getchar();

  printf(" CHILD <%ld> Goodbye!\n",
         (long) getpid());

  exit(127);
}

Compile using make.

make 

Run the program.

./bin/child

First this program simply prints two messages to the terminal and then wait for a key-press.

 CHILD <33172> I'm alive and my PPID = 81166.
 CHILD <33172> Press any key to make me terminate!

After you press any key in the terminal the program terminates.

 CHILD <33172> I'm alive and my PPID = 81166.
 CHILD <33172> Press any key to make me terminate!
  
 CHILD <33172> Goodbye!

The examples/src/fork_exec.c program uses execv to make the child process execute the examples/bin/child executable. After fork the child executes the child functions.

 1void child() {
 2  char *const argv[] = {"./bin/child", NULL};
 3
 4  printf(" CHILD <%ld> Press any key to make me call exec!\n",
 5         (long) getpid());
 6
 7  getchar();
 8
 9  execv(argv[0], argv);
10
11  perror("execv");
12  exit(EXIT_FAILURE);
13}

On line 2 the needed argument vector is constructed. On line 7 the child waits for a key-press. After the key-press, on line 9, the child use execv to replace the program executed by the child process by the child executable. If execv is successful control will never be returned and lines 11 and 12 should not be reached.

Compile using make.

make 

Run the program.

./bin/fork_exec
PARENT <33422> Spawned a child with PID = 33423.
 CHILD <33423> Press any key to make me call exec!

Open a second terminal and use the ps command with the -p option to see information about the child process.

ps -p 33206
  PID TTY           TIME CMD
  33423 ttys023    0:00.00 ./bin/fork_exec

Note that the child process currently is executing the .bin/fork_exec executable.

In the first terminal, press any key.

 CHILD <33423> I'm alive and my PPID = 33422.
 CHILD <33423> Press any key to make me terminate!

From the other terminal and use the ps command with the -p option to see information about the child process.

ps -p 33206
  PID TTY           TIME CMD
  33423 ttys023    0:00.00 ./bin/child

Note that the child process now executes the ./bin/child executable.

In the first terminal, press any key to make the child process terminate. Now the parent performs wait on the child and reports the child exit status.

  CHILD <33423> Goodbye!
 PARENT <33422> Child with PID = 33423 and exit status = 127 terminated.
 PARENT <33422> Goodbye!

Signals

Mandatory assignment

Signals are a limited form of inter-process communication (IPC), typically used in Unix, Unix-like, and other POSIX-compliant operating systems. 1 A signal is used to notify a process of an synchronous or asynchronous event.

When a signal is sent, the operating system interrupts the target process' normal flow of execution to deliver the signal. If the process has previously registered a signal handler, that routine is executed. Otherwise, the default signal handler is executed. 1

Each signal is represented by an integer value. Instead of using the numeric values directly, the named constants defined in signals.h should be used.

Clone repository

If you haven’t done so already, you must clone the processes-and-ipc repository.

Open file

Open the file mandatory/src/signals.c in the source code editor of your choice.

Study the source code

Study the C source code.

Header files

First a number of header files are included to get access to a few functions and constants from the C Standard library.

Global variable done

A global variable done is initialized to false.

bool done = false;

Later this variable is going to be updated by a signal handler.

divide_by_zero

The divide_by_zero function attempts to divide by zero.

segfault

The function segfault attempts to dereference a NULL pointer causing a segmentation fault.

signal_handler

The signal_handler function will handle signals sent to the process. A switch statement is used to determine which signal has been received. An alternative is to use one signal handling function for each signal but here a single signal handling function is used.

main

All C programs starts to execute in the main function.

  • The process ID (PID) is obtained using the getpid function and printed to the terminal with printf.
  • A number of lines are commented out, we’ll get back to these later.
  • The function puts is used to print the string I'm done! on a separate line to the terminal.
  • Finally, exit is used to terminate the process with exit status EXIT_SUCCESS defined in stdlib.h.

Program, executable and process

Let’s repeat the differences between a program, an executable and a process.

Program
A set of instructions which is in human readable format. A passive entity stored on secondary storage.
Executable
A compiled form of a program including machine instructions and static data that a computer can load and execute. A passive entity stored on secondary storage.
Process
An excutable loaded into memory and executing or waiting. A process typically executes for only a short time before it either finishes or needs to perform I/O (waiting). A process is an active entity and needs resources such as CPU time, memory etc to execute.

The make build tool

The make build tool is used together with the Makefile to compile all programs in the mandatory/src directory.

Compile all programs

From a terminal, navigate to the mandatory directory. To compile all programs, type make and press enter.

make

When compiling, make places all executables in the bin directory.

First test run

Run the signals program.

./bin/signals

You should now see output similar to this in the terminal.

My PID = 81430
I'm done!

Note that the PID value you see will be different.

New process

Run the program a few times. Note that each time you run the same program the process used to execute the program gets a new process ID (PID).

C comments

In C, // is used to start a comment reaching to the end of the line.

Division by zero

To make the program divide by zero, uncomment the following line.

// divide_by_zero();

Compile with make.

make

Run the program.

./bin/signals

In the terminal you should see something similar to this.

My PID = 81836
[2]    81836 floating point exception  ./bin/signals

Division by zero causes an exception. When the OS handles the exception it sends the SIGFPE (fatal arithmetic error) signal to the process executing the division by zero. The default handler for the SIGFPE signal terminates the process and this is exactly what happened here.

Division by zero on Mac

On some versions of Mac hardware, integer division by zero does not cause a SIGFPE signal to be sent, instead a SIGILL (illegal instruction) signal is sent. On other combinations of Mac hardware and C compiler, some other signal might be sent, or no signal is sent at all.

If you run on Mac hardware and the process does not terminate when dividing by zero, you can:

  • try to catch the SIGILL signal instead of the SIGFPE signal
  • or, you could simply skip the whole division by zero part of this assignment.

Read more:

Run the program a few times. Each time you run the program the same error (division by zero) happens, causing an exception, causing the OS to send the process the SIGFPE signal, causing the process to terminate.

Synchronous signals

Synchronous signals are delivered to the same process that performed the operation that caused the signal. Division by zero makes the OS send the process the synchronous signal SIGFPE.

Installing a signal handler

A program can install a signal handler using the signal function.

signal(sig, handler);
sig
The signal you want to specify a signal handler for.
handler
The function you want to use for handling the signal.

Handling SIGFPE

Uncomment the following line to install the signal_handler function as the signal handler for the SIGFPE signal.

// signal(SIGFPE,  signal_handler);

Compile with make.

make

Run the program.

./bin/signals

In the terminal you should see something similar to this.

My PID = 81979
Caught SIGFPE: arithmetic exception, such as divide by zero.

This time the signal doesn’t terminate the process immediately. When the process receives the SIGFPE signal the function signal_handler is executed with the signal number as argument. After printing a message to the terminal the signal handler terminates the process with status EXIT_FAILURE.

No more division by zero

Comment out the following line.

divide_by_zero();

Compile and run the program Make sure you see output similar to this in the terminal.

My PID = 82040
I'm done!

Segfault

A segmentation fault (aka segfault) are caused by a program trying to read or write an illegal memory location. To make the program cause a segfault, uncomment the following line.

// segfault();

Compile with make.

make

Run the program.

./bin/signals

In the terminal you should see something similar to this.

My PID = 82084
[2]    82084 segmentation fault  ./bin/signals

The illegal memory access causes an exception. When the OS handles the exception it sends the SIGSEGV signal to the process executing the illegal memory access. The default handler for the SIGSEGV signal terminates the process and this is exactly what happened here.

Run the program a few times. Each time you run the program the same error (illegal memory access) happen, causing an exception, causing the OS to send the process the SIGSEGV signal, causing the process to terminate.

Synchronous signals

Synchronous signals are delivered to the same process that performed the operation that caused the signal. An illegal memory access makes the OS send the process the synchronous signal SIGSEGV.

Handling SIGSEGV

Add code to install the function signal_handler as the signal handler for the SIGSEGV signal. When you run the program you should output similar to this in the terminal.

My PID = 82161
Caught SIGSEGV: segfault.

No more segfault

Comment out the following line.

segfault();

Compile and run the program Make sure you see output similar to this in the terminal.

My PID = 82040
I'm done!

Wait for a signal

The pause function is used to block a process until it receives a signal (any signal will do). Uncomment the following line.

// pause();

Compile and run the program. You should see output similar to this in the terminal.

My PID = 82249

The process is now blocked, waiting for any signal to be sent to the process.

Ctrl+C

To terminate the process, press Ctrl+C in the terminal.

My PID = 82249
^C

Note that once the process terminates you get the terminal prompt back.

Asynchronous signals

Asynchronous signals are generated by an event external to a running process. Pressing Ctrl+C is an external event causing the OS to send the asynchronous SIGINT (terminal interrupt) signal to the process.

The default signal SIGINT handler terminates the process.

Handling SIGINT

Add code to install the function signal_handler as the signal handler for the SIGINT signal. When you run the program the process blocks waiting for any signal. When you press Ctrl+C you should now see output similar to this in the terminal.

My PID = 82477
^CCaught SIGINT: interactive attention signal, probably a ctrl+c.
I'm done!

Open a second terminal

Open a second terminal.

Sending signals from the terminal

Compile and run the program in one of the terminals. The program should block waiting for any signal. Note the PID of the blocked process.

My PID = 82629

The command kill can be used to send signals to processes from the terminal. To send the SIGINT signal to the blocked process, execute the following command in the terminal where you replace <PID> with the PID of the blocked process.

kill -s INT <PID>

In the other terminal you should now see the blocked process execute the signal handler, then continue in main after pause(), print I'm done! and terminate.

My PID = 82629
Caught SIGINT: interactive attention signal, probably a ctrl+c.
I'm done!

Handle SIGUSR1

Add code to make the program print “Hello!” when receiving the SIGUSR1 signal.

  • Compile and run the program from one terminal.
  • Send the SIGUSR1 signal to the process from the other terminal using the kill command where you replace <PID> with the PID of the blocked process.
kill -s SIGUSR1 <PID>

Don’t terminate on SIGUSR1

How can you make the program print Hello! every time the signal SIGUSR1 is received without terminating?

Set the global variable done to true

In the signal_hanlder function, set the global variable done to true when handling the SIGINT signal.

Block until done

In main, replace the line:

pause();

, with the following while loop:

while (!done);

In C ! is the logical not operator. This while loop repeatedly checks the global variable done until it becomes true.

Compile, run and test

Compile and run the program from one terminal and send signals to the process from the other terminal.

  • Are you able to send multiple SIGUSR1 signals to the process?
  • Are you able to break out of the while loop and terminate the process by sending the signal SIGINT to the process, or by pressing Ctrl+C from the terminal?

Bug?

Depending on your compiler the program may not break out of the while(!done) loop

An optimizing compiler

An optimizing compiler may detect that the variable done is not changed in the while(!done); loop and replace the loop with if (!false);.

Volatile

Do you remember the volatile keyword?

Volatile

The volatile keyword is used to make sure that the contents of a variable is always read from memory.

Make the global variable done volatile.

Compile, run and test

Compile and run the program from one terminal and send signals to the process from the other terminal.

  • Make sure you are able to send multiple SIGUSR1 signals to the process.
  • Make sure you can terminate the process by sending the signal SIGINT to the process, or by pressing Ctrl+C from the terminal.

sig_atomic_t

The data type sig_atomic_t guarantees that reading and writing a variable happen in a single instruction, so there’s no way for a signal handler to run “in the middle” of an access. In general, you should always make any global variables changed by a signal handler be of the data type sig_atomic_t.

Change the datatype of the global variable done from bool to sig_atomic_t.

Use pause instead

Using a while loop to repeatedly check the global variable done is not a very efficient use of the CPU. A better way is to change the loop to:

while (pause()) {
  if (done) break;
};

Why is this more efficient?

Code grading questions

Here are a few examples of questions that you should be able to answer, discuss and relate to the source code of you solution during the code grading.

  • What is a signal?
  • How do signals relate to exceptions and interrupts?
  • What is a signal handler?
  • How do you register a signal handler?
  • What happens if you don’t register a signal handler?
  • What causes a segfault?
  • What is meant by a synchronous signal?
  • What do the systemcall pause() do?
  • What happens when you press Ctrl-C in the controlling terminal of a process?
  • How do you send signals to other processes?
  • Why is the keyword volatile needed when declaring the global variable done?
  • Why is the datatype sig_atomic_t needed when declaring the global variable done?
  • Why is it more efficient to use pause() instead of simply loop and check the done variable?

Pipeline

Mandatory assignment

Your task is to create a system with one parent process and two child processes where the children communicate using a pipe.

Open a terminal

Open a terminal and navigate to the mandatory directory.

List directory contents (ls)

The ls shell command list directory content. Execute the ls command in the terminal.

ls

You should now see this:

Makefile     bin     obj     src

The -F option appends a slash / to directory entries.

ls -F

The only file in the directory is Makefile. The directory contains the subdirectories bin, obj and src.

Makefile     bin/     obj/     src/

Add the -1 option to:

ls -F -1

, print each entry on a separate line:

Makefile
bin/
obj/
src/

Line numbering filter (nl)

The nl shell command is a filter that reads lines from stdin and echoes each line prefixed with a line number to stdout. In the terminal, type nl and press enter.

nl

The nl command now waits for stdin input. Type Some text and press enter.

Some text
     1 Some text

Type More text and press enter.

More text
     2 More text

For each line you type, the line is echoed back with a line number. Play around some more with the nl command. Press Ctrl+D (End Of File) to make the nl command exit.

Pipeline

One of the philosophies behind Unix is the motto do one thing and do it well. In this spirit, each shell command usually have a very specific purpose. More complicated commands can be constructed by combining simpler commands in a pipeline such that the output of one command becomes the input to another command.

The standard shell syntax for pipelines is to list multiple commands, separated by vertical bars | (the pipe character). In the below example the output from the ls -F -1 command is piped to input of the nl command.

ls -F -1 | nl

Now the result of ls -F -1 is printed with line numbers added at the front of each line.

    1 Makefile
    2 bin/
    3 obj/
    4 src/

Shell commands are ordinary programs

All shell commands are ordinary programs. When the shell executes a command, the shell first forks a new process and uses exec to make the new process execute the command program.

Which programs

The which utility locates a program executable in the user’s PATH. Use which to see which program executables the shell uses for the ls and nl commands.

which ls

The executables for the ls command program is found in the /bin directory.

/bin/ls

What about the nl command?

which nl

The executables for the nl command programs is also located in the /bin directory.

/bin/nl

System overview

Your task is to create a program that mimics the ls -F -1 | nl pipeline. The parent process uses fork to create two child processes. Child A uses exec to execute the ls -F -1 command and Child B uses exec to execute the nl command. The children uses a pipe to communicate. The stdout of Child A is redirected to the write end of the pipe. The stdin of the Child B is redirected to the read end of the pipe.

Pipe

How can you make the child process share a pipe? Which process should create the pipe? When should the pipe be created?

Parent forks twice

The parent must use fork twice, once for each child process.

Parent waits twice

The parent should use wait twice to wait for both child processes to terminate.

The children uses execlp

The child process should use execlp to execute their commands.

The children uses dup2

The child process should use dup2 to redirect stdout and stdin.

  • Process A redirects stdout to write to the pipe.
  • Process B redirects stdin to read from the pipe.

Close unused pipe file descriptors

Don’t forget to close unused pipe file descriptors, otherwise a reader or writer might be blocked .

AttemptConditionsResult
ReadEmpty pipe, writers attachedReader blocked
ReadEmpty pipe, no writer attachedEnd Of File (EOF) returned
WriteFull pipe, readers attachedWriter blocked
WriteNo readers attachedSIGPIPE

Closing a read descriptor to early may cause a SIGPIPE.

Beware of SIGPIPE

The default SIGPIPE signal handler causes the process to terminate.

Check return values

You should check the return values of all system calls to detect errors. On error use perror to print an error message and then terminate with exit(EXIT_FAILURE).

pipeline.c

Use the file src/pipeline.c to implement your solution.

  • You must add code the the main function. You must add code here.
  • After fork, Child A should execute the child_a function. You must add code here.
  • After fork, Child B should execute the child_b function. You must add code here.
  • You may add your own functions to further structure your solution.

Compile and run

Use make to compile.

make

Run the program.

./bin/pipeline

When running the finished program you should see output similar to this, where $ is your shell prompt.

$ ./bin/pipeline
    1 Makefile
    2 bin/
    3 obj/
    4 src/
$

Make sure you get the shell prompt $ back.

Code grading questions

Here are a few examples of general questions not directly related to the source code that you should be able to answer and discuss during the code grading.

  • What do we mean with a process pipeline?
  • What is a pipe?
  • How are pipes used to construct process pipelines?
  • Show an example of a proccess pipeline in the terminal (shell).
  • How are shell (terminal) commands implemented?
  • What happens to the file descriptor table after a successful creation of a new pipe?

Here are a few examples of questions that you should be able to answer, discuss and relate to the source code of you solution during the code grading.

  • How many times does the parent calls fork and why?
  • Why do the children need to call execlp()?
  • Explain how each child is able to redirect stdin or stdout from or to the pipe?
  • How will the consumer know when there is no more data to expect from the pipe?
  • Why is it important for a process to close any pipe file descriptors it does not intend to use?
  • What could happen if you close a read descriptor to early?

Shell

Optional assignment for higher grade (3 points)

A shell is an interface between a user and the operating system. It lets us give commands to the system and start other programs. Your task is to program a simple shell similar to for example Bash, which probably is the command shell you normally use when you use a Unix/Linux system.

Preparations

When programming a shell, several of the POSIX system calls you studied already will be useful. Before you continue, make sure you have a basic understanding of at least the following system calls: fork(), execvp(), getpid(), getppid(), wait(), pipe(), dup2() and close().

Files to use

In the higher-grade/src directory you find the following files.

parser.h
Header file for a provided command line parser.
parser.c
Implementation of the provided command line parser.
shell.c
Here you will implement your shell.

Compile and run

Navigate to the higher-grade directory. Use make to compile.

make

Run the shell.

./bin/shell

The provided version of the shell uses >>>   as the prompt and is able to execute single commands, for example ls.

>>> ls
>>> Makefile	bin		obj		src

Note that something is wrong with the shell prompt >>> . When executing a command, the prompt is printed immediately after the command, and not after the command has completed. This is something you need to fix.

Let’s try to pipe two commands together.

>>> ls | nl
>>> Makefile	bin		obj		src

When trying to pipe two commands together, only the first command is executed. The second command is not executed. The output of the first command in not piped to as input to the second command. This is something you need to fix.

Higher grade points (max 3)

For 1 point your shell must be able to handle a single command and a pipeline with two commands. When executing a command line, the prompt >>>   must be printed after the execution of the command line has finished.

For 3 points, in addition to the 1 point requirements above, the shell must be able to handle a pipeline with two, three or more commands.

You must also make sure that after a command line has finished, all descriptor to created pipes are closed. Otherwise the operating system will not be able to deallocate the pipes and potentially re-use the descriptor values.

Command data

In the file parser.h the following C structure is defined.

/**
 * Structure holding all data for a single command in a command pipeline.
 */
typedef struct {
  char*  argv[MAX_ARGV];  // Argument vector.
  position_t pos;         // Position within the pipeline (single, first, middle or last).
  int in;                 // Input file descriptor.
  int out;                // Output file descriptor.
} cmd_t;

The above structure is used to hold data for a single command in a command pipe line.

Command array

In the file shell.c command data for all commands in a command pipeline is stored in a global array.

/**
 * For simplicitiy we use a global array to store data of each command in a
 * command pipeline .
 */
cmd_t commands[MAX_COMMANDS];

Parser

In parser.h you find the following prototype.

/**
 *  parses the string str and populates the commands array with data for each
 *  command.
 */
int parse_commands(char *str, cmd_t* commands);

This function parse a command line string str such as "ls -l -F | nl" and populates the commands array with command data.

Program design

The below figure shows the overall structure of the shell.

When a user types a command line on the form A | B | C the parent parses the user input and creates one child process for each of the commands in the pipeline. The child processes communicates using pipes. Child A redirects stdout to the write end of pipe 1. Child B redirects stdin to the read end of pipe 1 and stdout to the write end of pipe 2. Child C redirects stdin to the read end of pipe 2.

parser.c

In the parser.c file you must complete the implementation of the following function.

/**
 * n - number of commands.
 * i - index of a command (the first command has index 0).
 *
 * Returns the position (single, first, middle or last) of the command at index
 * i.
 */
position_t cmd_position(int i, int n) {

shell.c

Use shell.c to implement your solution. This file already implements the most basic functionality but it is far from complete.

Feel free to make changes

When implementing your solution, you are allowed to:

  • add your own functions
  • modify existing functions
  • add (or remove) arguments to existing functions
  • modify existing data structures
  • add you own data structures.

Alternative program design

The provided code is based on a design where the shell (parent) process creates one child process for each command in the pipeline.

An alternative design is for the first child process to create the second child, the second child to create the third child etc. If you prefer this design, this might require you to modify the provided source code to fit this alternative design.