24.12.2012 Views

CSC322 C Programming and UNIX - Department of Computer ...

CSC322 C Programming and UNIX - Department of Computer ...

CSC322 C Programming and UNIX - Department of Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />

Prerequisites: CSC220 or EEN218


Hackers!<br />

Hacker [originally, someone who makes furniture with an axe] 1. A person who<br />

enjoys exploring the details <strong>of</strong> programmable systems <strong>and</strong> how to stretch their<br />

capabilities, as opposed to most users, who prefer to learn only the minimum<br />

necessary. 2. One who programs enthusiastically (even obsessively) or who<br />

enjoys programming rather than just theorizing about programming. 3. A person<br />

capable <strong>of</strong> appreciating hack value. 4. A person who is good at programming<br />

quickly. 5. An expert at a particular program, or one who frequently does work<br />

using it or on it; as in ‘a Unix hacker’. (Definitions 1 through 5 are correlated, <strong>and</strong><br />

people who fit them congregate.) 6. An expert or enthusiast <strong>of</strong> any kind. One<br />

might be an astronomy hacker, for example. 7. One who enjoys the intellectual<br />

challenge <strong>of</strong> creatively overcoming or circumventing limitations. 8. [deprecated]<br />

A malicious meddler who tries to discover sensitive information by poking around.<br />

Hence ‘password hacker’, ‘network hacker’. The correct term for this sense is<br />

cracker.<br />

The New Hacker’s Dictionary (aka the Jargon File)<br />

Stephan Schulz 2


The<br />

<strong>UNIX</strong><br />

Operating System<br />

<strong>UNIX</strong> <strong>and</strong> You<br />

Stephan Schulz 3


The<br />

<strong>UNIX</strong><br />

Operating System<br />

<strong>UNIX</strong> <strong>and</strong> You<br />

Stephan Schulz 4


The<br />

<strong>UNIX</strong><br />

Operating System<br />

<strong>UNIX</strong> <strong>and</strong> You<br />

C<br />

Stephan Schulz 5


I<br />

n<br />

t<br />

e<br />

r<br />

n<br />

e<br />

t<br />

/<br />

etc home usr dev<br />

joe jackjill<br />

PID:<br />

182<br />

The<br />

Our AIM<br />

bin<br />

ls man cat<br />

<strong>UNIX</strong><br />

hda mouse mta<br />

PID:<br />

Operating System<br />

512<br />

C<br />

Stephan Schulz 6


<strong>UNIX</strong> is a big-iron operating system<br />

<strong>UNIX</strong> is complicated<br />

<strong>UNIX</strong> is hard to use<br />

The Myth<br />

<strong>UNIX</strong> has been created by SUN, IBM, HP, <strong>and</strong> other large companies<br />

<strong>UNIX</strong> is monolithic<br />

Stephan Schulz 7


Counterpoint<br />

<strong>UNIX</strong> was developed on small machines <strong>and</strong> became popular on the “killer<br />

micros”. <strong>UNIX</strong> dialects now run on everything from a PDA to CRAY supercomputers<br />

<strong>UNIX</strong> is based on simple <strong>and</strong> elegant principles (but has added a some cruft over<br />

the years)<br />

<strong>UNIX</strong> is not particularly hard to use (compared to the power it gives to the<br />

user), but has a reasonably steep learning curve. It’s not a “show-me” operating<br />

system, but a “tell me” operating system,<br />

<strong>UNIX</strong> has been created in a research environment, <strong>and</strong> much <strong>of</strong> it has been<br />

developed in informal settings by hackers. Much <strong>of</strong> the impetus for <strong>UNIX</strong> comes<br />

from free versions (Linux, Net-, Open-, FreeBSD), although many companies<br />

contribute to it’s development<br />

Many <strong>UNIX</strong> kernels are monolithic, but the <strong>UNIX</strong> system is extremly modular.<br />

Stephan Schulz 8


<strong>UNIX</strong><br />

First portable operating system (NetBSD: 18 processor architecures, ≈ 50 computer<br />

architecures)<br />

Written in a “high-level” language (C)<br />

Small (for what it does):<br />

– Recent LINUX kernel: 2.4 million LOC (1.4 million for driver, 0.4 million<br />

architecture-dependent stuff (16 ports)<br />

– Windows 2000: Estimates range from 29 million to 65 million LOC, supports<br />

just 1.5 architecures<br />

Modular (though <strong>of</strong>ten on a monolithic kernel)<br />

– Separate windowing system (X) <strong>and</strong> window managers<br />

– Various Desktop-Solutions (CDE, KDE, Gnome)<br />

– Toolbox-philosphy: Combine (lot’s <strong>of</strong>) simple tools<br />

– Underneath: Strong <strong>and</strong> simple abstraction (“Everything is a file”)<br />

Stephan Schulz 9


“Pragmatic” high level language:<br />

C<br />

– H<strong>and</strong>les characters, numbers, adresses as implemented by most computers<br />

– Small core language, much functionality provided by libraries (mostly in C!)<br />

– Compilers are easy to write<br />

– Compilers are easy to port<br />

– Even naive compilers produce reasonably efficent code<br />

Hacker-friendly<br />

– Straightforward compilation (nothing is hidden)<br />

– Compact source code (fewer keystrokes, fast to read)<br />

– Typed, but no bondage-<strong>and</strong>-discipline language<br />

Adequate support for building abstractions<br />

– Structures (composing objects), unions, enumerations<br />

– Arrays <strong>and</strong> pointer<br />

– Support for defining new types<br />

Stephan Schulz 10


<strong>UNIX</strong> history tree (simplified)<br />

For a fuller tree see http://www.levenez.com/unix/<br />

Stephan Schulz 11


A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />

1969 Ken Thompson wrote the first <strong>UNIX</strong> (in assembler) on a PDP7 at AT&T Bell<br />

Labs, allegedly to play Space Travel<br />

1970 Brian Kernighan coins the name <strong>UNIX</strong>. The <strong>UNIX</strong> project gets a PDP11 <strong>and</strong><br />

a task: Writing a text processing system<br />

1971-72 Creation <strong>of</strong> C (Dennis Ritchie), <strong>UNIX</strong> rewritten in C<br />

1972 Pipes arrive, <strong>UNIX</strong> installed on 10 (!) systems<br />

1975 AT&T <strong>UNIX</strong> “Version 6” distributed with sources under academic licenses<br />

1976 Ken Thompson in Berkely, leading to BSD <strong>UNIX</strong><br />

1977 1BSD release<br />

1978 <strong>UNIX</strong> “Version 7”, leading to System V (AT&T)<br />

Stephan Schulz 12


1978 3BSD, adding virtual memory<br />

1980 Micros<strong>of</strong>t XENIX br<strong>and</strong> <strong>of</strong> <strong>UNIX</strong><br />

1982 4.2BSD, adding TCP/IP<br />

1982 SGI IRIX<br />

A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />

1983 Bjarne Stroustrup creates C++ (at AT&T Bell labs)<br />

1983 GNU Project announced (Aim: Free <strong>UNIX</strong>-like system)<br />

1983-1984 Berkeley Internet Name Demon (BIND) created<br />

1984 SUN introduces NFS (Network File System)<br />

1985 Free S<strong>of</strong>tware Foundation (Stallman), GNU manifesto, GNU Emacs<br />

Stephan Schulz 13


A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />

1986 HP-UX, SunOS3.2 (from BSD Unix), “attack <strong>of</strong> the killer micros”<br />

1986 MIT Project Athena creates X11 (Network window system)<br />

1986 POSIX.1 (Portable operating system interface st<strong>and</strong>ard)<br />

1988 GNU GPL<br />

1988 System VR4 “One <strong>UNIX</strong> to rule them all” (AT&T+SUN)<br />

1988 NeXTCUBE with NeXTSTEP operating system<br />

1989 ANSI-C St<strong>and</strong>ard “C89”(adds prototypes, st<strong>and</strong>ard library)<br />

1889 SunOS 4.0x<br />

1990 Net/1 Release (free BSD <strong>UNIX</strong>)<br />

1990 IBM AIX<br />

Stephan Schulz 14


A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />

1991 Linux 0.01, “attack <strong>of</strong> the killer PCs” (continuing till this day)<br />

1991 World Wide Web born<br />

1991–1992 Lawsuits around BSD <strong>UNIX</strong> Net/1 <strong>and</strong> Net/2 releases<br />

1992 SunOS 5 aka Solaris-2 (from System VR4)<br />

1993 FreeBSD 1.0<br />

1994 Linux 1.0<br />

1994 NetBSD 1.0, 4.4BSD Lite (unencumbered by AT&T copyrights, becomes new<br />

base for all non-commercial BSD flavours)<br />

1995 “<strong>UNIX</strong> wars” are over<br />

1996 Tux the Penguin becomes Linux mascot<br />

Stephan Schulz 15


A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />

1998 <strong>UNIX</strong>-98 br<strong>and</strong>ing (Single <strong>UNIX</strong> specification)<br />

2000 New ANSI “C99”<br />

2001 IBM runs prime time TV ads for Linux<br />

2001 <strong>UNIX</strong>-based MacOS X<br />

2002 Linux is at version 2.4, Emacs is version 21.2, SunOS is at 5.9 (aka Solaris 9),<br />

BIND is version 9.2.1<br />

Stephan Schulz 16


Another Opinion<br />

<strong>UNIX</strong> is not an operating system. . .<br />

. . . but is the collected folklore <strong>of</strong> the<br />

hacker community!<br />

Stephan Schulz 17


Spot the Even Ones<br />

Stephan Schulz 18


Upshot<br />

You don’t have to grow a beard<br />

to become a world-class <strong>UNIX</strong> hacker. . .<br />

. . . but it does seem to help!<br />

Stephan Schulz 19


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>UNIX</strong> from a User’s Perspective<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Shell<br />

<strong>UNIX</strong> Architecture<br />

Libraries<br />

<strong>UNIX</strong> Kernel<br />

Hardware<br />

Application<br />

Stephan Schulz 21


Shell<br />

<strong>UNIX</strong> Architecture<br />

Libraries<br />

<strong>UNIX</strong> Kernel<br />

Hardware<br />

Application<br />

Stephan Schulz 22


Some Concepts<br />

<strong>UNIX</strong> is a multi-user system. Each user has:<br />

– User name (mine is schulz on most machines)<br />

– Numerical user id (e.g. 500)<br />

– Home directory: A place where (most <strong>of</strong>) his or her files are stored<br />

<strong>UNIX</strong> is a multi-tasking system, i.e. it can run multiple programs at once. A<br />

running program (with its data) is called a process. Each process has:<br />

– Owner (a user)<br />

– Working directory (a place in the file system)<br />

– Various resources<br />

A shell is a comm<strong>and</strong> interpreter, i.e. a process accepting <strong>and</strong> executing comm<strong>and</strong>s<br />

from a user.<br />

– A shell is typically owned by the user using it<br />

– The initial working directory <strong>of</strong> a shell is typically the users home directory<br />

(but can be changed by comm<strong>and</strong>s)<br />

Stephan Schulz 23


There are two kinds <strong>of</strong> users:<br />

– Normal users<br />

– Super users (“root”)<br />

Super-users:<br />

More on Users<br />

– Have unlimited access to all files <strong>and</strong> resources<br />

– Always have numerical user id 0<br />

– Normally have user name “root” (but there can be more than one user name<br />

associated with UID 0)<br />

– Can seriously damage the system!<br />

Normal users<br />

– Can only access files if they have the appropriate permissions<br />

– Can belong to one or more groups. Users within a group can share files<br />

– Usually cannot damage the system or other users files!<br />

Stephan Schulz 24


<strong>UNIX</strong>: Provide Tools, Not Policy<br />

The User Interface<br />

– Most tools operate on all (ASCII) file formats<br />

– Extremely configurable environment – different users have different user experiences<br />

– No restrictions ⇔ Little consistency<br />

– We will assume the default environment on the lab machines for examples<br />

X Window System: Provide Mechanisms, Not Policy<br />

– Windowing system <strong>of</strong>fers (networked) drawing primitives<br />

– Different GUIs built on top <strong>of</strong> this<br />

– GUI conventions may even differ from one application to the other!<br />

– Modern desktop environments (GNOME/KDE) try to change this, but you are<br />

bound to use many legacy applications anyways!<br />

Stephan Schulz 25


My Graphical Desktop<br />

Stephan Schulz 26


Default KDE Desktop (SuSE Linux)<br />

Stephan Schulz 27


My Desktop<br />

Desktop Discussion<br />

– Uses windowing mostly to provide a better text-based interface (compared to<br />

pure text terminals)<br />

– Text editor <strong>and</strong> shell (comm<strong>and</strong> line) windows<br />

– (Can also run graphical applications)<br />

KDE Desktop<br />

– Graphical, mouse-based user experience<br />

– Mostly a launcher for GUI-based programs<br />

∗ Office prgrams<br />

∗ Graphics programs<br />

∗ Web browser<br />

– Can also run shell windows!<br />

Stephan Schulz 28


KDE Desktop with Terminal Application<br />

Stephan Schulz 29


Exploring the Text Interface<br />

Convention: System output is shown in typewriter font, user input is written in<br />

bold face, <strong>and</strong> comments (not to be entered) are written in italics.<br />

whoami will print the user name <strong>of</strong> the current user (more exactly: It will print<br />

the first user name associated with the effective user id)<br />

[schulz@gettysburg ∼]$ whoami<br />

schulz<br />

pwd prints the current working directory (more later):<br />

[schulz@gettysburg ∼]$ pwd<br />

/lee/home/graph/schulz Non-st<strong>and</strong>ard setup!<br />

ls lists the files in the current working directory:<br />

[schulz@gettysburg ∼]$ ls<br />

core Desktop Not much there at the moment<br />

Stephan Schulz 30


Text Interface Example (contd.)<br />

Most <strong>UNIX</strong> programs accept options to modify they behavior. One-letter<br />

(“short”) options start with a single dash, followed by a letter:<br />

[schulz@gettysburg ∼]$ ls -a (Show all files, even hidden ones)<br />

. .gnome<br />

.. .ICEauthority<br />

.bash_logout .kde<br />

.bash_pr<strong>of</strong>ile .mcop<br />

.bashrc .MCOP-r<strong>and</strong>om-seed<br />

core .mcoprc<br />

.DCOPserver_hopewell.cs.miami.edu .screenrc<br />

.DCOPserver_potomac.cs.miami.edu .ssh<br />

.DCOPserver_richmond.cs.miami.edu .tcshrc<br />

Desktop .xauth<br />

.emacs .Xauthority<br />

.first_start_kde .xsession-errors<br />

As you can see, hidden files start with a dot.<br />

Stephan Schulz 31


The <strong>UNIX</strong> File System<br />

In <strong>UNIX</strong>, all files are organized in a single directory tree, regardless <strong>of</strong> where they<br />

are stored physically<br />

There are two main types <strong>of</strong> files:<br />

– Plain files (containing data)<br />

– Directories (“folders”), containing both plain files (optionally) <strong>and</strong> other directories<br />

Each file in a directory is identified by its name <strong>and</strong> has a number <strong>of</strong> attributes:<br />

– Name<br />

– Type<br />

– Owner<br />

– Group (each file belongs to one group, even if the owner belongs to multiple<br />

groups)<br />

– Access rights<br />

– Access dates<br />

Stephan Schulz 32


Typical File System Layout<br />

/<br />

(Root directory)<br />

bin dev etc home<br />

tmp usr<br />

(System programs) (Devices) (Configuration) (Home directories) (Temporary files) (User programs)<br />

cp ls ps hda hdb kbd passwd hosts joe jane schulz<br />

(Private files)<br />

core Desktop<br />

Files in the directory trees are described by pathnames<br />

local lib bin<br />

(Site−installed) (Vendor) (Vendor)<br />

lib bin<br />

– Pathnames consist <strong>of</strong> file names, separated by slashes (/)<br />

– Absolute pathnames start with a /. /bin/cp denotes cp<br />

– Relative pathnames are interpreted relative to the current working directory. If<br />

/home is the current working directory, then schulz/core denotes core<br />

Stephan Schulz 33


Moving Through the File System<br />

We can use the comm<strong>and</strong> cd to change our working directory:<br />

[schulz@gettysburg ∼]$ pwd<br />

/lee/home/graph/schulz<br />

cd /<br />

[schulz@gettysburg /]$ pwd<br />

/<br />

[schulz@gettysburg /]$ cd bin<br />

[schulz@gettysburg /bin]$ pwd<br />

/bin<br />

[schulz@gettysburg /bin]$ cd /lee/home/graph/schulz<br />

[schulz@gettysburg ∼]$ pwd<br />

/lee/home/graph/schulz<br />

Each directory contains two special entries: . <strong>and</strong> ..<br />

– . represents the directory itself. cd . is a NOP<br />

– .. normally represents the parent directory. cd .. moves the working directory<br />

up one level. In /, .. points to / itself<br />

Stephan Schulz 34


More about files<br />

We can use the -l (“long format”) option to ls to show us all all attributes<br />

[schulz@gettysburg ∼]$ ls -l<br />

-rw------- 1 schulz users 1531904 Aug 29 10:55 core<br />

drwxr-xr-x 3 schulz users 4096 Aug 29 10:55 Desktop<br />

The long format <strong>of</strong> ls shows us more about the files:<br />

– The first letter tells us the file type. d is a directory, - means a plain file<br />

– The next nine letters describe access rights, i.e. who is allowed to read, write,<br />

<strong>and</strong> execute the file. More on those later!<br />

– The next number is the number <strong>of</strong> (hard) links to a file. More on that much<br />

later!<br />

– Next is the user that owns the file<br />

– After that, the group that owns the file<br />

– Next comes the file size in bytes<br />

– Then the date the file was changed for the last time<br />

– Finally, the name <strong>of</strong> the file<br />

Stephan Schulz 35


<strong>UNIX</strong> Online Documentation 1<br />

The <strong>UNIX</strong> Programmer’s Manual (“man pages”)<br />

– Traditionally available on every <strong>UNIX</strong> system, quite terse<br />

– Usage: man [section] <br />

– Sections (may differ by <strong>UNIX</strong> flavour):<br />

1. User comm<strong>and</strong>s<br />

2. System calls<br />

3. C library routines<br />

4. Device drivers <strong>and</strong> network interfaces<br />

5. File formats<br />

6. Games <strong>and</strong> demos<br />

7. Misc. (ASCII, macro packages, tables, etc)<br />

8. Comm<strong>and</strong>s for system administration<br />

9. Locally installed manual pages. (i.e. X11)<br />

– man -k gives you a list <strong>of</strong> pages relevant to <br />

– To leave the man program (or rather the pager it uses), hit q<br />

Stephan Schulz 36


GNU info files<br />

<strong>UNIX</strong> Online Documentation 2<br />

– Available with most Linux systems <strong>and</strong> most GNU packages<br />

– Usage: info , then browse interactively<br />

– You can also use the info reader build into GNU Emacs<br />

∗ Enter emacs, then type C-h i, then select topic<br />

∗ If you do not use Emacs, you should ;-)<br />

∗ . . . but we will introduce it later on<br />

Stephan Schulz 37


Exercises<br />

Move through the file system using cd. You can inspect most files using<br />

more if they are ASCII text. Try e.g. /etc/passwd <strong>and</strong> /etc/hosts.<br />

Try man man <strong>and</strong> info info<br />

Read the man <strong>and</strong> info documentation for<br />

– ls<br />

– whoami<br />

– cd<br />

– pwd<br />

Don’t worry if you don’t underst<strong>and</strong> everything!<br />

(Do worry if you underst<strong>and</strong> nothing!)<br />

Stephan Schulz 38


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>UNIX</strong> from a User’s Perspective II<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Comm<strong>and</strong> Format<br />

Normal <strong>UNIX</strong> comm<strong>and</strong> format: . . . <br />

– The first word is interpreted as a comm<strong>and</strong><br />

– The remaining words (separated by spaces or blanks) are arguments<br />

– The implementation <strong>of</strong> a comm<strong>and</strong> is free in how it treats the arguments<br />

– Convention: Arguments starting with a dash - are options<br />

Many characters have special meaning in most shells, including $, (, ), [,<br />

], *, &, |, ;, \, , ’, ", ’ ’ (blank, the argument separator)<br />

– Arguments may be enclosed in single quotes (’ ’) or in double quotes (" ")<br />

to suppress most special meanings<br />

∗ Single quotes suppress (nearly) all special meanings<br />

∗ Double quotes suppress most special meanings<br />

∗ In particular, both suppress the meaning <strong>of</strong> blank: A string ’a a’ will appear<br />

as a single argument to a comm<strong>and</strong><br />

∗ Quotes are not passed on to the comm<strong>and</strong>!<br />

– The backslash \ can be used to suppress the special meaning <strong>of</strong> individual<br />

characters. \” represents a double quote, \\ a backslash character<br />

Stephan Schulz 40


Comm<strong>and</strong> Types<br />

There are different types <strong>of</strong> comm<strong>and</strong>s a shell can execute:<br />

Shell built-in comm<strong>and</strong>s are executed directly by the shell<br />

– Examples: cd, pwd, echo, alias<br />

Shell functions are user-defined shell extensions<br />

– Particularly useful in scripting, rare in interactive use<br />

Executable programs (the normal case) are loaded from the disk <strong>and</strong> executed<br />

– Examples: ls, whoami, man<br />

– If a pathname is given, that file is executed (if possible)<br />

– If just a filename is given, bash searches in all directories specified in the<br />

variable $PATH<br />

– Note that neither . nor ∼ are necessarily in $PATH!<br />

Stephan Schulz 41


<strong>UNIX</strong> User Comm<strong>and</strong>s: echo <strong>and</strong> touch<br />

echo . . . prints its arguments to the screen<br />

– echo is <strong>of</strong>ten a shell built-in comm<strong>and</strong>. To guarantee the behavior described<br />

in the man-page, use /bin/echo<br />

– Example:<br />

[schulz@gettysburg ∼]$ echo ”Hello World”<br />

Hello World (simplest ”Hello World” program in <strong>UNIX</strong>)<br />

[schulz@gettysburg ∼]$ echo ’$PATH = ’ $PATH<br />

$PATH = .:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/us<br />

r/java/jdk1.3.1 01/bin:/home/graph/schulz/bin:/usr/X11R6/bin<br />

touch . . . sets the access <strong>and</strong> modification time <strong>of</strong> the given files to<br />

the current time<br />

– If one <strong>of</strong> the files does not exist, touch will create an empty file <strong>of</strong> that name<br />

– Important option:<br />

∗ -c: Do not create non-existing files (long form --no-create is supported by<br />

modern implementations (GNU))<br />

∗ Other options: man touch<br />

Stephan Schulz 42


<strong>UNIX</strong> User Comm<strong>and</strong>s: rm, mkdir, rmdir<br />

rm , . . . will delete the named files<br />

– Important options:<br />

∗ -f: Force removal, never ask the user (even if the user has withdrawn write<br />

permission for that file)<br />

∗ -i: Interactively ask the user about each file to be deleted<br />

∗ -r: If some <strong>of</strong> the files are directories, recursively delete their contents first,<br />

then delete them<br />

mkdir . . . will create the directories named (if the user has the permission<br />

to do so)<br />

rmdir . . . will delete the directories named (if the user has the permission<br />

to do so <strong>and</strong> if they are empty)<br />

Stephan Schulz 43


Effective Shell Use: History<br />

Modern shells like the bash or the tcsh keep a history <strong>of</strong> your previous comm<strong>and</strong>s<br />

– Type history to see these comm<strong>and</strong>s<br />

– Type ! re-execute the comm<strong>and</strong> with the given number<br />

– Type ! to re-execute the most recent comm<strong>and</strong> starting with the<br />

(partial) word <br />

Example:<br />

[schulz@gettysburg ∼]$ history<br />

(. . . many entries omitted)<br />

194 more <strong>CSC322</strong>.tex<br />

195 gv <strong>CSC322</strong> 1.pdf<br />

196 ls<br />

197 ll <strong>CSC322</strong> 1.pdf<br />

198 history<br />

– !197 will execute ll <strong>CSC322</strong> 1.pdf<br />

– !g will execute gv <strong>CSC322</strong> 1.pdf<br />

Stephan Schulz 44


Effective Shell Use: Editing/Completion<br />

While typing comm<strong>and</strong>s, bash <strong>of</strong>fers you many ways to ease your task:<br />

– [Backspace] will delete the character preceding the cursor<br />

– [C-d] (hold down [CTRL], then press [d]) will delete the character under the<br />

cursor (if there is such a character)<br />

– [C-k] will delete all characters under <strong>and</strong> right <strong>of</strong> the cursor<br />

– Left arrow <strong>and</strong> right arrow move the cursor in the comm<strong>and</strong> line (alternatively,<br />

try [C-b] <strong>and</strong> [C-f])<br />

– [C-a] <strong>and</strong> [C-e] move to the begin <strong>and</strong> end <strong>of</strong> the line, respectively<br />

– Up arrow <strong>and</strong> down arrow will move you through the history (as will [C-p] <strong>and</strong><br />

[C-n])!<br />

– In general, default bash key bindings are inspired by emacs editing comm<strong>and</strong>s<br />

One <strong>of</strong> the more intriguing features: Name completion<br />

– At any time, hit [TAB], <strong>and</strong> bash will complete the current word as far<br />

as possible. Hitting [C-d] at the end <strong>of</strong> a non-empty line will list possible<br />

completions<br />

– It is quite smart (configurably smart, in fact) about this<br />

Stephan Schulz 45


Effective Shell Use: Globbing<br />

Idea: Use simple patterns to describe sets <strong>of</strong> filenames<br />

A string is a wildcard pattern if it contains one <strong>of</strong> ?, * or [<br />

A wildcard pattern exp<strong>and</strong>s into all file names matching it<br />

– A normal letter in a pattern matches itself<br />

– A ? in a pattern matches any one letter<br />

– A * in a pattern matches any string<br />

– A pattern [l1. . . ln] matches any one <strong>of</strong> the enclosed letters (exception: ! as<br />

the first letter)<br />

– A pattern [!l1. . . ln] matches any one <strong>of</strong> the characters not in the set<br />

– A leading . in a filename is never matched by anything except an explicit<br />

leading dot<br />

– For more: man 7 glob<br />

Important: Globbing is performed by the shell!<br />

Stephan Schulz 46


Example: File H<strong>and</strong>ling <strong>and</strong> Globbing<br />

$ mkdir TEST DIR<br />

$ cd TEST DIR<br />

$ touch a ba bba bbba bbbba bbbbba LongFilename .LongHiddenFile<br />

$ ls -a<br />

. .. a ba bba bbba bbbba bbbbba LongFilename .LongHiddenFile<br />

$ echo *a* (Everything with an a anywhere)<br />

a ba bba bbba bbbba bbbbba LongFilename<br />

$ echo *Long*<br />

LongFilename (Note: Does not match .LongHiddenFile)<br />

$ echo .* (all hidden files)<br />

. .. .LongHiddenFile<br />

$ echo [ab]*<br />

a ba bba bbba bbbba bbbbba<br />

$ echo *[ae] (everything that ends in a or e)<br />

$ echo ?*[ae] (everything that ends in a or e <strong>and</strong> has at least one more letter)<br />

ba bba bbba bbbba bbbbba LongFilename<br />

Stephan Schulz 47


Example: File H<strong>and</strong>ling <strong>and</strong> Globbing (Contd.)<br />

$ cd ..<br />

$ rmdir TEST DIR<br />

rmdir: ‘TEST DIR’: Directory not empty<br />

$ rm TEST DIR/*<br />

rmdir: ‘TEST DIR’: Directory not empty<br />

$ rmdir TEST DIR<br />

$ rm TEST DIR/.L*<br />

$ rmdir TEST DIR<br />

Alternative:<br />

$ mkdir TEST DIR<br />

$ touch TEST DIR/.HiddenFile<br />

$ rmdir TEST DIR<br />

rmdir: ‘TEST DIR’: Directory not empty<br />

$ rm -r TEST DIR<br />

Stephan Schulz 48


<strong>UNIX</strong> User Comm<strong>and</strong>s: cat/more/less<br />

cat . . . will concatenate the named files <strong>and</strong> print them to st<strong>and</strong>ard<br />

output (by default, your terminal)<br />

– It’s usually just used to display short files ;-)<br />

more <strong>and</strong> less are pagers<br />

– Each will show you a text (e.g. the contents <strong>of</strong> a file given on the comm<strong>and</strong><br />

line) by pages, stopping after each page <strong>and</strong> waiting for a key press (normally<br />

[space])<br />

– Major differences:<br />

∗ more will automatically exit at the end <strong>of</strong> the data, less requires explicit<br />

termination with [q]<br />

∗ less allows you to scroll backwards (using [p]), more only allows scrolling<br />

forward<br />

– For more (or less): man more, man less<br />

Stephan Schulz 49


Text Editing under <strong>UNIX</strong><br />

There are 3 ways to edit text under <strong>UNIX</strong>:<br />

1. The vi way<br />

2. The emacs way<br />

3. The wrong way<br />

vi (the visual editor) is the text editor written by Bill Joy for BSD <strong>UNIX</strong> (published<br />

about 1978)<br />

– Screen-oriented WYSIWYG editor (for plain text)<br />

– Available on just about any <strong>UNIX</strong> system<br />

– About 35% <strong>of</strong> all serious <strong>UNIX</strong> hackers still prefer vi (or a derivative)!<br />

– Current version on Lab machines: vim 5.8.7 (Vi Improved)<br />

emacs (editing macros) started in 1976 as a set <strong>of</strong> TECO macros on ITS<br />

– Currently popular emacs versions (GNU Emacs <strong>and</strong> XEmacs) go back to 1985<br />

GNU Emacs by Stallman. Both basically are a LISP system with a large text<br />

editing library <strong>and</strong> an editor-like user interface<br />

– About 35% <strong>of</strong> all serious <strong>UNIX</strong> hackers use Emacs. Also widespread use on<br />

other operating systems<br />

– emacs on the lab machines is GNU Emacs 20.7.1<br />

Stephan Schulz 50


Getting into it: vi <br />

vi flyby<br />

Modal interface: Normally letters denote editing comm<strong>and</strong>s, only in insert mode<br />

can actual letters be typed into the file<br />

The editor starts in comm<strong>and</strong> mode (see next slide)<br />

Insert mode (shows {-- INSERT --} in bottom line):<br />

Key Effect<br />

[ESC] Go back to comm<strong>and</strong> mode<br />

Any normal key Insert corresponding letter<br />

[Backspace] Delete last typed letter<br />

Tutorials e.g. at http://www.cfm.brown.edu/Unixhelp/vi_.html.<br />

Stephan Schulz 51


vi flyby II<br />

Comm<strong>and</strong> mode (comm<strong>and</strong>s marked (*) change into insert mode):<br />

Key(s) Effect<br />

Cursor keys Move around<br />

:r Insert file content at cursor position<br />

:w Write file<br />

:q Leave vi<br />

:wq Write file <strong>and</strong> leave<br />

:q! Leave vi even if unsafed changes<br />

:h Help!<br />

i Insert text at the cursor position (*)<br />

a Insert text after the cursor position (*)<br />

A Insert text at the end <strong>of</strong> the current line (*)<br />

o Open a new line <strong>and</strong> insert text (*)<br />

j Join two lines into one<br />

x Delete character under cursor<br />

dd Delete current line<br />

. Repeat last comm<strong>and</strong><br />

: Goto line number <br />

Stephan Schulz 52


Emacs for Everyone<br />

Getting into it: emacs or just emacs & (remark: Normally, emacs is only<br />

started once, <strong>and</strong> you visit different files from within the editor. Emacs can work<br />

on many files at once)<br />

Emacs is extremely configurable <strong>and</strong> extendable:<br />

– Special modes support nearly all programming languages<br />

∗ Indentation<br />

∗ Compilation/Error correcting<br />

∗ Debugging<br />

– You can read email <strong>and</strong> USENET news in emacs<br />

– Emacs can be used as a web browser<br />

An Emacs window normally has different sub-regions:<br />

– Menu bar (operate with a mouse, many frequently used comm<strong>and</strong>s)<br />

– One or more text windows, each displaying a buffer (a text editing area)<br />

– One mode line for each text window, displaying various pieces <strong>of</strong> information<br />

– Finally, the mini-buffer for typing complex comm<strong>and</strong>s <strong>and</strong> dialogs<br />

Stephan Schulz 53


Emacs for Everyone II<br />

Stephan Schulz 54


Emacs for Everyone III<br />

Emacs is non-modal, normal keys always insert the corresponding letter<br />

Comm<strong>and</strong>s are typed by using [CRTL] or [ALT] in combination with normal<br />

keys. We write e.g. [C-a] or [M-a] to denote [a] pressed with[CRTL] or [ALT]<br />

(M for meta). [C-h t] is [C-h] followed by plain [t].<br />

Key(s) What it does<br />

[C-h t] Enter the emacs tutorial<br />

[C-x C-c] Leave emacs<br />

Cursor keys Move around<br />

[C-x C-f] Open a new file (*)<br />

[C-x C-s] Save current file<br />

[C-x s] Save all changed files (*)<br />

[M-x] Call arbitrary LISP function by name (*)<br />

[C-s] Incremental search (try it!) (*)<br />

Entries marked with (*) will ask for additional information in the mini-buffer<br />

Stephan Schulz 55


Exercises<br />

Experiment with bash comm<strong>and</strong> line editing <strong>and</strong> history<br />

Create some files <strong>and</strong> play with globbing<br />

Write a short text in both vi <strong>and</strong> emacs<br />

Read the vi <strong>and</strong> emacs tutorials<br />

Note: You are strongly encuraged to learn basics <strong>of</strong> both editors, <strong>and</strong> to become<br />

pr<strong>of</strong>icient in at least one <strong>of</strong> them. I’ll not examinate you about either, but don’t<br />

complain if you have troube with any other editor<br />

Stephan Schulz 56


ed is the st<strong>and</strong>ard text editor<br />

When I log into my Xenix system with my 110 baud teletype, both vi<br />

*<strong>and</strong>* Emacs are just too damn slow. They print useless messages like,<br />

’C-h for help’ <strong>and</strong> ’"foo" File is read only’. So I use the editor<br />

that doesn’t waste my VALUABLE time.<br />

Ed, man! !man ed<br />

ED(1) <strong>UNIX</strong> Programmer’s Manual ED(1)<br />

NAME<br />

ed - text editor<br />

SYNOPSIS<br />

ed [ - ] [ -x ] [ name ]<br />

DESCRIPTION<br />

Ed is the st<strong>and</strong>ard text editor.<br />

- ---<br />

<strong>Computer</strong> Scientists love ed, not just because it comes first<br />

alphabetically, but because it’s the st<strong>and</strong>ard. Everyone else loves ed<br />

because it’s ED!<br />

Stephan Schulz 57


"Ed is the st<strong>and</strong>ard text editor."<br />

And ed doesn’t waste space on my Timex Sinclair. Just look:<br />

- -rwxr-xr-x 1 root 24 Oct 29 1929 /bin/ed<br />

- -rwxr-xr-t 4 root 1310720 Jan 1 1970 /usr/ucb/vi<br />

- -rwxr-xr-x 1 root 5.89824e37 Oct 22 1990 /usr/bin/emacs<br />

Of course, on the system *I* administrate, vi is symlinked to ed.<br />

Emacs has been replaced by a shell script which 1) Generates a syslog<br />

message at level LOG_EMERG; 2) reduces the user’s disk quota by 100K;<br />

<strong>and</strong> 3) RUNS ED!!!!!!<br />

"Ed is the st<strong>and</strong>ard text editor."<br />

Let’s look at a typical novice’s session with the mighty ed:<br />

golem> ed<br />

?<br />

help<br />

?<br />

Stephan Schulz 58


?<br />

?<br />

quit<br />

?<br />

exit<br />

?<br />

bye<br />

?<br />

hello?<br />

?<br />

eat flaming death<br />

?^C<br />

?<br />

^C<br />

?<br />

^D<br />

?<br />

- ---<br />

Note the consistent user interface <strong>and</strong> error reportage. Ed is<br />

generous enough to flag errors, yet prudent enough not to overwhelm<br />

the novice with verbosity.<br />

Stephan Schulz 59


"Ed is the st<strong>and</strong>ard text editor."<br />

Ed, the greatest WYGIWYG editor <strong>of</strong> all.<br />

ED IS THE TRUE PATH TO NIRVANA! ED HAS BEEN THE CHOICE OF EDUCATED<br />

AND IGNORANT ALIKE FOR CENTURIES! ED WILL NOT CORRUPT YOUR PRECIOUS<br />

BODILY FLUIDS!! ED IS THE STANDARD TEXT EDITOR! ED MAKES THE SUN<br />

SHINE AND THE BIRDS SING AND THE GRASS GREEN!!<br />

When I use an editor, I don’t want eight extra KILOBYTES <strong>of</strong> worthless<br />

help screens <strong>and</strong> cursor positioning code! I just want an EDitor!!<br />

Not a "viitor". Not a "emacsitor". Those aren’t even WORDS!!!! ED!<br />

ED! ED IS THE STANDARD!!!<br />

TEXT EDITOR.<br />

When IBM, in its ever-present omnipotence, needed to base their<br />

"edlin" on a <strong>UNIX</strong> st<strong>and</strong>ard, did they mimic vi? No. Emacs? Surely<br />

you jest. They chose the most karmic editor <strong>of</strong> all. The st<strong>and</strong>ard.<br />

Ed is for those who can *remember* what they are working on. If you<br />

are an idiot, you should use Emacs. If you are an Emacs, you should<br />

not be vi. If you use ED, you are on THE PATH TO REDEMPTION. THE<br />

Stephan Schulz 60


SO-CALLED "VISUAL" EDITORS HAVE BEEN PLACED HERE BY ED TO TEMPT THE<br />

FAITHLESS. DO NOT GIVE IN!!! THE MIGHTY ED HAS SPOKEN!!!<br />

?<br />

Stephan Schulz 61


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>UNIX</strong> from a User’s Perspective<br />

The Goodies<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Usage: grep . . .<br />

<strong>UNIX</strong> User Comm<strong>and</strong>s: grep<br />

– grep will scan the input file(s) <strong>and</strong> print all lines containing a string that<br />

matches the regular expression <br />

– Important options:<br />

∗ -i: Ignore upper <strong>and</strong> lower case in the regular expression<br />

∗ -v: Print all lines not matching the regular expression<br />

– The name comes from an old editor comm<strong>and</strong> sequence st<strong>and</strong>ing for globally<br />

search for regular expression, print matches<br />

– It is one <strong>of</strong> the most useful <strong>UNIX</strong> tools!<br />

Regular expressions (much more by man grep):<br />

– A normal character matches itself<br />

– A . matches any normal character<br />

– A * after a pattern matches any number <strong>of</strong> repetitions<br />

– A range [...] works as for globbing (but use ^ instead <strong>of</strong> ! for negation)<br />

– Remember that many character are special for the shell – use quotes!<br />

– Example: grep ”Ste.*ulz” will find many versions <strong>of</strong> my full name in<br />

<br />

Stephan Schulz 63


Input <strong>and</strong> Output<br />

Each <strong>UNIX</strong> process normally is created with 3 input/output streams:<br />

– St<strong>and</strong>ard Input or stdin (file descriptor 0) is used for normal input. Many<br />

<strong>UNIX</strong> programs will read from stdin, if no file names are given<br />

– St<strong>and</strong>ard Output or stdout (file descriptor 1) is used for all normal output<br />

– St<strong>and</strong>ard Error or stderr (file descriptor 2) is used for out <strong>of</strong> b<strong>and</strong> output like<br />

warnings or error messages<br />

By default, all three are connected to your terminal<br />

It is possible to redirect the output streams into files<br />

It is possible to make stdin read from a file<br />

It is possible to connect one processes stdout to another ones stdin<br />

Stephan Schulz 64


Simple Output Redirection<br />

You can redirect the normal output <strong>of</strong> a comm<strong>and</strong> by appending > to<br />

the comm<strong>and</strong>.<br />

– Example 1:<br />

$ man stdin > stdin man page<br />

$ more stdin man page<br />

STDIN(3) System Library Functions Manual STDIN(3)<br />

NAME<br />

stdin, stdout, stderr - st<strong>and</strong>ard I/O streams<br />

...<br />

– Example 2: On the lab machines, the global password file is served over the<br />

NIS (or Yellow Pages) protocol, <strong>and</strong> is shown by the comm<strong>and</strong> ypcat passwd.<br />

ypcat passwd > my passwd gives you a private copy for password “quality<br />

checking”<br />

– Example 3: cat > myfile.c can replace a text editor (hit [C-d] on a line <strong>of</strong> its<br />

own to indicate the end <strong>of</strong> input)<br />

Stephan Schulz 65


Output Redirection II<br />

By default, stderr is not redirected, so you can still see error messages on the<br />

terminal (<strong>and</strong> discard the normal output if it is useless)<br />

To redirect stderr, use 2> (redirect file descriptor 2):<br />

– $ man bla will print No manual entry for bla<br />

– $ man bla 2> error will save that error message in the file error<br />

Special case: If you are not interested in any output, you can redirect it to<br />

/dev/null (a <strong>UNIX</strong> device file that just accepts data <strong>and</strong> ignores it):<br />

– $ man bla 2> /dev/null will make sure that you do not see the error message<br />

– Alternatively, $ man if bla > /dev/null will give you just the error message<br />

(even though man also prints the man page for the shell-built-in if)<br />

Stephan Schulz 66


Input Redirection<br />

You can also redirect the stdin file descriptor to read from a file<br />

– Append < to the comm<strong>and</strong><br />

– This is e.g. useful if you use an interactive program always for the same task<br />

(i.e. you always type the same data into the file)<br />

– Some <strong>UNIX</strong> comm<strong>and</strong>s only accept input on stdin (e.g. the tr utility)<br />

Example: cat < file is equivalent to cat file! (Why?)<br />

Stephan Schulz 67


Shell Pipes<br />

Pipes are a general tool for inter-process communication (IPC)<br />

The shell allows us to easily set up pipes connecting stdout <strong>of</strong> one process to<br />

stdin <strong>of</strong> another<br />

Example: man bash | cat will print the bash man page without using the pager<br />

– This can be chained: man bash| grep -i redir | grep -i input will print just<br />

the lines containing information about input redirection<br />

– ypcat passwd | grep schulz will give you just my entry in the password file<br />

Stephan Schulz 68


Basic Process Control<br />

You can start processes in the foreground or in the background<br />

– Foreground processes are started by just typing a normal comm<strong>and</strong><br />

– Background processes are started by appending an ampers<strong>and</strong> (&) to the<br />

comm<strong>and</strong>. This is particularly useful for graphical applications: emacs &<br />

– While a foreground process is running, the shell is blocked because the process<br />

is using the terminal as its stdin (i.e. you can have at most one non-suspended<br />

foreground process)<br />

– (Most) foreground processes can be terminated by hitting [C-c] (<strong>of</strong>ten written<br />

as ^C).<br />

– (Most) foreground processes can be suspended by hitting [C-z]<br />

– A suspended process can be continued by typing fg (to continue it as a<br />

foreground process) or bg (to let it run in the background)<br />

– A background process will be suspended automatically, if it needs to read data<br />

from stdin<br />

– jobs gives a numbered list <strong>of</strong> all processes started by the shell<br />

– You can use fg % to take a particular process into the foreground (bg<br />

% works on the same principle)<br />

– You can use kill % to terminate the named job<br />

Stephan Schulz 69


Usage: yes [arg]<br />

<strong>UNIX</strong> User Comm<strong>and</strong>s: Yes<br />

If no argument is given, yes will print an infinite sequence <strong>of</strong> lines containing just<br />

the character y<br />

If an argument is given, yes will print an infinite sequence <strong>of</strong> lines containing that<br />

argument<br />

Very little more is available by printing man yes<br />

Stephan Schulz 70


Job Control Example<br />

$ emacs & (Start emacs in the background – it opens its own window)<br />

$ yes Hello (Start yes in the foreground)<br />

Hello<br />

Hello<br />

Hello<br />

...<br />

^C (Enough <strong>of</strong> that)<br />

$ jobs<br />

[1] Running emacs (Just my emacs)<br />

$ yes Hi (I can never get enough)<br />

Hi<br />

Hi<br />

...<br />

^Z (Suspend it)<br />

Suspended (Indeed!)<br />

$ jobs<br />

[1] Running emacs<br />

[2] + Suspended yes<br />

$ kill %1 (Ooops! Emacs window closes)<br />

Stephan Schulz 71


Notice: Lab Hours<br />

At the moment, a TA for <strong>CSC322</strong> is in the lab Friday 4-6pm <strong>and</strong> Sunday 2-6pm.<br />

Stephan Schulz 72


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C - Basics<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


A C program is a collection <strong>of</strong><br />

– Declarations<br />

– Definitons<br />

for<br />

– Functions<br />

– Variables<br />

– Datatypes<br />

A Bird’s Eye View <strong>of</strong> C<br />

A program may be spread over multiples files<br />

A program file may contain preprocessor directives that<br />

– Include other files<br />

– Introduce <strong>and</strong> exp<strong>and</strong> macro definitions<br />

– Conditionally select certain parts <strong>of</strong> the source code for compilation<br />

Stephan Schulz 74


Consider the following C program<br />

#include <br />

#include <br />

int main(void)<br />

{<br />

printf("Hello World!\n");<br />

return EXIT SUCCESS;<br />

}<br />

A First C Program<br />

Assume that it is stored in a file called hello.c in the current working directory.<br />

Then:<br />

$ gcc -o hello hello.c<br />

(Note: Compiles without warning or error)<br />

$ ./hello<br />

Hello World!<br />

Stephan Schulz 75


#include <br />

#include <br />

int main(void)<br />

{<br />

printf("Hello World!\n");<br />

return EXIT SUCCESS;<br />

}<br />

A Closer Look (1)<br />

We are including two header files from the st<strong>and</strong>ard library<br />

– stdio.h contains declarations for buffered, stream-based input <strong>and</strong> output<br />

(we include it for the declaration <strong>of</strong> printf)<br />

– stdlib.h contains declarations for many odds <strong>and</strong> ends from the st<strong>and</strong>ard<br />

library (it gives us EXIT SUCCESS)<br />

– In general, preprocessor directives start with a hash #<br />

Stephan Schulz 76


#include <br />

#include <br />

int main(void)<br />

{<br />

printf("Hello World!\n");<br />

return EXIT SUCCESS;<br />

}<br />

A Closer Look (2)<br />

The program consist <strong>of</strong> one function named main()<br />

– main() returns a int (integer value) to its calling environment<br />

– In this case, it takes no arguments (its argument list is void)<br />

– In general, any C program is started by a call to its main() function, <strong>and</strong><br />

terminates if main() returns<br />

Stephan Schulz 77


#include <br />

#include <br />

int main(void)<br />

{<br />

printf("Hello World!\n");<br />

return EXIT SUCCESS;<br />

}<br />

A Closer Look (3)<br />

The function body contains two statements:<br />

– A call to the st<strong>and</strong>ard library function printf() with the argument ”Hello<br />

World!\n” (a string ending with a newline character)<br />

– A return statement, returning the value <strong>of</strong> the symbol EXIT SUCCESS to the<br />

caller <strong>of</strong> main()<br />

Stephan Schulz 78


A Closer Look (4)<br />

gcc is the GNU C compiler, the st<strong>and</strong>ard compiler on most free <strong>UNIX</strong> system<br />

(<strong>and</strong> <strong>of</strong>ten the preferred compiler on many other systems)<br />

– On traditional systems, the compiler is normally called cc<br />

gcc takes care <strong>of</strong> all stages <strong>of</strong> compiling:<br />

– Preprocessing<br />

– Compiling<br />

– Linking<br />

It automagically recognizes what to do (by looking at the file name suffix)<br />

Important options:<br />

– -o : Give the name <strong>of</strong> the output file<br />

– -ansi: Compile strict ANSI-89 C only<br />

– -Wall: Warn about all dubious lines<br />

– -c: Don’t perform linking, just generate a (linkable) object file<br />

– -O – -O6: Use increasing levels <strong>of</strong> optimization to generate faster executables<br />

Stephan Schulz 79


A More Advanced Example<br />

/* A program that prints a Fahrenheit -> Celsius conversion table */<br />

#include <br />

#include <br />

int main(void)<br />

{<br />

int fahrenheit, celsius;<br />

}<br />

printf("Fahrenheit -> Celsius\n\n");<br />

fahrenheit = 0;<br />

while(fahrenheit


The Fahrenheit-Celsius Example<br />

Compilation:<br />

$ gcc -ansi -Wall -W -o celsius fahrenheit celsius fahrenheit.c<br />

Running it:<br />

$ ./celsius fahrenheit | more<br />

Fahrenheit -> Celsius<br />

0 -17<br />

10 -12<br />

20 -6<br />

30 -1<br />

40 4<br />

50 10<br />

60 15<br />

70 21<br />

80 26<br />

90 32<br />

100 37<br />

--More--<br />

Stephan Schulz 81


Comments<br />

Comments in C are enclosed in /* <strong>and</strong> */<br />

Comments can contain any sequence <strong>of</strong> characters except for */ (although your<br />

compiler may complain if it hits a second occurence <strong>of</strong> /* in a comment)<br />

Comments can span multiple lines<br />

In assignments (<strong>and</strong> in live) use comments wisely<br />

– Do explain important ideas, like i.e. what a function or program does<br />

– Do explain clever tricks (if needed)<br />

– Do not repeat things obvious from the program code anyways<br />

Bad commenting will affect grading!<br />

Stephan Schulz 82


Variables<br />

“int fahrenheit, celsius;” declares two variables <strong>of</strong> type int that can store<br />

a signed integer value from a finite range<br />

– By intention, int is the fastest datatype available on any given C implementation<br />

– On most modern <strong>UNIX</strong> systems, int is a 32 bit type <strong>and</strong> interpreted in 2s<br />

complement, giving a range from -2 147 483 648 — 2 147 483 647. This is<br />

not part <strong>of</strong> the C language definition, though!<br />

In general, a variable in a program corresponds to a memory location <strong>and</strong> can<br />

store a value <strong>of</strong> a specific type<br />

– All variables must be declared, before they can be used<br />

– Variables can be local to a function (like the variables we have used so far),<br />

local to a single source file, or global to the hole program<br />

A variables value is changed by an assignment, an expression <strong>of</strong> the form<br />

“var = expression;”<br />

Stephan Schulz 83


(Arithmetic) Expressions<br />

C supports various arithmetic operators, including the usual +, - ,* , /<br />

– Subexpressions can be grouped using parenthenses<br />

– Normal arithmetic operations can be used on both integer <strong>and</strong> floating point<br />

values, with the type <strong>of</strong> the arguments determining the type <strong>of</strong> the result<br />

– Example: (fahrenheit-32)*5/9 is an arithmetic expression in C, implemeting<br />

the well-known formula C = 5<br />

9 (F − 32) for converting Fahrenheit to Celsius<br />

∗ Since all arguments are integer, all intermediate results are also integer (as<br />

well as the final result)<br />

∗ Therefore we have to multiply with 5 first, then divide by nine (multiplying<br />

with (5/9) would effectively multiply with 0)<br />

Bit-wise, logical <strong>and</strong> operator comparison operators also normally also return<br />

numeric values<br />

Possible oper<strong>and</strong>s include variables, numerical (<strong>and</strong> other) constants, <strong>and</strong> function<br />

calls<br />

Note: In C, any normal statement is an expression <strong>and</strong> has a value, including the<br />

assignment!<br />

Stephan Schulz 84


A while-loop has the form<br />

while()<br />

<br />

Simple Loops<br />

where either can be a single statement, terminated by a semicolon ’;’,<br />

or a statement block, included in curly braces ’{}’<br />

It operates as follows:<br />

– At the beginning <strong>of</strong> the loop, the controlling expression is evaluated<br />

– If it evaluates to a non-zero value, the loop body is executed once, <strong>and</strong> control<br />

returns to the while<br />

– If it evaluates to 0, the body is skipped, <strong>and</strong> the program continues on the<br />

next statement after the loop<br />

Note: The body can also be empty (but this is usually a programming bug)<br />

Stephan Schulz 85


Formatted Output<br />

printf() is a function for formatted output<br />

It has at least one argument (the format string), but may have an arbitrary<br />

number <strong>of</strong> arguments<br />

– The control string may contain various placeholders, beginning with the<br />

character %, followed by (optional) formatting instructions, <strong>and</strong> a letter (or<br />

letter combination) indicating the desired output format<br />

– Each placeholder corresponds to exactly one <strong>of</strong> the additional arguments<br />

(modern compilers will complain, if the arguments <strong>and</strong> the control string do<br />

not match)<br />

– In particular, %3d requests the output <strong>of</strong> a normal int in decimal representation,<br />

<strong>and</strong> with a width <strong>of</strong> atleast 3 characters<br />

Note: printf() is not part <strong>of</strong> the C language proper, but <strong>of</strong> the (st<strong>and</strong>ardized)<br />

C library<br />

Stephan Schulz 86


<strong>UNIX</strong> User Comm<strong>and</strong>s: cp <strong>and</strong> mv<br />

cp will either copy one file to another, or it will copy any number <strong>of</strong> files into a<br />

directory<br />

– Usage: cp <br />

Copy to <br />

– Usage: cp . . . <br />

Copy the named files into <br />

mv will likewise move files<br />

– Usage: mv <br />

Move to <br />

– Usage: mv . . . <br />

Move the named files into <br />

Warning: Unless used with option -i, both comm<strong>and</strong>s will happily overwrite<br />

existing files!<br />

Again, a more complete description is available by man!<br />

Stephan Schulz 87


Write the following two C programs:<br />

Assignment(also see Website)<br />

– celsius2fahrenheit should print a conversion table from Celsius to Fahrenheit,<br />

from -50 to +150 degrees Celsius, in steps <strong>of</strong> 5 degrees<br />

– imp metric should print two tables side by side (equivalent to a 4-column)<br />

table, one for the conversion from yards into meters, the other one for the<br />

conversion from km into miles. The output should use int values, but you<br />

can use the floating point conversion factors <strong>of</strong> 0.9144 (from yards to meters)<br />

<strong>and</strong> 1.609344 from mile to km. Try to make the program round correctly!<br />

Sample Output:<br />

Yards Meters Km Miles<br />

0 0 1 1<br />

10 9 2 1<br />

20 18 3 2<br />

30 27 4 2<br />

40 37 5 3<br />

...<br />

100 91 11 7<br />

Stephan Schulz 88


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

Extended Introduction<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Statements, Blocks, <strong>and</strong> Expressions<br />

C programs are mainly composed <strong>of</strong> statements<br />

In C, a statement is either:<br />

– An expression, followed by a semicolon ’;’ (as a special case, an empty expression<br />

is also allowed, i.e. a single semicolon represents the empty statement)<br />

– A flow-control statement (if, while, goto,break. . . )<br />

– A block <strong>of</strong> statements (or compound statement), enclosed in curly braces ’{}’.<br />

A compound statement can also include new variable declarations.<br />

Note: The following is actually legal C (although a good compiler will warn you<br />

that some <strong>of</strong> your statements have no effect):<br />

{<br />

}<br />

int a;<br />

10+20;<br />

10*(a=printf("Hello\n"));<br />

Stephan Schulz 90


Flow-Control: if<br />

The primary means for conditional execution in C is the if statement:<br />

if()<br />

<br />

– If the expression evalutes to a non-zero (“true”) value, then the statement will<br />

be executed<br />

– can also be a block <strong>of</strong> statements – in fact, it quite <strong>of</strong>ten is<br />

good style to always use a block, even if it contains only a single statement<br />

An if statement can also have a branch that is taken if the expression is zero<br />

(“false”):<br />

if()<br />

<br />

else<br />

<br />

Stephan Schulz 91


Flow-Control: while<br />

C supports different structured loop constructs, including a st<strong>and</strong>ard while-loop<br />

(see also example from last lesson):<br />

while()<br />

<br />

When control reaches the while at the top <strong>of</strong> the loop, the expression is tested<br />

– If it evaluates to true (non-zero), the body <strong>of</strong> the loop is executed <strong>and</strong> control<br />

returns to the while<br />

– If it evaluates to false (i.e. zero), control directly goes to the statement<br />

following the body <strong>of</strong> the loop<br />

Note: An empty loop body is possible (<strong>and</strong> sometimes useful)<br />

Again: In many cases it is recommended to use a block even if it contains only<br />

one statement (or even no statements)<br />

Stephan Schulz 92


Flow-Control: for<br />

The for-loop in C is a construct that combines initialization, test, <strong>and</strong> update <strong>of</strong><br />

loop variables in one place:<br />

for(; ; )<br />

<br />

– Before the loop is entered, is evaluated<br />

– Before each loop iteration, is evaluated<br />

∗ If it is true, the body is executed, then is evaluated <strong>and</strong> control<br />

returns to the top <strong>of</strong> the loop<br />

∗ If it is false, control goes to the first statement after the body<br />

∗ In the typical case, both <strong>and</strong> are assignments to the same<br />

variable, while tests some property depending on that variable<br />

Stephan Schulz 93


Example<br />

Here is the Fahrenheit/Celsius conversation using for:<br />

/* A program that prints a Fahrenheit -> Celsius conversion table */<br />

#include <br />

#include <br />

int main(void)<br />

{<br />

int fahrenheit, celsius;<br />

}<br />

printf("Fahrenheit -> Celsius\n\n");<br />

for(fahrenheit=0; fahrenheit


for vs. while<br />

Note that for is more general than while:<br />

while() <strong>and</strong> for(;;)<br />

<br />

are equivalent.<br />

Alternatively, you can achieve the functionality <strong>of</strong> for using just while (how?)<br />

The preference for one or the other <strong>of</strong>ten is a matter <strong>of</strong> personal choice<br />

– If there are clear initialization <strong>and</strong> update steps, for is <strong>of</strong>ten more convenient<br />

– In other cases, while is more natural<br />

Stephan Schulz 95


Variable names:<br />

Variable Declarations<br />

– A valid variable name starts with a letter or underscore ( ), <strong>and</strong> may contain<br />

any sequence <strong>of</strong> letters, underscores, <strong>and</strong> digits<br />

– Capitalization is significant – a variable is different from A Variable<br />

– In addition to the language keywords, certain other names are reserved (by the<br />

st<strong>and</strong>ard library or by the implementation). In particular, avoid using names<br />

that start with an underscore!<br />

Variable declarations:<br />

– A (simple) variable declaration has the form ;, where<br />

is a type identifier (e.g. int), <strong>and</strong> is a coma-separated list<br />

<strong>of</strong> variable names<br />

– In ANSI-89 C, variables can only be declared outside any blocks or directly<br />

after an open curly brace. The new st<strong>and</strong>ard relaxes this requirement<br />

– A variable declared in a block is (normally) visible just inside that block<br />

Stephan Schulz 96


Types: Integers <strong>and</strong> Characters<br />

C has a large number <strong>of</strong> integer data types:<br />

– It <strong>of</strong>fers char, short, int, long <strong>and</strong> (since the last language revision) long<br />

long types, all <strong>of</strong> which may represent integers from different ranges<br />

– Note that in particular char is an integer data type, i.e. characters are<br />

represented by their numerical encoding in the character set (normally ASCII)<br />

– Any <strong>of</strong> those can be signed or unsigned, i.e. capable <strong>of</strong> representing positive<br />

numbers only or both negative <strong>and</strong> positive numbers<br />

– All types can be freely mixed in expressions<br />

– Unsigned types always follow the rules <strong>of</strong> arithmetic modulo 2 n , where n is<br />

the width (in bits) <strong>of</strong> their representation (i.e. values greater than 2 n − 1 are<br />

reduced by subtracting 2 n until the result is in the range 0 − 2 n − 1)<br />

Integer constants are normally type int if they can be represented by that data<br />

type<br />

– 123 is int<br />

– 316L is long<br />

– 922U is unsigned int<br />

Stephan Schulz 97


Integer Type Table<br />

Type Alias Signed/Unsigned? Width(*)<br />

char Implementation 8 bit<br />

signed char Signed 8 bit<br />

unsigned char Unsigned 8 bit<br />

short int short Signed 16 bit<br />

signed short int short Signed 16 bit<br />

unsigned short int unsigned short Unsigned 16 bit<br />

int Signed 32 bit<br />

signed int int Signed 32 bit<br />

unsigned int unsigned Unsigned 32 bit<br />

long int long Signed 32 bit<br />

signed long int long Signed 32 bit<br />

unsigned long int unsigned long Unsigned 32 bit<br />

long long int long long Signed 64 bit<br />

signed long long int long long Signed 64 bit<br />

unsigned long long int unsigned long long Unsigned 64 bit<br />

Note (*): Width is not defined by the language st<strong>and</strong>ard but reflects currently<br />

common implementation choices!<br />

Stephan Schulz 98


Simple Character I/O<br />

The C library defines the three I/O streams stdin, stdout, <strong>and</strong> stderr, <strong>and</strong><br />

guarantees that they are open for reading or writing, respectively<br />

Reading characters from stdin: int getchar(void)<br />

– getchar() returns the numerical (ASCII) value <strong>of</strong> the next character in the<br />

stdin input stream<br />

– If there are no more characters available, getchar() returns the special value<br />

EOF that is guaranteed different from any normal character (that is why it<br />

returns int rather than char<br />

Printing characters to stdout: int putchar(int)<br />

– putchar(c) prints the character c on the stdout steam<br />

– (It returns that character, or EOF on failure)<br />

getchar(), putchar(), <strong>and</strong> EOF are declared in <br />

Stephan Schulz 99


#include <br />

#include <br />

int main(void)<br />

{<br />

int c;<br />

}<br />

c=getchar();<br />

while(c!=EOF)<br />

{<br />

putchar(c);<br />

c=getchar();<br />

}<br />

return EXIT_SUCCESS;<br />

Example: File Copying<br />

Copies stdin to stdout – to make a a file copy, use<br />

cat file | ourcopy > newfile<br />

Introduces != (“not equal”) relational operator<br />

Stephan Schulz 100


#include <br />

#include <br />

int main(void)<br />

{<br />

int c;<br />

}<br />

Example: File Copying (idiomatic)<br />

while((c=getchar())!=EOF)<br />

{<br />

putchar(c);<br />

}<br />

return EXIT_SUCCESS;<br />

Remember: Variable assignments have a value!<br />

Improvement: No duplication <strong>of</strong> call to getchar()<br />

Stephan Schulz 101


#include <br />

#include <br />

int main(void)<br />

{<br />

int c;<br />

long count = 0;<br />

}<br />

Example: Character Counting<br />

while((c=getchar())!=EOF)<br />

{<br />

count++;<br />

}<br />

printf("Number <strong>of</strong> characters: %ld\n", count);<br />

return EXIT_SUCCESS;<br />

New idiom: ++ operator (increases value <strong>of</strong> variable by 1)<br />

Test: $ man cat | charcount<br />

1091<br />

Stephan Schulz 102


Exercises<br />

Write a programm that continually increases the value <strong>of</strong> a short <strong>and</strong> a<br />

unsigned short variable <strong>and</strong> prints both (you can use printf("%6d %6u",<br />

var1, var2); to print them). What happens if you run the programm for some<br />

time? You can pipe the output into less <strong>and</strong> search for interesting things (man<br />

less to learn how!). Remember that [C-c] will terminate most programs under<br />

<strong>UNIX</strong>!<br />

Write a program that counts lines in the input. Hint: The st<strong>and</strong>ard library makes<br />

sure that any line in the input ends with a newline (’\n’)<br />

Write a program that computes the factorial <strong>of</strong> a number (given as a constant in<br />

the program). Test it for values <strong>of</strong> 3, 18, <strong>and</strong> 55. Any observations?<br />

Stephan Schulz 103


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

More on Expressions<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Nomination: Most Useless Use <strong>of</strong> cat Award<br />

If ourcopy is a program that just copies stdin to stdout, then<br />

– cat file | ourcopy > newfile will indeed copy file to newfile<br />

– So will ourcopy < file > newfile<br />

– (So will cat < file > newfile)<br />

Stephan Schulz 105


Usage: wc ...<br />

<strong>UNIX</strong> User Comm<strong>and</strong>s: wc<br />

wc counts the bytes, words <strong>and</strong> lines in each file specified (or in stdin if none is<br />

given) <strong>and</strong> print the results, including the total for all input files.<br />

Important options:<br />

– -c: Print just the byte count<br />

– -w: Print just the word count<br />

– -l: Print just the line count<br />

Example:<br />

$ wc < wordcount.c<br />

24 53 369<br />

Notice: The program does not print unnecessary headers or footers. The output<br />

can easily be interpreted by other programs!<br />

More: man wc<br />

Stephan Schulz 106


Example: Word Counting<br />

/* Count words */<br />

#include <br />

#include <br />

int main(void)<br />

{<br />

int c, in_word=0;<br />

long words = 0;<br />

while((c=getchar())!=EOF)<br />

{<br />

if(c == ’ ’ || c == ’\n’ || c == ’\t’)<br />

{<br />

in_word = 0;<br />

}<br />

else if(!in_word)<br />

{<br />

in_word = 1;<br />

words++;<br />

}<br />

}<br />

printf("%ld words counted\n", words);<br />

return EXIT_SUCCESS;<br />

}<br />

Stephan Schulz 107


In C, characters are just small integers<br />

Character Constants<br />

We can write character constants symbolically, by enclosing them in single quotes:<br />

– ’a’ is the numerical value <strong>of</strong> a lower case a in the character encoding (97 for<br />

ASCII)<br />

– ’A’ is upper case A (65 for ASCII)<br />

– These values can be assigned to any integer variable!<br />

You can use escape sequences (starting with \) for non-printable characters:<br />

– \t is the tabulator character (HT), ASCII 9<br />

– \n is the newline character (LF), ASCII 10 (<strong>and</strong> used by C to mark the end <strong>of</strong><br />

the current line)<br />

– \a is the BEL character, printing it will normally make the terminal beep<br />

– \0 is the NUL character, ASCII 0, <strong>and</strong> used by C to mark the end <strong>of</strong> a string<br />

– \\ is the backslash itself, ASCII 92<br />

You can get all C escape sequences (<strong>and</strong> more) via man ASCII<br />

Stephan Schulz 108


Another View at Expressions<br />

Expressions are build from operators <strong>and</strong> oper<strong>and</strong>s, with parentheses for grouping<br />

– Most operators are unary (taking one oper<strong>and</strong>) or binary (taking two)<br />

– Oper<strong>and</strong>s can be<br />

∗ (Sub-)Expressions<br />

∗ Constants<br />

∗ Function calls<br />

– In C, binary operators are written in infix, i.e. between the oper<strong>and</strong>s: 10+15<br />

– Unary operators are written either in prefix or postfix (some can even be<br />

written either way, with slightly different effects)<br />

All operators have a precedence, defining how to interprete operations with<br />

multiple operators<br />

– In the absence <strong>of</strong> parentheses, operators with a higher precedence bind tighter<br />

than those with a lower precedence: 10+3*4 == 22 is true, 10+4*3==42 is<br />

false<br />

– In doubt, we can always fully parenthesize expressions: 10+3*4 == 10+(3*4),<br />

but (10+4)*3==42<br />

Stephan Schulz 109


Expressions: Associativity <strong>of</strong> Binary Operators<br />

Binary operators have an additional property: Associativity<br />

– Example: 25+12+11 can be interpreted as (25+12)+11 or as 25+(12+11)<br />

Stephan Schulz 110


Expressions: Associativity <strong>of</strong> Binary Operators<br />

Binary operators have an additional property: Associativity<br />

– Example: 25+12+11 can be interpreted as (25+12)+11 or as 25+(12+11)<br />

– Worse: What about 25-12-11?<br />

By convention, st<strong>and</strong>ard arithmetic operators are left-associative, i.e. the bind to<br />

the left<br />

– Thus: 25-12-11 == (25-12)-11 has the value 2<br />

We will note associativity for many operators specifically, but unless otherwise<br />

noted, it’s probably left-associative!<br />

Stephan Schulz 111


Expressions: Relational Operators<br />

Relational operators take two arguments <strong>and</strong> return a truth value (0 or 1)<br />

We already have seen the equational operators. They apply to all basic data<br />

types <strong>and</strong> pointers:<br />

– a == b (equal) evaluates to 1 if the two arguments have the same value,<br />

otherwise it evaluates to 0<br />

– a != b evaluates to 1 if the two arguments have different values<br />

– Note: a == b == c is evaluated as (a == b) == c, i.e. it compares c to<br />

either 0 or 1!<br />

We can also compare the same types using the greater/lesser relations:<br />

– > evaluates to 1, if the first argument is greater than the second one<br />

– < evaluates to 1, if the second argument is greater than the first one<br />

– a >= b evaluates to 1, if either a > b == 1 or (a == b) ==1<br />

– a


Expressions: Logical Operators<br />

Logical operators operate on truth values, i.e. all non-zero values are treated the<br />

same way (representing true)<br />

The binary logical operators are || <strong>and</strong> &&<br />

– a||b evaluates to 1, if at least one <strong>of</strong> a or b is non-zero (otherwise it evaluates<br />

to 0)<br />

– a&&b evaluates to 1, if both a <strong>and</strong> b are non-zero<br />

– Both || <strong>and</strong> && are evaluated left-to-right, <strong>and</strong> the evaluation stops as soon<br />

as we can be sure <strong>of</strong> the result (short-circuit evaluation)<br />

∗ Example: If a!=b, then (a==b)&&c will not evaluate c<br />

∗ Similarly: (a==0 || 10/a >= 1) will never divide by zero!<br />

! is the (unary) logical negation operator, !a evaluates to 1, if a has the value<br />

0, it evaluates to 0 in all other cases<br />

Precedence rules:<br />

– The binary logical operators have lower precedence than the relational ones<br />

– || has lower precedence than &&<br />

– ! has a higher precedence than even arithmetic operators<br />

Stephan Schulz 113


Expressions: Assignments<br />

The assignment operator is = (a single equal sign)<br />

– a = b is an expression with the value <strong>of</strong> b<br />

– As a side effect, it will change the value <strong>of</strong> a to that same value<br />

The expression on the left h<strong>and</strong> side <strong>of</strong> an assignment (a) has to be an lvalue,<br />

i.e. something we can assing to. Legal lvalues are<br />

– Variables<br />

– Dereferenced pointers (“memory locations”)<br />

– Elements in a struct, union, or array<br />

The assignment operator is right-associative (so you can write<br />

a = b = c = d = 0; to set all for variables to zero)<br />

The assignment operator has extremely low precedence (lower than all other<br />

operators we have covered up to now)<br />

Stephan Schulz 114


Floating Point Numbers<br />

C supports three types <strong>of</strong> floating point numbers, float, double, <strong>and</strong> long<br />

double<br />

– float is the most memory-efficient representation (typically 32 bits), but has<br />

limited range <strong>and</strong> precision<br />

– double is the most commonly used floating point type. In particular, most<br />

numerical library functions accept <strong>and</strong> return double arguments. Doubles<br />

normally take up 64 bits<br />

– long double <strong>of</strong>fers extended range <strong>and</strong> precision (sometimes using 128 bits)<br />

<strong>and</strong> is a recent addition<br />

Floating point constants are written using a decimal point, or exponential notation<br />

(or both):<br />

– 1.0 is a floating point constant<br />

– 1 is an integer constant. . .<br />

– . . . but 1e0 <strong>and</strong> 1.0E0 are both floating point constants<br />

If we mix integer <strong>and</strong> floating point numbers in an expression, a value <strong>of</strong> a<br />

“smaller” type is converted to that <strong>of</strong> the bigger one transparently:<br />

– 9/2 == 4, but 9/2.0 == 4.5 <strong>and</strong> 9.0/2 == 4.5<br />

Stephan Schulz 115


Fahrenheit to Celsius – More Exactly<br />

/* A program that prints a Fahrenheit -> Celsius conversion table */<br />

#include <br />

#include <br />

int main(void)<br />

{<br />

int fahrenheit;<br />

double celsius;<br />

}<br />

printf("Fahrenheit -> Celsius\n\n");<br />

for(fahrenheit=0; fahrenheit


Administrative Notes<br />

Please ssh to lee.cs.miami.edu to use the lab machines over the net.<br />

To change your password on the lab machines, use yppasswd. Also check<br />

http://www.cs.miami.edu/~irina/password.html for the password policy<br />

To submit programming assignments, create a subdirectoy with the name<br />

ASSIGNMENT (where is the number <strong>of</strong> the current assigment) <strong>and</strong><br />

copy the relevant files to it<br />

Example: To submit the current assignment, do e.g.<br />

$ cd ∼ (go home)<br />

$ mkdir ASSIGNMENT 2<br />

$ cp mystuff/celsius2fahrenheit* ASSIGNMENT 2<br />

$ cp mystuff/imp metric* ASSIGNMENT 2<br />

Stephan Schulz 117


Excercises<br />

Exp<strong>and</strong> the word count program to count characters, words, <strong>and</strong> lines (<strong>of</strong> stdin)<br />

as wc does<br />

Write a program that prints useful imperial to metric (<strong>and</strong> back) conversion<br />

tables to a reasobale precision<br />

Stephan Schulz 118


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

Simple Arrays <strong>and</strong> Functions<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Arrays<br />

A array is a data structure that holds elements <strong>of</strong> one type so that each element<br />

can be (efficiently) accessed using an index<br />

In C, arrays are always indexed by integer values<br />

Indices always run from 0 to some fixed, predetermined value<br />

[]; defines a variable <strong>of</strong> an array type:<br />

– can be any valid C type, including user-defined types<br />

– is the name <strong>of</strong> the variable defined<br />

– is the number <strong>of</strong> elements in the array (Note: Indices run from 0<br />

to -1)<br />

Example: char x[10]; defines the variable x to hold 10 elements <strong>of</strong> type char,<br />

x[5] accesses the 5th element <strong>of</strong> that array<br />

Stephan Schulz 120


#include <br />

#include <br />

#include <br />

int main(void)<br />

{<br />

int freq_count[128];<br />

int i, c;<br />

Example: Counting Character Frequencies<br />

for(i=0; i


}<br />

Example: Counting Character Frequencies (Contd.)<br />

for(i=0; i


Initializing Arrays<br />

In the example, we used an explicit loop to initialize the array<br />

For short arrays we can also list the initial values in the definition <strong>of</strong> the array:<br />

– int days per month[12] = {31,28,31,30,31,30,31,31,30,31,30,31};<br />

– The number <strong>of</strong> values has to be smaller than or equal to the number <strong>of</strong><br />

elements in the array<br />

– Unspecified elements are initialized to all bits zero, (i.e. 0 for all basic data<br />

types)<br />

If we give an explicit intializer, we can omit the size <strong>of</strong> the array:<br />

– int days per month[] = {31,28,31,30,31,30,31,31,30,31,30,31};<br />

– The compiler will automatically allocate an array <strong>of</strong> sufficient size to hold all<br />

the values in the initializer<br />

Stephan Schulz 123


Array Layout<br />

C arrays are implemented as a sequence <strong>of</strong> consequtive memory locations <strong>of</strong> the<br />

right size to hold the element<br />

Example: Address Array Element Content<br />

0<br />

. . .<br />

112 Other data<br />

120 days per month[0] 31<br />

124 days per month[1] 28<br />

128 days per month[2] 31<br />

132 days per month[3] 30<br />

136 days per month[4] 31<br />

140 days per month[5] 30<br />

144 days per month[6] 31<br />

148 days per month[7] 31<br />

152 days per month[8] 30<br />

156 days per month[9] 31<br />

160 days per month[10] 30<br />

164 days per month[11] 31<br />

168 Other data<br />

. . .<br />

Stephan Schulz 124


No Safety Belts <strong>and</strong> No Air Bag!<br />

C does not check if the index is in the valid range!<br />

– If you access days per month[13] you might change some critical other data<br />

– The operating system may catch some <strong>of</strong> these wrong accesses, but do not<br />

rely on it!)<br />

This is source <strong>of</strong> many <strong>of</strong> the buffer-overflow errors exploited by crackers <strong>and</strong><br />

viruses to hack into systems!<br />

Stephan Schulz 125


Character Arrays<br />

Character arrays are the most frequent kind <strong>of</strong> arrays used in C<br />

– They are used for I/O operations<br />

– They are used for implementing string operations in C<br />

To make the use <strong>of</strong> character arrays easier, we can use string constants to<br />

initialize them. The following definitions are equivalent:<br />

– char hello[] = {’H’,’e’,’l’,’l’,’o’,’\0’};<br />

– char hello[] = "Hello";<br />

– char hello[6] = "Hello";<br />

Notice that the string constant is automatically terminated by a NUL character!<br />

Stephan Schulz 126


Functions<br />

Functions are the primary means <strong>of</strong> structuring programs in C<br />

A function is a named subroutine<br />

– It accepts a number <strong>of</strong> arguments, processes them, <strong>and</strong> (optionally) returns a<br />

result<br />

– Functions also may have side effects, like I/O or changes to global data<br />

structures<br />

– In C, any subroutine is called a function, wether it actually returns a result or<br />

is only called for its side effect<br />

Note: A function hides its implementation<br />

– To use a function, we only need to know its interface, i.e. its name, parameters,<br />

<strong>and</strong> return type<br />

– We can improve the implementation <strong>of</strong> a function without affecting the rest <strong>of</strong><br />

the program<br />

Function can be reused in the same program or even different programs, allowing<br />

people to build on existing code<br />

Stephan Schulz 127


Function Definitions<br />

A function definition consists <strong>of</strong> the following elements:<br />

– Return type (or void) if the function does not return a value<br />

– Name <strong>of</strong> the function<br />

– Parameter list<br />

– Function body<br />

The name follows the same rules as variable names<br />

The parameter list is a list <strong>of</strong> coma-separated pairs <strong>of</strong> the form <br />

The body is a sequence <strong>of</strong> statements included in curly braces<br />

Example:<br />

int timesX(int number, int x)<br />

{<br />

return x*number;<br />

}<br />

Stephan Schulz 128


Function Calling<br />

A function is called from another part <strong>of</strong> the program by writing its name,<br />

followed by a parenthesized list <strong>of</strong> arguments (where each argument has to have<br />

a type matching that <strong>of</strong> the corresponding parameter <strong>of</strong> the function)<br />

If a function is called, control passes from the call <strong>of</strong> the function to the function<br />

itself<br />

– The parameters are treated as local variables with the values <strong>of</strong> the arguments<br />

to the call<br />

– The function is executed normally<br />

– If control reaches the end <strong>of</strong> the function body, or a return statement is<br />

executed, control returns to the caller<br />

– A return statement may have a single argument <strong>of</strong> the same type as the<br />

return type <strong>of</strong> the function. If the statement is executed, the argument <strong>of</strong><br />

return becomes the value returned to the caller<br />

We can only call functions that have already been declared or defined at that<br />

point in the program!<br />

Stephan Schulz 129


Example: Printing Character Frequencies<br />

int print_freq(char c, int freq)<br />

{<br />

int i;<br />

}<br />

printf("%c :", c);<br />

if(freq < 75)<br />

{<br />

for(i=0; i


Example: Printing Character Frequencies (contd.)<br />

Assume that the previous function definition is inserted into the frequency<br />

counting program just in front <strong>of</strong> the int main(void) line<br />

We can then modify main as follows:<br />

...<br />

for(i=0; i


Exercises<br />

Rewrite the Fahrenheit→Celsius Program to use a function for the actual conversion<br />

Stephan Schulz 132


Assignment<br />

A prime number is a (positive integer) number that is evenly divisible only by 1<br />

<strong>and</strong> itself<br />

1. Write a function isprime() that determines if an integer number is prime.<br />

You can use the % modulus operator (division rest on integers) or work with<br />

plain division. Use your function to implement a program primes simple<br />

that prints all primes between 0 <strong>and</strong> 10000.<br />

2. The Sieve <strong>of</strong> Erathostenes is a more efficient (<strong>and</strong> ancient) algorithm for<br />

finding all primes up to a given number. It starts with a list <strong>of</strong> all numbers<br />

from 2 to the desired limit. It traverses this list, starting at two. Whenever<br />

it encounteres a new number, it strikes all multiples <strong>of</strong> it from the list. What<br />

remains at the end is a list <strong>of</strong> prime numbers.<br />

Example:<br />

Initial list: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16<br />

Striking multiples <strong>of</strong> 2: 2 3 5 7 9 11 13 15<br />

Striking multiples <strong>of</strong> 3: 2 3 5 7 11 13<br />

(There are no multiples <strong>of</strong> any remainig number, so we skip the<br />

Use the Sieve algorithm in a second program, primes sieve, that prints all<br />

primes between 0 <strong>and</strong> 10000. Hint: Use an array!<br />

Stephan Schulz 133


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

More on Functions<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


A function is a named subroutine<br />

Review <strong>of</strong> Function Properties<br />

It accepts a number <strong>of</strong> arguments <strong>of</strong> a predetermined type <strong>and</strong> returns a value <strong>of</strong><br />

a given type<br />

It can have its own local variables<br />

A function can be called from other places in the program, including other<br />

functions<br />

Functions have to be known (either defined or declared) before they can be called<br />

Stephan Schulz 135


Example: Reading Integers<br />

We want to write a function that reads a positive integer number from stdin,<br />

using only getchar()<br />

A number is defined as a sequence <strong>of</strong> decimal digits (characters from the range<br />

’0’ to ’9’<br />

– We can use the function isdigit(c) from ctype.h to test if a character is a<br />

(decimal) digit<br />

– The C st<strong>and</strong>ard guarantees that ’0’ to ’9’ have consecutive numerical values.<br />

We can thus get the value <strong>of</strong> a single character c that represents a digit by<br />

the expression c-’0’<br />

Idea: We read the most significant digits first. So whenever we read a new digit,<br />

the value <strong>of</strong> what we have read so far increases 10-fold:<br />

Read Value<br />

1 1<br />

13 10*1+3 = 13<br />

137 10*13+7 = 137<br />

1375 10*137+5 = 1375<br />

Stephan Schulz 136


Example: int read int10(void)<br />

/* We assume that stdio <strong>and</strong> ctype have been included */<br />

/* A function that reads a positive integer number in base 10 from<br />

* stdin. Return number or -1 on failure. Will read one character<br />

* ahead! */<br />

int read_int10(void)<br />

{<br />

int res = 0, c, count=0;<br />

}<br />

while(isdigit(c=getchar()))<br />

{<br />

res = (res*10)+c-’0’;<br />

count++;<br />

}<br />

if(count > 0) /* We read something */<br />

{<br />

return res;<br />

}<br />

return -1;<br />

Stephan Schulz 137


Improving the Function<br />

read int10(void) works fine, but can only read number in decimal notation<br />

We want to have a function that can read numbers in any base between 2 <strong>and</strong><br />

10 now<br />

Examples:<br />

– 142 in base 8 has the value 1 ∗ 8 2 + 4 ∗ 8 1 + 2 ∗ 8 0 = 1 ∗ 64 + 4 ∗ 8 + 2 = 98<br />

– 101010 in base two has the value 1∗2 5 +0∗2 4 +1∗2 3 +0∗2 2 +1∗2 1 +0∗2 0 =<br />

32 + 8 + 2 = 42<br />

– 1873 is not a valid number in base 6! All digits have to be smaller than the<br />

base<br />

The principle is the same, we just use a parameter base instead <strong>of</strong> the hardwired<br />

value 10!<br />

Stephan Schulz 138


Do we have a Valid Digit?<br />

/* Is a character a valid digit in base b? */<br />

int is_base_digit(int c, int base)<br />

{<br />

if(c - ’0’ < 0)<br />

{<br />

return 0;<br />

}<br />

if(c - ’0’ >= base)<br />

{<br />

return 0;<br />

}<br />

return 1;<br />

}<br />

Stephan Schulz 139


Reading a Number in any Base ( 0) /* We read something */<br />

{<br />

return res;<br />

}<br />

return -1;<br />

Stephan Schulz 140


Build General Functions!<br />

Good programs are build by breaking the task into many functions that are:<br />

– Small – at most one screen page (in your favourite editor)<br />

– Simple – they only do one thing, <strong>and</strong> they do that well<br />

– General – so that they can be reused at other parts in the program<br />

Going from general to specific is (generally) easy:<br />

/* Alternative to read_int10 */<br />

int read_int10b(void)<br />

{<br />

return read_int_b(10);<br />

}<br />

Stephan Schulz 141


Recursive Functions<br />

As we stated above, functions can call other functions. They can also call<br />

themselves recursively<br />

A recursive function always has to h<strong>and</strong>le at least two cases:<br />

– The base case h<strong>and</strong>les a simple situation without further calls to the same<br />

function<br />

– The recursive cases may do some work, <strong>and</strong> in between make recursive calls to<br />

the function for smaller (in some sense) subtasks<br />

Recursion is one <strong>of</strong> the most important programming principles!<br />

Stephan Schulz 142


Example: Printing Integers<br />

We now want to print positive integer numbers to stdout, using only putchar()<br />

Consider a number in base 10: 421 = 42 ∗ 10 + 1<br />

We can split the task into two subtasks:<br />

– Print everything but the last digit (recursively)<br />

– Print the last digit<br />

Base case: There are no digits to print any more<br />

Basic operations:<br />

– To get the last digit, we use the modulus operator %<br />

– To get rid <strong>of</strong> the last digit, we divide the number by the desired base (remember,<br />

integer division truncates)<br />

Stephan Schulz 143


Example: Decimal Representation <strong>of</strong> 421<br />

Let’s do an example: We want to print the number 421 in base 10<br />

– Step 1: 421%10 = 1 <strong>and</strong> 421/10 = 42. Hence the last number to print is 1<br />

<strong>and</strong> the rest we still have to print is 42<br />

– Step 2: 42%10 = 2 <strong>and</strong> 42/10 = 4. The second last digit is 2, the rest is 4<br />

– Step 3: 4%10 = 4 <strong>and</strong> 4/10 = 0. The next digit is 4<br />

– Step 4: Our rest is 0, hence there is nothing to do but printing the digits in<br />

the right order<br />

The same principle applies for other bases (just replace 10 by your base)<br />

Stephan Schulz 144


Writing a Number in any Base (


Writing Integers (Contd.)<br />

We can wrap the simple recursive function to h<strong>and</strong>le the abnormal case (but, as<br />

we saw on the last slide, we don’t need to):<br />

/* Write positive integer in any base to stdout */<br />

void write_int_b(int value, int base)<br />

{<br />

if(value == 0)<br />

{<br />

putchar(’0’);<br />

}<br />

write_int_b_rekursive(value, base);<br />

}<br />

Stephan Schulz 146


Putting Things Together: A Base Converter<br />

We now use the defined function to write a program that reads pairs number<br />

base <strong>and</strong> prints them back in the new base:<br />

– number is considered to be a decimal number<br />

– base should be a decimal number between 2 <strong>and</strong> 10 (inclusive)<br />

– Numbers <strong>and</strong> pairs are separated by a single, arbitrary character (including<br />

space <strong>and</strong> newline)<br />

– The program terminates, if one <strong>of</strong> the numbers is invalid<br />

Stephan Schulz 147


The Base Converter<br />

int main(void)<br />

{<br />

int num, base;<br />

while(1)<br />

{<br />

printf("Input decimal value <strong>and</strong> desired base!\n");<br />

num = read_int10();<br />

if(num == -1)<br />

{<br />

return EXIT_SUCCESS;<br />

}<br />

base = read_int10();<br />

if(base == -1 || base < 2 || base > 10)<br />

{<br />

printf("Error: No valid base!\n");return EXIT_FAILURE;<br />

}<br />

write_int_b(num, base);<br />

putchar(’\n’);<br />

}<br />

}<br />

Stephan Schulz 148


Usage Example<br />

$ ./base converter<br />

Input decimal value <strong>and</strong> desired base!<br />

123123 3<br />

20020220010<br />

Input decimal value <strong>and</strong> desired base!<br />

42 10<br />

42<br />

Input decimal value <strong>and</strong> desired base!<br />

42 Hallo!<br />

Error: No valid base!<br />

$<br />

Stephan Schulz 149


Exercise<br />

Extend the base converter to work with base 16, using 0-9 <strong>and</strong> A-F as digits<br />

(allow both upper <strong>and</strong> lower case!)<br />

Extend the base converter to accept tripplets input-base,value,outputbase,<br />

where value is interpreted in input-base (<strong>and</strong> input-base is a single hexadecimal<br />

digit >=2). Add reasonably robust error h<strong>and</strong>ling!<br />

The complete base converter code from the lecture is available from the <strong>CSC322</strong><br />

web page or directly at http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>/base_<br />

converter.c<br />

Stephan Schulz 150


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

Program Structure <strong>and</strong> the C Preprocessor<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Is combibed into<br />

Simple Program Structure<br />

Sources<br />

(Definitions)<br />

Compiler<br />

Executable<br />

C Preprocessor<br />

Headers<br />

(Declarations)<br />

Stephan Schulz 152


Includes<br />

Translates into<br />

Is combibed into<br />

Program Structure In Detail<br />

Sources<br />

(Definitions)<br />

Compiler<br />

Object<br />

System System<br />

Files Library<br />

Library<br />

Executable<br />

RTE<br />

(Shared libs)<br />

Linker<br />

C Preprocessor<br />

Headers<br />

(Declarations)<br />

Stephan Schulz 153


Program Structure for Multi-File Programs<br />

Headers Headers Headers<br />

(Declarations) (Declarations) (Declarations)<br />

Sources<br />

(Definitions)<br />

Object<br />

Object<br />

Object<br />

System System<br />

Files Files<br />

Files<br />

Library<br />

Library<br />

Includes<br />

Translates into<br />

Is combibed into<br />

Sources<br />

(Definitions)<br />

Executable<br />

RTE<br />

(Shared libs)<br />

Sources<br />

(Definitions)<br />

C Preprocessor<br />

Compiler<br />

Linker<br />

Headers<br />

(Declarations)<br />

Stephan Schulz 154


The C Preprocessor<br />

The C preprocessor performs a textual rewriting <strong>of</strong> the program text before it is<br />

ever seen by the compiler proper<br />

– It includes the contents <strong>of</strong> other files<br />

– It exp<strong>and</strong>s macro definitions<br />

– It conditionally processes or removes segments <strong>of</strong> the program text<br />

Preprocessor directives start with a hash # <strong>and</strong> traditionally are written starting<br />

in the very first column <strong>of</strong> the program text<br />

After preprocessing, the program text is free <strong>of</strong> all preprocessor directives<br />

Normally, gcc will transparently run the preprocessor. Run gcc -E if you<br />

want to see the preprocessor output<br />

Stephan Schulz 155


Including Other Files: #include<br />

The #include directive is used to include other files (the contents <strong>of</strong> the named<br />

file replaces the #include directive)<br />

Form 1: #include "file"<br />

– The preprocessor will search for file in the current directory<br />

– What happens if file is not found in the current directory, is implementationdefined<br />

∗ <strong>UNIX</strong> compilers will typically treat file as a pathname (that may be either<br />

absolute or relative)<br />

∗ If the file is not found, the compiler prints an error message <strong>and</strong> aborts<br />

Form 2: #include <br />

– file will be searched for in an implementation-defined way<br />

– <strong>UNIX</strong> compilers will typically treat file as a file name relative to the system<br />

include directory, /usr/include on the lab machines<br />

– You can add to the list <strong>of</strong> directories that will be searched using<br />

gcc -I<br />

Stephan Schulz 156


myfile.c:<br />

A Poem<br />

#include "mary"<br />

$ gcc -E myfile.c<br />

# 1 "myfile.c"<br />

A Poem<br />

# 1 "mary" 1<br />

Mary had a little lamb,<br />

Its fleece was white as snow;<br />

And everywhere that Mary went<br />

The lamb was sure to go.<br />

# 4 "myfile.c" 2<br />

Example: Include<br />

mary:<br />

Mary had a little lamb,<br />

Its fleece was white as snow;<br />

And everywhere that Mary went<br />

The lamb was sure to go.<br />

Stephan Schulz 157


Include Discussion<br />

Include directives are typically used for sharing common declarations between<br />

different program parts<br />

Libraries (including the st<strong>and</strong>ard library) come with header files that define their<br />

interface by<br />

– Defining data types <strong>and</strong> constants<br />

– Declaring functions (<strong>and</strong> defining macros)<br />

– Declaring variables<br />

Note that included files can contain further #include statements (that will be<br />

automatically exp<strong>and</strong>ed by the preprocessor)<br />

– This is frequent in system files, where the st<strong>and</strong>ard-prescribed include files<br />

<strong>of</strong>ten include system-specific files actually describing the features<br />

Stephan Schulz 158


Simple Macro Definitions: #define<br />

The #define directive is used to define macros<br />

Simple Form: #define <br />

– This will define a macro for , which has to follow the common rules for<br />

C identifiers (alphanumeric characters <strong>and</strong> underscore, should not start with a<br />

digit)<br />

– Any normal occurence <strong>of</strong> after the definition will be replaced by<br />

<br />

– Replacement will not take place in strings!<br />

– The macro definition normally ends at the end <strong>of</strong> the line, however, it can be<br />

extended to the next line by appending \ as the very last character <strong>of</strong> the line<br />

Note that macro expansion even takes place within further macro definitions!<br />

Most common use: Symbolic constants (e.g. EOF)<br />

Stephan Schulz 159


eality.c:<br />

Simple #define Example<br />

#define true 1<br />

#define false 0<br />

void reality_check(void)<br />

{<br />

if(true == false)<br />

{<br />

printf("Reality is broken!\n");<br />

}<br />

}<br />

$ gcc -E reality.c<br />

# 4 "reality.c"<br />

void reality_check(void)<br />

{<br />

if(1 == 0)<br />

{<br />

printf("Reality is broken!\n");<br />

}<br />

}<br />

Stephan Schulz 160


Macros with Arguments<br />

Macro definitions can also contain formal arguments<br />

#define (arg1,...,arg1) <br />

If a macro with arguments is exp<strong>and</strong>ed, any occurence <strong>of</strong> a formal argument in<br />

the replacement text is replaced with the actual value <strong>of</strong> the arguments in the<br />

macro call<br />

This allows a more efficient way <strong>of</strong> implementing small “functions”<br />

– But: Macros cannot do recursion<br />

– Macro calls have slightly different semantics from function calls<br />

– Therefore macros are usually only used for very simple tasks<br />

By convention, preprocessor defined constants <strong>and</strong> many macros are written in<br />

ALL CAPS (using underscores for structure)<br />

Stephan Schulz 161


macrotest.c:<br />

#define Examples<br />

#define XOR(x,y) ((!(x)&&(y))||((x)&&!(y))) /* Exclusive or */<br />

#define EQUIV(x,y) (!XOR(x,y))<br />

void test_macro(void)<br />

{<br />

printf("XOR(1,1) : %d\n", XOR(1,0));<br />

printf("EQUIV(1,0): %d\n", EQUIV(1,0));<br />

}<br />

$ gcc -E reality.c<br />

# 4 "macrotest.c"<br />

void test_macro(void)<br />

{<br />

printf("XOR(1,1) : %d\n", ((!(1)&&(0))||((1)&&!(0))));<br />

printf("EQUIV(1,0): %d\n", (!((!(1)&&(0))||((1)&&!(0)))));<br />

}<br />

Stephan Schulz 162


#define Caveats<br />

Since macros work by textual replacement, there are some unexpected effects:<br />

– Consider #define FUN(x,y) x*y + 2*x<br />

∗ Looks innocent enough, but: FUN(2+3,4) exp<strong>and</strong>s into 2+3*4+2*2+3 (not<br />

(2+3)*4+2*(2+3))<br />

∗ To avoid this, always enclose each formal parameter in parentheses (unless<br />

you know what you are doing)<br />

– Now consider FUN(var++,1)<br />

∗ It exp<strong>and</strong>s into x++*1 + 2*x++<br />

∗ Macro arguments may be evaluated more than once!<br />

∗ Thus, avoid using macros with expressions that have side effects<br />

Other frequent problems:<br />

– Semicolons at the end <strong>of</strong> a macro definition (wrong!)<br />

– “Invisible” syntax errors (run gcc -E <strong>and</strong> check the output if you cannot locate<br />

an error)<br />

Stephan Schulz 163


Conditional Compilation: #if/#else/#endif<br />

We can use preprocessor directives to conditionally include or exclude parts <strong>of</strong><br />

the program:<br />

– Program parts may be enclosed in #if /#endif pairs<br />

– has to be a constant integer expression<br />

– If it evaluates to 0, the text in the #if /#endif bracket is ignored,<br />

otherwise it is included<br />

– There also is an optional #else “branch”<br />

Most frequent use: Test for the definition <strong>of</strong> macros<br />

– defined() evaluates to 1 if is defined (even as the empty<br />

string), 0 otherwise<br />

– Short form: #if defined() is equivalent to #ifdef ,<br />

#if !defined() is equivalent to #ifndef ,<br />

– E.g.: #ifndef EOF<br />

#define EOF -1<br />

#endif<br />

Stephan Schulz 164


cond preproc.c:<br />

#define hallo<br />

#define fred barney<br />

#define test 2+2<br />

#if defined(hallo)<br />

"Hallo"<br />

#else<br />

#ifdef fred<br />

"Fred"<br />

#endif<br />

#endif<br />

#if test<br />

"test"<br />

#endif<br />

$ gcc -E cond preproc.c<br />

# 5 "cond_preproc.c"<br />

"Hallo"<br />

"test"<br />

Example: #ifdef<br />

Stephan Schulz 165


Exercises<br />

Search the /usr/include directory (use grep for faster progress) <strong>and</strong> find out<br />

where the following functions/macros are defined, <strong>and</strong>, for the macros, what<br />

their value is<br />

– LONG MAX<br />

– ULONG MAX<br />

– getchar()<br />

– getc()<br />

– EOF<br />

– EXIT FAILURE<br />

– EXIT SUCCESS<br />

– NULL<br />

Stephan Schulz 166


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

C Preprocessor/Declarations <strong>and</strong> Scoping<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Conditional Compilation: #if/#else/#endif<br />

We can use preprocessor directives to conditionally include or exclude parts <strong>of</strong><br />

the program:<br />

– Program parts may be enclosed in #if /#endif pairs<br />

– has to be a constant integer expression<br />

– If it evaluates to 0, the text in the #if /#endif bracket is ignored,<br />

otherwise it is included<br />

– There also is an optional #else “branch”<br />

Most frequent use: Test for the definition <strong>of</strong> macros<br />

– defined() evaluates to 1 if is defined (even as the empty<br />

string), 0 otherwise<br />

– Short form: #if defined() is equivalent to #ifdef ,<br />

#if !defined() is equivalent to #ifndef ,<br />

– E.g.: #ifndef EOF<br />

#define EOF -1<br />

#endif<br />

Stephan Schulz 168


cond preproc.c:<br />

#define hallo<br />

#define fred barney<br />

#define test 2+2<br />

#if defined(hallo)<br />

"Hallo"<br />

#else<br />

#ifdef fred<br />

"Fred"<br />

#endif<br />

#endif<br />

#if test<br />

"test"<br />

#endif<br />

$ gcc -E cond preproc.c<br />

# 5 "cond_preproc.c"<br />

"Hallo"<br />

"test"<br />

Example: #ifdef<br />

Stephan Schulz 169


More on Preprocessor Definitions<br />

You can use #undef to get rid <strong>of</strong> a definition<br />

– This is most <strong>of</strong>ten used to start from a clean slate:<br />

#undef true<br />

#undef false<br />

#define true 1<br />

#define false 0<br />

– It is, however, forbidden to undefine implementation-defined names<br />

You can use the -D option to gcc to cause certain names to be defined throughout<br />

the process<br />

– This is <strong>of</strong>ten used to select one <strong>of</strong> many alternatives for compilation<br />

∗ With or without internal consistency checkes<br />

∗ With or without certain features (e.g. Demo version vs. commercial version)<br />

∗ . . .<br />

Certain names may be predefined by the implementation (most starting with two<br />

underscores: __FILE__, __STDC__ . . . )<br />

Stephan Schulz 170


Combinations <strong>of</strong> #ifdef <strong>and</strong> #include<br />

#ifdef/endif also can be used to conditionally include or exclude files<br />

Usage: Compile for different operating systems:<br />

#ifdef __LINUX__<br />

#include "linux.h"<br />

#elif defined(__BSD__)<br />

#include "bsd.h"<br />

#else<br />

#include "default.h"<br />

#endif<br />

Usage: Guarding against multiple inclusions<br />

#ifndef THIS_HEADER<br />

#define THIS_HEADER<br />

<br />

#endif<br />

Stephan Schulz 171


Separate Compilation<br />

C supports the separate compliation <strong>of</strong> multiple source files<br />

– Each source file is translated into an object file<br />

– A linker combines different object files into the final executable<br />

gcc by default tries to create an executable program by performing operations as<br />

follows:<br />

1. Preprocessing<br />

2. Compilation (<strong>and</strong> assembly)<br />

3. Linking<br />

For multi-file programs, we have to perform separate compilation:<br />

– gcc -c file.c -o file.o will compile file.c into file.o without linking<br />

– gcc -o progname file1.o file2.o file3.o will link the three precomiled object<br />

files into an executable<br />

Stephan Schulz 172


Definitions <strong>and</strong> Declarations<br />

Definitions cause the defined objects to be created<br />

– Variable definitions allocate an appropriate amount <strong>of</strong> memory (<strong>and</strong> associate<br />

it with the variable name)<br />

– Function definitions cause code to be generated<br />

Declarations only state information about an object<br />

– For variables, they state the type<br />

– For functions, the state return type <strong>and</strong> argument types<br />

There can be any number <strong>of</strong> compatible declarations for an object<br />

There can be only one definition for the object<br />

A function or variable can only be used inside the scope <strong>of</strong> a matching declaration<br />

Any definition also implicitly declares an object<br />

Stephan Schulz 173


Explicit Declarations<br />

Variables can be declared by adding the extern keyword to the syntax <strong>of</strong> a<br />

definition:<br />

– extern int counter;<br />

– extern char filename[MAXPATHLEN];<br />

Function declarations just consist <strong>of</strong> the function header, terminated by a semicolon:<br />

– int isdigit(int c);<br />

– int putchar(int c);<br />

– bool TermComputeRWSequence(PStack p stack,Term p from,Term p to);<br />

Alternatively, the names <strong>of</strong> the formal parameters can be omitted<br />

– int isdigit(int);<br />

– int putchar(int);<br />

– bool TermComputeRWSequence(PStack p,Term p,Term p);<br />

– However, the first form is <strong>of</strong>ten preferred because the paramter names may<br />

document the purpose <strong>of</strong> the parameter<br />

Stephan Schulz 174


Scoping Rules<br />

There are two kinds <strong>of</strong> declarations in C<br />

– Declarations written inside a block are called local declarations<br />

– Declarations outside any block are global declarations<br />

The scope <strong>of</strong> a local declaration begins at the declaration <strong>and</strong> ends at the end <strong>of</strong><br />

the innermost enclosing block<br />

The scope <strong>of</strong> a global declaration begins at the declaration <strong>and</strong> continues to the<br />

end <strong>of</strong> the source file<br />

– Note that this refers to files after preprocessing, i.e. a declaration in a header file<br />

also is visible in the including file (from the point <strong>of</strong> the #include statement)<br />

Stephan Schulz 175


Scope Example<br />

| extern int global_count;<br />

|<br />

| | | int abs_val (double number)<br />

| | | {<br />

| | | | double help = number;<br />

| | | |<br />

| | | | if(number < 0)<br />

| | | | {<br />

| | | | help = -1 * help;<br />

| | | | global_count++;<br />

| | | | }<br />

| | | | }<br />

| |<br />

| | | int main()<br />

| | | {<br />

| | | printf("\%7f\n", abs_val(-1.0));<br />

| | | }<br />

| | |<br />

| | | int global_count;<br />

Stephan Schulz 176


Limiting Potential Scope<br />

By default, all declared variables <strong>and</strong> functions are accessible from any source file<br />

in the program<br />

– Of course, they may have to be declared to be visible<br />

Problems: We have no control over the use <strong>of</strong> these objects in other source files<br />

– Reuse <strong>of</strong> libraries may fail because <strong>of</strong> namespace polution<br />

– Unintentional or malicious misuse <strong>of</strong> internal functions may lead to program<br />

misbehaviour<br />

The static keyword, applied to a global definition (or declaration), limits the<br />

accessibility <strong>of</strong> the declared object to the source file it is defined in<br />

– static int internal help fun(int a1, int a2);<br />

In general, it is a good idea to declare everything not expected to be used by<br />

other program part static<br />

Stephan Schulz 177


Lifetime <strong>and</strong> Initialization <strong>of</strong> Variables<br />

Global variables have unlimited lifetime<br />

– They are created <strong>and</strong> initialized when the program starts<br />

– The expression used in the initialzation has to be constant, i.e. it has to be<br />

fully evaluable at compile time<br />

– If not explicitly initialized, they are guaranteed to be initialized to 0<br />

– They keep their values until the program terminates (unless explicitely changed,<br />

<strong>of</strong> course)<br />

Most local variables (<strong>and</strong> function parameters) only have limited lifetime<br />

– They are also called automatic variables <strong>and</strong> are typically allocated on the<br />

stack<br />

– They are created when the variable comes into scope <strong>and</strong> are destroyed when<br />

the variable goes out <strong>of</strong> scope – in particular, each recursive call gets a fresh<br />

copy <strong>of</strong> the variable<br />

– The initializing expression can use all variables <strong>and</strong> functions currently in scope<br />

– They are reinitialized every time they come into scope, if not initialized<br />

explicitly, they contain undefined values (“junk”)<br />

Stephan Schulz 178


Persistent Local Variables: static again<br />

static local variables have unlimited lifetime<br />

– They are initalized the very first time they come into scope<br />

– They are shared between different calls to the same function<br />

– They keep their values in between calls<br />

– However, they can only be accessed from inside their corresponing block<br />

Stephan Schulz 179


Example: Static <strong>and</strong> Automatic Variables<br />

#include <br />

#include <br />

static int global_count = 0;<br />

void counter_fun(void)<br />

{<br />

static int static_count = 0;<br />

int auto_count = 0;<br />

int pseudo_count = global_count;<br />

global_count++; auto_count++; static_count++; pseudo_count++;<br />

printf("Global: %3d Auto: %3d Static: %d Pseudo: %d\n",<br />

global_count,auto_count, static_count, pseudo_count);<br />

}<br />

int main(void)<br />

{<br />

counter_fun();<br />

counter_fun();<br />

global_count = 0;<br />

counter_fun();<br />

counter_fun();<br />

return EXIT_SUCCESS;<br />

}<br />

Stephan Schulz 180


Example: Static <strong>and</strong> Automatic Variables(Contd.)<br />

$ gcc -o vartest vartest.c<br />

$ ./vartest<br />

Global: 1 Auto: 1 Static: 1 Pseudo: 1<br />

Global: 2 Auto: 1 Static: 2 Pseudo: 2<br />

Global: 1 Auto: 1 Static: 3 Pseudo: 1<br />

Global: 2 Auto: 1 Static: 4 Pseudo: 2<br />

$<br />

Stephan Schulz 181


Assignment<br />

Write a data safe library <strong>of</strong>fering the following functionality:<br />

– Calling data safe(ds register, 0, 0) will return a unique r<strong>and</strong>om key (a<br />

positive integer). Use r<strong>and</strong>() to generate r<strong>and</strong>om numbers (<strong>and</strong> man r<strong>and</strong><br />

to find out how).<br />

– Calling data safe(ds store, key, value) will store the value (a positive<br />

integer) in the data safe (under the key). It should return the value if everything<br />

worked, -1 otherwise (e.g. if there is no space left)<br />

– Calling data safe(ds retrieve, key, n) will retrieve the nth value stored<br />

under the key, or -1 if less then n values have been stored under the key<br />

– Calling data safe(ds delete, key, 0) will delete all entries stored under<br />

key (you may then reuse key for future register calls, as long as you still<br />

generate a r<strong>and</strong>om key)<br />

– Make sure that at least 100 keys can be in use in parallel, <strong>and</strong> that at least<br />

10000 data items can be stored in total<br />

Make sure that the data is not accessible in any other way (using legal C)<br />

Stephan Schulz 182


Implement the libray in its own source file, with a header file data safe.h that<br />

contains all necessary declarations<br />

Write a main program ds test.c that uses the library, storing 10 values under 3<br />

different keys, retrieving them <strong>and</strong> delete them. Use a reasonably varied sequence<br />

<strong>of</strong> storage, retrieval, <strong>and</strong> registration<br />

Hints:<br />

– Use static local variables to store the necessary data in the data safe()<br />

function<br />

– Use preprocessor #define statements to define the symbolic constants<br />

ds register, ds store,. . .<br />

– Be careful to avoid h<strong>and</strong>ing a key already in use out on registration. Carefully<br />

design your data structures first, the operations will be simple to implement<br />

Stephan Schulz 183


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

rpn calc: An Extended Example<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Project: An RPN Calculator<br />

Aim: A calculator program that can do simple arithmentic<br />

– Conversion between different bases<br />

– Addition, subtraction, multiplication...<br />

We’ll use reverse polish notation<br />

– Operator is written after arguments: 7 5 + = 7+5<br />

– More complicated: 12 2 5 2 * + - = (12-(2+(5*2)))<br />

Advantages <strong>of</strong> RPN<br />

– Easy to underst<strong>and</strong><br />

– Easy to implement<br />

– No hassle with recursive parsing <strong>of</strong> parentheses <strong>and</strong> precedences<br />

– Can easily <strong>and</strong> consistently h<strong>and</strong>le operators <strong>of</strong> any arity (number <strong>of</strong> arguments)<br />

Stephan Schulz 185


Some Sugested Operators<br />

Arithmetic operators (others may be added):<br />

+ Pop two numbers , add them<br />

- Pop two numbers, subtract first from second<br />

* Pop two numbers , multiply them<br />

/ Pop two numbers, divide second by first<br />

% Pop two numbers, divide second by first, giving the division rest<br />

Non-Arithmetic operators (non-exclusive):<br />

p Print the topmost number on the stack<br />

o Pop topmost number on the stack, use it as new output base<br />

i Pop topmost number on the stack, use it as new input base<br />

S Print the whole stack (mainly for debugging)<br />

P Print input <strong>and</strong> output bases (in decimal)<br />

Stephan Schulz 186


$ ./rpn calc<br />

Usage Example<br />

10 8 +<br />

S<br />

18<br />

3 / p<br />

6<br />

3 / p<br />

2<br />

o<br />

p<br />

Stack underflow error<br />

P<br />

Input base (decimal): 10 Output Base (decimal): 2<br />

255<br />

p<br />

11111111<br />

16 p<br />

10000<br />

10 o<br />

S<br />

255 16<br />

Stephan Schulz 187


Basic idea:<br />

Implementation<br />

– Input is a sequence <strong>of</strong> numbers <strong>and</strong> operators<br />

– If a number is read, it is pushed onto a stack<br />

– If an operator is read, the necessary number or arguments is popped <strong>of</strong> the<br />

stack, the operation is performed, <strong>and</strong> the result is placed in the stack<br />

Input <strong>and</strong> output can happen in any representation from base 2-16<br />

– There is a strong convention for representing these numbers:<br />

∗ Digits are 0-9 with nominal value, A-F (or a-f) with values 10-15<br />

– Input <strong>and</strong> output use independent bases (base conversion made easy)<br />

Recognizing numbers <strong>and</strong> operators<br />

– Any string <strong>of</strong> valid digits in the current input base is a number<br />

– Any string starting with - <strong>and</strong> directly followed by valid digits in the current<br />

input base is a number<br />

– Everything else is treated as an operator<br />

Stephan Schulz 188


Subtasks<br />

From the above, we can identify a number <strong>of</strong> subtasks:<br />

– Reading numbers <strong>and</strong> operators<br />

– Printing numbers<br />

– H<strong>and</strong>ling the stack<br />

– Executing the actual operations<br />

Input h<strong>and</strong>ling is the hardest task!<br />

– We need to read up to 2 characters to decide if we read a number or an<br />

operator (’-+’ represents two operators, ’-1’ a number)<br />

– Rather than h<strong>and</strong>ling explicit lookahead variables throughout the program, we<br />

can build a general character I/O-library that allows us to read ahead, but to<br />

maintain (or restore) the status <strong>of</strong> the input queue<br />

Stephan Schulz 189


Program Organization<br />

ctype.h stdio.h<br />

stdlib.h<br />

chario.h<br />

chario.c<br />

integerio.h<br />

integerio.c<br />

chario.o integerio.o rpn_calc.o (libc)<br />

rpn_calc<br />

rpn_calc.c<br />

#include<br />

Compile (gcc −c)<br />

Link (gcc)<br />

Stephan Schulz 190


The Character I/O Library: Ideas<br />

Main interface similar to getchar()<br />

Read character can be “pushed back” into the input queue<br />

Implementation:<br />

– Internal buffer <strong>of</strong> character<br />

– Pushed characters go into the buffer<br />

– Reading first tries the buffer, <strong>and</strong> only reads stdio if the buffer is empty<br />

Additional help-functions<br />

– Look at a character, but don’t read it<br />

– Skip while space<br />

Stephan Schulz 191


The Character I/O Library: chario.h<br />

#ifndef UNGETCHAR<br />

#define UNGETCHAR<br />

#include <br />

#include <br />

/* Maximal number <strong>of</strong> characters the can be pushed back */<br />

#define MAX_BUFFERED_CHARS 1024<br />

/* As getchar(), but with unget cabability (provided by PushChar() */<br />

int GetChar(void);<br />

/* Push back a character into the read queue. Return c or EOF if the<br />

queue is full. */<br />

int PushChar(int c);<br />

/* Return the next character, but do _not_ read it */<br />

int LookChar(void);<br />

/* Skip over white space characters. Return the first non-white<br />

character (but it is not removed from the queue), or EOF if the<br />

pushback queue is full. */<br />

int SkipSpace(void);<br />

#endif<br />

Stephan Schulz 192


The Character I/O Library: Global Variables <strong>and</strong> Includes<br />

#include "chario.h"<br />

static int char_buff[MAX_BUFFERED_CHARS];<br />

static int buff_pos = 0;<br />

Stephan Schulz 193


The Character I/O Library: Reading <strong>and</strong> Unreading<br />

int GetChar(void)<br />

{<br />

if(buff_pos)<br />

{<br />

buff_pos--;<br />

return char_buff[buff_pos];<br />

}<br />

return getchar();<br />

}<br />

int PushChar(int c)<br />

{<br />

if(buff_pos < MAX_BUFFERED_CHARS)<br />

{<br />

char_buff[buff_pos] = c;<br />

buff_pos++;<br />

return c;<br />

}<br />

return EOF;<br />

}<br />

Stephan Schulz 194


int LookChar(void)<br />

{<br />

int c = GetChar();<br />

}<br />

PushChar(c);<br />

return c;<br />

int SkipSpace(void)<br />

{<br />

int c;<br />

}<br />

The Character I/O Library: Help Functions<br />

while(isspace((c=GetChar())))<br />

{ /* Empty body */ }<br />

return PushChar(c);<br />

Stephan Schulz 195


The Integer I/O Library: Ideas<br />

We use the same algorithms as discussed before<br />

However, because we allow bases up to 16, we add some additional helper<br />

functions for<br />

– Recognizing valid digits<br />

– Converting numerical values to character representation <strong>of</strong> digits<br />

– Giving the numerical value <strong>of</strong> digits<br />

Second difference: We allow negative numbers<br />

– We cannot use -1 to signal failure<br />

– Instead: We write a separate function that predicts the presence (or absence)<br />

<strong>of</strong> a number in the input stream<br />

– The calling functions have to make sure that the integer reading function is<br />

only called if there is valid input (i.e. success is guaranteed)<br />

Stephan Schulz 196


#include "chario.h"<br />

The Integer I/O Library: integerio.h<br />

/* Read an integer in base base. */<br />

int read_int_base(int base);<br />

/* Check if there is a integer to be read, i.e. a digit or ’-’<br />

directly followed by a digit */<br />

int int_available(int base);<br />

/* Write integer in any base to stdout */<br />

void write_int_base(int value, int base);<br />

Stephan Schulz 197


#include "integerio.h"<br />

The Integer I/O Library: Includes<br />

Stephan Schulz 198


The Integer I/O Library: Helper functions 1<br />

/* Consider c as a hexadecimal digit (0..9, a..f, A..F) <strong>and</strong> return its<br />

numerical value. If not a valid digit, return -1 */<br />

static int hex_digit_value(int c)<br />

{<br />

if(c >= ’0’ && c = ’a’ && c = ’A’ && c


The Integer I/O Library: Helper functions 2<br />

/* Check if a character c is a valid digit in base. */<br />

static int is_base_digit(int c, int base)<br />

{<br />

int value = hex_digit_value(c);<br />

}<br />

if(value < 0 || value >= base)<br />

{<br />

return 0;<br />

}<br />

return 1;<br />

Stephan Schulz 200


The Integer I/O Library: Helper functions 3<br />

/* Given an int 0


The Integer I/O Library: Reading Integers<br />

/* Read an integer in base base. */<br />

int read_int_base(int base)<br />

{<br />

int res = 0, c, sign = 1;<br />

}<br />

if((c=GetChar())==’-’)<br />

{<br />

sign = -1;<br />

}<br />

else<br />

{<br />

PushChar(c); /* Unread Character */<br />

}<br />

while(is_base_digit((c=GetChar()),base))<br />

{<br />

res = (res*base)+hex_digit_value(c);<br />

}<br />

PushChar(c);<br />

return res*sign;<br />

Stephan Schulz 202


The Integer I/O Library: Checking for Integer Presence<br />

/* Check if there is a integer to be read, i.e. a digit or ’-’<br />

directly followed by a digit */<br />

int int_available(int base)<br />

{<br />

int save_char , res = 0;<br />

}<br />

if(is_base_digit(LookChar(), base))<br />

{<br />

res = 1;<br />

}<br />

else if(LookChar() == ’-’)<br />

{<br />

save_char = GetChar();<br />

if(is_base_digit(LookChar(), base))<br />

{<br />

res = 1;<br />

}<br />

PushChar(save_char);<br />

}<br />

return res;<br />

Stephan Schulz 203


The Integer I/O Library: Writing Integers<br />

/* Write integer in any base (2


Exercises<br />

Download the program from http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.<br />

html, compile it, <strong>and</strong> read the source code. You may want to add more<br />

operators (e.g. t to duplicate the top <strong>of</strong> the stack, s to switch the two topmost<br />

numbers,. . .<br />

Stephan Schulz 205


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

rpn calc: An Extended Example (Part 2)<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Recapitulation: Some <strong>of</strong> our Library Functions<br />

The integerio library <strong>of</strong>fers functios for reading <strong>and</strong> printing integers. All<br />

functions have a parameter base for selecting the number system (2–16, or<br />

binary to hexadecimal)<br />

int read int base(int base);<br />

– Reads an integer from the st<strong>and</strong>ard input (using our GetChar()/PushChar()<br />

interface), returning its value<br />

– If no valid integer can be found, behavior is undefined!<br />

int int available(int base);<br />

– Returns 1 (true), if a valid integer can be read from st<strong>and</strong>ard input, 0 otherwise<br />

– Does not consume any characters from the input stream!<br />

void write int base(int value, int base);<br />

– Prints an integer number to stdout, using the number system selected by<br />

base<br />

Additional function from chario.c: int SkipSpace(void)<br />

Stephan Schulz 207


The Main Calculator Program<br />

Aim: RPN (Postfix) calculator program<br />

– Input: Operators <strong>and</strong> Numbers (oper<strong>and</strong>s)<br />

– Numbers are pushed on a stack<br />

– Operators pop oper<strong>and</strong>s <strong>and</strong> push the result <strong>of</strong> the operation<br />

while(there is input)<br />

{<br />

if(input is a number)<br />

{<br />

num = read_number();<br />

push(num);<br />

}<br />

else if(input is a valid operator)<br />

{<br />

pop oper<strong>and</strong>s, apply operator, push result;<br />

}<br />

else<br />

{<br />

print error mesage;<br />

}<br />

}<br />

Stephan Schulz 208


Case Distinctions<br />

Note: The operator determines which actions we have to perform<br />

– This is a case distinction: Based on a single (integer) value, we have to select<br />

one alternative<br />

– Possible implementation:<br />

if(value == val1)<br />

{<br />

action1;<br />

}<br />

else if((value == val2)<br />

{<br />

action2;<br />

}<br />

...<br />

else<br />

{<br />

default_action;<br />

}<br />

Stephan Schulz 209


C Alternative: switch<br />

switch(E)<br />

{<br />

case val1: action1;<br />

break; /* Otherwise we fall through! */<br />

case val2: action2;<br />

... break;<br />

default: default_action;<br />

break;<br />

}<br />

E has to be an integer-valued expression<br />

val1, val2,. . . have to be constant integer expressions<br />

E is evaluated <strong>and</strong> the result is compared to each <strong>of</strong> the constants after the case<br />

labels. Execution starts with the first statement after the matching case. If no<br />

case matches, execution starts with the (optional) default case.<br />

Note: Execution does not stop at the next case label! Use break; to break out<br />

<strong>of</strong> the switch<br />

Stephan Schulz 210


The Stack Abstract Datatype<br />

A stack is a last-in first-out (LIFO) data structure<br />

– It can store values <strong>of</strong> a given type<br />

– Values can be pushed onto a stack<br />

– The topmost element can be retrieved by poping it <strong>of</strong>f the stack<br />

– Typically, only the top element is accessed (enforced either by convention or<br />

by design)<br />

– Stacks can have a predetermined size (maximal number <strong>of</strong> elements) or grow<br />

as needed<br />

Stack impementation in C:<br />

– Values are stored in an array <strong>of</strong> the correct type<br />

– A stack pointer contains the index <strong>of</strong> the next unused cell<br />

Stephan Schulz 211


Stack Implementation in rpn calc.c<br />

We use a fixed maximal stack size:<br />

#define STACKSIZE 1024<br />

– Using a symbolic constant avoids mistyping <strong>and</strong> misreading, <strong>and</strong> allows us to<br />

eaily change the stack size later!<br />

Our stack data structure is realized by two variables:<br />

– int stack[STACKSIZE]; stores the values<br />

– int sp = 0; is the stack pointer, <strong>and</strong> initially points to the first element <strong>of</strong><br />

stack<br />

Stack operations are implemented as specialized macros<br />

Stephan Schulz 212


Pushing things onto the stack: PUSH()<br />

/* If stack is full, print an error message,<br />

otherwise push the value onto the stack */<br />

#define PUSH(value) \<br />

if(sp < STACKSIZE) \<br />

{ \<br />

stack[sp] = (value);\<br />

sp++;\<br />

}\<br />

else\<br />

{\<br />

printf("Stack overflow error\n");\<br />

}<br />

Stephan Schulz 213


Poping values: POP OR FAIL()<br />

/* If stack is empty, print an error message <strong>and</strong> "break;",<br />

otherwise pop the top value into varname */<br />

#define POP_OR_FAIL(varname) \<br />

if(sp > 0)\<br />

{\<br />

sp--;\<br />

(varname) = stack[sp];\<br />

}\<br />

else\<br />

{\<br />

printf("Stack underflow error\n");\<br />

break;\<br />

}<br />

Note that the macro contains a break; statement in the error case<br />

– Limits general usability but. . .<br />

– . . . exits the case it is used in early!<br />

Stephan Schulz 214


The Main Program: Prelimaries <strong>and</strong> Declarations<br />

int main(void)<br />

{<br />

int num, arg1, arg2, i;<br />

int stack[STACKSIZE];<br />

int sp = 0, in_base = 10, out_base = 10;<br />

SkipSpace();<br />

The number systems to be used for input <strong>and</strong> output is determined by in base<br />

<strong>and</strong> out base<br />

– Both are initialized to 10 (decimal)<br />

Note that the next character to be read is meaningful (not white space) now<br />

– This will be a loop invariant <strong>of</strong> the main loop)<br />

Stephan Schulz 215


}<br />

The Main Loop: Overall Structure<br />

while(LookChar()!=EOF)<br />

{<br />

if(int_available(in_base))<br />

{<br />

num = read_int_base(in_base);<br />

PUSH(num);<br />

}<br />

else<br />

{ /* Operator! */<br />

switch(GetChar())<br />

{<br />

case ’o’:<br />

... /* H<strong>and</strong>le different cases */<br />

default:<br />

printf("Unknown oper<strong>and</strong>\n");<br />

break;<br />

}<br />

}<br />

SkipSpace();<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 216


The Main Loop: Arithmetic operators<br />

switch(GetChar())<br />

{<br />

...<br />

case ’+’:<br />

POP_OR_FAIL(arg2);<br />

POP_OR_FAIL(arg1);<br />

num = arg1+arg2;<br />

PUSH(num);<br />

break;<br />

case ’-’:<br />

POP_OR_FAIL(arg2);<br />

POP_OR_FAIL(arg1);<br />

num = arg1-arg2;<br />

PUSH(num);<br />

break;<br />

case ’*’:<br />

POP_OR_FAIL(arg2);<br />

POP_OR_FAIL(arg1);<br />

num = arg1*arg2;<br />

PUSH(num);<br />

break;<br />

Stephan Schulz 217


case ’/’:<br />

POP_OR_FAIL(arg2);<br />

POP_OR_FAIL(arg1);<br />

num = arg1/arg2;<br />

PUSH(num);<br />

break;<br />

case ’%’:<br />

POP_OR_FAIL(arg2);<br />

POP_OR_FAIL(arg1);<br />

num = arg1%arg2;<br />

PUSH(num);<br />

break;<br />

...<br />

}<br />

Stephan Schulz 218


The Main Loop: I/O operators<br />

switch(GetChar())<br />

{<br />

...<br />

case ’p’:<br />

POP_OR_FAIL(num);<br />

write_int_base(num,out_base);<br />

putchar(’\n’);<br />

PUSH(num);<br />

break;<br />

case ’o’:<br />

POP_OR_FAIL(num);<br />

if(num < 2 || num >16)<br />

{<br />

printf("Only bases 2-16 (decimal) supported\n");<br />

}<br />

else<br />

{<br />

out_base = num;<br />

}<br />

break;<br />

Stephan Schulz 219


case ’i’:<br />

POP_OR_FAIL(num);<br />

if(num < 2 || num >16)<br />

{<br />

printf("Only bases 2-16 (decimal) supported\n");<br />

}<br />

else<br />

{<br />

in_base = num;<br />

}<br />

break;<br />

Stephan Schulz 220


case ’S’:<br />

for(i=0; i


Manual Compilation<br />

First, we comile all <strong>of</strong> the source files individually:<br />

$ gcc -ansi -Wall -c -o chario.o chario.c<br />

$ gcc -ansi -Wall -c -o integerio.o integerio.c<br />

$ gcc -ansi -Wall -c -o rpn calc.o rpn calc.c<br />

Then we perform the linking step:<br />

$ gcc -ansi -Wall -o rpn calc chario.o integerio.o rpn calc.o<br />

Now the program is ready to run:<br />

$ ./rpn calc<br />

2 o 10 p<br />

1010<br />

Stephan Schulz 222


<strong>UNIX</strong> User Comm<strong>and</strong>s: dc<br />

dc is an arbitrary precision RPN calculator<br />

– It h<strong>and</strong>les floating point numbers (to any preselected precision)<br />

– It h<strong>and</strong>les bignums, i.e. integers tgat do not fit into any st<strong>and</strong>ard data type<br />

– It has a lot <strong>of</strong> build-in functionality <strong>and</strong> can be extended by user-defined macros<br />

Usage is quite similar to our rpn calc<br />

For more: man dc or (particularly) info dc (or read info in emacs: [C-h i])<br />

Stephan Schulz 223


Read the man <strong>and</strong> info pages for dc<br />

Play with the program<br />

Enjoy the weekend <strong>and</strong> be merry<br />

Exercises<br />

Note: I’ve updated the rpn calc sources on the web page to the latest version<br />

(changes only comments <strong>and</strong> style)<br />

Stephan Schulz 224


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

More on Operators <strong>and</strong> Expressions<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Increment <strong>and</strong> Decrement Operators<br />

C supports the unary operators ++ <strong>and</strong> -- for incrementing <strong>and</strong> decrementing<br />

variables<br />

– ++ increments a variable by 1<br />

– -- decrements a variable by 1<br />

Both can be used as prefix <strong>and</strong> postfix operators: x++ or ++x<br />

– In both cases, x is incremented by 1<br />

– The difference is in the value <strong>of</strong> the expression:<br />

∗ The expression x++ has the value <strong>of</strong> x before incrementing<br />

∗ ++x has the value <strong>of</strong> x after incrementing, i.e. it is equivalent to the<br />

assignment x=x+1<br />

Both forms are used, but the postfix form is more common<br />

Stephan Schulz 226


#include <br />

#include <br />

int main(void)<br />

{<br />

int x,y;<br />

}<br />

Example<br />

x=5; y=5;<br />

printf("x = %d y = %d\n", x, y);<br />

printf("x++ = %d ++y = %d\n", x++, ++y);<br />

printf("x = %d y = %d\n", x, y);<br />

printf("x-- = %d --y = %d\n", x--, --y);<br />

printf("x = %d y = %d\n", x, y);<br />

return EXIT_SUCCESS;<br />

Output:<br />

x = 5 y = 5<br />

x++ = 5 ++y = 6<br />

x = 6 y = 6<br />

x-- = 6 --y = 5<br />

x = 5 y = 5<br />

Stephan Schulz 227


Binary Number Representation<br />

C guarantees a base 2 representation for all unsigned integer types:<br />

– Example: 16 bit representation (short on many implementations) <strong>of</strong> 42<br />

0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0<br />

2 15<br />

2 14<br />

2 13<br />

2 12<br />

2 11<br />

2 10<br />

2 9<br />

42 = 2 5 + 2 3 + 2 1 = 32 + 8 + 2<br />

– If a result <strong>of</strong> an arithmetic operation results in a value not representable by the<br />

result type, it is reduced modulo 2 n , where n is the width <strong>of</strong> the result type<br />

An unsigned number <strong>of</strong> a narrower type is converted to a wider type by adding<br />

an appropiate number <strong>of</strong> leading zeroes:<br />

– The 8 bit representation (char on many implementations) <strong>of</strong> 42 is:<br />

0 0 1 0 1 0 1 0<br />

2 7<br />

2 6<br />

2 5<br />

2 4<br />

2 3<br />

2 2<br />

2 1<br />

2 0<br />

The exact representation for signed integers is not fixed, however, positive signed<br />

integers are guaranteed to have the same representation in signed <strong>and</strong> unsigned<br />

types<br />

Stephan Schulz 228<br />

2 8<br />

2 7<br />

2 6<br />

2 5<br />

2 4<br />

2 3<br />

2 2<br />

2 1<br />

2 0


Bitwise Operators<br />

Bitwise operators operate on the binary representation <strong>of</strong> numbers<br />

The binary bitwise operators include<br />

– Bitwise <strong>and</strong> (&) sets a bit in the result, if it is set in both oper<strong>and</strong>s:<br />

6 & 3 == 2<br />

– | is the bitwise or, i.e. the result bit is set, if at least one <strong>of</strong> the corresponding<br />

bits in the input is set:<br />

6 | 3 == 7<br />

– ^ is the bitwise exclusive or (or xor) (the result bit is set if <strong>and</strong> only if the two<br />

oper<strong>and</strong>s differ at that position):<br />

6 ^ 3 == 5<br />

The bitwise not (or one’s complement) toggles all bits<br />

– The result value depends on the number format<br />

– For 16 bit unsigned short, ~42 == 65493<br />

Stephan Schulz 229


Bitwise Shifting<br />

C also supports the shifting <strong>of</strong> binary numbers<br />

The binary operator shifts an integer value right<br />

– For unsigned value, the new bits become zero<br />

– For signed values, either zeroes are shifted in (logical shift), or the first (sign)<br />

bit is replicated (arithmetic shift, equivalent to division by 2 n )<br />

Note: The shift operators are used seldomly<br />

– C++ has even recycled them for I/O operations<br />

– Binary <strong>and</strong>, or, <strong>and</strong> not, on the other h<strong>and</strong>, are used frequently to manipulate<br />

binary flags packed into a single integer value<br />

Stephan Schulz 230


Example<br />

These macros can be used to set <strong>and</strong> query properties in a variable, where each<br />

property is encoded in a single bit<br />

#define SetProp(var, prop) ((var) = (var) | (prop))<br />

#define DelProp(var, prop) ((var) = (var) & ~(prop))<br />

#define FlipProp(var, prop) ((var) = (var) ^ (prop))<br />

/* Absolutely assign properties masked by sel */<br />

#define AssignProp(var, sel, prop) DelProp((var),(sel));\<br />

SetProp((var),(sel)&(prop))<br />

/* Are _all_ properties in prop set in var? */<br />

#define QueryProp(var, prop) (((var) & (prop)) == (prop))<br />

/* Are any properties in prop set in var? */<br />

#define IsAnyPropSet(var, prop) ((var) & (prop))<br />

Stephan Schulz 231


Assignment Operators<br />

Very frequently, programming tasks require the updating <strong>of</strong> a varible, based on<br />

it’s old value<br />

– Frequent example: i=i+1;<br />

In addition to the general assignment operator, C <strong>of</strong>fers operators combining<br />

update <strong>and</strong> assignment<br />

– If is a binary operator, then = is the corresponding assignment<br />

operator<br />

– x = is equivalent to x = x <br />

– This is supported for ∈ { +, -, *, /, %, , &, ^, | }<br />

Most frequently used<br />

– += (as in fahrenheit += 10)<br />

– -= (e.g. in the update part <strong>of</strong> a for loop)<br />

Stephan Schulz 232


Conditional Expressions<br />

Similarly to conditional statements (if/else), C has conditional expressions:<br />

– If , , are expressions, then ? : is<br />

a conditional expression<br />

∗ If evaluates to true (non-zero), then is evaluated <strong>and</strong> its value<br />

returned<br />

∗ Otherwise, is evaluated <strong>and</strong> returned<br />

Example 1:<br />

#define MAX(a,b) ((a>b)?a:b)<br />

Example 2:<br />

printf("There %s %d item%s left\n",<br />

(count==1)?"is":"are",<br />

count,<br />

(count==1)?"":"s");<br />

Stephan Schulz 233


Expression Sequences<br />

The coma operator separates two expressions: , <br />

– Expressions are evaluated left to right<br />

– The value <strong>of</strong> a coma-separated sequence is the value <strong>of</strong> the last expression in<br />

it<br />

– Don’t confuse it with the coma separating function call arguments!<br />

Nearly only legitimate use: Initialize <strong>and</strong> update in for loops:<br />

for(cels=0, fahr=-32; cels


Type Conversion (Casting)<br />

As already stated, C performs type conversion in many situations automatically<br />

– If different numeric types are used in an expression, all values are promoted to<br />

the “largest” type<br />

– If a value <strong>of</strong> an unsigned integer type is assigned to a “smaller” variable <strong>of</strong><br />

smaller type, excess bits are dropped<br />

– For signed types, conversion is only partially specified<br />

In addition, values can be coerced to a different type<br />

– A cast expression has the syntax () <br />

Example:<br />

printf("Int: %d Float: %d\n",<br />

(1/2)*2,<br />

(int) (((float)1/2)*2));<br />

Int: 0 Float: 1<br />

Stephan Schulz 235


Exercises<br />

Write a function that counts the number <strong>of</strong> bits that are one in an unsigned<br />

long number (Footnote: Allegedly the NSA sponsors the inclusion <strong>of</strong> hardware to<br />

make this operation fast in many chips because they need it for speeding up the<br />

cracking <strong>of</strong> encrypted documents)<br />

Rewrite imp metric to use comma-separated expressions to build the tables<br />

Stephan Schulz 236


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

Expressions <strong>and</strong> the Type System<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


A final operator is size<strong>of</strong><br />

Getting the Size <strong>of</strong> Objects <strong>and</strong> Types<br />

– size<strong>of</strong> can be applied to an expression or to a parenthesized type name<br />

– Applying it to an expression is equivalent to applying it to the type <strong>of</strong> the<br />

expression<br />

size<strong>of</strong> returns the number <strong>of</strong> character-sized memory units necessary to store<br />

an object <strong>of</strong> the type<br />

– By definition, size<strong>of</strong> (char) == 1<br />

Example:<br />

printf("size<strong>of</strong> 1: %d size<strong>of</strong> (short)1: %d\n", size<strong>of</strong> 1, size<strong>of</strong> ((short)1));<br />

size<strong>of</strong> 1: 4 size<strong>of</strong> (short)1: 2<br />

Note: size<strong>of</strong> will be useful for dynamic memory h<strong>and</strong>ling<br />

Stephan Schulz 238


Order Of Execution<br />

In general, the order <strong>of</strong> execution <strong>of</strong> subexpressions is not defined!<br />

Exceptions:<br />

– &&, ||, ?:, <strong>and</strong> ,<br />

If you need a particular order <strong>of</strong> execution, you must force it<br />

– Since statements are executed sequentially, compute subexpression in separate<br />

statments (assigning them to different variables)<br />

– Other sequence points are set by the operators listed above<br />

The example on the next page may print One Two One Two or Two One One<br />

Two<br />

Stephan Schulz 239


#include <br />

#include <br />

int one(void)<br />

{<br />

printf("One ");<br />

return 1;<br />

}<br />

int two(void)<br />

{<br />

printf("Two ");<br />

return 1;<br />

}<br />

int main(void)<br />

{<br />

one()+two();<br />

one()&&two();<br />

printf("\n");<br />

return EXIT_SUCCESS;<br />

}<br />

Example<br />

Stephan Schulz 240


Types in C<br />

C <strong>of</strong>fers a set <strong>of</strong> basic types built into the language<br />

We can define new, quasi-basic types as enumerations<br />

We can construct new types using type contruction:<br />

– Arrays over a base type<br />

– Structures, combining different base types in one object<br />

– Unions (can store different type values alternatively)<br />

– Pointer to a base type<br />

This generates a recursive type hierarchy!<br />

– We can use new types to build further on them<br />

– E.g. Arrays <strong>of</strong> Pointers, Structures combining unions <strong>and</strong> enumerations, . . .<br />

Stephan Schulz 241


Basic types in C:<br />

Basic Types<br />

– char (typically used to represent characters)<br />

– short<br />

– int<br />

– long<br />

– long long<br />

– float<br />

– double<br />

All integer types come in <strong>and</strong> unsigned variety<br />

Stephan Schulz 242


Defining New Types with typedef<br />

The typedef keyword is used to define new names for types in C<br />

General syntax: If we add typedef to a variable definition, it turns into a type<br />

definition<br />

Examples:<br />

unsigned long ulong; /* Define variable */<br />

typedef long ulong_t; /* Define a new type ulong_t */<br />

ulong_t ulong1; /* Define variable <strong>of</strong> new type */<br />

char string[80]; /* Defining an array variable *<br />

typedef char string_t[80]; /* Define a string type */<br />

string_t string1; /* Define a variable <strong>of</strong> that type -- we can use<br />

string1[32] now */<br />

Stephan Schulz 243


Symbolic Names in the Data Safe Assignement<br />

The data safe assignement calls for a function data safe() with three arguments<br />

– The first argument is a symbolic method: ds register, ds store,<br />

ds retrieve, ds delete<br />

– We can implement this using a int argument <strong>and</strong> #define:<br />

#define ds_register 1<br />

#define ds_store 2<br />

#define ds_retrieve 3<br />

#define ds_delete 4<br />

Problems:<br />

int data_safe(int method, int key, int value_or_index);<br />

– Nothing in the declaration <strong>of</strong> data safe() tells us that the int is anything<br />

but a number<br />

– The #define statements are independent<br />

Wouldn’t it be nice to create a new type to reflect the intended use?<br />

Stephan Schulz 244


Enumerations in C<br />

Enumeration data types can represent values from a finite domain using symbolic<br />

names<br />

– The possible values are explictly listed in the definition <strong>of</strong> the data type<br />

– Typically, each value can be used in only one enumeration<br />

In C, enumerations are created using the enum keyword<br />

In C, enumeration types are integer types<br />

– A definition <strong>of</strong> an enumeration type just assigns numerical values to the<br />

symbolic name<br />

– Unless explicitely chosen otherwise, the symbolic names are numbered starting<br />

at 0, <strong>and</strong> increasing by one for each name<br />

– Jowever, any int value can be assigned to a variable <strong>of</strong> an enumeration type<br />

– Likewise, we can assing any enumeration constant to any integer type variable<br />

C enumerations have only mnemonic value, they do not enable the compiler to<br />

catch bugs resulting from mixing up different types<br />

Stephan Schulz 245


Enumeration Syntax<br />

An enumeration type is defined by the enum keyword, followed by a list <strong>of</strong><br />

identifiers (enumeration constants) in curly brackets<br />

The following code describes an enumeration data type for the data safe methods:<br />

enum{ds_register, ds_store, ds_retrieve, ds_delete}<br />

It can be used like any other type specifier:<br />

int data_safe(enum{ds_register, ds_store, ds_retrieve, ds_delete}method,<br />

int key, int value_or_index);<br />

...<br />

key = data_safe(ds_register, 0, 0);<br />

Stephan Schulz 246


enum <strong>and</strong> typedef<br />

Typically, enumeration data type are used to define new types<br />

– The enum keyword describes the new type<br />

– The typedef keyword assigns a name to the type<br />

– The new type can then be used consistently throughout the program<br />

Example:<br />

typedef enum{ds_register, ds_store, ds_retrieve, ds_delete}DS_operation;<br />

int data_safe(DS_operation method, int key, int value_or_index);<br />

...<br />

key = data_safe(ds_register, 0, 0);<br />

Typically, enumerations (<strong>and</strong> other new data types) are declared in header files<br />

(.h files), <strong>and</strong> form part <strong>of</strong> the interface <strong>of</strong> a module<br />

Stephan Schulz 247


More on Enumerations<br />

Since enumeration are actually integer types, we can assign specific values to the<br />

constants<br />

– We can even assign the same value to different constants!<br />

Example (also note preferred form <strong>of</strong> formatting for enums):<br />

typedef enum<br />

{<br />

ds_register = 1,<br />

ds_store = 2,<br />

ds_retrieve = 3,<br />

ds_delete = 4,<br />

ds_forget = 4<br />

}DS_operation;<br />

Stephan Schulz 248


Aggregating Data Types<br />

Let’s again look at the data safe assignment<br />

– We somehow have to associate a key <strong>and</strong> a value (or multiple values)<br />

– Simple approach: Use two arrays, one for keys, one for values<br />

– If keys[i] = key, then values[i] holds a value associated with key<br />

However, the association between those two elements is not reflected by this<br />

construction<br />

– The two arrays are independent<br />

– They can be manipulated independently<br />

– There is not even a guaranty that both arrays have the same size!<br />

– If we pass key <strong>and</strong> value to a function, we have to pass them as individual<br />

elements (what if we have 132 different elements?)<br />

Solution: Creating structures that combine different elements into one<br />

Stephan Schulz 249


struct<br />

A structure is a datatype that may have any number <strong>of</strong> members<br />

– Members can have different types<br />

– Members can have any other type (including arrays or other structures)<br />

– Members are referred to by their name in the structure<br />

Java analogy: A structure type is a class, but:<br />

– No member functions<br />

– All members are public<br />

Structures are defined using the struct keyword, followed by an optional name<br />

<strong>and</strong> a list <strong>of</strong> member definitions in curly braces<br />

– Each member definition is a normal variable definition, giving type <strong>and</strong> name<br />

<strong>of</strong> the member<br />

Stephan Schulz 250


Consider the following definition:<br />

Structure Example<br />

struct key_assoc {int key; int value;} key_pair;<br />

– It creates a variable key pair with two members<br />

– They can be referred to by name:<br />

key_pair.value = 10;<br />

...<br />

if(key_pair.key == user_key)<br />

{<br />

count++;<br />

}<br />

Stephan Schulz 251


stuct <strong>and</strong> typedef<br />

As with enumerations, structures are usually used with typedef:<br />

typedef struct key_assoc<br />

{<br />

int key;<br />

int value;<br />

} key_pair_t;<br />

static key_pair_t key_value_array[10000];<br />

– The first definition defines a new type, key pair t<br />

– The second one creates an array <strong>of</strong> 10000 <strong>of</strong> these pairs<br />

Using the name (struct key assoc), we can refer to the array even before we<br />

have seen the full definition<br />

– Important for self-referential data types using pointers<br />

Stephan Schulz 252


Exercises<br />

Create a function that has two primary colours (red, blue, yellow) as input, <strong>and</strong><br />

returns the colour that results from mixing them<br />

– Use an enumeration type for the colours<br />

– Use an struct to hold triples (colour1, colour2, mix) <strong>and</strong> an array to store all<br />

associations<br />

– You can use linear search to find matching patterns for your input<br />

Stephan Schulz 253


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

Data Structures <strong>and</strong> Pointers<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Assume the following problem:<br />

Representing Related Objects<br />

– In a drawing program, we need to represent geometrical shapes (circles, squares,<br />

rectangles, triangles...)<br />

– There is some common information for all shapes:<br />

∗ Border colour<br />

∗ Line width<br />

∗ Fill colour (if any)<br />

– However, the coordinates are different for each shape:<br />

∗ For a circle, we need center point <strong>and</strong> radius<br />

∗ For a square or rectangle we need two corners<br />

∗ For a triangle we need three corners<br />

Object-oriented languages allow a base class shape, <strong>and</strong> derived classes for the<br />

different shapes<br />

– In C, we have to program this explicitely, using unions!<br />

Stephan Schulz 255


Unions<br />

Unions <strong>of</strong> base types allow the new type to store one value <strong>of</strong> any <strong>of</strong> its base<br />

types (but only one at a time)<br />

The syntax is analogous to that <strong>of</strong> structures:<br />

– The keyword union is followed by a list <strong>of</strong> member definitions in curly braces<br />

Example<br />

– union {int i; float f; char *str;} numval<br />

– numval can store either an integer or a floating point number, or a pointer to<br />

a character (normally a string)<br />

– Access is as for structures: numval.i is the integer value<br />

Note: Unions weaken the type system:<br />

– numval.f=1.0; printf("%d\n",numval.i);<br />

– Situations like that are, in general, impossible to detect at compile time<br />

Stephan Schulz 256


typedef enum<br />

{<br />

circle,<br />

square,<br />

rectangle,<br />

triangle<br />

}ShapeType;<br />

typedef enum<br />

{<br />

red,<br />

green,<br />

blue,<br />

black, white<br />

}ColourType<br />

typedef struct<br />

{<br />

int center_x;<br />

int center_y;<br />

int radius;<br />

}CircleCoord;<br />

Shape Example Continued (1)<br />

Stephan Schulz 257


typedef struct<br />

{<br />

int lower_left_x;<br />

int lower_left_y;<br />

int upper_right_x;<br />

int upper_right_y;<br />

}RectangleCoord;<br />

typedef RectangleCoord SquareCoord;<br />

typedef struct<br />

{<br />

int point1_x;<br />

int point1_y;<br />

int point2_x;<br />

int point2_y;<br />

int point3_x;<br />

int point4_y;<br />

}TriangleCoord;<br />

Shape Example Continued (2)<br />

Stephan Schulz 258


typedef union<br />

{<br />

CircleCoord circle_coord;<br />

RectangleCoord rect_coord;<br />

SquareCoord square_coord;<br />

TriangleCoord tria_coord;<br />

}ShapeCoord;<br />

typedef struct<br />

{<br />

ShapeType type;<br />

int border_width;<br />

ColourType border_colour;<br />

ColourType fill_colour;<br />

ShapeCoord coords;<br />

}Shape;<br />

Shape Example Continued (3)<br />

Stephan Schulz 259


Shape Example Continued (4)<br />

void draw_shape(Shape draw_obj)<br />

{<br />

switch(draw_obj.type)<br />

{<br />

case circle:<br />

draw_circle(draw_obj.coords.circle_coord.center_x,<br />

draw_obj.coords.circle_coord.center_y,<br />

draw_obj.coords.circle_coord.radius,<br />

draw_obj.border_width,<br />

draw_obj.border_colour,<br />

draw_obj.fill_colour);<br />

break;<br />

case square:<br />

draw_square(draw_obj.coords.square_coord.lower_left_x,<br />

draw_obj.coords.square_coord.lower_left_y,<br />

draw_obj.coords.square_coord.upper_right_x,<br />

draw_obj.coords.square_coord.upper_right_y,<br />

draw_obj.border_width,<br />

draw_obj.border_colour,<br />

draw_obj.fill_colour);<br />

break;<br />

...<br />

Stephan Schulz 260


Pointers<br />

Pointers are derived types <strong>of</strong> a base type<br />

– A pointer is the memory address <strong>of</strong> an object <strong>of</strong> the base type<br />

– Given a pointer, we can manipulate the object pointed to<br />

Notice that there are two parts to a pointer:<br />

– The actual memory address (a dynamic feature in the running program)<br />

– The type <strong>of</strong> the pointer (pointer to int, pointer to Shape. . . ) telling us how<br />

to interprete the data at that address (a static feature that can be determined<br />

at compile time)<br />

C uses the unary * to define variables <strong>of</strong> pointer types:<br />

– int *count; defines the variable count as a pointer to int<br />

– Notice that this pointer does not contain a valid address - there is no object<br />

<strong>of</strong> type int created along with the pointer!<br />

– Pointers can be defined for any valid type in C: struct{double real;double<br />

imag;} *complex defines complex as a pointer to the struct<br />

Stephan Schulz 261


Basic Pointer Operations in C<br />

The most basic operations on pointers are:<br />

– Given an object, return a pointer to it<br />

– Given a pointer, give the object it points to (dereference the pointer)<br />

C uses the unary * operator for both pointer definition <strong>and</strong> pointer dereferencing,<br />

<strong>and</strong> & for getting the adress <strong>of</strong> an existing object<br />

– int var;int *p; defines var to be a variable <strong>of</strong> type int <strong>and</strong> p to be a<br />

variable <strong>of</strong> type pointer to int<br />

– p = &var makes p point to var (i.e. p now stores the address <strong>of</strong> var)<br />

– *p = 17; assigns 17 to the int object that p points to (in our example, it<br />

would set var to 17)<br />

– Note that &(*p) == p always is true for a pointer variable pointing to a valid<br />

object, as is *(&var)==var for an arbitrary variable!<br />

Stephan Schulz 262


#include <br />

#include <br />

void swap(int *x, int *y)<br />

{<br />

int z;<br />

}<br />

z =*x;<br />

*x =*y;<br />

*y = z;<br />

int main(void)<br />

{<br />

int var1=7, var2=42;<br />

}<br />

Pointers - A simple Example<br />

printf("var1: %d var2: %d\n", var1, var2);<br />

swap(&var1, &var2);<br />

printf("var1: %d var2: %d\n", var1, var2);<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 263


Output <strong>of</strong> the program:<br />

var1: 7 var2: 42<br />

var1: 42 var2: 7<br />

Example Continued<br />

Note that this technique is an example <strong>of</strong> a frequent way to simulate call by<br />

reference in C<br />

– Instead <strong>of</strong> passing an object, we pass a reference to it<br />

– Allows changes to the object inside the function<br />

– Often cheaper (especially for big objects)<br />

Stephan Schulz 264


Why Pointers?<br />

The are two main reasons for using pointers:<br />

– Efficiency<br />

– Dynamically growing data structures<br />

Efficiency Aspects<br />

– Pointers are typically represented by one machine word<br />

– Storing pointers instead <strong>of</strong> copies <strong>of</strong> large objects safes memories<br />

– Passing pointers instead <strong>of</strong> large objects is much more efficient<br />

Dynamically growing data structures<br />

– Each data type has a fixed size <strong>and</strong> memory layout<br />

– Pointers allow us to build dynamically growing data structures by adding <strong>and</strong><br />

removing fixed size cells<br />

Stephan Schulz 265


Pointing at Nothing <strong>and</strong> Pointing Nowhere<br />

Pointers <strong>of</strong> type void* are a special case:<br />

– A void* pointer is a generic pointer, without associated base type<br />

– void* pointers can be assigned to variables <strong>of</strong> any other pointer type (<strong>and</strong><br />

vice versa)<br />

– Such pointers are used for primarily for dynamic memory h<strong>and</strong>ling<br />

C has a special, reserved NULL pointer <strong>of</strong> type void*<br />

– The NULL pointer is guranteed to be different from all pointers pointing to<br />

legitimate objects<br />

– It can be written as plain 0 (in a pointer context)<br />

– stdlib.h defines a symbolic namen, NULL, for the NULL pointer<br />

– Dereferencing NULL is illegal!<br />

– Notice that NULL is considered to be false if used in logical expressions<br />

– Note: For most current machines, the NULL pointer actually is address 0.<br />

However, this is not guaranteed (<strong>and</strong> is false for some machines with strange<br />

memory models)<br />

Stephan Schulz 266


Exercises<br />

Write a program that prints the sizes <strong>of</strong> various build-in <strong>and</strong> self-defined data<br />

types (e.g. the Shape type <strong>and</strong> its subtypes). Do you see a relation between<br />

them?<br />

Write a program that uses swap() to sort an array <strong>of</strong> integers <strong>and</strong> print it. If<br />

you feel adventurous, use read int base() from the rpn calc example (or a<br />

similar function) to read integers to fill the array<br />

Notes<br />

Please email the TA, Raghu, at his UMiami address,raghu@lee.cs.miami.edu<br />

from now on<br />

Your grades for the assignments will be placed into your home directories<br />

Solutions to the prime number assignment will be available shortly after noon<br />

Stephan Schulz 267


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

Dynamic Data Structures<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Refresher: Pointers<br />

A pointer type is a derived type <strong>of</strong> a base type<br />

– A pointer is the address <strong>of</strong> an object <strong>of</strong> the base type<br />

– Given a pointer p, *p gives us the object it points to<br />

– Given an object o, &o gives us a pointer to that object in memory<br />

An object <strong>of</strong> type void* is a generic pointer (i.e. a plain address without<br />

associated base type)<br />

– A pointer <strong>of</strong> type void* can be assigned to a variable <strong>of</strong> any other pointer<br />

type<br />

– Similarly, a value <strong>of</strong> any pointer type can be assigned to a void* variable<br />

The special value NULL is a pointer <strong>of</strong> type void*<br />

– It is guaranteed different from all pointers to valid object<br />

– Its logical value is false, while that <strong>of</strong> all other pointers is true<br />

Stephan Schulz 269


Dynamic Memory H<strong>and</strong>ling<br />

The C library <strong>of</strong>fers functions for dynamic memory h<strong>and</strong>ling<br />

– We can request a block <strong>of</strong> memory <strong>of</strong> a certain size<br />

– If such a block is available, we will get a void* pointer to it<br />

– This block can be used to store any object that fits into it<br />

– If we do not need that object anymore, we can return it to the library<br />

Such blocks can be used to build arbitray sized data structures<br />

– . . . e.g. by allocating bigger <strong>and</strong> bigger arrays if the need arrises<br />

– . . . or by using pointers within a structure to point to additional structures<br />

(which may contain further pointers)<br />

Stephan Schulz 270


The malloc() function<br />

We request a block <strong>of</strong> memory using malloc() (declared in )<br />

– It’s declared as void *malloc(size t size);, i.e. it returns a generic<br />

pointer<br />

– size t is a new data type from the st<strong>and</strong>ard library. It’s guaranteed to be an<br />

unsigned integer data type (<strong>of</strong>ten unsigned int)<br />

– malloc() allocates a region big enough to hold the requested number <strong>of</strong> bytes<br />

on the heap (a reserved memory region) <strong>and</strong> returns the address <strong>of</strong> the first<br />

byte (a pointer to that region)<br />

– The size<strong>of</strong> operator is used to get the necessary size for the object datatype<br />

- p = malloc(size<strong>of</strong>(int)); allocates a memory region big enough to store<br />

an integer <strong>and</strong> makes p point to it<br />

- The void* pointer is silently converted to a pointer to int<br />

– If no memory is available on the heap, malloc() will return the NULL pointer<br />

(also written as plain 0)<br />

Stephan Schulz 271


Freeing Allocated Memory<br />

The counterpart to malloc() is free()<br />

– It is declared in as<br />

void free(void* ptr);<br />

– free() takes a pointer allocated with malloc() <strong>and</strong> returns the memory to<br />

the heap<br />

Note that it is a bug to call free() with a pointer not obtained by calling<br />

malloc() (i.e. a pointer generated by applying & to a variable)<br />

It also is a bug to call free() with the same pointer more than once<br />

Stephan Schulz 272


More on Dynamic Memory Allocation<br />

Good programming practice always checks if malloc() succeeded (i.e. returns<br />

not NULL)<br />

– In multi-tasking systems, even small allocations may fail, because other processes<br />

consume resources<br />

– The OS may limit memory usage to small values<br />

– Failing to implement that chack can lead to erratic <strong>and</strong> non-reproducable<br />

failure!<br />

Similarly, each call to malloc() should (eventually) be followed by a call to<br />

free() for the pointer obtained<br />

– If you do not know if you still need a piece <strong>of</strong> memory, or if a pointer still<br />

points somewhere, you are in deep trouble, anyways!<br />

– By consequently freeing all allocated memory, you can easily check if you<br />

return the same number <strong>of</strong> block you allocate!<br />

Stephan Schulz 273


Dangling pointers<br />

Pointers are a Mixed Blessing!<br />

– A dangling pointer is a pointer not pointing to a valid object<br />

– A call to free() leaves the pointer dangling (the pointer variable still holds<br />

the adress <strong>of</strong> a block <strong>of</strong> memory, but we are no longer allowed to use it)<br />

– Copying a pointer may also lead to additional dangling pointer if we call<br />

free() on one <strong>of</strong> the copies<br />

– Trying to access a dangling pointer typcially causes hard to find errors, including<br />

crashes<br />

Memory leaks<br />

– A memory leak is a situation where we lose the reference to an allocated piece<br />

<strong>of</strong> memory:<br />

p = malloc(100000 * size<strong>of</strong>(int));<br />

p = NULL; /* We just lost a huge gob <strong>of</strong> memory! */<br />

– Memory leaks can cause programs to eventually run out <strong>of</strong> memory<br />

– Periodically occurring leaks are catastophic for server programs!<br />

Stephan Schulz 274


Example: SecureMalloc()<br />

Note: In my programs, there is typically at most a single call to malloc():<br />

void* SecureMalloc(size_t size)<br />

{<br />

void* res = malloc(size);<br />

}<br />

if(!res)<br />

{<br />

printf("malloc() failure -- out <strong>of</strong> memory?");<br />

exit(EXIT_FAILURE);<br />

}<br />

return res;<br />

Stephan Schulz 275


Pointers <strong>and</strong> Structures/Unions<br />

Most interesting data strucures use pointers to structures<br />

– Examples: Linear lists (see below), binary trees, terms,. . .<br />

Most frequent operation: Given a pointer, access one <strong>of</strong> the elements <strong>of</strong> the<br />

structure (or union) pointed to<br />

– (*list).value = 0;<br />

– Note that that requires parentheses in C<br />

More intuitive alternative:<br />

– The -> operator combines dereferencing <strong>and</strong> selection<br />

– list->value = 0;<br />

– This is the preferred form (<strong>and</strong> seen nearly exclusively in many programs)<br />

Stephan Schulz 276


Example: Linear Lists (<strong>of</strong> Integers)<br />

A list over a can be recursively defined as follows:<br />

– The empty list is a list<br />

– If l is a list <strong>and</strong> e is an element <strong>of</strong> the base type, then e . l is a list<br />

We can represent that in C as follows:<br />

– The empty list is represented by the NULL pointer<br />

– A non-empty list is represented by a pointer to a struct containing the<br />

element <strong>and</strong> a pointer to the rest <strong>of</strong> a list<br />

Some list operations:<br />

– Insert an element as the first element<br />

– Insert an element as the last element<br />

– Print the list elements in order<br />

– Free the memory taken up by a list<br />

Stephan Schulz 277


Example Continued<br />

Graphical representation <strong>of</strong> the list structure for (7,9,13):<br />

7 9 13<br />

Notice the anchor <strong>of</strong> the list<br />

Stephan Schulz 278<br />

NULL


#ifndef INT_LISTS<br />

#define INT_LISTS<br />

#include <br />

#include <br />

typedef struct int_list_cell<br />

{<br />

int value;<br />

struct int_list_cell *next;<br />

}IntListCell;<br />

typedef IntListCell *IntList_p;<br />

void* SecureMalloc(size_t size);<br />

Example – Declarations<br />

void IntListInsertFirst(IntList_p *list, int new_val);<br />

void IntListInsertLast(IntList_p *list, int new_val);<br />

void IntListFree(IntList_p list);<br />

void IntListPrint(IntList_p list);<br />

#endif<br />

Stephan Schulz 279


Example – Inserting At the Front<br />

/* Insert a new integer as the first element <strong>of</strong> an integer list */<br />

void IntListInsertFirst(IntList_p *list, int new_val)<br />

{<br />

IntList_p h<strong>and</strong>le;<br />

}<br />

h<strong>and</strong>le = SecureMalloc(size<strong>of</strong> (IntListCell));<br />

h<strong>and</strong>le->value = new_val;<br />

h<strong>and</strong>le->next = *list;<br />

*list = h<strong>and</strong>le;<br />

Stephan Schulz 280


Example – Inserting At the End<br />

/* Insert a new integer as the last element <strong>of</strong> an integer list */<br />

void IntListInsertLast(IntList_p *list, int new_val)<br />

{<br />

IntList_p h<strong>and</strong>le, last;<br />

}<br />

h<strong>and</strong>le = SecureMalloc(size<strong>of</strong> (IntListCell));<br />

h<strong>and</strong>le->value = new_val;<br />

h<strong>and</strong>le->next = NULL;<br />

if(!*list)<br />

{<br />

*list = h<strong>and</strong>le;<br />

}<br />

else<br />

{<br />

last = find_last_element(*list);<br />

last->next = h<strong>and</strong>le;<br />

}<br />

Stephan Schulz 281


Example – Helper Function<br />

//* Helper function: Given a non-empty list, return last element */<br />

IntList_p find_last_element(IntList_p list)<br />

{<br />

if(list->next)<br />

{<br />

return find_last_element(list->next);<br />

}<br />

return list;<br />

}<br />

Stephan Schulz 282


Example – Freeing Lists<br />

/* Free the memory taken up by a list */<br />

void IntListFree(IntList_p list)<br />

{<br />

if(list)<br />

{<br />

IntListFree(list->next); /* Free rest */<br />

free(list); /* Free this cell */<br />

}<br />

}<br />

Stephan Schulz 283


Example – Printing Lists<br />

/* Print a list as a sequence <strong>of</strong> numbers */<br />

void IntListPrint(IntList_p list)<br />

{<br />

IntList_p h<strong>and</strong>le;<br />

}<br />

for(h<strong>and</strong>le = list; h<strong>and</strong>le; h<strong>and</strong>le = h<strong>and</strong>le->next)<br />

{<br />

printf("%d ", h<strong>and</strong>le->value);<br />

}<br />

putchar(’\n’);<br />

Stephan Schulz 284


Example – Main Function<br />

int main(void)<br />

{<br />

int value;<br />

IntList_p list1 = NULL, list2 = NULL;<br />

}<br />

SkipSpace();<br />

while(int_available(10))<br />

{<br />

value = read_int_base(10);<br />

IntListInsertFirst(&list1, value);<br />

IntListInsertLast(&list2, value);<br />

SkipSpace();<br />

}<br />

printf("List1: ");<br />

IntListPrint(list1);<br />

printf("List2: ");<br />

IntListPrint(list2);<br />

IntListFree(list1);<br />

IntListFree(list2);<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 285


Assignment<br />

A binary search tree is either empty, or it consist <strong>of</strong> a node storing a key (the root<br />

<strong>of</strong> the tree), <strong>and</strong> a left <strong>and</strong> right subtree, such that all keys in the left subtree<br />

are smaller than the key in the node, <strong>and</strong> all keys in the right subtree are bigger<br />

– To print a tree in (left-to-right) preorder, you first print the root, then the left<br />

subtree, then the right subtree<br />

– To print a tree in (left-to-right) postorder, you first print the left subtree, then<br />

the right subtree, then the root<br />

– To print a tree in natural order, you first print the left tree, then the root, then<br />

the right tree<br />

Design a data structure for binary search trees with int keys, using dynamic<br />

memory h<strong>and</strong>ling<br />

Implement functions to:<br />

– Insert keys into the tree (ignoring keys already in the tree)<br />

– Print a tree in preorder, natural order, <strong>and</strong> postorder<br />

– Free the memory taken up by the tree<br />

Stephan Schulz 286


Use this datatype <strong>and</strong> the functions from integerio to write a program that<br />

reads a list <strong>of</strong> integers from stdin into a tree, <strong>and</strong> prints that tree in the three<br />

different orders<br />

You can use the code from the linear list example as a base. The complete code<br />

will be available from the course homepage<br />

Stephan Schulz 287


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

Pointers <strong>and</strong> Arrays<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Monday, Oct. 14th, 11:00 – 11:50<br />

Topics: Everything we did so far<br />

– <strong>UNIX</strong> file system layout<br />

– Simple <strong>UNIX</strong> utilities<br />

– Job Control<br />

– Basic C<br />

– Compilation <strong>and</strong> the preprocessor<br />

– C flow control <strong>and</strong> functions<br />

– Data structures in C<br />

– Pointers<br />

Midterm Examn<br />

Friday we will refresh some <strong>of</strong> that stuff (but do reread the lecture notes yourself,<br />

<strong>and</strong> check the example solutions on the web)<br />

Stephan Schulz 289


Refresher: Pointers<br />

A pointer type is a derived type <strong>of</strong> a base type<br />

– A pointer is the address <strong>of</strong> an object <strong>of</strong> the base type<br />

– Given a pointer p, *p gives us the object it points to<br />

– Given an object o, &o gives us a pointer to that object in memory<br />

An object <strong>of</strong> type void* is a generic pointer (i.e. a plain address without<br />

associated base type)<br />

– A pointer <strong>of</strong> type void* can be assigned to a variable <strong>of</strong> any other pointer<br />

type<br />

– Similarly, a value <strong>of</strong> any pointer type can be assigned to a void* variable<br />

The special value NULL is a pointer <strong>of</strong> type void*<br />

– It is guaranteed different from all pointers to valid object<br />

– Its logical value is false, while that <strong>of</strong> all other pointers is true<br />

Stephan Schulz 290


Refresher: Dynamic Memory H<strong>and</strong>ling<br />

void* malloc(size t size); is a function from <br />

– It will return a pointer to an otherwise unused block <strong>of</strong> memory with at least<br />

size bytes (or NULL if no memory is available)<br />

– Typical use: int *p = malloc(size<strong>of</strong>(int));<br />

void free(void* ptr); is the counterpart to malloc()<br />

– It takes a pointer to a block allocated with malloc() <strong>and</strong> returns the block<br />

to the heap<br />

– It is a (usually fatal) bug to call free() more than once for the same block,<br />

or with a pointer not obtained from malloc()<br />

Very frequent case: Allocation <strong>of</strong> memory for structs<br />

– Accessing elements in a struct: (*list).value = 0;<br />

– More readable alternative: list->value = 0;<br />

Stephan Schulz 291


Pointers <strong>and</strong> Arrays in C<br />

In C, arrays <strong>and</strong> pointers are strongly related:<br />

– Everwhere except in a definition <strong>and</strong> the left h<strong>and</strong> side <strong>of</strong> an assignment, an<br />

array is equivalent to a pointer to its first element<br />

– In particular, arrays are passed to functions by passing their address!<br />

– More exactly: An array degenerates to a pointer if passed or used in pointer<br />

contexts<br />

Not only can we treat arrays as pointers, we can also apply array operations to<br />

pointers:<br />

– If p is a pointer to the first element <strong>of</strong> an array, we can use p[3] to access<br />

the third element <strong>of</strong> that array<br />

– In general, if p points to some memory address corresponding to an array<br />

element a[j], p[i] points to a[j+i]<br />

Stephan Schulz 292


int array[10];<br />

int *a, *b;<br />

a = array;<br />

b = &(array[0]);<br />

array[0] = 10;<br />

a[1] = 11;<br />

b[3] = *a;<br />

Graphic Example<br />

...<br />

...<br />

...<br />

10<br />

11<br />

10<br />

array[0]<br />

array[9]<br />

Stephan Schulz 293<br />

a<br />

b


#include <br />

#include <br />

int main(void)<br />

{<br />

char a[] = "<strong>CSC322</strong>\n";<br />

char *b;<br />

int i;<br />

}<br />

b=a;<br />

Example<br />

printf(b);<br />

for(i=0;b[i];i++)<br />

{<br />

printf("Character %d: %c\n", i, b[i]);<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 294


Compiling: gcc -o csc322 csc322.c<br />

Running:<br />

<strong>CSC322</strong><br />

Character 0: C<br />

Character 1: S<br />

Character 2: C<br />

Character 3: 3<br />

Character 4: 2<br />

Character 5: 2<br />

Character 6:<br />

Example Output<br />

Stephan Schulz 295


Parameter Passing in C<br />

In C, parameters to functions are always passed by value<br />

– The formal parameter (in the function) is a local variable<br />

– It is initialized to the value <strong>of</strong> the actual parameter (the expression we used in<br />

the function call)<br />

– Changing the local variable in the function does not change the formal<br />

parameter<br />

Arrays degenerate into pointers to the first element, however!<br />

– That pointer is still passed by value, however, in effect the array is passed by<br />

reference<br />

– We can thus change the array elements from inside the function!<br />

This is frequently used for efficient array manipulation!<br />

– Sorting arrays<br />

– Reading elements into an array from stdin<br />

– Applying a transformation to all elements<br />

Stephan Schulz 296


#include <br />

#include <br />

#include <br />

void upcase(char *string)<br />

{<br />

int i;<br />

for(i=0; string[i]; i++)<br />

{<br />

string[i] = toupper(string[i]);<br />

}<br />

}<br />

int main(void)<br />

{<br />

char str[] = "A test string.";<br />

}<br />

printf("%s\n", str);<br />

upcase(str);<br />

printf("%s\n", str);<br />

return EXIT_SUCCESS;<br />

Example<br />

Stephan Schulz 297


A test string.<br />

A TEST STRING.<br />

Example Output<br />

Stephan Schulz 298


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Midterm Review<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


<strong>UNIX</strong> is a multi-user system<br />

<strong>UNIX</strong> Concepts<br />

– Users hava a user name, a numerical user id (e.g. 500), <strong>and</strong> a home directory<br />

– The privileged user root with UID 0 has (essentially) unlimited access<br />

<strong>UNIX</strong> is a multi-tasking system, i.e. it can run multiple programs at once. A<br />

running program (with its data) is called a process. Each process has:<br />

– Owner (a user)<br />

– Working directory (a place in the file system)<br />

– Various resources<br />

A shell is a comm<strong>and</strong> interpreter, i.e. a process accepting <strong>and</strong> executing comm<strong>and</strong>s<br />

from a user.<br />

– A shell is typically owned by the user using it<br />

– The initial working directory <strong>of</strong> a shell is typically the users home directory<br />

(but can be changed by comm<strong>and</strong>s)<br />

Stephan Schulz 300


The File System<br />

/<br />

(Root directory)<br />

bin dev etc home<br />

tmp usr<br />

(System programs) (Devices) (Configuration) (Home directories) (Temporary files) (User programs)<br />

cp ls ps hda hdb kbd passwd hosts joe jane schulz<br />

(Private files)<br />

core Desktop<br />

In <strong>UNIX</strong>, all files are organized in a single directory tree<br />

There are two main types <strong>of</strong> files:<br />

local lib bin<br />

(Site−installed) (Vendor) (Vendor)<br />

lib bin<br />

– Plain files (containing data)<br />

– Directories, containing both plain files (optionally) <strong>and</strong> other directories<br />

Stephan Schulz 301


Globbing<br />

Glob patterns describe sets <strong>of</strong> file names<br />

A string is a wildcard pattern if it contains one <strong>of</strong> ?, * or [<br />

A wildcard pattern exp<strong>and</strong>s into all file names matching it<br />

– A normal letter in a pattern matches itself<br />

– A ? in a pattern matches any one letter<br />

– A * in a pattern matches any string<br />

– A pattern [l1. . . ln] matches any one <strong>of</strong> the enclosed letters (exception: ! as<br />

the first letter)<br />

– A pattern [!l1. . . ln] matches any one <strong>of</strong> the characters not in the set<br />

– A leading . in a filename is never matched by anything except an explicit<br />

leading dot<br />

Important: Globbing is performed by the shell, not an application program!<br />

Stephan Schulz 302


Some Important <strong>UNIX</strong> Comm<strong>and</strong>s (1)<br />

Orientation <strong>and</strong> moving around<br />

– whoami<br />

– pwd – print working directory<br />

– cd – change directory<br />

– ls – list files (Important options: -a, -l)<br />

Operating on files<br />

– cat – concatenate <strong>and</strong> print files<br />

– less <strong>and</strong> more – print files page by page<br />

– touch – change access dates (or create empty files)<br />

– mv – move files<br />

– cp – copy files<br />

– rm – remove files<br />

– wc – count words (<strong>and</strong> lines <strong>and</strong> characters)<br />

Stephan Schulz 303


Working on Directories:<br />

Some Important <strong>UNIX</strong> Comm<strong>and</strong>s (2)<br />

– mkdir – make a new directory<br />

– rmdir – remove an empty directory<br />

Miscellanous<br />

– man – read the manual (-k: Search for keywords in the manual)<br />

– info – read info format documentation (also available through emacs<br />

– echo – Print arguments<br />

– grep – Search lines matching a regular expression<br />

Stephan Schulz 304


Input <strong>and</strong> Output Redirection, Piping<br />

The three st<strong>and</strong>ard <strong>UNIX</strong> IO channels are<br />

– stdin (St<strong>and</strong>ard Input)<br />

– stdout (St<strong>and</strong>ard Output)<br />

– stderr (Errors)<br />

Normal output redirection redirects stdout into a file:<br />

Input redirection makes stdin read from a file<br />

Piping connects one processes stdout to the stdin <strong>of</strong> another process<br />

cat > newfile # Read stdin, write to newfile<br />

cat < newfile # Read newfile, write to terminal<br />

cat > newfile < oldfile # Poor man’s copy<br />

cat newfile | wc # Count words in newfile<br />

Stephan Schulz 305


Process Control<br />

Processes started from the shell can be<br />

– Running or Suspended<br />

– In the foreground (accepting keyboard input) or in the background<br />

Simple process control:<br />

– Running a comm<strong>and</strong> followed by & starts it in the background (normally<br />

comm<strong>and</strong>s are executed in the foreground)<br />

– ^Z (Control-Z) will suspend a foreground process<br />

– ^C (Control-C) will terminate it<br />

– fg wakes a suspended process <strong>and</strong> puts it into the foreground<br />

– bg puts it into the background<br />

– kill can be used to terminate it<br />

– jobs prints a list <strong>of</strong> active processes started from a shell<br />

Stephan Schulz 306


C Compiling with gcc<br />

Programs consisting <strong>of</strong> a single .c file can be compiled in one step<br />

– gcc -o file file.c will compile file.c into an executable program file<br />

Multiple C files must be compiled <strong>and</strong> linked separately!<br />

– gcc -c -o file1.o file1.c compiles the file into an object (.o) file<br />

– gcc -o file file1.o file2.o... links the different object files together to form an<br />

executable<br />

Important gcc options:<br />

– -o : Give the name <strong>of</strong> the output file<br />

– -ansi: Compile strict ANSI-89 C only<br />

– -Wall: Warn about all dubious lines<br />

– -c: Don’t perform linking, just generate a (linkable) object file<br />

– -O – -O6: Use increasing levels <strong>of</strong> optimization to generate faster executables<br />

Stephan Schulz 307


C Datatypes<br />

The language <strong>of</strong>fers a set <strong>of</strong> basic types built into the language<br />

– char, short, int, long, long long<br />

– float, double<br />

– Integer data types come in signed <strong>and</strong> unsigned variety!<br />

We can define new, quasi-basic types as enumerations (enum)<br />

We can derive new types as follows:<br />

– Arrays over a base type ([])<br />

– Structures combining base types (struct)<br />

– Unions (able to store alternative types) (union)<br />

– Pointer to a base type (*)<br />

typedef is used to define named new types<br />

Stephan Schulz 308


if...else<br />

– Conditional execution<br />

switch<br />

Flow Control<br />

– Select between many alternatives, based on a single integer type variable<br />

– Remember fall through property <strong>and</strong> break;!<br />

while<br />

– Loop as long as a condition is true<br />

for<br />

– As while, but included initialization <strong>and</strong> update in a single statement<br />

Stephan Schulz 309


Functions<br />

Any C program is a collection <strong>of</strong> functions<br />

– There has to be exactly one function called main() in the program<br />

– Execution starts by a call to main() (executed by the OS)<br />

– A function definition consists <strong>of</strong> a header <strong>and</strong> a body<br />

The header consists <strong>of</strong>:<br />

– The return type <strong>of</strong> the function<br />

– The name <strong>of</strong> the function<br />

– A parenthesized list <strong>of</strong> formal arguments<br />

The body <strong>of</strong> the function is a sequence <strong>of</strong> declarations <strong>and</strong> statements<br />

– Execution <strong>of</strong> the function ends when a return statement is encountered or<br />

the end <strong>of</strong> the body is reaches<br />

– The argument <strong>of</strong> the return statement is the value returned from the function<br />

call<br />

Stephan Schulz 310


C Preprocessor<br />

The #include directive is used to include other files (the contents <strong>of</strong> the named<br />

file replaces the #include directive)<br />

The #define directive is used to define macros<br />

– Macros can simply define a textual constant<br />

– Macros can have formal arguments, which will be instanciated in the replacement<br />

text<br />

#if/#else/#endif is used for conditional compilaton<br />

– The controlling expression <strong>of</strong> the #if has to be a constant integer expression<br />

– Special case: #ifdef tests if a macro is defined<br />

– Special case: #ifndef tests if a macro is not defined<br />

Stephan Schulz 311


Reread the lecture notes<br />

Exercises<br />

Download the C examples from the Web<br />

– Read the code<br />

– Compile them by h<strong>and</strong><br />

– Run them<br />

Stephan Schulz 312


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

<strong>Programming</strong> in C<br />

Dynamic Arrays <strong>and</strong> Pointer Arithmetic<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Dynamically Allocated Arrays<br />

Since pointers <strong>and</strong> arrays can be used interchangably in many contexts, we can<br />

use malloc() to allocate arrays <strong>of</strong> whatever size we need!<br />

– The size <strong>of</strong> an array <strong>of</strong> n elements <strong>of</strong> type t is just n*size<strong>of</strong>(t)<br />

Applications:<br />

– We can allocate arrays in a function <strong>and</strong> return pointers to them (remember<br />

that local variables are destroyed when control leaves a function)<br />

– We can determine array size at run time<br />

– We can dynamically increase array size by:<br />

∗ Allocating a bigger array<br />

∗ Copying the old array into the initial part <strong>of</strong> the new array<br />

∗ Freeing the old array<br />

Stephan Schulz 314


#include <br />

#include <br />

#define BUF_SIZE 1024<br />

int main(void)<br />

{<br />

int c, count=0;<br />

char* buffer;<br />

}<br />

Example<br />

buffer = malloc(size<strong>of</strong>(char)*BUF_SIZE);/* Missing check! */<br />

while((c=getchar())!=EOF)<br />

{<br />

if(count == BUF_SIZE-1)<br />

{<br />

printf("Buffer full\n"); exit(EXIT_FAILURE);<br />

}<br />

buffer[count++] = c;<br />

}<br />

buffer[count] = ’\0’;<br />

printf("%s\n", buffer);<br />

free(buffer);<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 315


Changing Allocated Block Size: realloc()<br />

void* realloc(void* ptr, size t size); is defined in <br />

– It’s first argument is a pointer to a block <strong>of</strong> memory on the heap (obtained<br />

with malloc(), realloc(), or an equivalent function)<br />

– The second argument is a desired new size <strong>of</strong> the block<br />

– realloc() returns a pointer to a new block <strong>of</strong> memory, <strong>of</strong> the desired size (if<br />

available, otherwise NULL)<br />

– If realloc() is successfull, the initial part <strong>of</strong> the new block (up to the smaller<br />

<strong>of</strong> the two sizes) will be identical to the old block<br />

Special cases:<br />

– if ptr is NULL, realloc() is equivalent to malloc()<br />

– If size is NULL, realloc() is equivalent to free<br />

– As with malloc(), we always have to check the return value!<br />

Most common use: Increase the size <strong>of</strong> some array<br />

Stephan Schulz 316


Example: Growing the Buffer as Needed<br />

#include <br />

#include <br />

int main(void)<br />

{<br />

int c, count=0, size = 2;<br />

char* buffer;<br />

}<br />

buffer = malloc(size<strong>of</strong>(char)*size); /* Missing check! */<br />

while((c=getchar())!=EOF)<br />

{<br />

if(count == size - 1)<br />

{<br />

size = size * 2;<br />

buffer = realloc(buffer, size); /* Missing check! */<br />

}<br />

buffer[count++] = c;<br />

}<br />

buffer[count] = ’\0’;<br />

printf("%s\n", buffer);<br />

free(buffer);<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 317


Additional Pointer Properties<br />

Pointers <strong>of</strong> the same type can be compared using , =<br />

– The result is only defined, when both pointers point at elements in the<br />

same array or struct, or if both pointers point to addresses within the same<br />

malloc()ed block<br />

– Pointers to elements with a smaller index are smaller than pointers to elements<br />

with a larger index<br />

Pointer arithmetic allows addition <strong>of</strong> integers to (non-void) pointers<br />

– If p points to element n in an array, p+k points to element n+k<br />

– As a special case, p[n] <strong>and</strong> *(p+n) can again be used interchangably (<strong>and</strong><br />

<strong>of</strong>ten are in practice)<br />

– Most frequent case: Use p++ to advance a pointer to the next element in an<br />

array<br />

– Note that pointer arithmetic only works on non-void pointers<br />

Stephan Schulz 318


char *cp, *cq;<br />

int *ip, *iq;<br />

Pointer Arithmetic<br />

cp<br />

cp+1<br />

cp+2<br />

cq=cp+12<br />

char arr1[28]<br />

a<br />

b<br />

c<br />

d<br />

e<br />

f<br />

g<br />

h<br />

i<br />

j<br />

k<br />

l<br />

m<br />

n<br />

o<br />

p<br />

q<br />

r<br />

s<br />

t<br />

u<br />

v<br />

w<br />

x<br />

y<br />

z<br />

0<br />

\0<br />

ip<br />

p+1<br />

&ip[2]<br />

iq=ip+3<br />

iq+2<br />

int arr2[7]<br />

17<br />

42<br />

−13<br />

2<br />

2147483647<br />

Stephan Schulz 319<br />

1024<br />

−1


Pointer Arithmetic Example<br />

#include <br />

#include <br />

int print_str(char *string)<br />

{<br />

int i = 0;<br />

while(*string)<br />

{<br />

putchar(*string);<br />

string++;<br />

i++;<br />

}<br />

return i;<br />

}<br />

int main(int argc, char* argv[])<br />

{<br />

char message[] = "Hello World!\n";<br />

int count;<br />

count = print_str(message);<br />

printf("Printed %d characters!\n", count);<br />

return EXIT_SUCCESS;<br />

}<br />

Stephan Schulz 320


Reading the Comm<strong>and</strong> Line: argc <strong>and</strong> argv<br />

The C st<strong>and</strong>ard defines a st<strong>and</strong>ardized way for a program to access its (comm<strong>and</strong><br />

line) arguments: main() can be defined with two additional arguments<br />

– int argc gives the number <strong>of</strong> arguments (including the program name)<br />

– char *argv[] is an array <strong>of</strong> pointers to character strings each corresponding<br />

to a comm<strong>and</strong> line argument<br />

Since the name under which the program was called is included among its<br />

arguments, argc is always at least one<br />

– argv[0] is the program name<br />

– argv[argc-1] is the last argument<br />

– argv[argc] is guranteed to be NULL<br />

Stephan Schulz 321


#include <br />

#include <br />

int main(int argc, char* argv[])<br />

{<br />

int i;<br />

}<br />

for(i=1; i


#include <br />

#include <br />

Example: Echoing Arguments – Idiomatic<br />

int main(int argc, char* argv[])<br />

{<br />

char **p;<br />

}<br />

for(p=argv+1; *p; p++)<br />

{<br />

printf("%s ", *p);<br />

}<br />

putchar(’\n’);<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 323


Exercises<br />

Write a function that reads a line (terminated by ’\n’) into an array, <strong>and</strong> a<br />

program that reads files line by line <strong>and</strong> prints it back. You can assume a<br />

reasonable fixed length (e.g. 1024 characters) per line<br />

Write a library that implements a dynamic array type for char arrays.<br />

– Implement functions that can assign <strong>and</strong> retrieve values from arbitrary positions,<br />

e.g. void darrayassign(darray p array, int index, char newval)<br />

<strong>and</strong> char darrayvalue(darray p array, int index)<br />

– Write a function darrayalloc() that returns a pointer to a freshly allocated<br />

dynamic array<br />

– Write a function darrayfree() that frees such an array<br />

– Hint: Use a struct that contains at least a pointer to the dynamically<br />

allocated proper array <strong>and</strong> the currently allocated array size. If an index<br />

greater than the size occurs, use realloc() to increase the size<br />

Put the two together: Write a function that can read a line <strong>of</strong> any length <strong>and</strong><br />

returns (a pointer to) it<br />

Stephan Schulz 324


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Making <strong>Programming</strong> Easier<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


The rpn calc Example<br />

ctype.h stdio.h<br />

stdlib.h<br />

chario.h<br />

chario.c<br />

integerio.h<br />

integerio.c<br />

chario.o integerio.o rpn_calc.o (libc)<br />

rpn_calc<br />

rpn_calc.c<br />

#include<br />

Compile (gcc −c)<br />

Link (gcc)<br />

Stephan Schulz 326


chario.h<br />

The rpn calc Example (Simplified)<br />

integerio.h<br />

chario.c integerio.c<br />

rpn_calc.c<br />

chario.o integerio.o rpn_calc.o<br />

rpn_calc<br />

Stephan Schulz 327


Program Dependencies<br />

In the example, changing one file may make many steps necessary to propagate<br />

the change<br />

– If any .h file has been changed, all .c files that include it may have to be<br />

recompiled<br />

– If any .c file has changed, it has to be recompiled<br />

– If any .o file has changed, we need to relink the program<br />

– In more complex programs, even more such situations exist!<br />

Recompiling all files <strong>and</strong> relinking (in the right order) solves the problem. . .<br />

– Very expensive for large programs<br />

∗ Mozilla, Windows NT: Many hours<br />

∗ Linux kernel (on modern machine): Many minutes<br />

∗ E theorem prover: 1-2 minutes<br />

– We still need to know the right order!<br />

Recompiling by h<strong>and</strong> is error-prone (<strong>and</strong> inconvenient)<br />

Stephan Schulz 328


<strong>UNIX</strong> User Utilities: make<br />

make is a <strong>UNIX</strong> utility that can automatically update large projects with complex<br />

dependencies<br />

– Dependencies <strong>and</strong> build instructions are described in a file called Makefile<br />

(preferred form) or makefile<br />

A makefile contains a number <strong>of</strong> rules for rebuilding the project<br />

A rule consist <strong>of</strong> a target, a list <strong>of</strong> prerequisites, <strong>and</strong> comm<strong>and</strong>s for rebuilding<br />

– The target normally is a file that needs rebuilding<br />

– The prerequisites are all files that are needed to rebuild the target<br />

– Finally, the comm<strong>and</strong>s describe how to rebuild the target<br />

Semantics:<br />

– Execution begins with the first target (or a target given on the comm<strong>and</strong> line)<br />

– First, rules for all prerequisites are activated (if any)<br />

– Then, if the target does not exist, or if any <strong>of</strong> the prerequisites is younger than<br />

the target, the comm<strong>and</strong>s are executed<br />

Stephan Schulz 329


Example: rpn calc makefile<br />

# Relink rpn_calc if one <strong>of</strong> the object files changed<br />

rpn_calc: chario.o integerio.o rpn_calc.o<br />

gcc -ansi -Wall -o rpn_calc chario.o integerio.o rpn_calc.o<br />

# Recompile chario if either the .h or the .h changed<br />

chario.o: chario.h chario.c<br />

gcc -ansi -Wall -c -o chario.o chario.c<br />

#...<br />

integerio.o: chario.h integerio.h integerio.c<br />

gcc -ansi -Wall -c -o integerio.o integerio.c<br />

#...<br />

rpn_calc.o: integerio.h chario.h rpn_calc.c<br />

gcc -Wall -ansi -c -o rpn_calc.o rpn_calc.c<br />

# General format:<br />

#<br />

# TARGET: PREREQUISITES<br />

# [TAB] comm<strong>and</strong>1<br />

# [TAB] comm<strong>and</strong>2 ...<br />

Stephan Schulz 330


Built-In Rules <strong>and</strong> Makefile Variables<br />

make knows how to remake many types <strong>of</strong> files!<br />

– In particular, make knows how to run the C compiler to build object (.o) files<br />

from .c files<br />

We could have omitted the comiler comm<strong>and</strong> e.g. from the rule for chario.o:<br />

chario.o: chario.h chario.c<br />

make allows the use <strong>of</strong> variables, both for custimization <strong>and</strong> for more compact<br />

makefiles<br />

– Variables are set using the assignment operator:<br />

RPN=chario.o integerio.o rpn_calc.o<br />

– Variables are referenced using a $: $(RPN)<br />

Important predefined variables:<br />

– CC: Name <strong>of</strong> the C compiler<br />

– CFLAGS: Additional flags for the C compiler<br />

Stephan Schulz 331


CC=gcc<br />

CFLAGS=-Wall -ansi -O6<br />

Example: rpn calc makefile revisited<br />

RPN=chario.o integerio.o rpn_calc.o<br />

# Relink rpn_calc if one <strong>of</strong> the object files changed<br />

rpn_calc: chario.o integerio.o rpn_calc.o<br />

gcc -ansi -Wall -o rpn_calc $(RPN)<br />

chario.o: chario.h chario.c<br />

integerio.o: chario.h integerio.h integerio.c<br />

rpn_calc.o: integerio.h chario.h rpn_calc.c<br />

Rebuilding from scratch:<br />

$ rm *.o<br />

$ make<br />

gcc -Wall -ansi -O6 -c -o chario.o chario.c<br />

gcc -Wall -ansi -O6 -c -o integerio.o integerio.c<br />

gcc -Wall -ansi -O6 -c -o rpn calc.o rpn calc.c<br />

gcc -ansi -Wall -o rpn calc chario.o integerio.o rpn calc.o<br />

Stephan Schulz 332


Phony Targets<br />

Not all targets need to correspond to files<br />

– Targets not corresponding to a file are called phony<br />

– Since no corresponding file exists, comm<strong>and</strong>s in rules with phony targets are<br />

always executed<br />

Frequent use: Cleanup comm<strong>and</strong>s<br />

clean:<br />

rm *.o<br />

rm rpn_calc<br />

Stephan Schulz 333


Assignment<br />

Write a program sort csc322 that reads an arbitray length file line by line<br />

(allowing for arbitrary line length), sort the lines in ASCIIbetical order, <strong>and</strong> prints<br />

it back<br />

– Order: A letter that has a smaller numerical value is smaller than a letter that<br />

has a bigger numerical value. To compare strings, find the first character that<br />

differs (including the terminating ’\0’)<br />

– Hints:<br />

∗ If you are lazy, reuse the binary tree code for sorting!<br />

∗ Define a data type for the lines, using struct <strong>and</strong> char*<br />

Include a Makefile for building your final program from the sources!<br />

– More hint: If you are lazy, read man makedepend<br />

Stephan Schulz 334


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Odds And Ends<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Errors, Bugs, <strong>and</strong> Other Unpleasant Animals<br />

Most hard-to-h<strong>and</strong>le errors are not syntax errors<br />

– Most syntax errors go away with experience<br />

– Even if not, they are usually easy to find <strong>and</strong> fix!<br />

Most serious problems are runtime errors, resulting from faulty program logic<br />

– Finding logic errors is hard<br />

– Not finding them is worse!<br />

Examples:<br />

– Spacecraft may crash (Mars Climate Orbiter) or explode (Ariane-5)<br />

– Medical devices may actually kill patients (Therac-25 cancer treatment device)<br />

– The IRS may decide you are a tax evader, <strong>and</strong> have you arrested!<br />

Ways to (more) correct s<strong>of</strong>tware:<br />

– Formal methods <strong>and</strong> a controlled development process<br />

– Testing<br />

– Internal consistency checks<br />

Stephan Schulz 336


Assertions<br />

Internal consistency check are used to verify that assumptions about the state <strong>of</strong><br />

the program are true<br />

– Very frequent use: Check if parameters to functions have valid values<br />

– Check loop invariants<br />

– Check array boundaries<br />

Problems<br />

– Checks are inconvenient to program<br />

– The checks may cause unacceptable slowdowns (E theorem prover: Factor <strong>of</strong><br />

2–3, depending on input data)<br />

C solution: The header file <strong>and</strong> macros<br />

– Convenient way to add simple consistency checks<br />

– Checks can be disabled at compile time (now slow-down for final product)<br />

Stephan Schulz 337


<strong>and</strong> assert()<br />

The assert() macro is defined in assert.h<br />

It is used with a single argument<br />

If that argument has the truth value “true”, nothing happens<br />

Otherwise, assert() prints an error message <strong>and</strong> aborts the program<br />

– Error message contains text <strong>of</strong> the assertion, name <strong>of</strong> source file, line in file<br />

If the preprocessor macro NDEBUG is defined, assert() is ignored (defined as the<br />

empty macro)<br />

Careful use <strong>of</strong> assert() while testing makes your programs much more robust<br />

<strong>and</strong> helps you weed out errors earlier!<br />

Stephan Schulz 338


#include <br />

#include <br />

#include <br />

int gcd(int a, int b)<br />

{<br />

assert(a>0);assert(b>0);<br />

if(a==b)<br />

{<br />

return a;<br />

}<br />

if(a > b)<br />

{<br />

return gcd(a-b,b);<br />

}<br />

return gcd(b-a,a);<br />

}<br />

int main(void)<br />

{<br />

printf("Result: %d\n", gcd(15,3));<br />

printf("Result: %d\n", gcd(0,2));<br />

return EXIT_SUCCESS;<br />

}<br />

Example<br />

Stephan Schulz 339


Example (Continued)<br />

$ gcc -ansi -Wall -o gcd assert gcd assert.c<br />

$ ./gcd assert<br />

Result: 3<br />

gcd assert: gcd assert.c:7: gcd: Assertion ‘a>0’ failed.<br />

Abort<br />

$ gcc -ansi -Wall -o gcd assert gcd assert.c -DNDEBUG<br />

$ ./gcd assert<br />

Result: 3<br />

Segmentation fault<br />

Stephan Schulz 340


Search in Loops<br />

A frequent use <strong>of</strong> loops is to search for something in a sequence (list or array) <strong>of</strong><br />

elements<br />

First attempt: Search for an element with property P in array<br />

for(i=0; (i< array_size) && !P(array[i]); i=i+1)<br />

{ /* Empty Body */ }<br />

if(i!=array_size)<br />

{<br />

do_something(array[i]);<br />

}<br />

– Combines property test <strong>and</strong> loop traversal test (unrelated tests!) in one<br />

expression<br />

– Property test is negated<br />

– We still have to check if we found something at the end (in a not very intuitive<br />

test)<br />

Is there a better way?<br />

Stephan Schulz 341


Early Exit: break<br />

C <strong>of</strong>fers a way to h<strong>and</strong>le early loop exits<br />

The break; statement will always exit the innermost (structured) loop (or<br />

switch) statement<br />

Example revisited:<br />

for(i=0; i< array_size; i=i+1)<br />

{<br />

if(P(array[i])<br />

{<br />

do_something(array[i]);<br />

break;<br />

}<br />

}<br />

– I find this easier to read<br />

– Note that the loop is still single entry/single exit, although control flow in the<br />

loop is more complex<br />

Stephan Schulz 342


Selective Operations <strong>and</strong> Special Cases<br />

Assume we have a sequence <strong>of</strong> elements, <strong>and</strong> have to h<strong>and</strong>le them differently,<br />

depending on properties:<br />

for(i=0; i< array_size; i=i+1)<br />

{<br />

if(P1(array[i])<br />

{<br />

/* Nothing to do */<br />

}<br />

else if(P2(array[i]))<br />

{<br />

do_something(array[i]);<br />

}<br />

else<br />

{<br />

do_something_really_complex(array[i]);<br />

}<br />

}<br />

Because <strong>of</strong> the special cases, all the main stuff is hidden away in an else<br />

Wouldn’t it be nice to just goto the top <strong>of</strong> the loop?<br />

Stephan Schulz 343


Early Continuation: continue<br />

A continue; statement will immediately start a new iteration <strong>of</strong> the current loop<br />

– For C for loops, the update expression will be evaluated!<br />

Example with continue:<br />

for(i=0; i< array_size; i=i+1)<br />

{<br />

if(P1(array[i])<br />

{<br />

continue;<br />

}<br />

if(P2(array[i]))<br />

{<br />

do_something2(array[i]);<br />

continue;<br />

}<br />

do_something_really_complex(array[i]);<br />

}<br />

Stephan Schulz 344


do/while Loops<br />

Both while <strong>and</strong> for loops in C are controlled at the top<br />

– If the controlling expression is false, the loop is not entered at all<br />

Occasionally, we can express some algorithms more conveniently, if we have a<br />

controlling expression at the end <strong>of</strong> the loop<br />

– Loop body is always executed at least once!<br />

C language construct: do/while() loop<br />

do<br />

{<br />

loop body<br />

}while(E);<br />

– If E evaluates to true at the end <strong>of</strong> the loop, control is transferred back to the<br />

do<br />

Stephan Schulz 345


#include <br />

#include <br />

int main(int argc, char* argv[])<br />

{<br />

int c;<br />

}<br />

Example<br />

do<br />

{<br />

printf("Please choose 1 for half <strong>of</strong> a bad joke or 2 for a cool number!\n");<br />

c=getchar();<br />

}while(!(c==’1’ || c==’2’));<br />

if(c==’1’)<br />

{<br />

printf("Why did the chicken cross the road? ...\n");<br />

}<br />

else<br />

{<br />

printf("42\n");<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 346


E theorem prover<br />

Some Loop Statistics<br />

– State <strong>of</strong> the art automated theorem prover<br />

– About 100000 lines <strong>of</strong> C code (20000 statements, the rest is comments, white<br />

space, definitions....)<br />

– Total <strong>of</strong> 942 structured loop statements in code base<br />

521 for() loops<br />

– Most iterate over integer values (for i=0; i


Exercises<br />

Go back over your excercises ans assignments, <strong>and</strong> think about good places to<br />

insert assert() statements<br />

Write a non-recursive function that searches for a value in a binary search tree.<br />

Use break to leave the lopp if you found it!<br />

Think about uses for do/while ;-)<br />

Stephan Schulz 348


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Function Pointers<br />

C St<strong>and</strong>ard Library (1)<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Functions as Arguments<br />

Occasionally, you want to be able to pass around functions just like data<br />

Example:<br />

– Configure an event h<strong>and</strong>ler (“call this function if the UPS signals power-down”)<br />

– Simulate some object-oriented techniques (virtual functions), e.g. to implement<br />

destructors<br />

– Most importantly: Parameterize algorithms<br />

Functional languages have functions as first class objects<br />

C is less flexible, but gives us function pointers to pass as arguments <strong>and</strong> store in<br />

variables<br />

– Idea: Pointers are addresses in memory<br />

– Functions are pieces <strong>of</strong> code in memory<br />

Stephan Schulz 350


Function Pointers<br />

We can use the address <strong>of</strong> a function to call it!<br />

– As with normal pointers, we need know the type <strong>of</strong> the function (in this case,<br />

the return type <strong>and</strong> the type <strong>of</strong> the arguments it takes)<br />

Syntax: Same principle as for other type!<br />

– To declare a function pointer, use a function declaration, but add parentheses<br />

<strong>and</strong> add a * to denote that it is a pointer:<br />

int (*add)(int x1, int x2);<br />

– This declares add to be a pointer to a function accepting two integer arguments<br />

<strong>and</strong> returning a third integer<br />

To use a function pointer: Just dereference the pointer<br />

a = (*add)(10,20);<br />

To assign a value to the pointer, just get the address <strong>of</strong> a function:<br />

add = &some_function_name;<br />

Stephan Schulz 351


Function Pointers (2)<br />

To confuse students (<strong>and</strong> for convenience), it is possible to omit both the<br />

dereferencing in calling <strong>and</strong> the ampers<strong>and</strong> in assigning:<br />

add = somefunction<br />

a = add(10,20);<br />

– Since there is nothing else you can do with functions in C, these simplifications<br />

do not create am ambiguity<br />

– They tend to make code easier to read, though, especially with functions that<br />

return pointers<br />

Note: Since declarations quickly become hard to read, it is wise to always use<br />

typedef to define a suitble function pointer type!<br />

Stephan Schulz 352


Example<br />

#include <br />

#include <br />

int add(int x1, int x2)<br />

{ return x1+x2; }<br />

int subtract(int x1, int x2)<br />

{ return x1-x2; }<br />

void use_fun(int limit, int (*fun)(int x1, int x2))<br />

{<br />

int i;<br />

for(i=0; i


Result: 20<br />

Result: 21<br />

Result: 22<br />

Result: 23<br />

Result: 24<br />

--------<br />

Result: 20<br />

Result: 19<br />

Result: 18<br />

Result: 17<br />

Result: 16<br />

Example Output<br />

Stephan Schulz 354


C Library Functions: qsort()<br />

qsort() is a very useful C library function (declared in that is able<br />

to sort any kind <strong>of</strong> array (<strong>and</strong> normally does so very efficiently)!<br />

qsort is defined as follows:<br />

void qsort(void *base, size_t nmemb, size_t size,<br />

int(*compar)(const void *, const void *));<br />

– The first argument points to the array to be sorted (i.e. to its first argument)<br />

– The second argument is the number <strong>of</strong> elements in the array<br />

– The third argument gives the size if a single element<br />

– Finally, the last element is a function pointer <strong>of</strong> a function taking two pointer<br />

arguments, <strong>and</strong> returning an integer value<br />

Stephan Schulz 355


qsort definition (repeated):<br />

C Library Functions: qsort() (2)<br />

void qsort(void *base, size_t nmemb, size_t size,<br />

int(*compar)(const void *, const void *));<br />

Purpose <strong>of</strong> compar: Let the caller define an order on elements<br />

– (*compar)() is called by qsort() to compare two arguments<br />

– It gets pointers to two array elements as arguments<br />

– It should compare these elements <strong>and</strong> return<br />

∗ 0, if the two elements are equal (under the order)<br />

∗ A negative integer, if the first element is smaller<br />

∗ A positive integer, if the first element is greater<br />

Stephan Schulz 356


#include <br />

#include <br />

Example<br />

typedef int (*CompareFun)(const void* arg1, const void* arg2);<br />

int compare_ints(int *arg1, int* arg2)<br />

{<br />

if(*arg1 < *arg2)<br />

{<br />

return -1;<br />

}<br />

if(*arg1 > *arg2)<br />

{<br />

return 1;<br />

}<br />

return 0;<br />

}<br />

Stephan Schulz 357


int main(int argc, char* argv[])<br />

{<br />

int array[10], i;<br />

}<br />

Example (continued)<br />

for(i=0; i


Unsorted:<br />

103<br />

70<br />

105<br />

115<br />

81<br />

127<br />

74<br />

108<br />

41<br />

77<br />

Sorted:<br />

41<br />

70<br />

74<br />

77<br />

81<br />

103<br />

105<br />

108<br />

115<br />

127<br />

Example (Output)<br />

Stephan Schulz 359


The C St<strong>and</strong>ard Library<br />

The C St<strong>and</strong>ard Library contains a large number <strong>of</strong> functions, some data types<br />

<strong>and</strong> system dependend constants<br />

– Covers many things that other languages h<strong>and</strong>le in the main language<br />

– Also contains primitives for extending some parts <strong>of</strong> the language<br />

– Notably missing: Any functionality for graphics (only stream-based I/O)<br />

Most parts <strong>of</strong> the library are automatically linked with the C programs (exception:<br />

Floating point math functions)<br />

The st<strong>and</strong>ard library is part <strong>of</strong> the C st<strong>and</strong>ard, <strong>and</strong> has to be supported on any<br />

st<strong>and</strong>ards-compliant full C implementation<br />

– Code written using only the st<strong>and</strong>ard library should be highly portable<br />

The library has 15 parts with corresponding header files<br />

– Some declarations are repeated in different headers<br />

Stephan Schulz 360


C St<strong>and</strong>ard Library Organisation<br />

– assert.h: Assertions (*)<br />

– ctype.h: Character classes (+)<br />

– errno.h: Error reporting for library functions<br />

– float.h: Implementation limits for floating point numbers<br />

– limits.h: Limits for other things<br />

– locale.h: Localization support<br />

– math.h: Mathematical functions<br />

– setjmp.h: Non-local function exits<br />

– signal.h: Signal h<strong>and</strong>ling<br />

– stdarg.h: Support for functions with a variable number <strong>of</strong> arguments (as<br />

e.g. printf())<br />

– stddef.h: St<strong>and</strong>ard macros <strong>and</strong> typedefs<br />

– stdio.h: Input <strong>and</strong> output (+)<br />

– stdlib.h: Miscellaneous library functions (+)<br />

– string.h: String (character array) h<strong>and</strong>ling<br />

– time.h: Functions about time <strong>and</strong> date<br />

Stephan Schulz 361


Error H<strong>and</strong>ling: errno.h<br />

Library functions typically signal an error by returning an out <strong>of</strong> range value, i.e.<br />

a value that cannot possibly be correct<br />

– For many functions that is -1 or NULL<br />

They communicate the cause <strong>of</strong> the error by setting the global int variable<br />

errno to a specific value<br />

– At the program start, errno is guaranteed to have the value 0<br />

– No library function will ever set errno to 0, but failed library functions will set<br />

it to an implemetation-defined value encoding the cause <strong>of</strong> the error<br />

Error codes have symbolic names (with #define):<br />

– EDOM: (Required by the st<strong>and</strong>ard) Domain error for some math functions<br />

– ERANGE: (Required by the st<strong>and</strong>ard) Range error for some math functions<br />

– EAGAIN: (<strong>UNIX</strong>) Temporary problem, try again<br />

– ENOMEM: (<strong>UNIX</strong>) Out <strong>of</strong> memory<br />

– EBUSY: (<strong>UNIX</strong>) Some necessary resource is already in use<br />

– EINVAL: (<strong>UNIX</strong>) Invalid argument to some function<br />

Stephan Schulz 362


#include <br />

#include <br />

#include <br />

#include <br />

int main(int argc, char* argv[])<br />

{<br />

char *res;<br />

}<br />

Example<br />

printf("errno: %d\n", errno);<br />

res = strdup("Hallo"); /* Allocate space, copy the string to it */<br />

if(!res)<br />

{<br />

printf("Could not copy string, errno: %d = %d\n", errno, ENOMEM);<br />

}<br />

else<br />

{<br />

printf("All is fine, errno: %d\n", errno);<br />

free(res);<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 363


Exercises<br />

Write a program that sorts an arbitrary sized array <strong>of</strong> double values<br />

Think about a program that sorts pointers to char, based on the characters (or<br />

character arrays) the pointers point to (yes, this is a hint for your assignement)<br />

Check out /usr/include/errno.h <strong>and</strong> /usr/include/asm/errno.h<br />

Stephan Schulz 364


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

C St<strong>and</strong>ard Library<br />

Characters <strong>and</strong> Strings<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Character Classes <strong>and</strong> <br />

Th C st<strong>and</strong>ard defines several character classes in a portable way<br />

– We can use these functions regardless <strong>of</strong> the underlying character set <strong>of</strong> the<br />

implementation<br />

– Most <strong>of</strong> these functions can be (<strong>and</strong> are) implemented in a very efficient<br />

manner for ASCII characters<br />

C characters are integer values, typically 8 bits wide<br />

– On most implementations, char is an 8 bit extension to ASCII (in recent time,<br />

isolatin-1 or variants have become popular)<br />

– There is limited support for bigger character sets using wchar t<br />

Character h<strong>and</strong>ling functions are defined in <br />

Stephan Schulz 366


Some C Character Classes<br />

All character class functions accept <strong>and</strong> return int values<br />

– Behaviour is only defined if the input is from the range <strong>of</strong> unsigned char or<br />

EOF<br />

– Each function returns true (non-0) if the character is in the range, 0 otherwise<br />

Character class test functions<br />

– isdigit(c): Digits, i.e. {0-9}<br />

– isalpha(c): Upper <strong>and</strong> lower case characters ({a-z,A-Z}, in some locales<br />

additional characters, e.g. umlauts like ä, Ö,. . .<br />

– isalnum(c): Equivalent to (isdigit(c)||isalpha(c))<br />

– iscntrl(c): Control characters, i.e. non-printable characters (in ASCII, those<br />

are characters with codes 0 to 31 <strong>and</strong> 127)<br />

– isxdigit(c): Hexadecimal digits, {0-9,a-z,A-Z}<br />

– islower(c): Lower case letters<br />

– isupper(c): Upper case letters<br />

– ispunct(c): Printing characters that are neither letters, digits, nor space<br />

– isprint(c): Normal, printable characters<br />

Stephan Schulz 367


Character Class Conversion Functions<br />

There are two functions for converting characters from one class to another:<br />

– tolower(c) converts upper case characters to lower case characters<br />

– toupper(c) converts lower case characters to upper case characters<br />

Both functions return the character unchanged, if it is not a upper or lower case<br />

character, respectively<br />

Stephan Schulz 368


#include <br />

#include <br />

#include <br />

int main(void)<br />

{<br />

int c;<br />

}<br />

Example<br />

while((c=getchar())!=EOF)<br />

{<br />

if(iscntrl(c))<br />

{<br />

printf("


$ man man | ctypedemo<br />

Example Output<br />

MAN(1) MANUAL PAGER UTILS MAN(1)<br />

<br />

<br />

<br />

N<br />

NA<br />

AM<br />

ME<br />

E<br />

MAN - AN INTERFACE TO THE ON-LINE REFERENCE MANUALS<br />

<br />

S<br />

SY<br />

YN<br />

NO<br />

OP<br />

PS<br />

SI<br />

Stephan Schulz 370


Strings<br />

Strings are not part <strong>of</strong> the C language proper<br />

– String literals are supported<br />

– Limited support by functions the C st<strong>and</strong>ard library<br />

String-h<strong>and</strong>ling functions are operating on char* (pointer to char) values<br />

– It is the responsibility <strong>of</strong> the program to make sure that there is sufficient<br />

space for the operations available!<br />

Convention for strings:<br />

– Strings are \0 terminated arrays <strong>of</strong> character<br />

– Important: Size <strong>of</strong> the array is not taken into account!<br />

char excess[10000] = "a"; /* String length 1, takes up two<br />

characters, a <strong>and</strong> \0 */<br />

char tooshort[2];<br />

tooshort[0] = ’a’;<br />

tooshort[1] = ’b’; /* tooshort is not a valid string, if treated<br />

as one, behaviour is undefined */<br />

Stephan Schulz 371


String Functions from (1)<br />

char *strcpy(char* s, const char *ct)<br />

– Copy a ’\0’-terminated string from ct to s<br />

– Returns s<br />

– s must point to a sufficiently large area <strong>of</strong> memory!<br />

– Note: For all string functions that copy strings, source <strong>and</strong> target areas may<br />

not overlap (otherwise, behaviour is undefined)<br />

char *strncpy(char* s, const char *ct, size t n)<br />

– As strcpy(), but copies at most n characters<br />

– Note: If ct is longer than n, s will not be ’\0’-terminated<br />

– If ct is shorter than n, then the result will be padded with additional ’\0’<br />

characters (i.e. s must always have space for n characters, even if ct is shorter<br />

than n characters)<br />

size t strlen(const char *cs)<br />

– Return the length <strong>of</strong> the string at cs<br />

– Does not count the trailing ’\0’<br />

Stephan Schulz 372


Example: Duplicating Strings<br />

Several <strong>UNIX</strong> st<strong>and</strong>ards define a function strdup() that allocates enough<br />

memory for a string, <strong>and</strong> then copies it, returning the pointer to the newly<br />

allocated memory<br />

Our version also makes sure that there is memory available:<br />

char* SecureStrdup(char* str)<br />

{<br />

char *newstr = SecureMalloc(strlen(str)+1);<br />

}<br />

return strcpy(newstr,str);<br />

Stephan Schulz 373


String Functions from (2)<br />

char *strcat(char *s, const char *ct)<br />

– Concatenates ct at the end <strong>of</strong> s<br />

– Returns s<br />

– Result is always ’\0’ terminated<br />

char *strncat(char *s, const char *ct, size t n)<br />

– As strcat(), but copies at most n characters from ct<br />

– Result is always ’\0’ (even if ct is longer than n<br />

Examples:<br />

char *t="World";<br />

char s[10] = "Hello";<br />

strncat(s,t,3); /* Ok, t now points to "HelloWor" */<br />

strcat(s,t); /* Error: "HelloWorld" requires 11 character (’\0’!) */<br />

Stephan Schulz 374


String Functions from (3)<br />

int strcmp(const char* cs, const char* ct)<br />

– Compare two strings in the lexical extension <strong>of</strong> the natural order on characters<br />

– First differing character decides which string is bigger (including terminating<br />

’\0’, i.e. a substring is always smaller than a superstring)<br />

– Return value: Integer 0, if ct is smaller, or 0 if both are<br />

equal<br />

int strncmp(const char* cs, const char* ct, size t n)<br />

– As strcmp(), but compare at most n characters<br />

char *strchr(const char *s, int c)<br />

– Return pointer to the first occurrence <strong>of</strong> c in cs (or NULL, if c is not present<br />

in cs)<br />

char *strrchr(const char *s, int c)<br />

– Return pointer to the last occurrence <strong>of</strong> c in cs (or NULL)<br />

Stephan Schulz 375


String Functions from (4)<br />

char *strpbrk(const char *cs, const char *ct)<br />

– Returns pointer to first character from ct in cs (or NULL), i.e. ct is treated as<br />

a set <strong>of</strong> characters<br />

– Example:<br />

strpbrk("Hello", "eul"); /* Returns pointer to the "e" in "Hello"<br />

char* strstr(const char *cs, const char *ct)<br />

– Return pointer to first occurrence <strong>of</strong> ct in cs, or NULL if ct is not a substring<br />

<strong>of</strong> cs<br />

char *strerror(int n)<br />

– Return a pointer to a string description <strong>of</strong> the library error with error code n<br />

(as defined in )<br />

– If n is not a known error code, a pointer to a generic “unknown error code”<br />

message is returned<br />

Stephan Schulz 376


Generic Memory Access Functions<br />

The original C st<strong>and</strong>ard used char* as a generic pointer, hence generic memory<br />

h<strong>and</strong>ling functions are lumped in with strings<br />

– Character is just another name for Byte in C, anyways<br />

– However, ANSI C has the generic void* pointer type<br />

The following functions are generally very similar to the string functions, but do<br />

not use a delimiter like ’\0’<br />

– All operations specify a lenght parameter n, <strong>and</strong> h<strong>and</strong>le exactly n characters<br />

These functions basically treat the virtual memory as one big character array!<br />

– Used to implement many basic operations<br />

– Typically implemented very efficiently (<strong>of</strong>ten by processor specific assembler<br />

subroutines)<br />

Stephan Schulz 377


Memory Functions from (1)<br />

void *memcpy(void *s, const void *ct, size t n)<br />

– Copy a sequence <strong>of</strong> n bytes from ct to s<br />

– The regions may not overlap!<br />

void *memmove(void *s, const void *ct, size t n)<br />

– Copy a sequence <strong>of</strong> n bytes from ct to s<br />

– There are no additional constraints (i.e. memmove() has to h<strong>and</strong>le cases where<br />

the regions overlap)<br />

int memcmp(const void *cs, const void *ct, size t n)<br />

– Compare the first n characters found at cs <strong>and</strong> ct<br />

– Return value: As strcmp() (, = 0)<br />

Stephan Schulz 378


Memory Functions from (2)<br />

void *memchr(const char *s, int c, size t n)<br />

– Search for character c in the first n bytes at cs, return pointer to it (or NULL)<br />

void *memset(void *s, int c, size t n)<br />

– Place character c into the first n characters at s, returning s<br />

Stephan Schulz 379


Exercises<br />

Write a simple version <strong>of</strong> grep (looking for plain strings in stdin only)<br />

Write a version <strong>of</strong> memmove() (the hard part is h<strong>and</strong>ling overlapping arrays –<br />

remember that you can compare pointers with <strong>and</strong> ==)!<br />

Stephan Schulz 380


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

C St<strong>and</strong>ard Library<br />

Memory H<strong>and</strong>ling <strong>and</strong> IO<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Generic Memory Access Functions<br />

The original C st<strong>and</strong>ard used char* as a generic pointer, hence generic memory<br />

h<strong>and</strong>ling functions are lumped in with strings<br />

– Character is just another name for Byte in C, anyways<br />

– However, ANSI C has the generic void* pointer type<br />

The following functions are generally very similar to the string functions, but do<br />

not use a delimiter like ’\0’<br />

– All operations specify a lenght parameter n, <strong>and</strong> h<strong>and</strong>le exactly n characters<br />

These functions basically treat the virtual memory as one big character array!<br />

– Used to implement many basic operations<br />

– Typically implemented very efficiently (<strong>of</strong>ten by processor specific assembler<br />

subroutines)<br />

Stephan Schulz 382


Memory Functions from (1)<br />

void *memcpy(void *s, const void *ct, size t n)<br />

– Copy a sequence <strong>of</strong> n bytes from ct to s<br />

– The regions may not overlap!<br />

void *memmove(void *s, const void *ct, size t n)<br />

– Copy a sequence <strong>of</strong> n bytes from ct to s<br />

– There are no additional constraints (i.e. memmove() has to h<strong>and</strong>le cases where<br />

the regions overlap)<br />

int memcmp(const void *cs, const void *ct, size t n)<br />

– Compare the first n characters found at cs <strong>and</strong> ct<br />

– Return value: As strcmp() (, = 0)<br />

Stephan Schulz 383


Memory Functions from (2)<br />

void *memchr(const char *s, int c, size t n)<br />

– Search for character c in the first n bytes at cs, return pointer to it (or NULL)<br />

void *memset(void *s, int c, size t n)<br />

– Place character c into the first n characters at s, returning s<br />

Stephan Schulz 384


Example<br />

#include <br />

#include <br />

#include <br />

int main(int argc, char* argv[])<br />

{<br />

char carray[10];<br />

int iarray[10], i;<br />

memset(&carray[0], ’a’, 10*size<strong>of</strong>(char));<br />

memset(&iarray[0], ’a’, 10*size<strong>of</strong>(int));<br />

for(i=0; i


a : 1633771873<br />

a : 1633771873<br />

a : 1633771873<br />

a : 1633771873<br />

a : 1633771873<br />

a : 1633771873<br />

a : 1633771873<br />

a : 1633771873<br />

a : 1633771873<br />

a : 1633771873<br />

b : 1650614882<br />

b : 1650614882<br />

b : 1633772130<br />

b : 1633771873<br />

b : 1633771873<br />

b : 1633771873<br />

b : 1633771873<br />

b : 1633771873<br />

b : 1633771873<br />

b : 1633771873<br />

Example Output<br />

Stephan Schulz 386


Input <strong>and</strong> Output in the St<strong>and</strong>ard Library<br />

Input <strong>and</strong> output in C is based on the concept <strong>of</strong> streams <strong>of</strong> bytes<br />

– Binary streams are raw, unprocessed bytes (only guarantee: If you write data<br />

to a binary stream, <strong>and</strong> then read it back, it is unchanged)<br />

– Text streams are composed <strong>of</strong> (possibly empty) lines, separated by a single newline<br />

(’\n’) character (the library has to make sure other text representations<br />

are converted properly)<br />

– In <strong>UNIX</strong>, text <strong>and</strong> binary streams are identical<br />

– In Windows, the library has to convert the newline/linefeed sequence used to<br />

separate lines to a single newline for text streams (but, <strong>of</strong> course, may not<br />

mangle binary streams)<br />

Streams are represented by FILE* objects in C (“file pointers”)<br />

– The FILE type is defined in <br />

– A stream normally has to be explicitely opened (connected to an input <strong>and</strong><br />

output device) <strong>and</strong> should be closed (made available for resuse)<br />

Stephan Schulz 387


St<strong>and</strong>ard Streams<br />

By default, each program has three text streams open on startup:<br />

– stdin is the st<strong>and</strong>ard input (normally reading from keyboard)<br />

– stdout is the st<strong>and</strong>ard output (normally conected to the terminal)<br />

– stderr is the st<strong>and</strong>ard error channel (also connected to the terminal)<br />

The I/O-functions we have used so far implicitely use the default streams:<br />

– printf() <strong>and</strong> putchar() write to stdout<br />

– getchar() reads from stdin<br />

Stephan Schulz 388


Opening File Streams<br />

In addition to the st<strong>and</strong>ard streams, we can create additional streams, normally<br />

associated with a file. The most general function is:<br />

FILE* fopen(const char *filename, const char *mode)<br />

– The first argument hat to be a valid filename<br />

– The second argument describes the mode in which the file should be opened<br />

The mode is a string <strong>of</strong> characters<br />

– "r" opens a file for reading in text mode<br />

– "w" opens a file for writing in text mode (will create new file, overwriting an<br />

existing file)<br />

– "a" opens a file for writing in text mode (but will append new output to the<br />

end <strong>of</strong> an existing file)<br />

– Adding a "b" will open the file as a binary file (e.g. "rb": Read binary)<br />

fopen() returns a valid file pointer, if successful, or NULL if it fails<br />

– In the case <strong>of</strong> failure, it sets errno to an appropriate value!<br />

Stephan Schulz 389


Closing <strong>and</strong> Reopening File Streams<br />

Once we are done with a certain file, we have to close it<br />

– The number <strong>of</strong> simultaneously open files is limited for most operating systems.<br />

Closing a stream makes it available for other purposes<br />

– Streams may be buffered. Closing a straem flushes the buffer (i.e. prints all<br />

remaining characters)<br />

int fclose(FILE *stream) closes the file associated with stream<br />

– It returns 0 if no errors occurred, EOF otherwise<br />

FILE* freopen(const char *filenam, const char *mode, FILE *stream)<br />

– This function closes stream <strong>and</strong> reopens it with a new associated file<br />

– It is useful to e.g. redirect stdin into a file (from within the program)<br />

Stephan Schulz 390


Simple Stream Based I/O Functions (Characters)<br />

int fgetc(FILE *stream)<br />

– Return the next character from the named stream (or EOF if no character is<br />

available or an error occurs)<br />

– Note: getchar() is equivalent to fgetc(stdin)<br />

int fputc(int c, FILE *stream)<br />

– Print the character c to the stream, returning c or EOF in case <strong>of</strong> error<br />

– putchar(c) is equivalent to fputc(c, stdout)<br />

int getc(FILE *stream) is equivalent to fgetc(), except that it may be<br />

implemented as a macro (<strong>and</strong> may hence evaluate stream more than once)<br />

Similarly, int putc(int c, FILE *stream) is equivalent to fputc, but may<br />

be a macro<br />

Stephan Schulz 391


Simple Stream Based I/O Functions (Strings)<br />

int fputs(const char *s, FILE *stream)<br />

– Writes the string pointed to by the first argument to the denoted stream<br />

– Returns EOF on failure, a non-negative value otherwise<br />

char *fgets(char *s, int n, FILE *stream)<br />

– Read at most n-1 characters into the array pointed to by s, stops early if a<br />

newline is encountered<br />

– *s is always ’\0’ terminated<br />

– Returns s, or NULL on error<br />

Note: There also is a function char *gets(char *s) that attempts to read a<br />

line <strong>of</strong> input from stdin<br />

– Never use gets()!<br />

– Since there is no way to specify a maximal number <strong>of</strong> characters to read, we<br />

cannot ensure that gets() will not result in a buffer overflow error!<br />

Stephan Schulz 392


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void print_file(FILE *stream)<br />

{<br />

int c;<br />

}<br />

Example: Simple cat Implementation<br />

while((c=fgetc(stream))!=EOF)<br />

{<br />

fputc(c, stdout);<br />

}<br />

Stephan Schulz 393


int main(int argc, char *argv[])<br />

{<br />

int i;<br />

FILE *file;<br />

if(argc == 1)<br />

{<br />

print_file(stdin);<br />

}<br />

else<br />

{<br />

Example Continued<br />

Stephan Schulz 394


}<br />

for(i=1; i


$ man man | ./mycat1<br />

Example Output<br />

man(1) man(1)<br />

NAME<br />

...<br />

man - format <strong>and</strong> display the on-line manual pages<br />

manpath - determine user’s search path for man pages<br />

$ ./mycat1 does not exist<br />

./mycat1: No such file or directory<br />

$ ./mycat1 mycat1.c<br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void print_file(FILE *stream)<br />

Stephan Schulz 396


Exercises<br />

Write a version <strong>of</strong> memmove() using pointer assignment (the hard part is h<strong>and</strong>ling<br />

overlapping arrays – remember that you can compare pointers with <strong>and</strong> ==)!<br />

You may need to cast void* to char* to access individual bytes.<br />

Write a version <strong>of</strong> wc that more closely mimics the behaviour <strong>of</strong> the <strong>UNIX</strong><br />

version, i.e. that gives separate accounts <strong>and</strong> a total if called with more than one<br />

argument (if called with a single arguments, it just gives an account for that file,<br />

if called with none, it reads from stdin)<br />

Stephan Schulz 397


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

C St<strong>and</strong>ard Library<br />

Input <strong>and</strong> Output<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Remark about fgets()<br />

char *fgets(char *s, int n, FILE *stream)<br />

– Read at most n-1 characters into the array pointed to by s, stops early if a<br />

newline is encountered<br />

– *s is always ’\0’ terminated<br />

– Returns s, or NULL on error<br />

Note:<br />

– It is the responsibility <strong>of</strong> the caller (i.e. your program) to provide enough<br />

memory!<br />

– s already has to point to an array (or malloc()ed area <strong>of</strong> sufficient size<br />

This holds for most st<strong>and</strong>ard library functions!<br />

– . . . including gets() (never use gets()!)<br />

Stephan Schulz 399


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void print_file(FILE *stream)<br />

{<br />

int c;<br />

}<br />

Example: Simple cat Implementation<br />

while((c=fgetc(stream))!=EOF)<br />

{<br />

fputc(c, stdout);<br />

}<br />

Stephan Schulz 400


int main(int argc, char *argv[])<br />

{<br />

int i;<br />

FILE *file;<br />

if(argc == 1)<br />

{<br />

print_file(stdin);<br />

}<br />

else<br />

{<br />

Example Continued<br />

Stephan Schulz 401


}<br />

for(i=1; i


$ man man | ./mycat1<br />

Example Output<br />

man(1) man(1)<br />

NAME<br />

...<br />

man - format <strong>and</strong> display the on-line manual pages<br />

manpath - determine user’s search path for man pages<br />

$ ./mycat1 does not exist<br />

./mycat1: No such file or directory<br />

$ ./mycat1 mycat1.c<br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void print_file(FILE *stream)<br />

Stephan Schulz 403


You can redirect files into stdin:<br />

– mycat1 < mycat.c<br />

Reminder: Using stdin<br />

You can type into stdin from your terminal<br />

– Type [C-d] (^d), ”Control-D” to indicate end <strong>of</strong> input<br />

– Depending on your version <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> your terminal, you may have to type<br />

[C-d] on a line <strong>of</strong> it’s own<br />

Stephan Schulz 404


Buffering <strong>and</strong> Flushing<br />

Both input <strong>and</strong> output streams can be buffered<br />

– Unbuffered streams will pass on each individual character as soon as possible<br />

– Fully buffered streams will wait until the (arbitrary sized) buffer is full until<br />

they pass on the collected data as one chunk<br />

– Text streams can also be line buffered. A line buffered stream will collect at<br />

most one line <strong>of</strong> data<br />

int fflush(FILE* stream) will flush all buffers associated with an output<br />

stream<br />

– Causes data to be actually written (if the writing process dies, the data is<br />

safe), although the OS may still have another layer <strong>of</strong> buffers<br />

– Return value: 0 on success, EOF on failure<br />

– Calling fflush(NULL) flushes all open streams<br />

– Calling fflush(NULL) on an input stream invokes undefined behaviour<br />

Stephan Schulz 405


Buffering<br />

By default, the st<strong>and</strong>ard streams are buffered as follows:<br />

– stdin is line buffered<br />

– stdout is line buffered<br />

– stderr is unbuffered<br />

You can change the buffering state with the funcion<br />

int setvbuff(FILE *stream, char* buff, int mode, size t size)<br />

– buff points to a buffer <strong>of</strong> at least size byte (or it is NULL, in which case a<br />

buffer will be malloc()ed)<br />

– Mode can be one <strong>of</strong> three predefined values:<br />

∗ IOFBF for full buffering<br />

∗ IONBF to disable buffering<br />

∗ IOLBF to enable line buffering<br />

void setbuf(FILE *stream, char *buff) is a simpler interface:<br />

– If buff is zero, buffering is switched <strong>of</strong><br />

– Otherwise, full buffering wit a buffer size BUFSIZ is enabled (<strong>and</strong> buff has to<br />

point to a large enough buffer!)<br />

Stephan Schulz 406


#include <br />

#include <br />

int main(int argc, char* argv[])<br />

{<br />

char name[80];<br />

char buffer[BUFSIZ];<br />

}<br />

setbuf(stdout, buffer);<br />

printf("Please enter name: ");<br />

fgets(name,80,stdin);<br />

printf("Your name is: %s\n", name);<br />

setbuf(stdout, NULL);<br />

printf("Please enter name: ");<br />

fgets(name,80,stdin);<br />

printf("Your name is: %s\n", name);<br />

return EXIT_SUCCESS;<br />

Example<br />

Stephan Schulz 407


$ ./bufftest<br />

Example Behaviour<br />

Stephan<br />

Please enter name: Your name is: Stephan<br />

Please enter name: Schulz<br />

Your name is: Schulz<br />

Stephan Schulz 408


More Operations on Files<br />

int remove(const char *filename)<br />

– Removes a file (as in rm)<br />

– Return 0 on success, something else on failure<br />

in rename(const char *oldname, const char *newname)<br />

– Rename a file (as in mv)<br />

– Return 0 on success, something else on failure<br />

FILE *tmpfile(void)<br />

– Creates a temporary file with mode wb+ (reading <strong>and</strong> writing in binary)<br />

– The file will vanish if the program terminates normally<br />

– On failure, NULL will be returned<br />

Stephan Schulz 409


char *tmpnam(char *s)<br />

Even More File Operations<br />

– Creates a file name that is different from any existing name<br />

– If called with argument NULL, will return a pointer to a static buffer containing<br />

the name<br />

– Otherwise, s has to point to an array <strong>of</strong> at least L tmpnam bytes<br />

– Note: Using tmpnam() in security-critical applications is discouraged, as it<br />

creates a race condition (what if another process creates a file with the name<br />

in between the call to tmpnam() <strong>and</strong> fopen()?)<br />

Stephan Schulz 410


Error Functions<br />

Each FILE data structure stores two pieces <strong>of</strong> information:<br />

– If end-<strong>of</strong>-file has been reached during reading<br />

– If an error occurred<br />

int fe<strong>of</strong>(FILE *stream) returns true if the end-<strong>of</strong>-file indicator has been set<br />

int ferror(FILE *stream) returns true if the error indicator is set<br />

void clearerr(FILE *stream) clears both indicators<br />

void perror(const char *s prints an error message to stderr as follows:<br />

– First, the supplied string is printed, followed by a colon<br />

– Then the error message for the current value <strong>of</strong> errno is printed, followed by<br />

a newline<br />

Stephan Schulz 411


Exercises<br />

Write a simple database that keeps given name, family name, <strong>and</strong> date <strong>of</strong> birth<br />

for a person. Subtasks:<br />

– Create a dialog where people can enter data<br />

– Create an interface for searching for data, based on any criterium<br />

– Create an interface where you can print lists <strong>of</strong> people, possibly sorted by any<br />

<strong>of</strong> the data fields<br />

You need to think about the data base structure (a flat text file should work, see<br />

e.g. /etc/passwd for ideas)<br />

You need an architecture for your overall program<br />

– The conventional way is to use one monolithic program with a a menue<br />

structure (use text menues...)<br />

– The <strong>UNIX</strong> way would be to write one program for each task<br />

Stephan Schulz 412


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

C St<strong>and</strong>ard Library<br />

Formated Output<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Formatted Output<br />

The formatted output functions <strong>of</strong>fer a very convenient way <strong>of</strong> printing data in a<br />

controlled manner<br />

– They are able to print all basic C datatypes (<strong>and</strong> strings)<br />

– They can print any number <strong>of</strong> arguments with one comm<strong>and</strong><br />

– For most datatypes, there are multiple useful formats<br />

– Argument output <strong>and</strong> descriptive strings can be interspersed easily<br />

Output format is determined by a format string argument<br />

– The format string contains ordinary text that is copied directly to the output<br />

– It also contains conversion specifiers that describe how to format additional<br />

arguments<br />

Formatted output functions are variadic, i.e. they take a variable number <strong>of</strong><br />

arguments<br />

– Number <strong>of</strong> arguments is determined by the number <strong>of</strong> conversion specifiers<br />

– Modern compilers check this property if the format string is constant<br />

Stephan Schulz 414


A first Example<br />

printf("%d divided by %d = %f\n",22,7,22/7.0);<br />

– The first argument to printf is the format string<br />

– It contains 3 conversion specifiers:<br />

∗ The first %d specifies an int argument that should be printed in decimal<br />

notation <strong>and</strong> corresponds to the first extra argument, 22<br />

∗ The second %d corresponds to the third argument, 7<br />

∗ Finally, the %f specifies a double (floating point) argument that should be<br />

printed in pure decomal notation (with fractional part after the decimal dot)<br />

The format string also contains additional text<br />

– Text is printed<br />

– Note that normal conventions hold, i.e. \n in a string literal is the newline<br />

character<br />

Output printed:<br />

22 divided by 7 = 3.142857<br />

Stephan Schulz 415


The printf() Family <strong>of</strong> Functions<br />

All functions are declared in <br />

int printf(char *format, ...);<br />

– Print the additional arguments under control <strong>of</strong> the argument string to stdout<br />

– Returns number <strong>of</strong> characters printed, or any negative number on error<br />

int fprintf(FILE *stream, char *format, ...);<br />

– As printf(), but print to the designated output stream<br />

int sprintf(char *s, char *format, ...);<br />

– Instead <strong>of</strong> actually printing anything, sprintf() will store the output characters<br />

in the character array s points to<br />

– The string will be \0 terminated<br />

– It is the responsibility <strong>of</strong> the programmer to make sure *s is big enough<br />

– The returned count <strong>of</strong> characters does not include the terminating nul character<br />

(i.e. it is the same value that printf() would return)<br />

Stephan Schulz 416


Format Specifiers<br />

Format specifiers always start with a % character, <strong>and</strong> end in a conversion letter<br />

– The conversion letter describes the basic output format<br />

– It normally also decribes which kind <strong>of</strong> argument has to follow<br />

Optional parts <strong>of</strong> a format specifier include (in order)<br />

– Flags (affect how the result will be printed)<br />

– Minimum field width (if fewer characters are necessary, padding will be used)<br />

– Precision (number <strong>of</strong> significant digits/characters)<br />

– Size modifier (e.g. require short or long instead <strong>of</strong> int)<br />

Stephan Schulz 417


Some Conversion Letters (1)<br />

d: Convert an int argument <strong>and</strong> print it in decimal representation<br />

i: Alias for d<br />

u: Convert an unsigned int argument <strong>and</strong> print it in decimal representation<br />

o: Convert an unsigned int argument <strong>and</strong> print it in octal representation<br />

x: Convert an unsigned int argument <strong>and</strong> print it in hexadecimal representation,<br />

using {a, b, c, d, e, f} for the extra hexadecimal digits<br />

X: As x, but use upper case hex digits ({A, B, C, D, E, F})<br />

p: Convert a void* pointer <strong>and</strong> print it in an implementation-defined manner<br />

(for our system, <strong>and</strong> for many other systems, the argument is printed as a<br />

hexadecimal number representig the address)<br />

Stephan Schulz 418


Some Conversion Letters (2)<br />

f: Print a double argument (float is converted automatically) as a sequence<br />

<strong>of</strong> digits with a decimal point<br />

– Unless otherwise specified via the precision modifier, 6 digits are printed after<br />

the decimal point<br />

e: Print a double argument in normalized exponential form, with 1 digit before<br />

the decimal dot (<strong>and</strong> by default 6 digits after the dot). Example: 3.141593e+01<br />

(= 3.141593 ∗ 10 1 )<br />

E: As e, but print upper case E before exponent<br />

g: “Human-friendly floating point output”. Print a double number either as<br />

with e (for very small numbers) or with f letters, cutting <strong>of</strong> unneccessary training<br />

zeros<br />

G: As g, but use E instead <strong>of</strong> e<br />

Stephan Schulz 419


Some Conversion Letters (3)<br />

c: Print a single int argument by converting it to char <strong>and</strong> printing the<br />

corresponding character (use %i to print the numeric value <strong>of</strong> a character)<br />

s: Print a C style string, converting a char* argument pointing to a \0-terminated<br />

string<br />

%: Convert no arguments, just print a single % character (i.e. %% in the format<br />

string generates a single % in the output)<br />

Remarks:<br />

– We have not covered some <strong>of</strong> the more esoteric conversions<br />

– The 1995 addendum to C89 <strong>and</strong> the C99 st<strong>and</strong>ard add additional conversion<br />

characters<br />

– For more details, check man 3 printf<br />

Stephan Schulz 420


#include <br />

#include <br />

int main(void)<br />

{<br />

char *p = "This is a string";<br />

}<br />

Example<br />

printf("12 with... %%d: %d, %%u: %u, %%o: %o, %%x: %x, %%X: %X\n",<br />

12, 12, 12, 12, 12);<br />

printf("12.5 with... %%f: %f, %%e: %e, %%E: %E, \n%%g: %g, %%G: %G\n",<br />

12.5,12.5,12.5,12.5,12.5);<br />

printf("Printing a character with %%c: %c <strong>and</strong> %%d: %d\n",<br />

’a’, ’a’);<br />

printf("This is a string \"%s\" <strong>and</strong> its address: %p\n", p,p);<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 421


Example Output<br />

12 with... %d: 12, %u: 12, %o: 14, %x: c, %X: C<br />

12.5 with... %f: 12.500000, %e: 1.250000e+01, %E: 1.250000E+01,<br />

%g: 12.5, %G: 12.5<br />

Printing a character with %c: a <strong>and</strong> %d: 97<br />

This is a string "This is a string" <strong>and</strong> its address: 0x80485a0<br />

Stephan Schulz 422


Size Modifiers<br />

Size modifiers are used to change the default argument size:<br />

– The l modifier changes integer arguments to their long variants<br />

– It changes h modifier indicates that the argument is <strong>of</strong> type short or unsigned<br />

short instead <strong>of</strong> the default int<br />

The C99 st<strong>and</strong>ard introduces additional size modifiers:<br />

– z indicates argument <strong>of</strong> type size t (for integer arguments)<br />

– ll indicates long long versions <strong>of</strong> the integers<br />

– hh indicates char arguments instead <strong>of</strong> int types<br />

For us, the %ld version (long integer) is probably the most important one<br />

Stephan Schulz 423


Specifying Minimum Field Width<br />

The minimum field width is an integer literal between the % <strong>and</strong> the conversion<br />

letter (with optional size modifier)<br />

– It may be preceded by any flags<br />

– The precision, if any, follows it<br />

By default, any value is printed right-justified in its field<br />

– Padding is done with spaces:<br />

printf("|%7d|\n",12);<br />

| 12|<br />

If the natural value representation is bigger than the minimum field width, the<br />

specification has no effect<br />

printf("|%7s|\n", "A long string");<br />

|A long string|<br />

Stephan Schulz 424


The Flags<br />

-: Left-justify output (only useful in connection with a width specification)<br />

0: Use 0 for padding to requested field width (by default, ’ ’ (space) is used<br />

+: For numerical values: Always print a sign, either + or -<br />

’ ’ (space): Always print a character for the sign, - for negative numbers, ’ ’<br />

for positive ones<br />

#: Use a variant <strong>of</strong> the conversion operation<br />

– For %o, print a leading 0<br />

– For %x, print a leading 0x<br />

– For %X, print a leading 0X<br />

– For floating point numbers, trailing digits <strong>and</strong> decimal dot are always printed<br />

with the # flag<br />

Stephan Schulz 425


The Precision<br />

The precision field is used for a number <strong>of</strong> different things:<br />

– For any integer conversion character, it gives a minimum number <strong>of</strong> digits to<br />

print (by adding leading zeros)<br />

– For %e, %E <strong>and</strong> %f, it gives the number <strong>of</strong> digits in the fractional part<br />

– For %g <strong>and</strong> %G, it is the number <strong>of</strong> significant digits to be printed<br />

– Finally, for strings (%s), it gives the maximal number <strong>of</strong> characters to be<br />

printed from the string<br />

Stephan Schulz 426


#include <br />

#include <br />

Example<br />

int main(void)<br />

{<br />

printf("Floating point example: |%+8.2f|\n", 3.0/7.0);<br />

printf("Floating point example: |% 8.2f|\n", 3.0/7.0);<br />

printf("String: %-7.7s\n", "Longish String");<br />

printf("String: %-7.7s\n", "short");<br />

printf("String: %7.7s\n", "short");<br />

}<br />

return EXIT_SUCCESS;<br />

Output:<br />

Floating point example: | +0.43|<br />

Floating point example: | 0.43|<br />

String: Longish<br />

String: short<br />

String: short<br />

Stephan Schulz 427


Assignment<br />

Write an archiver program arch322. Your program should accept any number <strong>of</strong><br />

arguments (to be treated as filenames). It should write (to stdout) an archive,<br />

i.e. a file that contains enough information to recreate the original files with their<br />

names. For simplicity, allow only files in the current directory to be archived<br />

(check, if the arguments contain a / <strong>and</strong> print an error message if yes). Also<br />

print useful error messages if one <strong>of</strong> the named files does not exist, etc.<br />

Write a dearchiver dearch322 that accepts an archive file (in your format) on<br />

stdin <strong>and</strong> recreates the original files in the current directory. Print an error<br />

message if the file is not a valid archive.<br />

You are free to design your own archive format, but you may get some ideas from<br />

reading the documentation (man/info) on tar/gtar. Please document your<br />

format in one or two paragraphs. You may assume <strong>UNIX</strong> I/O, i.e. no difference<br />

between text <strong>and</strong> binary I/O.<br />

Example:<br />

Stephan Schulz 428


$ arch322 Makefile sort_csc322.c utilities.c > myarch.arch<br />

$ mkdir NEW<br />

$ cd NEW<br />

$ dearch322 < ../myarch.arch<br />

Recreating Makefile<br />

Recreating sort_csc322.c<br />

Recreating utilities.c<br />

$ ls<br />

$<br />

Makefile sort_csc322.c utilities.c<br />

Stephan Schulz 429


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Asynchronous Events <strong>and</strong> Signals<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Processes<br />

A <strong>UNIX</strong> process is an instance <strong>of</strong> a program in execution. It can be described by<br />

– The executable code (stored in the text segment <strong>of</strong> the virtual memory image<br />

<strong>of</strong> the process<br />

– The program data (stored in the data segement)<br />

– The state, including stack pointer <strong>and</strong> stack, program counter, etc. (usually<br />

collected in a process control block, or PCB)<br />

A process uses certain resources:<br />

– Processor time on a CPU<br />

– Memory, both virtual or real<br />

– File descriptors<br />

– . . .<br />

Some <strong>of</strong> its important properties are<br />

– Owner<br />

– Process id (pid), a unique non-negative integer<br />

– Parent (exception: init)<br />

Stephan Schulz 431


Usage: ps <br />

<strong>UNIX</strong> User Comm<strong>and</strong>s: ps<br />

– ps shows information about currently executing processes<br />

– It is one <strong>of</strong> the least st<strong>and</strong>ardized <strong>UNIX</strong> tools<br />

Our Linux ps can assume many different personalities<br />

– Different personalities show different behaviour<br />

– . . . <strong>and</strong> accept different options.<br />

Default behaviour (ps without options):<br />

– Show information about all existing processes <strong>of</strong> the current user controlled by<br />

the same terminal ps was run on<br />

– For each process, list:<br />

∗ Process Id (PID)<br />

∗ Controlling terminal (TTY)<br />

∗ CPU time used by the process<br />

∗ Name <strong>of</strong> the executable program file<br />

Stephan Schulz 432


$ ps<br />

PID TTY TIME CMD<br />

1125 pts/3 00:00:01 tcsh<br />

7157 pts/3 00:00:00 xevil<br />

7189 pts/3 00:00:00 gv<br />

7193 pts/3 00:00:00 gs<br />

7194 pts/3 00:00:00 ps<br />

Vanilla ps Example<br />

Stephan Schulz 433


Some ps Options<br />

Some simple BSD style options for the default personality (note: BDS style<br />

options for ps are not preceeded by a dash!)<br />

– a: Print information about all processes that are connected to any terminal<br />

– x: Print information about processes not connected to a terminal<br />

– U : Print information about processes owned by the named user<br />

– u: User oriented output with more interesting information:<br />

∗ Owner <strong>of</strong> a process (USER)<br />

∗ Process Id (PID)<br />

∗ Percentage <strong>of</strong> available CPU used by the process (%CPU)<br />

∗ Percentage <strong>of</strong> memory used (%MEM) (note that this measures virtual<br />

memory usage, real memory usage may be lower because <strong>of</strong> shared pages)<br />

∗ Virtual memory size <strong>of</strong> the process in KByte (VSZ)<br />

∗ Size <strong>of</strong> the resident set, i.e. the recently referenced pages not swapped out<br />

(RSS)<br />

∗ Controlling terminal (TTY)<br />

∗ Time or date when the process was started (START)<br />

∗ Seconds <strong>of</strong> CPU time used (TIME)<br />

∗ Full comm<strong>and</strong> used to start the process (COMMAND)<br />

Stephan Schulz 434


$ ps aux<br />

Interesting ps Example<br />

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND<br />

root 1 0.0 0.1 1368 432 ? S Oct30 0:04 init<br />

root 2 0.0 0.0 0 0 ? SW Oct30 0:03 [keventd]<br />

root 3 0.0 0.0 0 0 ? SW Oct30 0:00 [kapmd]<br />

...<br />

root 486 0.0 0.1 1372 408 ? S Oct30 0:00 /sbin/dhcpcd -n -h wo<br />

root 551 0.0 0.2 1644 668 ? S Oct30 0:00 syslogd -m 0<br />

...<br />

schulz 1095 0.0 4.8 16112 12268 ? S Oct30 4:40 emacs -geometry 96x77<br />

schulz 1096 0.0 0.8 4944 2216 ? S Oct30 0:05 xterm -geometry 80x40<br />

schulz 1997 0.0 0.5 3072 1476 ? S Oct31 0:12 ssh sherman emacs<br />

root 4073 0.0 1.0 7480 2768 pts/3 S Oct31 0:03 /usr/local/lib/xmcd/b<br />

schulz 22637 0.0 0.5 2940 1444 pts/5 S Nov05 0:03 ssh -X sunbroy2.infor<br />

schulz 22645 4.0 18.7 82248 47832 ? S Nov05 31:04 /usr/local/mozilla/mo<br />

schulz 6722 0.0 0.0 0 0 ? Z Nov05 0:00 [plugger ]<br />

schulz 7189 0.0 0.8 3948 2220 pts/3 S 00:15 0:00 gv <strong>CSC322</strong>_1.pdf<br />

schulz 7235 0.4 2.2 10060 5668 pts/3 S 00:41 0:00 gs -dNOPLATFONTS -sDE<br />

schulz 7236 76.8 38.0 98072 96896 pts/0 R 00:43 0:33 eprover /home/schulz/<br />

news 7237 0.5 1.0 3704 2796 ? S 00:43 0:00 leafnode<br />

schulz 7258 0.0 0.2 2624 708 pts/3 R 00:43 0:00 ps aux<br />

Stephan Schulz 435


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Signals <strong>and</strong> Signal H<strong>and</strong>lers<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Signals<br />

Signals are a way to signal unusal events to a process<br />

– Run time errors<br />

– User requests<br />

– Pending communication<br />

In general, signals can arrive assynchronously, i.e. at any time<br />

Signals can have many different values, depending on the value, the process can<br />

– Ignore a signal<br />

– Perform a default action (defined by the implementation)<br />

– Invoke an explicit signal h<strong>and</strong>ler<br />

Stephan Schulz 437


St<strong>and</strong>ard C Signals<br />

St<strong>and</strong>ard C defines a small number <strong>of</strong> signals, <strong>UNIX</strong> defines many more<br />

Signal Meaning Default Action (<strong>UNIX</strong>)<br />

SIGABRT Abort the process Terminate<br />

SIGFPE Floating point exception Terminate with core<br />

SIGILL Illegal instruction Terminate with core<br />

SIGINT Interactive interrupt Terminate<br />

SIGSEGV Illegal memory access Terminate with core<br />

SIGTERM Termination request Terminate<br />

Note: SIGINT is generated when you press [CTRL-C]!<br />

– The signal is delivered to the process<br />

– The default action is to terminate the process<br />

Stephan Schulz 438


Some <strong>UNIX</strong> Signals<br />

<strong>UNIX</strong> defines about 60 different signals, including all St<strong>and</strong>ard C signals<br />

Some important <strong>UNIX</strong> signals:<br />

Signal Meaning Default Action (<strong>UNIX</strong>)<br />

SIGHUP Terminal connection lost (or controlling<br />

process dies)<br />

Terminate<br />

SIGKILL Kill process, cannot be caught or<br />

ignored<br />

Terminate<br />

SIGBUS Bus error Terminate with core<br />

SIGSTOP Stop a process (does not terminate,<br />

cannot be caught or ignored)<br />

Suspends process<br />

SIGCONT Continue suspended process Ignored (*)<br />

SIGURG Out <strong>of</strong> b<strong>and</strong> data arrived on a socket Ignore<br />

SIGXCPU CPU time limit reached Terminate with core<br />

(*) OS will still wake process up<br />

[CTRL-Z] generates SIGSTOP!<br />

Stephan Schulz 439


<strong>UNIX</strong> User Comm<strong>and</strong>: kill<br />

Note: kill is <strong>of</strong>ten implemented as a shell built-in<br />

– Syntax may differ slightly from the kill program<br />

– Allows use <strong>of</strong> kill in job control<br />

Usage for our kill: kill [-] ...<br />

– If no signal is specified, SIGTERM is sent<br />

– Signals can be specified symbolically (for a list <strong>of</strong> names run kill -l) or<br />

numerically (man 7 signal gives a list <strong>of</strong> signals <strong>and</strong> their numeric values)<br />

kill accepts a list <strong>of</strong> arguments<br />

– Most common case: is a normal process id (a positive integer). The<br />

signal is sent to the corresponding process<br />

– If is -1, the signal is sent to all processes <strong>of</strong> the user (kill -KILL<br />

-1 is a surefire way to log yourself out)<br />

– Finally, if is any other negative number, the signal is sent to the<br />

corresponding process group<br />

Stephan Schulz 440


top is an interactive version <strong>of</strong> ps<br />

<strong>UNIX</strong> User Comm<strong>and</strong>s: top<br />

– It shows various information about the top processed currently running<br />

– Also shows general system information<br />

– All information is periodically updates<br />

– top seems to be more consistent between different <strong>UNIX</strong> dialects, <strong>and</strong> is <strong>of</strong>ten<br />

preferred for interactive use (or even for scripting)<br />

top also can be used to send signals to processes<br />

– Press [k] <strong>and</strong> then specify process <strong>and</strong> signal<br />

Non-interactive use <strong>of</strong> top (“better ps”):<br />

– top -b -n1 will print a single page in a ps-like manner<br />

For more information: man top or run top <strong>and</strong> hit [h] for help<br />

Stephan Schulz 441


top Example<br />

11:09pm up 8 days, 1:15, 7 users, load average: 0.59, 0.21, 0.07<br />

78 processes: 71 sleeping, 4 running, 3 zombie, 0 stopped<br />

CPU states: 95.2% user, 4.7% system, 0.0% nice, 0.0% idle<br />

Mem: 254576K av, 249892K used, 4684K free, 0K shrd, 7428K buff<br />

Swap: 522072K av, 30888K used, 491184K free 68440K cached<br />

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND<br />

12692 schulz 25 0 25548 24M 664 R 89.3 10.0 0:08 eprover<br />

1040 root 15 0 89416 15M 5424 S 5.5 6.4 919:35 X<br />

1097 schulz 15 0 2324 2124 1676 S 3.7 0.8 0:15 xterm<br />

12693 schulz 16 0 924 924 728 R 1.1 0.3 0:00 top<br />

1096 schulz 15 0 2512 2252 1708 R 0.1 0.8 0:07 xterm<br />

1 root 15 0 472 432 416 S 0.0 0.1 0:04 init<br />

2 root 15 0 0 0 0 SW 0.0 0.0 0:04 keventd<br />

3 root 15 0 0 0 0 SW 0.0 0.0 0:00 kapmd<br />

4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ks<strong>of</strong>tirqd_CPU0<br />

5 root 15 0 0 0 0 SW 0.0 0.0 0:09 kswapd<br />

6 root 15 0 0 0 0 SW 0.0 0.0 0:00 bdflush<br />

7 root 15 0 0 0 0 SW 0.0 0.0 0:00 kupdated<br />

8 root 25 0 0 0 0 SW 0.0 0.0 0:00 mdrecoveryd<br />

12 root 15 0 0 0 0 SW 0.0 0.0 0:01 kjournald<br />

...<br />

Stephan Schulz 442


Catching Signals<br />

User programs can set up a signal h<strong>and</strong>ler to catch signals<br />

– A signal h<strong>and</strong>ler is a normal function<br />

– It has to be explicitely set up for each signal type<br />

– It will be called asynchronously when a signal <strong>of</strong> the correct type has been<br />

caught<br />

– When the signal h<strong>and</strong>ler returns, the program will resume execution at the old<br />

spot<br />

<strong>UNIX</strong> implements several different ways <strong>of</strong> h<strong>and</strong>ling signals, we will concentrate<br />

on the ANSI C signal h<strong>and</strong>ling<br />

– All use the same signal: Signals are small integers<br />

– However, for all existing signals, we use the #defined name showed above<br />

(SIGHUP. . . )<br />

Signal h<strong>and</strong>ling stuff is defined in <br />

Stephan Schulz 443


ANSI C Signal H<strong>and</strong>ling with signal.h<br />

signal.h defines the signal() function for establishing signal h<strong>and</strong>lers as<br />

follows:<br />

void (*signal(int sig, void (*h<strong>and</strong>ler)(int)))(int)<br />

Huh?<br />

Stephan Schulz 444


ANSI C Signal H<strong>and</strong>ling with signal.h<br />

signal.h defines the signal() function for establishing signal h<strong>and</strong>lers as<br />

follows:<br />

void (*signal(int ig, void (*h<strong>and</strong>ler)(int)))(int)<br />

We can break this definition up as follows:<br />

typedef void (*SigH<strong>and</strong>ler)(int);<br />

SigH<strong>and</strong>ler signal(int sig, SigH<strong>and</strong>ler h<strong>and</strong>ler);<br />

– The first argument to signal() is the signal to be caught<br />

– The second argument is a pointer to the new signal h<strong>and</strong>ler<br />

– Return value is a pointer to the old signal h<strong>and</strong>ler for that signal (or SIG ERR<br />

if no signal h<strong>and</strong>ler could be established)<br />

Predefined (pseudo) signal h<strong>and</strong>lers (possible arguments to signal():<br />

– SIG DFL: Revert to the default behaviour for that signal<br />

– SIG IGN: Ignore the signal from now on<br />

Stephan Schulz 445


ANSI C Signal H<strong>and</strong>ling (Continued)<br />

Additional definitions in signal.h:<br />

sig atomic t is an integer type<br />

– We are guartanteed that an assignment to a variable <strong>of</strong> this type is atomic,<br />

i.e. will not be interrupted by e.g. another signal<br />

– That means that it’s value will always be well-defined<br />

int raise(int sig) raises a signal to the program<br />

– Return value: 0 on success, something else otherwise<br />

Stephan Schulz 446


ANSI C Signal H<strong>and</strong>ers<br />

A signal h<strong>and</strong>ler is a function that returns nothing <strong>and</strong> gets the signal that was<br />

caught as an argument<br />

There are several limitations on signal h<strong>and</strong>ler:<br />

– Since signals can arrive asynchronously, the state <strong>of</strong> the program is not<br />

well-defined!<br />

– Signals may be h<strong>and</strong>led even within a single C statement<br />

– Therefore a signal h<strong>and</strong>ler cannot make many assumptions about the state <strong>of</strong><br />

the program<br />

– For maximum portability, a signal h<strong>and</strong>ler should only<br />

∗ Reestablish itself by calling signal()<br />

∗ Assing a value to a variable <strong>of</strong> type volatile sig atomic t<br />

∗ Return or terminate the program (e.g. calling exit())<br />

Once a signal has been caught, the signal h<strong>and</strong>ler for that signal is reset to<br />

default behaviour<br />

– If you want to catch multiple signals, the signal h<strong>and</strong>ler has to reestablish itself<br />

Stephan Schulz 447


Common <strong>UNIX</strong> functions: sleep()<br />

Often, a program only has to perform task only occasionally, or it has to wait for<br />

a certain event to happen. ANSI C has no way <strong>of</strong> delaying a program<br />

– Old-style home computer programmers use busy delay loop<br />

– However, those are unacceptable on multi-user systems<br />

– Moreover, they can usually be optimized away by a good compiler<br />

All <strong>UNIX</strong> versions address this problem with the sleep() function (normally<br />

defined in ):<br />

unsigned int sleep(unsigned int seconds);<br />

sleep() makes the current process sleep (do nothing ;-) until either<br />

– (At least) seconds seconds have elapsed or<br />

– A non-ignored signal arrives<br />

Return value:<br />

– 0 if sleep terminated because <strong>of</strong> elapsed time<br />

– Number <strong>of</strong> seconds left when the process was awakened by a signal<br />

Stephan Schulz 448


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

Example: Counting Signals (Fluff)<br />

typedef void (*SigH<strong>and</strong>ler)(int);<br />

volatile sig_atomic_t sig_int_flag = 0;<br />

volatile sig_atomic_t sig_term_flag = 0;<br />

void EstablishSignal(int sig, SigH<strong>and</strong>ler h<strong>and</strong>ler)<br />

{<br />

SigH<strong>and</strong>ler res;<br />

}<br />

res = signal(sig, h<strong>and</strong>ler);<br />

if(res == SIG_ERR)<br />

{<br />

perror("Could not establish signal h<strong>and</strong>ler");<br />

exit(EXIT_FAILURE);<br />

}<br />

Stephan Schulz 449


Example: Counting Signals (The Signal H<strong>and</strong>lers)<br />

void sig_int_h<strong>and</strong>ler(int sig)<br />

{<br />

EstablishSignal(SIGINT, sig_int_h<strong>and</strong>ler);<br />

}<br />

assert(sig == SIGINT);<br />

printf("Caught SIGINT!\n"); /* Risky */<br />

sig_int_flag = 1;<br />

void sig_term_h<strong>and</strong>ler(int sig)<br />

{<br />

EstablishSignal(SIGTERM, sig_term_h<strong>and</strong>ler);<br />

assert(sig == SIGTERM);<br />

printf("Caught SIGTERM!\n"); /* Risky! */<br />

sig_term_flag = 1;<br />

}<br />

Stephan Schulz 450


int main(int argc, char* argv[])<br />

{<br />

int i;<br />

int int_counter = 0;<br />

}<br />

Example: Counting Signals (Main)<br />

EstablishSignal(SIGTERM, sig_term_h<strong>and</strong>ler);<br />

EstablishSignal(SIGINT, sig_int_h<strong>and</strong>ler);<br />

for(i=0; i


(Change to Terminal!)<br />

Example Session: Live<br />

Stephan Schulz 452


Exercises<br />

Start a long running process (e.g. top) in one xterm<br />

Send it various signals <strong>and</strong> see how it behaves<br />

Read man 7 signal, man kill <strong>and</strong> man 2 kill<br />

Download the source to the signal h<strong>and</strong>ler example <strong>and</strong> play with it<br />

– Send different signals to it<br />

– Add your own signal h<strong>and</strong>ler<br />

– Write a generic signal h<strong>and</strong>ler function that catches more than one signal (<strong>and</strong><br />

works correctly for multiple signals)<br />

Stephan Schulz 453


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

The <strong>UNIX</strong> File System<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


<strong>UNIX</strong> philosophy: Everything is a file<br />

<strong>UNIX</strong> File System<br />

– Plain files<br />

– Hardware devices (Keyboard, mouse, hard drives)<br />

– Network connections<br />

Consequently, <strong>UNIX</strong> specifies a lot more properties <strong>and</strong> has more ways <strong>of</strong><br />

manipulating a file then ANSI C<br />

– Low-level IO<br />

– File access rights<br />

– Different file types<br />

Note: These are not ANSI C features<br />

– We have to call gcc without the -ansi option to use most <strong>of</strong> these features<br />

(otherwise, most <strong>UNIX</strong> extensions are disabled)<br />

Stephan Schulz 455


Regular files<br />

<strong>UNIX</strong> File Types (1)<br />

– Boring old data file (most common type <strong>of</strong> file)<br />

– <strong>UNIX</strong> does not care what is inside that file<br />

Directories<br />

– Stores names <strong>and</strong> pointers to more information<br />

– Write access is limited to kernel file system functions to assure the integrity <strong>of</strong><br />

the file system<br />

Character special files<br />

– Represent hardware devices that generate individual characters (/dev/kbd,<br />

/dev/mouse)<br />

Block special files<br />

– Represent hardware where data is available in fixed-size blocks (e.g. hard<br />

drives, /dev/hda in Linux)<br />

Stephan Schulz 456


FIFOs (named pipes)<br />

<strong>UNIX</strong> File Types (2)<br />

– Special files used for interprocess communication<br />

Sockets<br />

– Special files used for network communication (or local interprocess communication)<br />

– Not available in all <strong>UNIX</strong> versions (some don’t represent network connections<br />

as files in the file system)<br />

Symbolic links<br />

– A symbolic link is a file containing just a file name<br />

– The kernel normally automatically redirects any access to the link to the named<br />

file<br />

Stephan Schulz 457


The stat() Functions<br />

The three functions in the stat family all allow us to extract information about<br />

a file<br />

– Who owns it<br />

– How big is it<br />

– What kind <strong>of</strong> file is it<br />

– . . .<br />

They are specified as follows:<br />

#include <br />

#include <br />

int stat(const char *pathname, struct stat *buf);<br />

int fstat(int filedes, struct stat *buf);<br />

int lstat(const char *pathname, struct stat *buf);<br />

Stephan Schulz 458


The stat() Functions (2)<br />

All three functions perform the same basic function:<br />

– Write information about a file into the structure buf points to (<strong>and</strong> which we<br />

have to provide)<br />

– Return 0 if the operation was possible, -1 otherwise (in which case they also<br />

set errno)<br />

Differences:<br />

– fstat() accepts a low level file descriptor referring to an open file<br />

– lstat() will not follow symbolic links, but give information about the link<br />

itself (stat() given information about the file pointed to)<br />

How exactly struct stat is defined may differ<br />

– It always constains certain st<strong>and</strong>ard members<br />

Stephan Schulz 459


The stat() Functions (3)<br />

struct stat {<br />

dev_t st_dev; /* device number*/<br />

dev_t st_rdev; /* device type (if inode device) */<br />

ino_t st_ino; /* inode number */<br />

mode_t st_mode; /* access rights <strong>and</strong> file type */<br />

nlink_t st_nlink; /* number <strong>of</strong> hard links */<br />

uid_t st_uid; /* user ID <strong>of</strong> owner */<br />

gid_t st_gid; /* group ID <strong>of</strong> owner */<br />

<strong>of</strong>f_t st_size; /* total size, in bytes */<br />

unsigned long st_blksize; /* blocksize for filesystem I/O */<br />

unsigned long st_blocks; /* number <strong>of</strong> blocks allocated */<br />

time_t st_atime; /* time <strong>of</strong> last access */<br />

time_t st_mtime; /* time <strong>of</strong> last modification */<br />

time_t st_ctime; /* time <strong>of</strong> last change */<br />

};<br />

Interpretation <strong>of</strong> some fields is supported by predefine macros<br />

– E.g. st mode<br />

Stephan Schulz 460


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

void stat_file(char *fname)<br />

{<br />

struct stat buff;<br />

char* type = "Unknown";<br />

if(lstat(fname, &buff) < 0)<br />

{<br />

err_sys("lstat");<br />

}<br />

Example: Simple ls -l Version<br />

Stephan Schulz 461


if(S_ISREG(buff.st_mode))<br />

{<br />

type = "Regular file";<br />

}<br />

else if(S_ISDIR(buff.st_mode))<br />

{<br />

type = "Directory";<br />

}<br />

else if(S_ISCHR(buff.st_mode))<br />

{<br />

type = "Character special file";<br />

}<br />

else if(S_ISBLK(buff.st_mode))<br />

{<br />

type = "Block special file";<br />

}<br />

else if(S_ISFIFO(buff.st_mode))<br />

{<br />

type = "Pipe or FIFO";<br />

}<br />

Example Continued<br />

Stephan Schulz 462


}<br />

Example Continued<br />

else if(S_ISLNK(buff.st_mode))<br />

{<br />

type = "Symbolic link";<br />

}<br />

else if(S_ISSOCK(buff.st_mode))<br />

{<br />

type = "Socket";<br />

}<br />

printf("%-30s %10ld Bytes %s\n", fname, buff.st_size,type);<br />

int main(int argc, char *argv[])<br />

{<br />

int i;<br />

}<br />

for(i=1; i


$ /SOURCES/CSC 322/myls *<br />

Example Output<br />

BINTREE 533 Bytes Directory<br />

LIST_DEMO 549 Bytes Directory<br />

Makefile 1322 Bytes Regular file<br />

Makefile~ 1277 Bytes Regular file<br />

RPN_CALC 630 Bytes Directory<br />

RPN_CALC.tgz 10197 Bytes Regular file<br />

SORT 373 Bytes Directory<br />

a.out 13756 Bytes Regular file<br />

base_converter 14634 Bytes Regular file<br />

base_converter.c 1918 Bytes Regular file<br />

base_converter.c~ 430 Bytes Regular file<br />

celsius2fahrenheit 13633 Bytes Regular file<br />

celsius2fahrenheit.c 395 Bytes Regular file<br />

charcount 13639 Bytes Regular file<br />

charcount.c 216 Bytes Regular file<br />

charcount.c~ 114 Bytes Regular file<br />

charuniq 13643 Bytes Regular file<br />

charuniq.c 571 Bytes Regular file<br />

...<br />

Stephan Schulz 464


Example Output (<strong>of</strong> device directory /dev/)<br />

$ /SOURCES/CSC 322/myls *<br />

...<br />

cdrom 8 Bytes Symbolic link<br />

cdu535 0 Bytes Block special file<br />

cfs0 0 Bytes Character special file<br />

cm205cd 0 Bytes Block special file<br />

cm206cd 0 Bytes Block special file<br />

console 0 Bytes Character special file<br />

core 11 Bytes Symbolic link<br />

cpu 196 Bytes Directory<br />

cua0 0 Bytes Character special file<br />

cua1 0 Bytes Character special file<br />

...<br />

ham 0 Bytes Character special file<br />

hda 0 Bytes Block special file<br />

hda1 0 Bytes Block special file<br />

hda10 0 Bytes Block special file<br />

hda11 0 Bytes Block special file<br />

hda12 0 Bytes Block special file<br />

...<br />

Stephan Schulz 465


Links<br />

Links form a connection between a file name <strong>and</strong> the actual file<br />

There are two kinds <strong>of</strong> links:<br />

– Hard links<br />

– Symbolic (or s<strong>of</strong>t) links<br />

A hard link links a name <strong>and</strong> a file<br />

– Each file can have multiple hard links<br />

– All are equivalent (no concept <strong>of</strong> “original link”), access is equally efficient for<br />

all hard links<br />

– rm actually only removes a link, if the number <strong>of</strong> links becomes 0, the file is<br />

finally removed)<br />

– Typically, it is only possible to have hard links to a file on the same physical<br />

partition or medium<br />

Stephan Schulz 466


Links (2)<br />

S<strong>of</strong>t links create indirect aliases for a file<br />

– They are just files that contain another file name<br />

– Following a s<strong>of</strong>t link incurrs a small performance penalty<br />

– Symbolic links point anywhere in the file system (no limitations as to physical<br />

medium, networked file system, . . . )<br />

– Symbolic links do not influence the file pointed to at all!<br />

– If the file does not exist any more, the link still exists, but is broken<br />

Most user-created links are s<strong>of</strong>t links nowadays<br />

– Used to share files<br />

– Used to hide file system reorganization<br />

Stephan Schulz 467


<strong>UNIX</strong> User Comm<strong>and</strong>s: ln<br />

ln is used to create both hard <strong>and</strong> symbolic links<br />

Usage is similar to mv <strong>and</strong> cp:<br />

– ln : Create a link to in the current directory (under the<br />

same file name)<br />

– ln : Make a link to <br />

– ln ... : Create links to all targets in the current<br />

directories<br />

Important option: -s (create symbolic links)<br />

More: man ln<br />

Stephan Schulz 468


Exercises<br />

Read man stat <strong>and</strong> extend the ls example to show more information (e.g.<br />

everything ls -l shows)<br />

Explain the difference between mv filea fileb, cp filea fileb <strong>and</strong> ln<br />

filea fileb<br />

Stephan Schulz 469


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

The <strong>UNIX</strong> File System<br />

File modes<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


All files have an owner (a user)<br />

File Ownership<br />

– ls -l displays the user name (if available) or the numerical user id (e.g. for<br />

files <strong>of</strong> a user that no longer exists)<br />

Similarly, each file has a group associated with it<br />

– This will be similarly displayed by ls -l<br />

Owner <strong>and</strong> group <strong>of</strong> a file determine who has what kind <strong>of</strong> access to that file.<br />

Access types are<br />

– Read access (open a file for reading, reading data)<br />

– Write access (change a file)<br />

– Execute access (run a file as a program, or, for directories, access file names in<br />

that directory<br />

Stephan Schulz 471


ls -l Output Explained<br />

−rw−r−−r−− 1 schulz schulz 1283190 Nov 11 10:35 <strong>CSC322</strong>.pdf<br />

File size in Bytes (st_size)<br />

Group that owns the file (st_gid)<br />

User that owns the file (st_uid)<br />

File access rights (encoded in st_mode)<br />

File type (encoded in st_mode)<br />

Number <strong>of</strong> hard links (st_nlink)<br />

Filename<br />

Modification time (st_mtime)<br />

Note: All information (except for the file name) are available by calling one <strong>of</strong><br />

the stat() functions!<br />

Stephan Schulz 472


User Groups<br />

Groups are used in <strong>UNIX</strong> to give a group <strong>of</strong> users the ability to access a common<br />

resource<br />

– Most obvious use: Share files on the disk<br />

– In practice more important: Allow access to a hardware device (Note: A<br />

modem is a file, e.g. /dev/modem!)<br />

Every user belongs to a primary group<br />

– The primary group for a user is listed in the passwd file (as a numerical group<br />

id or gid):<br />

schulz:x:500:500:Stephan Schulz:/home/schulz:/bin/tcsh<br />

∗ For normal <strong>UNIX</strong> systems, /etc/passwd<br />

∗ For systems running NIS, see the file with ypcat passwd<br />

– After logging in, the users primary group is active (the gid <strong>of</strong> the shell has the<br />

value for the primary group)<br />

∗ Processes started by another process (including the shell) inherit the gid<br />

Stephan Schulz 473


Groups (Continued)<br />

Additional group information is in /etc/group:<br />

– For each group, a symbolic name (displayed by ls -l) <strong>and</strong> a list <strong>of</strong> users<br />

belonging to that group:<br />

daemon:x:2:root,bin,daemon<br />

schulz:x:500:<br />

Secondary groups are additional groups which list the user as a member<br />

– A user can explicitely change to such a group using the newgrp comm<strong>and</strong><br />

(man newgrp)<br />

Stephan Schulz 474


<strong>UNIX</strong> User Utilities: chown <strong>and</strong> chgrp<br />

chown is used to change the owner <strong>of</strong> a file<br />

– Usage: chown ...<br />

– On most systems, only root is allowed to use chown (there are security issues<br />

even with giving away files!)<br />

chgrp changes the group <strong>of</strong> a file<br />

– Usage: chgrp ...<br />

– On most systems, you can only change the group <strong>of</strong> a file to a group in which<br />

you are a member (see above)<br />

Important option for both: -R<br />

– Recursively apply the operation to subdirectories <strong>and</strong> files in them<br />

Stephan Schulz 475


File Mode Bits<br />

The status word <strong>of</strong> a file (the st mode field in struct stat also contains 9 bits<br />

describing file access rights<br />

– Note: These rights exist for all files, includding special files <strong>and</strong> directories!<br />

There are three different groups with potentiallty different access rights:<br />

– The user who owns the file<br />

– Members <strong>of</strong> the group associated with the file<br />

– Other users<br />

There are also three different types <strong>of</strong> access:<br />

– Read access<br />

– Write access<br />

– Execute access<br />

There are three more bits describing special properties<br />

– The setuid bit: If true, the file will run under the effective user id <strong>of</strong> the<br />

program owner (not the one who started it)<br />

– The setgid bit: Same thing for the group id<br />

– The sticky bit with complex semantics <strong>and</strong> interesting history<br />

Stephan Schulz 476


Symbolic Encoding<br />

ls -l prints a string <strong>of</strong> 9 letters to represent the 9 12 file mode bits<br />

Normal case: The setuid, setgid <strong>and</strong> sticky bit are all cleared (0):<br />

– The mode has the form uuugggooo to encode user, group, <strong>and</strong> other access<br />

rights<br />

– Each letter may be - to denote that that bit is clear<br />

– Or it may have the mnemonic value <strong>of</strong> that right:<br />

∗ r for read (first letter)<br />

∗ w for write (second letter)<br />

∗ x for execute (third letter)<br />

If one <strong>of</strong> the special bits is set, this is denoted by changing the last letter <strong>of</strong> each<br />

group (x) to another letter. Common cases (more: info ls):<br />

– s in the user executable position: The file is user executable <strong>and</strong> the setuid<br />

bit is set<br />

– s in the group executable position: The file is group executable, <strong>and</strong> the<br />

setgid bit is set<br />

Stephan Schulz 477


Numerical Encoding<br />

The 12 permission bits are normally represented by 4 octal digits (each digit<br />

represents 3 bits):<br />

– 0001 represents execute access for others<br />

– 0002 represents write access for others<br />

– 0004 represents read access for others<br />

– 0010 represents execute access for group<br />

– 0020 represents write access for group<br />

– 0040 represents read access for group<br />

– 0100 represents execute access for user<br />

– 0200 represents write access for user<br />

– 0400 represents read access for user<br />

– 1000 is the sticky bit<br />

– 2000 is the setgid bit<br />

– 4000 is the setuid bit<br />

To generate a composite mode, just add up the individual modes<br />

Leading zeroes (especially the first one) are <strong>of</strong>ten omitted<br />

Stephan Schulz 478


Examples<br />

rw-r--r-- is the most common mode for a regular file on a conventional <strong>UNIX</strong><br />

system:<br />

– The user is allowed to read <strong>and</strong> write the file<br />

– Everyone else is allowed to read the file (no secrets ;-)<br />

– Corresponding numerical value:<br />

0004 Other read<br />

0040 Group read<br />

0400 User read<br />

0200 User write<br />

---------------<br />

0644<br />

Numeric mode 666 (the number <strong>of</strong> the beast) gives full read <strong>and</strong> write access for<br />

everyone (rw-rw-rw-)<br />

– Some people claim that this is not coincidence. . .<br />

Stephan Schulz 479


<strong>UNIX</strong> User Utilities: chmod<br />

chmod is used to change the file access bits<br />

Usage 1: chmod files<br />

– Sets the file mode <strong>of</strong> the named files to the octal mode absolutely<br />

Usage 2: chmod files<br />

– The symbolic mode comm<strong>and</strong> can add or remove privileges for the different<br />

groups<br />

– Format: <br />

∗ can be any sequence <strong>of</strong> letters from ugo or a (equivalent to ugo)<br />

∗ can be<br />

· + to add rights<br />

· - to remove rights<br />

· = to absolutely assign rights<br />

∗ can be any combination <strong>of</strong> letters from rwx<br />

Important option: -R<br />

– Recursively modify files <strong>and</strong> subdirectories<br />

Stephan Schulz 480


chmod Examples<br />

chmod ugo+rwx myfile # Grant full access rights to everybody<br />

chmod 777 myfile # Grant full access rights to everybody<br />

chmod -R go-rwx . # Paranoid: Remove read, write, <strong>and</strong> exute<br />

# rights for all other people on the current<br />

# directory <strong>and</strong> all files <strong>and</strong> subdirectory<br />

chmod -R 644 . # Trying to fix things, but removed all<br />

# execute rights from programs _<strong>and</strong>_<br />

# directories (makes things hard to fix ;-)<br />

Stephan Schulz 481


File Mode Creation Mask<br />

Each process maintains a file mode creation mask<br />

– This mask determines, which access rights are granted for newly created files<br />

<strong>and</strong> directories<br />

– The colloquial name is umask<br />

– The umask is inherited by new processes started (i.e. your files will be created<br />

with rights based on the umask <strong>of</strong> your shell)<br />

The umask contains 9 bits, corresponding to the rwxrwxrwx access rights<br />

– Bits set in the mask are always cleared<br />

– All other rights are granted by default (with the x bits only set for executables<br />

<strong>and</strong> directories)<br />

The shell maintains a umask that can be set with the umask comm<strong>and</strong> (which<br />

is normally in a user configuration file)<br />

– Example: umask 022<br />

– Removes write permissions for everybody but the owner<br />

Stephan Schulz 482


Exercises<br />

Read the man <strong>and</strong> info pages on chmod, chown <strong>and</strong> chgrp<br />

The <strong>UNIX</strong> comm<strong>and</strong>s chmod <strong>and</strong> chown correspond to system calls <strong>of</strong> the same<br />

name. To find out how they work, read:<br />

– man 2 chmod<br />

– man 2 chown<br />

Use this information to implement a rudimentary version <strong>of</strong> chmod<br />

Stephan Schulz 483


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

The <strong>UNIX</strong> File System<br />

File Descriptors<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


File Decriptors<br />

Files are identified for the kernel as file descriptors<br />

– A file descriptor is a small, non-negative integer<br />

– It’s used as an index into the file descriptor table <strong>of</strong> a process to obtain more<br />

information<br />

For many purposes, file descriptors are quite similar to file pointers (FILE*) from<br />

the C st<strong>and</strong>ard I/O library<br />

Hower, file descriptor I/O is much more lowlevel<br />

– No formatted I/O<br />

– No buffering – each I/O operation directly causes a system call to actually<br />

perform the data transfer<br />

Notes:<br />

– <strong>UNIX</strong>’s st<strong>and</strong>ard I/O library is implemented using file descriptors<br />

– Network communication also works via file descriptors<br />

Stephan Schulz 485


Opening Files: open()<br />

The open() system call opens a named file <strong>and</strong> returns a file descriptor (or -1<br />

on failure)<br />

It is defined as follows:<br />

#include <br />

#include <br />

#include <br />

int open(const char *pathname, int <strong>of</strong>lag, mode_t mode);<br />

Arguments:<br />

– pathname is a st<strong>and</strong>ard <strong>UNIX</strong> file name as for fopen()<br />

– <strong>of</strong>lag contains the options. The value is created by bitwise ORing <strong>of</strong> one <strong>of</strong><br />

the following values with a number <strong>of</strong> option flags:<br />

∗ O RDONLY: Open the file for reading<br />

∗ O WRONLY: Open the file for writing<br />

∗ O RDWR: Open for reading <strong>and</strong> writing<br />

– The third argument is only interpreted if open() is used for file creation (<strong>and</strong><br />

can be omitted otherwise)<br />

Stephan Schulz 486


Option flags for open()<br />

Note: All <strong>of</strong> the following flags have to be ORed (using the bitwise or operator |<br />

with the main access mode (O RDONLY, O WRONLY,O RDWR)<br />

Options:<br />

– O APPEND: All output on this file descriptor is appended at the end <strong>of</strong> the file<br />

– O CREAT: If the file does not exist, create it<br />

– O EXCL: Only used with O CREAT – give an error, if the named file already<br />

exists<br />

– O TRUNC: If the file exists <strong>and</strong> is opened for writing or r/w, truncate it to<br />

lenght 0<br />

– O SYNC: Only return from writes to that file when the physical output is<br />

complete<br />

There are some more flags that we only discuss when necessary<br />

Example: fd = open("/tmp/testfile", O WDONLY|O APPEND|O SYNC)<br />

Stephan Schulz 487


Using open() to create files<br />

If the option O CREAT is given, open() will create a file if no file with the given<br />

name exists<br />

This also requires the third argument to open() (which otherwise is ignored or<br />

can be omitted)<br />

– This argument describes the access rights set for the new file<br />

– It is created by binary ORing <strong>of</strong> the following constants:<br />

S IRUSR Read Permission for the user<br />

S IWUSR Write permission for the user<br />

S IXUSR Execute permission for the user<br />

S IRGRP Read Permission for the group<br />

S IWGRP Write permission for the group<br />

S IXGRP Execute permission for the group<br />

S IROTH Read Permission for the others<br />

S IWOTH Write permission for the others<br />

S IXOTH Execute permission for the others<br />

– Note: These are the same values used byst mode in struct stat<br />

Stephan Schulz 488


Notes on open() <strong>and</strong> close()<br />

The mode given to open() is modified by the umask <strong>of</strong> the process<br />

The file mode is only set if the file is actually created (not even if it exists but is<br />

truncated with O TRUNC)<br />

A file is closed using the close() function:<br />

#include <br />

int close(int fd);<br />

– Return value: 0 on success, -1 on failure<br />

There are three predefined file descriptors that are open by default, corresponding<br />

to the 3 st<strong>and</strong>ard I/O channels:<br />

– STDIN FILENO (traditionally 0)<br />

– STDOUT FILENO (traditionally 1)<br />

– STDERR FILENO (traditionally 2)<br />

Stephan Schulz 489


File Descriptor I/O: read() <strong>and</strong> write()<br />

The functions read() <strong>and</strong> write() perform unbuffered input <strong>and</strong> output:<br />

#include <br />

ssize_t read(int fd, void *buf, size_t count);<br />

ssize_t write(int fd, const void *buf, size_t count);<br />

– ssize t is an integer type defined in <br />

– fd is the file descriptor for input or output<br />

– buf is a pointer to an area <strong>of</strong> memory<br />

∗ write() reads the data to write from this buffer<br />

∗ read() stores the read data in the buffer<br />

– count is the number <strong>of</strong> bytes to transfer (<strong>and</strong> should not be bigger than the<br />

size <strong>of</strong> *buf!)<br />

Both functions return the number <strong>of</strong> bytes transmitted<br />

– For write(), a smaller number than requested signals an error<br />

– For read():<br />

∗ 0 indicates end <strong>of</strong> file<br />

∗ -1 signals error<br />

∗ Everything else is normal (there may be fewer characters than requested<br />

currently available)<br />

Stephan Schulz 490


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

Example: Simple cat<br />

Stephan Schulz 491


#define BUF_SIZE 1024<br />

int main(int argc, char* argv[])<br />

{<br />

int fd;<br />

char buf[BUF_SIZE];<br />

ssize_t count, check;<br />

Example Continued<br />

if(argc!=2)<br />

{<br />

fprintf(stderr, "USAGE: mycat2 file");<br />

exit(EXIT_FAILURE);<br />

}<br />

fd = open(argv[1], O_RDONLY);<br />

if(fd == -1)<br />

{<br />

err_sys("open");<br />

}<br />

Stephan Schulz 492


}<br />

Example Continued<br />

while((count = read(fd,&buf,BUF_SIZE)))<br />

{<br />

if(count==-1)<br />

{<br />

err_sys("read");<br />

}<br />

check = write(STDOUT_FILENO, &buf, count);<br />

if(check!=count)<br />

{<br />

err_sys("write");<br />

}<br />

}<br />

if(close(fd) == -1)<br />

{<br />

err_sys("close");<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 493


The St<strong>and</strong>ard I/O Library <strong>and</strong> File Descriptors<br />

Remember that a file pointer is actually <strong>of</strong> type FILE*<br />

It typically points to a structure in an array<br />

– stdin points to element number 0<br />

– stdout points to element number 1<br />

– stderr points to element number 2<br />

– More elements are filled in for each use <strong>of</strong> fopen()<br />

Each <strong>of</strong> the structures contains:<br />

– A buffer<br />

– Some counters <strong>and</strong> positions to manage the buffer<br />

– A file descriptor<br />

– Flags for the access mode (read or write)<br />

Consider the case <strong>of</strong> writing:<br />

– All write comm<strong>and</strong>s just write into the buffer space<br />

– If the buffer is full or a fflush() comm<strong>and</strong> is issued (or the stream is closed),<br />

all <strong>of</strong> the buffer is written using a single write() comm<strong>and</strong><br />

Reading similarly reads a large block <strong>and</strong> h<strong>and</strong>s it out piecewise<br />

Stephan Schulz 494


Cheating with fdopen()<br />

Formatted, buffered output is very convenient <strong>and</strong> quite efficient for many small<br />

I/O operations (getchar(), fprintf(), . . . )<br />

– Normally much better than read() <strong>and</strong> write()<br />

– But some I/O methods only give us file descriptors (dammit!)<br />

Solution: The function fdopen() will generate an entry in the FILE array from<br />

a file descriptor <strong>and</strong> return the pointer to it<br />

#include <br />

FILE *fdopen(int fildes, const char *mode);<br />

filedes has to be an open file descriptor<br />

mode is a string as for fopen() ("r", "w". . . ) <strong>and</strong> must be compatible with<br />

the flags <strong>of</strong> the file descriptior<br />

Stephan Schulz 495


Exercises<br />

Write simple version <strong>of</strong> cp using open(), read() <strong>and</strong> write(). Use a default<br />

buffer size, but support an option -b that allows you to set the buffer size from<br />

the comm<strong>and</strong> line. Measure the speed <strong>of</strong> copying a large file for different sizes<br />

Examples:<br />

– mycp file1 file2 copies file1 to file2, using the default buffer size<br />

– mycp -b3 file2 file2 copies the file using a buffer <strong>of</strong> 3 bytes<br />

Use the fstat() comm<strong>and</strong> on both files to get the native block size <strong>of</strong> the file<br />

systems for both files (the st blksize field in struct stat). What do you<br />

notice? Can you write a better cp now?<br />

Stephan Schulz 496


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

More on File Descriptors<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


More on the <strong>UNIX</strong> I/O System<br />

The file descriptor typically is an index into a table that contains information<br />

about all open files <strong>of</strong> the process<br />

– That table contains just the flags (read/write) for that file descriptor <strong>and</strong> a<br />

pointer to the kernels global file table<br />

The file table is global <strong>and</strong> shared by all processes. It has one entry per opened<br />

file , containing:<br />

– File status flags (read, write, append,sync. . . , the things we passed to open()<br />

– Current <strong>of</strong>fset into the file: The position where the next read or write will start<br />

– A pointer to the vnode <strong>of</strong> the file<br />

∗ The vnode contains the file type <strong>and</strong> information about how to actually<br />

access the file, as well as the current real file size<br />

∗ It also gives us a way to access the inode that contains all the information<br />

we get with stat()<br />

∗ There is only one vnode per file, i.e. the vnode is the same for all file<br />

descriptors <strong>and</strong> all processes that access the same file<br />

Stephan Schulz 498


FILE* myfile<br />

St<strong>and</strong>ard IO Library<br />

The <strong>UNIX</strong> File I/O System<br />

FILE array entry<br />

Buffer<br />

File descriptor<br />

Process table File table entry<br />

fd n: flags :<br />

Status flags<br />

Offset<br />

Per−Process Data Structures Global, Shared Data Structures<br />

vnode table entry<br />

Actual<br />

current<br />

filesize<br />

...<br />

Stephan Schulz 499


Blocking vs. Nonblocking I/O<br />

All I/O we have seen so far is blocking<br />

– read() waits (blocks) until some input becomes available<br />

– It then returns the read data<br />

– Similarly, if write() temporarily cannot write the data, it blocks until it can<br />

Non-blocking I/O always returns immediately from the I/O function<br />

– If the I/O failed temporarily, the functions return -1<br />

– errno is set to EWOULDBLOCK<br />

Question: How do we achieve non-blocking I/O?<br />

Answer: By manipulating the file descriptor<br />

– Each file descriptor has a number <strong>of</strong> associated flags<br />

– One <strong>of</strong> these selects blocking vs. non-blocking behaviour<br />

Stephan Schulz 500


Manipulating File Descriptors: fcntl()<br />

fcntl() is a catch-all function for manipulating file descriptors<br />

#include <br />

#include <br />

int fcntl(int fd, int cmd);<br />

int fcntl(int fd, int cmd, long arg);<br />

int fcntl(int fd, int cmd, struct flock *lock);<br />

We are only interested in the use <strong>of</strong> fcntl() for getting <strong>and</strong> changing the file<br />

status flags:<br />

– O RDONLY, O WRONLY, O RDWR<br />

– O APPEND<br />

– O NONBLOCK<br />

– O SYNC<br />

– . . . (depending on <strong>UNIX</strong> version)<br />

fcntl() may return various values, depending on cmd<br />

– On error, it always returns -1 <strong>and</strong> sets errno<br />

Stephan Schulz 501


fcntl() Continued<br />

Using fcntl() to get the file status flags:<br />

flags = fcntl(fd, F_GETFL);<br />

– To interprete the result, we need to logically AND it with the flag we are<br />

interested in (see example)<br />

– To get the read/write status, AND the result with O ACCMODE<br />

To set the file status flags:<br />

fcntl(fd, F_SETFL, newflags);<br />

– If we only want to change a single flag, we have to get the old value <strong>and</strong> use<br />

binary operations to change just that flag!<br />

– Example:<br />

int flags = fcntl(STDIN_FILENO, F_GETFL);<br />

flags = flags | O_NONBLOCK;<br />

fcntl(STDIN_FILENO, F_SETFL, flags);<br />

Stephan Schulz 502


#include <br />

#include <br />

#include <br />

#include <br />

Example: Printing Flags for a File Descriptor<br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

Stephan Schulz 503


Example (2)<br />

void print_fd_file_status(int fd)<br />

{<br />

int flags = fcntl(fd, F_GETFL);<br />

if(flags == -1)<br />

{<br />

err_sys("fcntl");<br />

}<br />

printf("Flags for file descriptor %d\n", fd);<br />

switch(flags & O_ACCMODE)<br />

{<br />

case O_RDONLY:<br />

printf("Read only\n");<br />

break;<br />

case O_WRONLY:<br />

printf("Write only\n");<br />

break;<br />

case O_RDWR:<br />

printf("Read/Write\n");<br />

break;<br />

default:<br />

printf("Strange\n");<br />

}<br />

Stephan Schulz 504


if(flags & O_APPEND)<br />

{<br />

printf("Append is set\n");<br />

}<br />

if(flags & O_NONBLOCK)<br />

{<br />

printf("Non-blocking\n");<br />

}<br />

if(flags & O_SYNC)<br />

{<br />

printf("Synchronous writes\n");<br />

}<br />

}<br />

int main(int argc, char* argv[])<br />

{<br />

print_fd_file_status(STDIN_FILENO);<br />

print_fd_file_status(STDOUT_FILENO);<br />

print_fd_file_status(STDERR_FILENO);<br />

print_fd_file_status(42);<br />

return EXIT_SUCCESS;<br />

}<br />

Example (3)<br />

Stephan Schulz 505


$./fcntl_example<br />

Flags for file descriptor 0<br />

Read/Write<br />

Flags for file descriptor 1<br />

Read/Write<br />

Flags for file descriptor 2<br />

Read/Write<br />

fcntl: Bad file descriptor<br />

$./fcntl_example < signal_test.c<br />

Flags for file descriptor 0<br />

Read only<br />

Flags for file descriptor 1<br />

Read/Write<br />

Flags for file descriptor 2<br />

Read/Write<br />

fcntl: Bad file descriptor<br />

Example Output<br />

Stephan Schulz 506


Multiplexing I/O<br />

Often, a program has to be able to read data from multiple sources<br />

– Data from the user<br />

– Data from the network<br />

– Data from a file that is in the process <strong>of</strong> being written<br />

Bad solution: Polling<br />

– Switch all file descriptors to non-blocking<br />

– Test them one after the other, until one <strong>of</strong> them has data<br />

– Uses to much system resources!<br />

Minimally better: Polling with a short waiting time between I/O attempts<br />

– But: Lousy reaction time<br />

Right solution: Use the right tool (select())<br />

Stephan Schulz 507


Multiplexing I/O: select()<br />

select() is used to watch a set <strong>of</strong> file descriptors for one <strong>of</strong> three conditions:<br />

– A file descriptor is ready for reading<br />

– A file descriptor is ready for writing<br />

– Is there an exceptional condition for a file descriptor?<br />

We can tell the function to either<br />

– Return immediately, telling us the current status<br />

– Wait until at least one <strong>of</strong> the conditions becomes true<br />

– Wait until at least one <strong>of</strong> the conditions becomes true, but at most a fixed<br />

amount <strong>of</strong> time<br />

Specification:<br />

#include <br />

#include <br />

#include <br />

int select(int max_fdp1, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,<br />

struct timeval *tvptr);<br />

Stephan Schulz 508


select() Arguments<br />

fd set is defined in sys/types.h<br />

– It is a data type that can store a set <strong>of</strong> file descriptors<br />

– We only know how to manipulate it:<br />

∗ FD ZERO(fd set *set) removes all file descriptors from the set<br />

∗ FD SET(int fd, fd set *set) inserts fd into the set<br />

∗ FD CLR(int fd, fd set *set) removes fd from the set<br />

∗ FD ISSET(int fd, fd set *set) returns true, if fd is contained in *set<br />

The three fd set* arguments are used for input <strong>and</strong> output <strong>of</strong> select()<br />

– The fd set structures the arguments point to describe which file descriptors<br />

we are interested in<br />

– If the pointer is NULL, we are not interested in any file descriptor for the<br />

corresponding property<br />

– If select() returns, the set have been modified to contain just the descriptors<br />

for which the property is true<br />

int max fdp1 has to be at least one bigger than the biggest file descriptor in<br />

any one <strong>of</strong> the three sets<br />

– It is used to speed up things in the <strong>UNIX</strong> kernel<br />

Stephan Schulz 509


select() Arguments <strong>and</strong> Return Value<br />

The last argument to select() is a pointer to a struct timeval<br />

This struct has two fields:<br />

– long tv_sec; /* Seconds */<br />

– long tv_usec; /* Microseconds */<br />

There are two possible cases:<br />

– tvptr is NULL: In this case, select() waits until one <strong>of</strong> the file descriptors is<br />

ready (or a signal is caught)<br />

– tvptr points to a valid struct timeval: In this case, select() waits at<br />

most the specified time<br />

Return value:<br />

– -1 on error or if select() returned because <strong>of</strong> a signal (errno will be set!)<br />

– Otherwise, the number <strong>of</strong> file descriptors for which the specified condition is<br />

true is returned<br />

Stephan Schulz 510


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

int main(int argc, char* argv[])<br />

{<br />

fd_set readfds;<br />

fd_set writefds;<br />

int res;<br />

FD_ZERO(&readfds);<br />

FD_ZERO(&writefds);<br />

FD_SET(STDIN_FILENO, &readfds);<br />

FD_SET(STDOUT_FILENO, &writefds);<br />

FD_SET(STDERR_FILENO, &writefds);<br />

Example<br />

res = select(3, &readfds, &writefds, NULL, NULL);<br />

printf("%d file descriptors are ready\n", res);<br />

Stephan Schulz 511


}<br />

Example (2)<br />

if(FD_ISSET(STDIN_FILENO, &readfds))<br />

{<br />

printf("STDIN is ready for reading\n");<br />

}<br />

if(FD_ISSET(STDOUT_FILENO, &writefds))<br />

{<br />

printf("STDOUT is ready for writing\n");<br />

}<br />

if(FD_ISSET(STDERR_FILENO, &writefds))<br />

{<br />

printf("STDERR is ready for writing\n");<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 512


$ ./select_example<br />

2 file descriptors are ready<br />

STDOUT is ready for writing<br />

STDERR is ready for writing<br />

$ ./select_example < select_example.c<br />

3 file descriptors are ready<br />

STDIN is ready for reading<br />

STDOUT is ready for writing<br />

STDERR is ready for writing<br />

Example Output<br />

Stephan Schulz 513


Internet Assignment (I)<br />

On the assignment home page you will find links to two binary programs, a chat<br />

server <strong>and</strong> a chat client. In the end, you should turn in a program that has the<br />

same functionality as the client<br />

Step 1:<br />

– Download the programs <strong>and</strong> underst<strong>and</strong> what they do<br />

– To start the server, type ./chat server , where is an integer<br />

greater than 1024<br />

– To connect to the sever, type ./chat client <br />

∗ is the IP-Address <strong>of</strong> the server host (use 127.0.0.1 if the server<br />

runs on the same host, use nslookup for other hosts)<br />

∗ is the same number as used for the server<br />

∗ is the nickname under which you will chat<br />

Caveats:<br />

– Due to firewalling, do not expect to be able to reach a server from outside the<br />

lab<br />

– The binaries only run under Linux<br />

I will try to keep a server running on lee on port 6666 for all <strong>of</strong> you to share<br />

Stephan Schulz 514


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Basic <strong>UNIX</strong> Network <strong>Programming</strong><br />

Introduction<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html


Networking<br />

Stephan Schulz 516


Networking<br />

Stephan Schulz 517


Networking<br />

Stephan Schulz 518


Networking<br />

Stephan Schulz 519


Networking<br />

Stephan Schulz 520


Networking<br />

Stephan Schulz 521


Networking<br />

The Internet<br />

Stephan Schulz 522


Networking Concepts<br />

Is communication occuring between two partners, or is it a broadcast communication?<br />

– How are partners identified (addressed)?<br />

Is traffic stream oriented or packet oriented?<br />

– Stream-oriented: Messages arrive as stream <strong>of</strong> bytes (similar to reading from<br />

a file)<br />

– Packet oriented: Traffic arrives in the form <strong>of</strong> distinct pakets <strong>of</strong> a fixed (or<br />

fixed maximal) size<br />

Is the communication reliable or unreliable?<br />

– Can messages disappear?<br />

– Can the order <strong>of</strong> messages change?<br />

– Can messages be duplicated?<br />

Stephan Schulz 523


Network Layers<br />

Level 0: Physical or Hardware layer<br />

– Copper wires or optical fiber<br />

– Radio waves or laser beams for wireless protocols<br />

Level 1: Data Link Layer<br />

– How is data transported?<br />

– Examples: Ethernet, Token ring, ATM, ISDN<br />

Level 2: Network layer<br />

– How are individual hosts or networks assembles into a network?<br />

– Examples: Internet protocol (IP)<br />

Level 3: Transport layer<br />

– Converts from st<strong>and</strong>ard user pakets to network layer pakets<br />

– May include error checking <strong>and</strong> correcting<br />

– Examples: TCP <strong>and</strong> UDP<br />

Higher layers. . .<br />

– Take care <strong>of</strong> data representation at various levels<br />

Stephan Schulz 524


Level 2 protocol (Hardware-Agnostic)<br />

Prevalent protocol today: IPv4<br />

The Internet Protocol (IP)<br />

– Unreliable (“best effort”)<br />

– Packet-oriented (IP-Datagrams)<br />

– Can be addressed to individual hosts or broadcast adresses<br />

– Addresses are 32 bit numbers (“4 binary octets”), normally written as dotted<br />

decimal numbers: 127.0.0.1<br />

– Addresses denote individual hosts!<br />

Currently being deployed: IPv6<br />

– Shares many properties<br />

– But: 128 bit addresses (8 4-digit hex numbers, written in Hex <strong>and</strong> separated<br />

by colons: 21DA:00D3:0000:2A3B:02AA:00BF:FE28:9C5A)<br />

Stephan Schulz 525


Based on IP<br />

Still. . .<br />

– paket-oriented<br />

– unreliable<br />

Adds: Service multiplexing<br />

The User Datagram Protocol (UDP)<br />

– The same host can have many different communications<br />

– Each communication uses a different port<br />

Supported by <strong>UNIX</strong> with sockets with socket type SOCK DGRAM<br />

Used for:<br />

– DNS (Domain name service)<br />

– NFS (Network file system)<br />

Stephan Schulz 526


The Transmission Control Protocol (TCP)<br />

Based on IP, but. . .<br />

– Connection-based<br />

– Stream-oriented<br />

– Reliable<br />

– Service multiplexing (with ports)<br />

Supported by <strong>UNIX</strong> with sockets with socket type SOCK STREAM<br />

Addresses for TCP (<strong>and</strong> UDP) have two parts:<br />

– The IP number for specifying the host<br />

– The port number for specifying the port<br />

Most services are associated with a fixed port number:<br />

– HTTP (WWW): Port 80<br />

– SMTP (Email transport): Port 25<br />

– FTP (File Transfer): Port 21<br />

– For a semi-complete list: more /etc/services<br />

– Server port numbers up to 1024 are normally reserved for root<br />

Stephan Schulz 527


<strong>UNIX</strong> Sockets<br />

Sockets are special file descriptors used for many different kind <strong>of</strong> inter-process<br />

communication<br />

– Local<br />

– Networked<br />

We can create sockets for different communication styles<br />

– Stream oriented<br />

– Datagram<br />

Sockets are used on both sides <strong>of</strong> a communictation<br />

– The receiver creates a socket <strong>and</strong> associates it with a port<br />

– The sender creates a socket <strong>and</strong> connects it to the receiver<br />

Stephan Schulz 528


A server <strong>of</strong>fers a certain service<br />

Client/Server Model<br />

– It is ready to accept connections on a certain port<br />

A client initiates communication by trying to connect to that port<br />

Stephan Schulz 529


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Basic <strong>UNIX</strong> Network <strong>Programming</strong><br />

Simple Connections<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />

Prerequisites: CSC220 or EEN218


The Client Side for TCP Connections<br />

The client has to perform the following steps:<br />

– Create a socket for stream-oriented communication over IP<br />

– Create an address structure for the server address<br />

– Connect the socket to the server port<br />

– Use the connection<br />

Stephan Schulz 531


Creating Sockets with socket() (1)<br />

To create a socket, we call socket()<br />

#include <br />

#include <br />

int socket(int domain, int type, int protocol);<br />

On success, socket() returns a valid file descriptor just like open()<br />

– After enough black magic, we can use it with read() <strong>and</strong> write()<br />

On failure, the function returns -1 <strong>and</strong> sets errno<br />

Stephan Schulz 532


Creating Sockets with socket() (2)<br />

int socket(int domain, int type, int protocol);<br />

The domain argument describes the protocol family that will be used. Interesting<br />

values:<br />

– PF INET: Internet with IPv4<br />

– PF INET6: Internet with IPv6<br />

– PF LOCAL: Local communication<br />

The type describes the communication style:<br />

– SOCK STREAM for connection based streams<br />

– SOCK DGRAM for datagrams<br />

The last argument specifies the protocol<br />

– There normally is only a single protocol for each domain/type pair, use 0 to<br />

select this (the default)<br />

– PF INET/SOCK STREAM gives us TCP/IPv4<br />

– PF INET/SOCK DGRAM gives us UDP/IPv4<br />

Stephan Schulz 533


This is a reasonably ugly topic!<br />

Socket Adresses<br />

Because sockets are used for so many things, there is no single data type for<br />

socket addresses<br />

– Instead, each address family has its own format<br />

– We have to pass this by address (casted to a bogus type struct sock addr*)<br />

– Additionally, we have to tell the system the size <strong>of</strong> our address format<br />

Because different computer models use different data formats (Big Endian vs.<br />

Little Endian), we have to convert values to network order using:<br />

#include <br />

uint32_t htonl(uint32_t hostlong); /* Host to Network conversion for long */<br />

uint16_t htons(uint16_t hostshort);<br />

uint32_t ntohl(uint32_t netlong);<br />

uint16_t ntohs(uint16_t netshort); /* Network to host conversion for short */<br />

Stephan Schulz 534


Socket Adresses for IPv4<br />

For IPv4 addresses, we use the data type struct sock addr in<br />

It contains the following fields we have to fill:<br />

u_char sin_family; /*----Internet address family */<br />

u_short sin_port; /*----Port number */<br />

struct in_addr sin_addr; /*----Holds the IP address */<br />

For sin family, we use a predefined constant AF INET<br />

For the port, we use the port number, converted with htons()<br />

sin addr is filled in by the function inet pton():<br />

#include <br />

#include <br />

#include <br />

int inet_pton(int af, const char *src, void *dst);<br />

Stephan Schulz 535


inet pton()<br />

int inet_pton(int af, const char *src, void *dst);<br />

inet pton() converts an internet address in string form to a network address<br />

structure<br />

First argument: Address family<br />

– AF INET for IPv4 adresses<br />

– AF INET6 for IPv6 adresses<br />

Second argument: Pointer to string containing address<br />

– For IPv4: IP-Numbers (4 numbers with dots)<br />

– IPv6: Hex representation (8 4-digit hex numbers separated by colons)<br />

Third argument: Pointer to the destination<br />

– Normaly a pointer to the sin addr field in a struct sock addr in<br />

Stephan Schulz 536


Connecting to a Remote Port: connect()<br />

After we have prepared an address in a struct sock adr in, we can connect<br />

an existing socket to a remote port:<br />

#include <br />

#include <br />

int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);<br />

– sockfd: Socket you want to connect<br />

– serv addr: Pointer to the carefully prepared address you want to connect to,<br />

casted to struct sockaddr*<br />

– addrlen: Size <strong>of</strong> your actual structure, i.e. size<strong>of</strong>(struct sockadr in)<br />

∗ Remember that by casting the second argument, we actually lie to the about<br />

the data structure we are pointing to<br />

∗ That’s ok – the socket library knows that we are probably lying<br />

∗ Passing the length helps the library to straighten things out<br />

Return value: 0 on success, -1 on failure<br />

Stephan Schulz 537


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

int main(int argc, char* argv[])<br />

{<br />

int sock;<br />

struct sockaddr_in server_addr;<br />

char buf[80];<br />

int msg_len,res;<br />

Example: Getting an Insult<br />

Stephan Schulz 538


Example (2)<br />

sock = socket(PF_INET, SOCK_STREAM, 0); /* Check against -1 omitted! */<br />

memset(&server_addr, 0, size<strong>of</strong>(server_addr));<br />

server_addr.sin_family = AF_INET;<br />

server_addr.sin_port = htons(1695);<br />

res = inet_pton(AF_INET, "128.138.196.16", &server_addr.sin_addr);<br />

if(res < 0)<br />

{<br />

err_sys("inet_pton (no valid address family)");<br />

}<br />

if(res == 0)<br />

{<br />

fprintf(stderr, "Not a valid IP address");<br />

exit(EXIT_FAILURE);<br />

}<br />

res = connect(sock, (struct sockaddr *) &server_addr,<br />

size<strong>of</strong>(server_addr));<br />

if(res == -1)<br />

{<br />

err_sys("connect");<br />

}<br />

Stephan Schulz 539


}<br />

Example (3)<br />

while(1)<br />

{<br />

msg_len = read(sock, buf, 80);<br />

if(msg_len == 0)<br />

{<br />

break;<br />

}<br />

write(STDOUT_FILENO, buf,msg_len);<br />

}<br />

close(sock);<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 540


The Server Side<br />

A server has a more complex task than a client<br />

General steps:<br />

– Create a socket (we now how to do this)<br />

– Create an address (on its own machine)<br />

– Bind the socket to the address<br />

– Listen for incomming connections on that socket<br />

For each client:<br />

– Accept the connection (on a new socket)<br />

– Use the connection<br />

– Close the connection<br />

Stephan Schulz 541


Server Side Addresses<br />

We need to specify a local address for the listening port<br />

– It contains the address family, IP address, <strong>and</strong> port<br />

Instead <strong>of</strong> actually digging out the servers IP address (which may be complex),<br />

we use the special address 0.0.0.0 or INADDR ANY<br />

Given this address, the server will accept connections on any IP address which<br />

refers to it<br />

Example:<br />

struct sockaddr_in sock_name;<br />

int sock;<br />

short port;<br />

...Get socket, set port to some value...<br />

memset(&sock_name, 0, size<strong>of</strong>(sock_name)); /* Clear address */<br />

sock_name.sin_family = AF_INET; /* Set address family */<br />

sock_name.sin_port = htons(port); /* Set port */<br />

sock_name.sin_addr.s_addr = htonl(INADDR_ANY);/* Set address */<br />

Stephan Schulz 542


Naming a Socket (Binding a Socket to an Address)<br />

Once we have created a socket <strong>and</strong> a local address, we need to bind the socket<br />

to an address<br />

– All future operations will make use <strong>of</strong> that address<br />

#include <br />

#include <br />

int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);<br />

– sock fd: Socket we want to bind<br />

– my addr: Pointer to the address<br />

– addrlen: Lenght <strong>of</strong> that address<br />

– See remarks for connect()!<br />

Return value:<br />

– 0 on success<br />

– -1 on failure<br />

Stephan Schulz 543


Listening for Incoming Connections<br />

We use the listen() function call to make a socket listen for incoming connections:<br />

#include <br />

int listen(int sock, int backlog);<br />

– sock is the file descriptor we want to set to listening state<br />

– backlock is the number <strong>of</strong> pending connections allowed at any one time<br />

∗ If more unanswered connection request are received, they will be refused or<br />

ignores<br />

∗ If we accept a connection, that slot becomes available again<br />

∗ A good value is 5 ;-)<br />

Return value: 0 on success, -1 on failure<br />

Stephan Schulz 544


Accepting Connections<br />

To finally establish a connection, we have to accept it:<br />

#include <br />

#include <br />

int accept(int sock, struct sockaddr *addr, socklen_t *addrlen);<br />

– sock: The socket we expect connection on<br />

– addr: A pointer to an address structure (or NULL)<br />

– addrlen: A pointer to an integer variable <strong>of</strong> type socklen t that initialy has<br />

to contain the size <strong>of</strong> *addr<br />

If accept() returns. . .<br />

– The return value is the file descriptor <strong>of</strong> a new socket (or -1)<br />

– If addr is not NULL, the address <strong>of</strong> the remote socket is written into it<br />

– *addrlen is changed to the actual size <strong>of</strong> the new variable<br />

By default, accept() blocks until a connection request is received<br />

– If we set the socket to non-blocking (using fcntl()), it will return with -1<br />

<strong>and</strong> set errno to EWOULDBLOCK if there are no pending requests<br />

Stephan Schulz 545


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

int main(int argc, char* argv[])<br />

{<br />

int sock, con_sock;<br />

struct sockaddr_in sock_name;<br />

Example: Greeting the World<br />

Stephan Schulz 546


Example (2)<br />

if(argc!=2)<br />

{<br />

fprintf(stderr, "Usage: simple_server \n");<br />

exit(EXIT_FAILURE);<br />

}<br />

sock = socket(PF_INET, SOCK_STREAM, 0);<br />

if(sock == -1)<br />

{<br />

err_sys("socket");<br />

}<br />

sock_name.sin_family = AF_INET;<br />

sock_name.sin_port = htons(atoi(argv[1]));<br />

sock_name.sin_addr.s_addr = htonl(INADDR_ANY);<br />

if (bind(sock, (struct sockaddr *) &sock_name, size<strong>of</strong>(sock_name)) < 0)<br />

{<br />

err_sys("bind");<br />

}<br />

if(listen(sock, 1) == -1)<br />

{<br />

err_sys("listen");<br />

}<br />

Stephan Schulz 547


}<br />

Example (3)<br />

while(1)<br />

{<br />

con_sock = accept(sock, NULL, NULL);<br />

if(con_sock == -1)<br />

{<br />

err_sys("accept");<br />

}<br />

write(con_sock, "Hiho <strong>and</strong> welcome!\n", strlen("Hiho <strong>and</strong> welcome!\n"));<br />

if(close(con_sock) == -1)<br />

{<br />

err_sys("close(con_sock)");<br />

}<br />

}<br />

/* sock closed automatically when we exit via ^C */<br />

Stephan Schulz 548


man pages:<br />

– man socket<br />

– man 2 bind<br />

– man accept<br />

More Information<br />

The GNU C library documentation on sockets<br />

– Available by doing info libc<br />

– In emacs: [C-h i]<br />

– On the internet, e.g. at:<br />

∗ http://www.gnu.org/manual/glibc-2.2.3/html_chapter/libc_16.html<br />

∗ http://www.gnuenterprise.org/doc/glibc-doc/html/chapters_16.html<br />

Stephan Schulz 549


Internet Assignment (II)<br />

Step 2: Write a simple client that<br />

– reads IP adress, port <strong>and</strong> nickname from the comm<strong>and</strong> line<br />

– Connects to the specified server<br />

– Uses select() or non-blocking read() to read everything the server transmits<br />

– Closes the connection<br />

Step 3: Modify the client to keep on reading. Be sure to use select() now!<br />

Step 4: Write a second client that connects, reads input from the terminal, <strong>and</strong><br />

sends it to the server (prepended with the nickname <strong>and</strong> a colon).<br />

– You should be able to see what you send if you simultaneously connect with<br />

the client from step 3.<br />

– If the user types [C-D] to signal end <strong>of</strong> input, close the connection to the<br />

server <strong>and</strong> terminate<br />

Step 5: Put everything together, using select() on the network connection <strong>and</strong><br />

st<strong>and</strong>ard input.<br />

– Send input from the terminal to the server<br />

– Set input from the network to st<strong>and</strong>ard output<br />

Stephan Schulz 550


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Process Creation <strong>and</strong> Termination (I)<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />

Prerequisites: CSC220 or EEN218


Subprocesses<br />

<strong>UNIX</strong> is a multi-process operating systems<br />

– Many processes run at the same time<br />

– Processes can be created <strong>and</strong> can terminated<br />

Processes form a hierarchy<br />

– All processes have a unique parent<br />

– In the end, all (real) processes descent from the init process<br />

Parent <strong>and</strong> child share a special relationship<br />

– The parent has to retrieve the termination status <strong>of</strong> a process<br />

– The child can get his parents process id<br />

– If a parent dies, its special role is taken over by the init process<br />

Stephan Schulz 552


Process Properties<br />

For each process, we can get various identifiers:<br />

– The process id<br />

– The process id <strong>of</strong> the parent<br />

– The real user id <strong>of</strong> the process (i.e. the user id <strong>of</strong> the owner)<br />

– The effective user id <strong>of</strong> the process (i.e. the user id that is used to check acces<br />

rights). It can differ e.g. for programs with the setuid bit set<br />

– The real group id<br />

– The effective group id<br />

#include <br />

#include <br />

pid_t getpid(void); /* Get process id */<br />

pid_t getppid(void); /* Get parent process id */<br />

uid_t getuid(void); /* Get real user id */<br />

uid_t geteuid(void); /* Get effective user id */<br />

gid_t getgid(void); /* Get real group id id */<br />

gid_t getggid(void); /* Get effective group id */<br />

Stephan Schulz 553


Creation <strong>of</strong> the process<br />

St<strong>and</strong>ard Execution <strong>of</strong> a <strong>UNIX</strong> Program<br />

– Can only happen via the fork() process<br />

Executution <strong>of</strong> a program<br />

– Via the kernel system call exec()<br />

– Comes in various h<strong>and</strong>y library variants<br />

Running<br />

– Process runs in its own process space (virtual memory)<br />

Termination<br />

– Normal exit<br />

– Call to abort()<br />

– Catching a signal for which the default action is aborting<br />

Stephan Schulz 554


Exiting<br />

There are three normal ways <strong>of</strong> terminating a program<br />

Calling return st; from main() (ANSI C)<br />

– In that case the exit status <strong>of</strong> the program is st<br />

– Interpretation <strong>of</strong> the exit status is implementation-defined for ANSI C (but<br />

defined for <strong>UNIX</strong>)<br />

Calling exit(st); from anywhere in the program (ANSI C)<br />

– Exit status is st<br />

– In main(), exit() <strong>and</strong> return are equivalent<br />

– In both cases, some cleanup actions are performed<br />

∗ Exit h<strong>and</strong>lers are called<br />

∗ All open files are flushed <strong>and</strong> closed<br />

Calling exit(st) (<strong>UNIX</strong>) or Exit(st) (new in ANSI-C 99, may not be widely<br />

supported)<br />

– Program is immediately terminated<br />

– Exit status is st<br />

Stephan Schulz 555


#include <br />

Exit Formalities<br />

void exit(int status);<br />

void _Exit(int status); /* New in C99 */<br />

#include <br />

void _exit(int status);<br />

ANSI C defines three different exit statuses:<br />

– EXIT SUCCESS (in stdlib.h)<br />

– EXIT FAILURE (in stdlib.h)<br />

– 0 (equivalent to EXIT SUCCESS<br />

In practice, EXIT SUCCESS is nearly always just #defined as 0<br />

Stephan Schulz 556


Cleaning up: atexit()<br />

ANSI C allows us to register up to 32 functions that will be called whenever the<br />

program terminates normally:<br />

#include <br />

int atexit(void (*func)(void));<br />

– Argument is a pointer to a function that neither takes an argument nor returns<br />

a value<br />

– Return value for atexit() is 0 on success, -1 on error<br />

Each call to atexit() results in a single call to the registered function<br />

– Registered functions are called in reverse order <strong>of</strong> registration<br />

– We can register the same function more than once<br />

Note: Exit h<strong>and</strong>lers should only access global variables<br />

Stephan Schulz 557


#include <br />

#include <br />

#include <br />

int h<strong>and</strong>ler_counter=0;<br />

Example<br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

void h<strong>and</strong>ler1(void)<br />

{<br />

printf("H<strong>and</strong>ler1, counter = %d\n", h<strong>and</strong>ler_counter);<br />

h<strong>and</strong>ler_counter++;<br />

}<br />

void h<strong>and</strong>ler2(void)<br />

{<br />

printf("H<strong>and</strong>ler2, counter = %d\n", h<strong>and</strong>ler_counter);<br />

h<strong>and</strong>ler_counter++;<br />

}<br />

Stephan Schulz 558


Example (2)<br />

int main(void)<br />

{<br />

if(atexit(h<strong>and</strong>ler1) != 0)<br />

{<br />

err_sys("atexit");<br />

}<br />

if(atexit(h<strong>and</strong>ler2) != 0)<br />

{<br />

err_sys("atexit");<br />

}<br />

if(atexit(h<strong>and</strong>ler1) != 0)<br />

{<br />

err_sys("atexit");<br />

}<br />

if(atexit(h<strong>and</strong>ler1) != 0)<br />

{<br />

err_sys("atexit");<br />

}<br />

printf("My PID is %d <strong>and</strong> my parents PID is %d\n", getpid(), getppid());<br />

return EXIT_SUCCESS;<br />

}<br />

Stephan Schulz 559


Example Output<br />

My PID is 2019 <strong>and</strong> my parents PID is 746<br />

H<strong>and</strong>ler1, counter = 0<br />

H<strong>and</strong>ler1, counter = 1<br />

H<strong>and</strong>ler2, counter = 2<br />

H<strong>and</strong>ler1, counter = 3<br />

Stephan Schulz 560


Running other Programs: system()<br />

The system() function is defined by ANSI C<br />

#include <br />

int system(const char *comm<strong>and</strong>);<br />

system() h<strong>and</strong>s the string pointed to by comm<strong>and</strong> to the systems comm<strong>and</strong><br />

processor for execution<br />

– system() returns, when the comm<strong>and</strong> returns<br />

– Return value <strong>of</strong> system() in this case is implementation-defined<br />

If comm<strong>and</strong> is NULL, system() checks if the implementation has a comm<strong>and</strong><br />

processor<br />

– It returns 0, if not<br />

– Anything else, otherwise<br />

Stephan Schulz 561


system() in <strong>UNIX</strong><br />

On <strong>UNIX</strong>, there always is a comm<strong>and</strong> processor<br />

– The comm<strong>and</strong> is h<strong>and</strong>ed to the st<strong>and</strong>ard shell, /bin/sh<br />

– It can make use <strong>of</strong> all shell facilities, including I/O redirection<br />

The return value <strong>of</strong> the system() comm<strong>and</strong> normally is an encoding <strong>of</strong> the exit<br />

status <strong>of</strong> the executed comm<strong>and</strong><br />

– If for some reason no new process for the shell can be created, -1 is returned<br />

(<strong>and</strong> errno is set to specify what went wrong)<br />

– If the shell cannot be executed, it is treated as if the shell returned 127<br />

– Otherwise, the return value is an encoding <strong>of</strong> the exit status <strong>of</strong> the shell (which<br />

always returns the exit status <strong>of</strong> the comm<strong>and</strong>, if it could be executed)<br />

Stephan Schulz 562


Termination Status Interpretation<br />

Termination status can come from multiple sources<br />

– system() (which nicely packs up all the work for us)<br />

– Functions that retrieve the exit status <strong>of</strong> a child process: wait() <strong>and</strong><br />

waitpid() (more later)<br />

Interpretation depends on the cause <strong>of</strong> the termination <strong>of</strong> the child process.<br />

Assume that status is the termination status<br />

– If WIFEXITED(status) is true, the process terminated normally (i.e. via<br />

exit(), exit() or return from main)<br />

∗ WEXITSTATUS(status) returns the (lower 8 bit <strong>of</strong>) the value that was<br />

passed to exit()<br />

– If WIFSIGNALED(status) is true, the process was terminated because <strong>of</strong> an<br />

uncaught signal with default action abort<br />

∗ WTERMSIG(status) gives the number <strong>of</strong> the signal<br />

If WIFSTOPPED(staus) is true, the process is currently stopped (via SIGSTOP or<br />

SIGSTP)<br />

– WSTOPSIG(status) returns the number <strong>of</strong> the stop signal<br />

Stephan Schulz 563


#include <br />

#include <br />

#include <br />

int h<strong>and</strong>ler_counter=0;<br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

Example: Executing Comm<strong>and</strong>s<br />

Stephan Schulz 564


Example: Executing Comm<strong>and</strong>s<br />

int main(int argc, char* argv[])<br />

{<br />

int i, status;<br />

for(i=1; i


Example Output<br />

$ ./system example ”date” ”man does not exist” ”whoami -q”<br />

Tue Nov 26 04:59:29 CET 2002<br />

Exited normally, returning 0<br />

No manual entry for does_not_exist<br />

Exited normally, returning 16<br />

whoami: invalid option -- q<br />

Try ‘whoami --help’ for more information.<br />

Exited normally, returning 1<br />

Stephan Schulz 566


Exercises<br />

Write a program that prints it parents PID <strong>and</strong> modify the last example to print<br />

its PID. Run the program via the example code. What do you notice? Why?<br />

Extend the example to a shell that reads comm<strong>and</strong>s from the user <strong>and</strong> executes<br />

them<br />

– H<strong>and</strong>le all cases <strong>of</strong> why a process can terminate, <strong>and</strong> print a useful message<br />

for all cases<br />

Stephan Schulz 567


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Process Creation <strong>and</strong> Termination (II)<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />

Prerequisites: CSC220 or EEN218


Creating new Processes: fork()<br />

The only way <strong>of</strong> creating a new process under <strong>UNIX</strong> is via the fork() function<br />

#include <br />

#include <br />

pid_t fork(void);<br />

fork() creates a new child process that is in nearly all ways an exact copy <strong>of</strong> the<br />

parent<br />

Execution continues in both parent <strong>and</strong> child<br />

Only (major) differences:<br />

– New PID <strong>and</strong> new parent PID<br />

– Return value <strong>of</strong> fork<br />

Return value <strong>of</strong> fork()<br />

– On failure: -1, errno will be set<br />

– On success:<br />

∗ In the child, 0 will be returned<br />

∗ In the parent, the PID <strong>of</strong> the child (a value >0) will be returned<br />

Stephan Schulz 569


#include <br />

#include <br />

#include <br />

#include <br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

int main(int argc, char* argv[])<br />

{<br />

pid_t pid, ppid, child_pid;<br />

int some_var = 42;<br />

Example<br />

pid = getpid();<br />

printf("Parent. My PID is %d <strong>and</strong> I am about to procreate\n", pid);<br />

child_pid = fork();<br />

if(child_pid


}<br />

Example<br />

if(child_pid == 0)<br />

{<br />

pid = getpid();<br />

ppid = getppid();<br />

printf("Child. My PID is %d, my parent is %d\n", pid, ppid);<br />

printf("Child: some_var=%d - Changing it now!\n", some_var);<br />

some_var=7;<br />

printf("Child: some_var=%d\n", some_var);<br />

}<br />

else<br />

{<br />

printf("Parent. My PID is %d, my child is %d\n", pid, child_pid);<br />

printf("Parent: some_var=%d\n", some_var);<br />

printf("Going to sleep now, waiting for my child to die...\n");<br />

sleep(5);<br />

printf("I’m awake again. some_var is still %d\n", some_var);<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 571


Example Output<br />

Parent. My PID is 12625 <strong>and</strong> I am about to procreate<br />

Parent. My PID is 12625, my child is 12626<br />

Parent: some_var=42<br />

Going to sleep now, waiting for my child to die...<br />

Child. My PID is 12626, my parent is 12625<br />

Child: some_var=42 - Changing it now!<br />

Child: some_var=7<br />

I’m awake again. some_var is still 42<br />

Notice that I took a snapshot <strong>of</strong> the processes with top:<br />

PID USER PRI ... SHARE STAT %CPU %MEM TIME COMMAND<br />

12625 schulz 16 ... 280 S 0.0 0.1 0:00 fork_example<br />

12626 schulz 16 ... 0 Z 0.0 0.0 0:00 fork_example <br />

- As long as the parent lives, the child remains around as a zombie<br />

- As the parent dies, the init process gets the termination status <strong>and</strong> is delivered<br />

from its undead state<br />

Stephan Schulz 572


Comments on fork()<br />

Order <strong>of</strong> execution for parent <strong>and</strong> child is unpredictable!<br />

Forked processes behave as if an actual copy has been made<br />

– All <strong>of</strong> the processes memory is accessible in both parent <strong>and</strong> child<br />

– Changing them in one does not affect the other<br />

On modern <strong>UNIX</strong> versions, fork() is implemented with copy on write<br />

– Both processes actually share the same pages in memory<br />

– Only when a process actually tries to change a value in memory is a private<br />

copy created<br />

– Consequence: Forking is very cheap – it only has to copy basic process<br />

structures<br />

<strong>UNIX</strong> programmers use forking a lot!<br />

– Servers may fork one process for each connection!<br />

– Shells fork for executing comm<strong>and</strong>s<br />

Stephan Schulz 573


#include <br />

int main(int argc, char* argv[])<br />

{<br />

while(1)<br />

{<br />

fork();<br />

}<br />

}<br />

Don’t Do This!<br />

Stephan Schulz 574


#include <br />

int main(int argc, char* argv[])<br />

{<br />

while(1)<br />

{<br />

fork();<br />

}<br />

}<br />

Don’t Do This!<br />

It is the simplest version <strong>of</strong> a fork bomb<br />

– Will create an exponentially growing number <strong>of</strong> processes<br />

– Quickly consumes all system resources<br />

– Makes system essentially unusable<br />

Stephan Schulz 575


Forking <strong>and</strong> I/O<br />

As the example showed, both parent <strong>and</strong> child were able to write to stdout<br />

– In general, parent <strong>and</strong> child share file descriptors open at the time <strong>of</strong> fork()<br />

– This can be problematic, as the order in which output is written is undefined<br />

– Even worse for input or output to files or sockets (on the screen, we can usually<br />

figure things out)<br />

If responsibility for file descriptors is clear, parent can delegate communication to<br />

child<br />

– Eample: Parent just accepts() connections<br />

– Child actually performs communication on the file descriptor<br />

– Both parent <strong>and</strong> child need to close an open file descriptor!<br />

Parent <strong>and</strong> child share file descriptor, but not st<strong>and</strong>ard I/O library buffers<br />

– Can have unexpected effects!<br />

Stephan Schulz 576


FILE* myfile<br />

St<strong>and</strong>ard IO Library<br />

I/O Setup before Forking<br />

FILE array entry<br />

Buffer<br />

File descriptor<br />

Process table File table entry<br />

fd n: flags :<br />

Status flags<br />

Offset<br />

Per−Process Data Structures Global, Shared Data Structures<br />

vnode table entry<br />

Stephan Schulz 577<br />

Actual<br />

current<br />

filesize<br />

...


FILE* myfile<br />

St<strong>and</strong>ard IO Library<br />

Per−Process Data Structures<br />

FILE* myfile<br />

St<strong>and</strong>ard IO Library<br />

Per−Process Data Structures<br />

I/O Setup after Forking<br />

FILE array entry<br />

Buffer<br />

File descriptor<br />

FILE array entry<br />

Buffer<br />

File descriptor<br />

Process table<br />

fd n: flags :<br />

Process table<br />

fd n: flags :<br />

File table entry<br />

Status flags<br />

Offset<br />

Global, Shared Data Structures<br />

vnode table entry<br />

Stephan Schulz 578<br />

Actual<br />

current<br />

filesize<br />

...


Example: Bufferd I/O <strong>and</strong> Forking<br />

/* Usual includes <strong>and</strong> stuff omitted */<br />

int main(int argc, char* argv[])<br />

{<br />

pid_t child_pid;<br />

}<br />

printf("Hiho "); /*


$fork example2<br />

Hiho from the parent!<br />

Hiho from the child!<br />

stdout is line buffered<br />

Example Output<br />

– Since we did not print a full line (<strong>and</strong> did not call flush(), the string was not<br />

printed<br />

– Calling fork() duplicated the buffer contents<br />

– Then, both parent <strong>and</strong> child caused a flush<br />

Stephan Schulz 580


Waiting for Children to Die<br />

As stated above, parents need to get the termination status <strong>of</strong> their children<br />

(otherwise those children become zombies)<br />

They can do so by calling wait()<br />

#include <br />

#include <br />

pid_t wait(int *status);<br />

– wait() waits until a child terminates<br />

– It returns the PID <strong>of</strong> the terminated child<br />

– If status is not equal to NULL, it writes the termination status <strong>of</strong> the child<br />

into the variable it points to<br />

– Note: If some children have already terminated, wait() picks one <strong>of</strong> those<br />

<strong>and</strong> returns its data<br />

– If there are no children, wait() returns -1 <strong>and</strong> sets errno<br />

Stephan Schulz 581


#include <br />

#include <br />

#include <br />

#include <br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

int main(int argc, char* argv[])<br />

{<br />

pid_t pid, ppid, child_pid;<br />

int i, status;<br />

Example<br />

pid = getpid();<br />

printf("Parent. My PID is %d <strong>and</strong> I am about to procreate\n", pid);<br />

fflush(stdout);<br />

Stephan Schulz 582


Example (2)<br />

for(i=0; i


}<br />

Example (3)<br />

printf("Parent: Waiting for my children\n");<br />

while((child_pid = wait(&status))!=-1)<br />

{<br />

printf("Child %d terminated with termination status %d\n", child_pid, status);<br />

if(WIFEXITED(status))<br />

{<br />

printf("Termination normal, exit status %d\n", WEXITSTATUS(status));<br />

}<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 584


Output:<br />

Example Output<br />

Parent. My PID is 13565 <strong>and</strong> I am about to procreate<br />

Child. My PID is 13567, my parent is 13565<br />

Child. My PID is 13568, my parent is 13565<br />

Child. My PID is 13569, my parent is 13565<br />

Parent: Waiting for my children<br />

Child 13569 terminated with termination status 512<br />

Termination normal, exit status 2<br />

Child 13568 terminated with termination status 256<br />

Termination normal, exit status 1<br />

Child 13567 terminated with termination status 0<br />

Termination normal, exit status 0<br />

Stephan Schulz 585


Exercises<br />

Here is a function that computes the rollercoaster numbers<br />

long rollercoaster(long i)<br />

{<br />

printf("%ld\n", i);<br />

if(i==1)<br />

{<br />

return 0;<br />

}<br />

if(i%2==0)<br />

{<br />

return 1+rollercoaster(i/2);<br />

}<br />

return 1+rollercoaster(3*i+1);<br />

}<br />

Write a program that forks <strong>of</strong> 10 processes, each <strong>of</strong> which computes the<br />

rollercoaster numbers for one <strong>of</strong> the numbers from 11 to 20 <strong>and</strong> prints it<br />

Make the parent wait for all children <strong>and</strong> print the PID’s <strong>and</strong> the exit status <strong>of</strong><br />

each in the order in which the children terminate, then terminate the parent<br />

Stephan Schulz 586


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Process Control (System Calls)<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />

Prerequisites: CSC220 or EEN218


Process Groups<br />

<strong>UNIX</strong> processes are organized in process groups<br />

– A process group has a group leader<br />

– All processes in the group have the same process group id (which is the process<br />

id <strong>of</strong> the group leader)<br />

Some operations can be done not just for single processes, but for a whole group:<br />

– Delivering signals with kill<br />

– Waiting for process termination with waitpid() (later)<br />

By default, a process inherits the process group id from its parent<br />

– Processes can change their own process group id<br />

∗ . . . to become process group leaders in a new process group, or<br />

∗ . . . to join an existing process group<br />

– Parents can change the process group id <strong>of</strong> their children (unless the children<br />

already called exec())<br />

Note: Don’t confuse the pgid (process group) with the gid (user/owner group)<br />

Stephan Schulz 588


Getting <strong>and</strong> Changing Process Groups<br />

#include <br />

#include <br />

pid_t getpgrp(void);<br />

int setpgid(pid_t pid, pid_t pgrp);<br />

getpgrp() always returns the process group id <strong>of</strong> the current process<br />

– No error condition!<br />

setpgid(pid t pid, pid t pgrp) sets the process group id <strong>of</strong> the process<br />

with the PID pid to pgrp<br />

– Return value: 0 on success, -1 on error (errno set)<br />

– Special values:<br />

∗ If pid is 0, the PID <strong>of</strong> the calling process is assumed<br />

∗ If pgrp is 0, the process id denoted by the first argument is assumed (i.e.<br />

that process is made into a process group leader <strong>of</strong> a new process group)<br />

– Note that this means that setpgid(0,0) makes the current process into a<br />

process group leader<br />

Stephan Schulz 589


#include <br />

#include <br />

#include <br />

#include <br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

int main(int argc, char* argv[])<br />

{<br />

pid_t pid, pgid, child_pid;<br />

int i, res;<br />

Example<br />

pid = getpid();<br />

pgid = getpgrp();<br />

printf("Parent. My PID is %d <strong>and</strong> my process group is %d\n",pid,pgid);<br />

Stephan Schulz 590


Example (2)<br />

res = setpgid(0,0);<br />

if(res==-1)<br />

{<br />

err_sys("setpgid");<br />

}<br />

printf("Parent. I’m now the process group leader.\n");<br />

for(i=0; i


}<br />

Example (3)<br />

if(child_pid == 0)<br />

{<br />

pid = getpid();<br />

pgid = getpgrp();<br />

printf("Child %d. My PID is %d, my process group is %d.\n", i, pid, pgid);<br />

sleep(1);<br />

res = setpgid(0,0);<br />

if(res==-1)<br />

{<br />

err_sys("setpgid");<br />

}<br />

pid = getpid();<br />

pgid = getpgrp();<br />

printf("Child %d. I’m now independent, pid %d <strong>and</strong> pgid %d\n",i, pid,pgid);<br />

printf("Child %d exiting\n", i);<br />

exit(EXIT_SUCCESS);<br />

}<br />

printf("Parent, sleeping.\n");<br />

sleep(3);<br />

printf("Parent, exiting.\n");<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 592


Example Output<br />

$ ./pg example<br />

Parent. My PID is 1946 <strong>and</strong> my process group is 1946<br />

Parent. I’m now the process group leader.<br />

Parent, sleeping.<br />

Child 0. My PID is 1947, my process group is 1946.<br />

Child 1. My PID is 1948, my process group is 1946.<br />

Child 2. My PID is 1949, my process group is 1946.<br />

Child 0. I’m now independent, pid 1947 <strong>and</strong> pgid 1947<br />

Child 0 exiting<br />

Child 1. I’m now independent, pid 1948 <strong>and</strong> pgid 1948<br />

Child 1 exiting<br />

Child 2. I’m now independent, pid 1949 <strong>and</strong> pgid 1949<br />

Child 2 exiting<br />

Parent, exiting.<br />

Note that the parent starts out as a process group leader!<br />

– Most shells with build-in job control will always execute comm<strong>and</strong>s in their<br />

own process group<br />

Stephan Schulz 593


#include <br />

int kill(pid_t pid, int sig);<br />

<strong>UNIX</strong> System Call: kill<br />

kill() sends the signal sig to the process or processes specified by pid<br />

– pid > 0: Signal is send to process with PID pid<br />

– pid == 0: Signal is sent to all processes in the same process group (if process<br />

has permission to send it)<br />

– pid < 0: Signal is sent to all processes with process group id |pid|<br />

– Special case: pid == -1: Most <strong>UNIX</strong> versions send signal to all processes<br />

with the same user id (real or effective) as the caller<br />

Possible signals: As for the kill comm<strong>and</strong> (defined in <br />

– Also see man signal<br />

Note: kill() is the function used to implement the kill comm<strong>and</strong><br />

Stephan Schulz 594


Is this good for Something?<br />

There are amany possible situations where an application consists <strong>of</strong> a set <strong>of</strong><br />

processes:<br />

– Server may have one process that accepts() connections, multiple workers<br />

that serve individual connections<br />

– Competitive theorem prover runs many search strategies in parallel<br />

If we make the top level control program into a process group leader, termination<br />

becomes a lot easier<br />

– We can kill whole process group with one comm<strong>and</strong><br />

– The leader can be made to automatically kill all processes<br />

Stephan Schulz 595


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

int main(int argc, char* argv[])<br />

{<br />

pid_t pid, pgid, child_pid;<br />

int i, res;<br />

res = setpgid(0,0);<br />

if(res==-1)<br />

{<br />

err_sys("setpgid");<br />

}<br />

Example<br />

Stephan Schulz 596


Example (2)<br />

pid = getpid();<br />

pgid = getpgrp();<br />

printf("Queen bee:PID is %d process group is %d\n",pid,pgid);<br />

for(i=0; i


}<br />

Example (3)<br />

if(child_pid == 0)<br />

{<br />

while(1)<br />

{<br />

printf("Worker bee %d gathering honey\n", i);<br />

sleep(1);<br />

}<br />

}<br />

for(i=0; i


Example Output with kill<br />

Queen bee:PID is 2412 process group is 2412<br />

Queen bee sleeping<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Queen bee sleeping<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Queen bee sleeping<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Queen bee terminates<br />

Stephan Schulz 599


Example Output without kill<br />

schulz@leonardo 4:31am [CSC_322] ./pgkill_example<br />

Queen bee:PID is 2460 process group is 2460<br />

Queen bee sleeping<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Queen bee sleeping<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Queen bee sleeping<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Queen bee terminates<br />

schulz@leonardo 4:32am [CSC_322] Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Worker bee 0 gathering honey<br />

Worker bee 1 gathering honey<br />

Worker bee 2 gathering honey<br />

Worker bee 0 gathering honey<br />

...<br />

Stephan Schulz 600


Waiting for Termination: waitpid()<br />

The wait() function waits for termination <strong>of</strong> any child <strong>of</strong> a process<br />

– It blocks until a child terminates<br />

– It cannot check the status <strong>of</strong> a specific child<br />

POSIX introduced waitpid() as a more general interface solving this problem:<br />

#include <br />

#include <br />

pid_t waitpid(pid_t wpid, int *status, int options);<br />

Stephan Schulz 601


waitpid() continued<br />

Return value: PID <strong>of</strong> terminated child (or 0 if no child terminated, or -1 on error)<br />

wpid: Process id describing processes we are waiting for<br />

– wpid == -1: Wait for all processes<br />

– wpid > 0: Wait for process with PID wpid<br />

– wpid < -1: Wait for all processes in process group with PDID |wpid|<br />

– wpid == 0: Wait for all children with PGID <strong>of</strong> the caller<br />

status: As for wait(), if !=NULL, termination status is written into it<br />

options: (Can be combined with |)<br />

– 0: Normal blocking wait<br />

– WNOHANG: Return immediately with 0 if no child is available<br />

– WUNTRACED: Used for job control <strong>and</strong> stopped processes<br />

Stephan Schulz 602


Exercises<br />

Write a program that keeps a network server alive (or reaninmates it):<br />

– The server accepts connections<br />

– For each connection, it forks a child that reads input from the net <strong>and</strong> appends<br />

it to a log file<br />

– All those processes should be in the same process group<br />

The monitor program just starts the main server process, makes it the group<br />

leader, <strong>and</strong> waits for the server to terminate<br />

– In that case, it kills all <strong>of</strong> the server processes <strong>and</strong> restarts the server<br />

Stephan Schulz 603


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Program Execution<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />

Prerequisites: CSC220 or EEN218


Process Environment<br />

Each <strong>UNIX</strong> process has an environment<br />

– the Environment consists <strong>of</strong> a list <strong>of</strong> strings<br />

– Normally, those strings have the form name=value (<strong>and</strong> most functions for<br />

manipulating the environment assume this form)<br />

– The name is called an environment variable<br />

– Since most environment variables are created <strong>and</strong> maintained by the shell, they<br />

are <strong>of</strong>ten also called shell variables<br />

Children inherit the environment <strong>of</strong> their parents<br />

– Note that children get a copy <strong>of</strong> the environment<br />

– Each process can change its own environment, but not that <strong>of</strong> its parent<br />

Environment variables are used for a large number <strong>of</strong> things<br />

– Where to look for executable programs<br />

– Which editor to use (in well-written applications)<br />

– What is the users username?<br />

– Some m<strong>and</strong>ated by st<strong>and</strong>ards (POSIX, SUSv2), others just customary<br />

Stephan Schulz 605


Environment <strong>and</strong> the Shell<br />

You can print the environment using the printenv program<br />

– Just printenv prints all environment variables (<strong>and</strong> their values)<br />

– printenv prints the value <strong>of</strong> the variable with name <br />

Since no process can modify its parents environment, you need to use a build-in<br />

comm<strong>and</strong> to change a shells environment<br />

– tcsh: setenv VAR VALUE <strong>and</strong> unsetenv VAR<br />

– bash: export VAR=VALUE <strong>and</strong> unset VAR<br />

Stephan Schulz 606


$ printenv<br />

Example: Part <strong>of</strong> my Environment<br />

PWD=/home/schulz/SOURCES/CSC_322<br />

VENDOR=intel<br />

HOSTNAME=wombat<br />

QTDIR=/usr/lib/qt3-gcc2.96<br />

LESSOPEN=|/usr/bin/lesspipe.sh %s<br />

USER=schulz<br />

LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:<br />

MACHTYPE=i386<br />

XDM_MANAGED=/var/run/xdmctl/xdmctl-:0,maysd,mayfn,sched<br />

XMODIFIERS=@im=none<br />

EDITOR=emacsclient<br />

LANG=C<br />

HOST=wombat<br />

DISPLAY=:0.0<br />

FROM=Stephan Schulz <br />

LOGNAME=schulz<br />

SHLVL=3<br />

GROUP=schulz<br />

TEXINPUTS=:~/TEXT/TEXLIB/<br />

SUPPORTED=en_US.iso885915:en_US:en:de_DE@euro:de_DE:de<br />

SHELL=/bin/tcsh<br />

HOSTTYPE=i386-linux<br />

CVSROOT=stephan@gw.safelogic.se:/CVS<br />

OSTYPE=linux<br />

HOME=/home/schulz<br />

SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass<br />

PATH=/home/schulz/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/usr/X11R6/bin:.<br />

_=/usr/X11R6/bin/xterm<br />

TERM=xterm<br />

WINDOWID=18874382<br />

Stephan Schulz 607


Some Important Environment Variables<br />

PATH (POSIX) determines where the shell looks for executable programs<br />

– List <strong>of</strong> directory names, separated by colon<br />

– Can contain . to include working directory (Dangerous on multi-user systems)<br />

EDITOR (traditional) is used by good <strong>UNIX</strong> program to determine which editor<br />

to run if you have to edit text<br />

LOGNAME (POSIX) is your user name<br />

TERM (POSIX) is your (text) terminal type<br />

– If you have trouble with remote logins, set it to vt100<br />

HOME (POSIX) is your home directory<br />

DISPLAY (X11 Window System) is the name <strong>of</strong> your display<br />

– <strong>UNIX</strong> can run programs on one host, <strong>and</strong> display them on another<br />

– DISPLAY tells it where to show output for X programs<br />

Stephan Schulz 608


Accessing the Environment from a Program<br />

There are two ways to access the environment <strong>of</strong> a process:<br />

– Via the environ variable<br />

– Via getenv() <strong>and</strong> putenv()<br />

If we want to go through all <strong>of</strong> the environment, we need to declare the environ<br />

variable:<br />

extern char **environ;<br />

– It points to a NULL-terminated array <strong>of</strong> pointers<br />

– Each array element points to \0-terminated C string <strong>of</strong> the form<br />

=<br />

Stephan Schulz 609


#include <br />

#include <br />

extern char **environ;<br />

int main(int argc, char* argv[])<br />

{<br />

char **h<strong>and</strong>le;<br />

}<br />

Example<br />

for(h<strong>and</strong>le=environ; *h<strong>and</strong>le; h<strong>and</strong>le++)<br />

{<br />

printf("%s\n", *h<strong>and</strong>le);<br />

}<br />

return EXIT_SUCCESS;<br />

Stephan Schulz 610


The POSIX Interface to the Environment<br />

#include <br />

char *getenv(const char *name);<br />

int putenv(char *string);<br />

getenv() takes a pointer to an environment variable name <strong>and</strong> returns its value<br />

(or NULL if the variable does not exist)<br />

– It’s even part <strong>of</strong> ANSI C (but ANSI C says nothing about the enviroment)<br />

putenv() takes a single string <strong>of</strong> the form =<br />

– Adds the string (i.e. the = pair) to the environment<br />

– If exists, the old definition is changed<br />

– Note that some versions <strong>of</strong> <strong>UNIX</strong> include just the pointer in the environment,<br />

while others create a copy <strong>of</strong> the string<br />

Additional functions <strong>of</strong> interest:<br />

– clearenv(): Clears environment (POSIX, but not traditional)<br />

– unsetenv(): Remove a single variable (traditional)<br />

– setenv(): More flexible version <strong>of</strong> putenv() (traditional)<br />

Stephan Schulz 611


Executing New Programs<br />

A process can cause the execution <strong>of</strong> a new program via one <strong>of</strong> the exec functions<br />

– Causes this same process to replace its own program, data, <strong>and</strong> stack with<br />

new data<br />

– Program code is loaded from disk<br />

– Heap <strong>and</strong> stack are reinitialized<br />

– New program starts running at its main() function<br />

There are 6 different exec functions that differ in:<br />

– How they look for the program to run (via path or via absolute filename)<br />

– How they accept arguments for the new program (as additional arguments to<br />

the exec function or via an array <strong>of</strong> pointers)<br />

– How they h<strong>and</strong>le the environment (inheritance <strong>of</strong> completely new environment)<br />

Stephan Schulz 612


The 6 exec Functions<br />

#include <br />

int execl(const char *path, const char *arg, ...);<br />

int execlp(const char *file, const char *arg, ...);<br />

int execle(const char *path, const char *arg , ..., char *const envp[]);<br />

int execv(const char *path, char *const argv[]);<br />

int execvp(const char *file, char *const argv[]);<br />

int execve(const char *filename, char *const argv [], char *const envp[]);<br />

All return -1 on error, <strong>and</strong> not at all on success<br />

execlp() <strong>and</strong> execvp() take a filename <strong>and</strong> search the PATH directories for the<br />

program<br />

execl(), execlp() <strong>and</strong> execle() take arguments for the new program as<br />

additional arguments<br />

– The list has to end with an additional NULL argument<br />

– The others take a pre-created argv vector<br />

Finally, execle() <strong>and</strong> execve() take an explicit environment pointer<br />

Stephan Schulz 613


The execvp() function<br />

execvp(const char *file, char *const argv[]) is reaonably easy to use:<br />

– First argument is a file name (not containing any /)<br />

– The program to be executed is found as by the shell, by looking through all<br />

the directories in PATH<br />

Second argument is a pointer to an array <strong>of</strong> argument pointers<br />

– Same format <strong>and</strong> conventions as argv in main<br />

– First argument should be program name<br />

– Array should be NULL terminated<br />

Upon execution, the new program runs<br />

– Keeps old PID, GID, PGID, working directory, . . .<br />

– Normal file descriptors stay open (unless the the flag FD CLOEXEC is set using<br />

fcntl())<br />

Stephan Schulz 614


#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#define MAX_LINE 1024<br />

void err_sys(char* message)<br />

{<br />

perror(message);<br />

exit(EXIT_FAILURE);<br />

}<br />

Example: A mini-Shell<br />

Stephan Schulz 615


void* secure_malloc(int size)<br />

{<br />

void* res = malloc(size);<br />

}<br />

Example (2)<br />

if(!res)<br />

{<br />

fprintf(stderr, "malloc() failure -- out <strong>of</strong> memory?");<br />

exit(EXIT_FAILURE);<br />

}<br />

return res;<br />

char* secure_strdup(char* source)<br />

{<br />

void* res = secure_malloc(strlen(source)+1);<br />

}<br />

strcpy(res, source);<br />

return res;<br />

Stephan Schulz 616


int count_words(char* line)<br />

{<br />

int words=0, in_word=0;<br />

while(*line)<br />

{<br />

if(isspace(*line))<br />

{<br />

in_word = 0;<br />

}<br />

else<br />

{<br />

if(in_word == 0)<br />

{<br />

words++;<br />

in_word = 1;<br />

}<br />

}<br />

line++;<br />

}<br />

return words;<br />

}<br />

Example (3)<br />

Stephan Schulz 617


char **build_argv(char* line)<br />

{<br />

int argc = count_words(line);<br />

int i;<br />

char *new;<br />

char **argv;<br />

Example (4)<br />

if(argc == 0)<br />

{<br />

return NULL;<br />

}<br />

argv = secure_malloc(size<strong>of</strong>(char*)*(argc+1));<br />

Stephan Schulz 618


}<br />

Example (5)<br />

for(i=0; i


void print_argv(char **argv)<br />

{<br />

int i;<br />

}<br />

printf("Comm<strong>and</strong>: %s\n", argv[0]);<br />

printf("Arguments:\n");<br />

for(i=0; argv[i]; i++)<br />

{<br />

printf("%s\n", argv[i]);<br />

}<br />

printf("=======\n");<br />

Example (6)<br />

Stephan Schulz 620


int main(void)<br />

{<br />

pid_t child_pid;<br />

char line[MAX_LINE];<br />

char *line_res;<br />

char **argv;<br />

Example (7)<br />

while(1)<br />

{<br />

printf("# ");fflush(NULL);<br />

line_res = fgets(line, MAX_LINE, stdin);<br />

if(!line_res)<br />

{<br />

break;<br />

}<br />

argv = build_argv(line);<br />

if(!argv)<br />

{<br />

continue;<br />

}<br />

print_argv(argv);<br />

Stephan Schulz 621


Example (8)<br />

child_pid = fork();<br />

if(child_pid == -1)<br />

{<br />

err_sys("fork");<br />

}<br />

if(child_pid == 0) /* Child! */<br />

{<br />

setpgid(0,0);<br />

if(execvp(argv[0], argv) == -1)<br />

{<br />

err_sys("execvp");<br />

}<br />

}<br />

else<br />

Stephan Schulz 622


}<br />

{ /* Parent */<br />

setpgid(child_pid, child_pid);<br />

if(wait(NULL) == -1)<br />

{<br />

err_sys("wait");<br />

}<br />

free(argv);<br />

}<br />

}<br />

return EXIT_SUCCESS;<br />

Example (9)<br />

Stephan Schulz 623


Example Usage<br />

schulz@wombat 2:01am [CSC_322] ./shell_example<br />

# echo Hallo<br />

Comm<strong>and</strong>: echo<br />

Arguments:<br />

echo<br />

Hallo<br />

=======<br />

Hallo<br />

# ls -l macrotest.c wordcount env_example.c<br />

Comm<strong>and</strong>: ls<br />

Arguments:<br />

ls<br />

-l<br />

macrotest.c<br />

wordcount<br />

env_example.c<br />

=======<br />

-rw-rw-r-- 1 schulz schulz 233 Dec 3 21:31 env_example.c<br />

-rw-rw-r-- 1 schulz schulz 206 Nov 26 23:41 macrotest.c<br />

-rwxrwxr-x 1 schulz schulz 13715 Nov 26 23:47 wordcount<br />

Stephan Schulz 624


# ls *<br />

Comm<strong>and</strong>: ls<br />

Arguments:<br />

ls<br />

*<br />

=======<br />

ls: *: No such file or directory<br />

# hallo<br />

Comm<strong>and</strong>: hallo<br />

Arguments:<br />

hallo<br />

=======<br />

execvp: No such file or directory<br />

Example Usage (2)<br />

Stephan Schulz 625


Exercises<br />

Extend the shell example (code is on the web page) to<br />

– Have better error h<strong>and</strong>ling<br />

– Do background processing (with &)<br />

– Support job control<br />

– Offer I/O redirection with > <strong>and</strong> <<br />

Read the man pages on popen() <strong>and</strong> pipe() to see how we could achive piping<br />

If you are adventurous, implement:<br />

– Piping<br />

– Globbing (read man glob)<br />

Stephan Schulz 626


<strong>CSC322</strong><br />

C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />

Final Review<br />

Stephan Schulz<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />

University <strong>of</strong> Miami<br />

schulz@cs.miami.edu<br />

http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />

Prerequisites: CSC220 or EEN218


Place <strong>and</strong> Time:<br />

Final Examn<br />

– Room LC 192 (the normal room)<br />

– Monday, Dec. 16th, 11:00 a.m. – 13:30 p.m.<br />

Topics:<br />

– Everything we covered in this class<br />

– Emphasis will be on second half<br />

You may bring:<br />

– Lecture notes, your own notes, books, printouts <strong>of</strong> your (or my solutions)to<br />

the exercises. . .<br />

– . . . but no computers, PDAs, mobile phones (switch them <strong>of</strong>f <strong>and</strong> stow them<br />

away) or similar items<br />

Note: I’ll only review material from the second half <strong>of</strong> the semester today<br />

– Check lecture notes, pages 299–312 for overview <strong>of</strong> first half<br />

Stephan Schulz 628


Pointers <strong>and</strong> Dynamic Arrays<br />

Arrays are passed as pointers to the first element<br />

– Arrays <strong>and</strong> pointers (to an allocated memory region) can be used in the same<br />

way (i.e. we can index a pointer: p[5])<br />

– We can use realloc() to dynamically enlarge dynamically allocated arrays<br />

Pointer arithmetic: We can add <strong>and</strong> subtract integers to pointers to step through<br />

an array<br />

– p[5] is equivalent to *(p+5)<br />

The following two program snippets are equivalent:<br />

int a[SIZE], i; int a[SIZE], *h<strong>and</strong>le;<br />

/* Assume some initialization in both versions */<br />

for(i=0; a[i]; i++) for(h<strong>and</strong>le = a; *h<strong>and</strong>le; h<strong>and</strong>le++)<br />

{ {<br />

printf("%d\n", a[i]) printf("%d\n", *h<strong>and</strong>le)<br />

} }<br />

Stephan Schulz 629


Make<br />

Make is a tool for automating multi-program builds<br />

– Rule-based (rules are stored in Makefile)<br />

– Performs just the necessary operations to update all program parts<br />

– You specify dependencies <strong>and</strong> actions<br />

Example:<br />

PROGS=hello fahrenheit2celsius fahrenheit2celsius2 fahrenheit2celsius3 \<br />

charcount ourcopy wordcount escape base_converter inc_example<br />

all: $(PROGS)<br />

clean:<br />

rm $(PROGS)<br />

hello: hello.c<br />

gcc -ansi -Wall -o hello hello.c<br />

fahrenheit2celsius: fahrenheit2celsius.c<br />

gcc -ansi -Wall -o fahrenheit2celsius fahrenheit2celsius.c<br />

...<br />

Stephan Schulz 630


New Flow Control Constructs<br />

break is used to break out <strong>of</strong> loops (<strong>and</strong> switch statments<br />

– Immediately transfers control to the first statement after the loop<br />

continue allows early continuation <strong>of</strong> a loop<br />

– Transfers control back to the beginning <strong>of</strong> the loop<br />

– In case <strong>of</strong> for loops, update expression will be evaluated<br />

do/while loops test the condition at the end <strong>of</strong> the loop<br />

– Loop body always gets executed once<br />

– Otherwise similar to plain while loop<br />

Stephan Schulz 631


Function Pointers <strong>and</strong> qsort()<br />

We can use pointers to functions (<strong>of</strong> a specific type) to<br />

– Implement generic functions <strong>and</strong> data types<br />

– Emulate object-oriented constructs (virtual functions)<br />

– Implement call back <strong>and</strong> signal h<strong>and</strong>lers<br />

Using function pointer:<br />

– Just use the function name or use the address operator (&fun<br />

– Calling the function: Either use pointer as is, or dereference: (*fun)(arg1)<br />

Example for function pointer usage: qsort() from stdlib<br />

void qsort(void *base, size_t nmemb, size_t size,<br />

int(*compar)(const void *, const void *));<br />

Stephan Schulz 632


St<strong>and</strong>ard Library: Characters <strong>and</strong> Strings<br />

ctype.h contains character classification functions:<br />

– isspace(c)<br />

– isprint(c)<br />

– isdigit(c)<br />

– isalpha(c)<br />

– isalnum(c) . . .<br />

– Also: toupper(c), tolower(c)<br />

String (\0 terminated sequence <strong>of</strong> characters) functions are defined in string.h<br />

– strcpy(to,from) copies a \0-terminated string to exiting memory<br />

– strcat(to,from) appends a string at the end <strong>of</strong> an existing string<br />

– strcmp(s1,s2) compares two strings, returns value 0<br />

– strncopy(), strncat(), strncmp() limit operation to a given number <strong>of</strong><br />

characters<br />

– strpbrk() searches for characters in a string<br />

– strstr() seraches for a substring<br />

Stephan Schulz 633


St<strong>and</strong>ard Library: Memory Accesses<br />

Memory access functions treat memory as a large array <strong>of</strong> characters<br />

– Important difference to string functions: Not \0-terminated, you always have<br />

to give a lenght<br />

Functions:<br />

– memcpy(to, from, n) copies n bytes<br />

– memmove(to, from, n) does the same even for overlapping regions <strong>of</strong> memory<br />

– memcmp(s1,s2,n) compares two memory regions<br />

– memchr(s, c, n) searches for character c in memory region starting at s<br />

– memset(s,c,n) writes n copies <strong>of</strong> character c into memory (used e.g. to zero<br />

out socket address data structures)<br />

Stephan Schulz 634


St<strong>and</strong>ard Library: Buffered I/O<br />

St<strong>and</strong>ard library supports buffered IO via streams<br />

– Stream creation: fopen(filename, mode)<br />

– Stream destruction: fclose(stream)<br />

– Predefined streams: stdin, stdout, stderr<br />

– Text streams: Lines separated by \n<br />

– Binary streams: Raw data (under <strong>UNIX</strong>, no difference)<br />

Basic I/O functions:<br />

– fgetc(stream) reads a single character <strong>and</strong> returns it as an int (<strong>and</strong> EOF on<br />

end <strong>of</strong> file)<br />

– fputc(c, stream) writes a single character to a stream<br />

– fgets(s,n,stream) reads a single line or n characters (whichever is less) into<br />

the preallocated memory at s<br />

– fputs(s, stream) writes a \0-terminated string to the stream<br />

Streams can be fflush()ed, <strong>and</strong> we can change buffering with setvbuff() <strong>and</strong><br />

setbuf()<br />

Stephan Schulz 635


St<strong>and</strong>ard Library: Formatted Output<br />

printf(format,...) <strong>and</strong> fprintf(stream, format, ...) write an arbitrary<br />

number <strong>of</strong> arguments under the control <strong>of</strong> a format string<br />

– Format string contains plain characters <strong>and</strong> conversion specifiers starting with<br />

a %<br />

– Each conversion specifier must have a matching argument<br />

– Conversion specifiers specify in which form argument is printed<br />

Conversion specifier format:<br />

– %, followed by optional flags, field width, precision, size modifier<br />

– Ends in a conversion letter<br />

Example: printf("%-5ld\n", i)<br />

– Prints integer, at least 5 characters, left-justified (fills up with spaces), followed<br />

by a newline<br />

Important conversion letters: d (int), s (string), c (character), g (floating point<br />

number)<br />

Stephan Schulz 636


Processes <strong>and</strong> Signals<br />

Processes are running programs <strong>and</strong> have a number <strong>of</strong> properties<br />

– Owner, PID, GID, PGID, Parent<br />

– Each process has its own virtual memory <strong>and</strong> cannot (directly) access other<br />

processes data<br />

– Multiple processes can run “at the same time”<br />

We can use a number <strong>of</strong> tools to work with running processes:<br />

– ps lists running processes<br />

– top gives an interactive view <strong>of</strong> running processes<br />

kill can be used to send signals to process <br />

– By default sends SIGTERM<br />

– You can also send other signals, e.g. kill -HUP <br />

Signals can also be generated by other events, e.g.<br />

– Floating point exception<br />

– Illegal memory access<br />

Stephan Schulz 637


Signal H<strong>and</strong>ling<br />

Each signal has a default action (either abort, abort with core dump, or ignore)<br />

– Action can be changed!<br />

The signal(sig, h<strong>and</strong>ler) function can be used to change the behaviour <strong>of</strong> a<br />

process to a signal<br />

– sig is the signal to respond to<br />

– h<strong>and</strong>ler is a pointer to a function that returns void <strong>and</strong> takes an int (the<br />

signal) as an argument<br />

– Predefined pseudo-h<strong>and</strong>lers: SIG DFL (re-establish default behaviour),<br />

SIG IGN (ignore signal)<br />

Established signal h<strong>and</strong>lers catch a single signal!<br />

– Must reestablish h<strong>and</strong>ler from within the h<strong>and</strong>ler<br />

Signals can occur at any time, state <strong>of</strong> the program may be undefined<br />

– It’s dangerous to do much beyond exiting, manipulating variables <strong>of</strong> type<br />

volatile sig atomic t, <strong>and</strong> calling signal() again<br />

Stephan Schulz 638


<strong>UNIX</strong>: Everything is a file<br />

File types:<br />

– Regular file<br />

– Directory<br />

– Character special file<br />

– Block special file<br />

– Socket<br />

– Symbolic link<br />

<strong>UNIX</strong> File System (I)<br />

stat() functions give us information about files<br />

– Owner<br />

– Mode<br />

– Size<br />

– Access <strong>and</strong> modification times<br />

Stephan Schulz 639


Importsant concepts:<br />

<strong>UNIX</strong> File System (II)<br />

– File ownership <strong>and</strong> group ownership<br />

– Access rights (read, write, execute for user/group/others)<br />

– Links: Connect a name to a file<br />

∗ Hard links: Directory entries<br />

∗ S<strong>of</strong>t links: Files with names <strong>of</strong> another file as data<br />

Important utilities:<br />

– ln: Creates links (both symbolic <strong>and</strong> hard)<br />

– ls: Shows files <strong>and</strong> file information<br />

– chmod: Allows us to change the mode <strong>of</strong> a file<br />

– chgrp: Changes group<br />

– chown: Changes owner<br />

Stephan Schulz 640


File Descriptors <strong>and</strong> select()<br />

File descriptors are used by the <strong>UNIX</strong> kernel to represent open files<br />

– File descriptors are small integers (indices into the process file table)<br />

– Can be associated with a number <strong>of</strong> flags we can manipulate with fcntl() or<br />

set when we open the file: O NONBLOCK, O APPEND, . . .<br />

– Predefined: STDIN FILENO, STDOUT FILENO, STDERR FILENO<br />

– Opening files: open()<br />

– Using files: read(fd, buf, n) <strong>and</strong> write(fd, buf, n)<br />

– Closing: close()<br />

select(maxfd, readfds, writefds, exceptfds, time) waits for certain<br />

things to become true for sets <strong>of</strong> file descriptors<br />

– Any <strong>of</strong> the file descriptors in readfds() is ready for reading<br />

– Any <strong>of</strong> the file descriptors in writefds() is ready for writing<br />

– An exceptional circumstance happens for one <strong>of</strong> the file descriptors in<br />

exceptfds()<br />

– Return value: Number <strong>of</strong> file descriptors for which condition is true<br />

– Also removes all file descriptors from sets for which condition is not true<br />

Stephan Schulz 641


Communication can be<br />

– Broadcast vs. dedicated partners<br />

– Stream-oriented vs. packet-oriented<br />

– Reliable vs. unreliable<br />

Networking Concepts<br />

Communication partners need to be uniquely identified<br />

– For IP: IP addresses (denote hosts) (4 8 bit numbers, e.g. 127.0.0.1)<br />

– For TCP/IP: IP address <strong>and</strong> port (16-bit integer)<br />

<strong>UNIX</strong> uses sockets (a special kind <strong>of</strong> file descriptors) for communication<br />

– Bi-directional streams<br />

– Use with read() <strong>and</strong> write()<br />

Stephan Schulz 642


TCP/IP (v4) Connections<br />

Reliable, stream-oriented, between two partners<br />

Client:<br />

– Create a socket: socket(PF INET, SOCK STREAM, 0)<br />

– Fill in struct sockaddr in address structure<br />

∗ sin family = AF INET<br />

∗ sin port = htons(port)<br />

∗ sin addr filled in with inet ptons()<br />

– Connect socket to address: connect(sock, addr, addr len)<br />

– Use socker <strong>and</strong> close() it<br />

Server:<br />

– Create socket<br />

– Create its own address (normally with INADDDR ANY)<br />

– bind()ing the socket to the address<br />

– listen()ing on the socket<br />

– accepting() the connection (giving a new socket)<br />

– Use <strong>and</strong> close the socket<br />

Stephan Schulz 643


fork() creates new process<br />

Creating <strong>and</strong> Ending Processes<br />

– Both parent <strong>and</strong> child execute the same program<br />

Parent has to wait() or waitpid() to pick up the childs termination status<br />

– Otherwise child becomes zombie<br />

– But orphans are inherited by init<br />

Process termination<br />

– exit()<br />

– return from main()<br />

– Abort (from a signal)<br />

Stephan Schulz 644


Process Environment <strong>and</strong> Program Execution<br />

Processes have access to environment variables<br />

– Inherited from (or set up by) parent<br />

– Can be modified<br />

To start a new program:<br />

– fork() to create a new process<br />

– Call one <strong>of</strong> the exec functions with:<br />

∗ Executable name (filename or path name)<br />

∗ Arguments (individual or as array)<br />

∗ For some functions, environment pointer<br />

Stephan Schulz 645


Learn hard ;-)<br />

Exercises<br />

Stephan Schulz 646

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!