CSC322 C Programming and UNIX - Department of Computer ...
CSC322 C Programming and UNIX - Department of Computer ...
CSC322 C Programming and UNIX - Department of Computer ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />
Prerequisites: CSC220 or EEN218
Hackers!<br />
Hacker [originally, someone who makes furniture with an axe] 1. A person who<br />
enjoys exploring the details <strong>of</strong> programmable systems <strong>and</strong> how to stretch their<br />
capabilities, as opposed to most users, who prefer to learn only the minimum<br />
necessary. 2. One who programs enthusiastically (even obsessively) or who<br />
enjoys programming rather than just theorizing about programming. 3. A person<br />
capable <strong>of</strong> appreciating hack value. 4. A person who is good at programming<br />
quickly. 5. An expert at a particular program, or one who frequently does work<br />
using it or on it; as in ‘a Unix hacker’. (Definitions 1 through 5 are correlated, <strong>and</strong><br />
people who fit them congregate.) 6. An expert or enthusiast <strong>of</strong> any kind. One<br />
might be an astronomy hacker, for example. 7. One who enjoys the intellectual<br />
challenge <strong>of</strong> creatively overcoming or circumventing limitations. 8. [deprecated]<br />
A malicious meddler who tries to discover sensitive information by poking around.<br />
Hence ‘password hacker’, ‘network hacker’. The correct term for this sense is<br />
cracker.<br />
The New Hacker’s Dictionary (aka the Jargon File)<br />
Stephan Schulz 2
The<br />
<strong>UNIX</strong><br />
Operating System<br />
<strong>UNIX</strong> <strong>and</strong> You<br />
Stephan Schulz 3
The<br />
<strong>UNIX</strong><br />
Operating System<br />
<strong>UNIX</strong> <strong>and</strong> You<br />
Stephan Schulz 4
The<br />
<strong>UNIX</strong><br />
Operating System<br />
<strong>UNIX</strong> <strong>and</strong> You<br />
C<br />
Stephan Schulz 5
I<br />
n<br />
t<br />
e<br />
r<br />
n<br />
e<br />
t<br />
/<br />
etc home usr dev<br />
joe jackjill<br />
PID:<br />
182<br />
The<br />
Our AIM<br />
bin<br />
ls man cat<br />
<strong>UNIX</strong><br />
hda mouse mta<br />
PID:<br />
Operating System<br />
512<br />
C<br />
Stephan Schulz 6
<strong>UNIX</strong> is a big-iron operating system<br />
<strong>UNIX</strong> is complicated<br />
<strong>UNIX</strong> is hard to use<br />
The Myth<br />
<strong>UNIX</strong> has been created by SUN, IBM, HP, <strong>and</strong> other large companies<br />
<strong>UNIX</strong> is monolithic<br />
Stephan Schulz 7
Counterpoint<br />
<strong>UNIX</strong> was developed on small machines <strong>and</strong> became popular on the “killer<br />
micros”. <strong>UNIX</strong> dialects now run on everything from a PDA to CRAY supercomputers<br />
<strong>UNIX</strong> is based on simple <strong>and</strong> elegant principles (but has added a some cruft over<br />
the years)<br />
<strong>UNIX</strong> is not particularly hard to use (compared to the power it gives to the<br />
user), but has a reasonably steep learning curve. It’s not a “show-me” operating<br />
system, but a “tell me” operating system,<br />
<strong>UNIX</strong> has been created in a research environment, <strong>and</strong> much <strong>of</strong> it has been<br />
developed in informal settings by hackers. Much <strong>of</strong> the impetus for <strong>UNIX</strong> comes<br />
from free versions (Linux, Net-, Open-, FreeBSD), although many companies<br />
contribute to it’s development<br />
Many <strong>UNIX</strong> kernels are monolithic, but the <strong>UNIX</strong> system is extremly modular.<br />
Stephan Schulz 8
<strong>UNIX</strong><br />
First portable operating system (NetBSD: 18 processor architecures, ≈ 50 computer<br />
architecures)<br />
Written in a “high-level” language (C)<br />
Small (for what it does):<br />
– Recent LINUX kernel: 2.4 million LOC (1.4 million for driver, 0.4 million<br />
architecture-dependent stuff (16 ports)<br />
– Windows 2000: Estimates range from 29 million to 65 million LOC, supports<br />
just 1.5 architecures<br />
Modular (though <strong>of</strong>ten on a monolithic kernel)<br />
– Separate windowing system (X) <strong>and</strong> window managers<br />
– Various Desktop-Solutions (CDE, KDE, Gnome)<br />
– Toolbox-philosphy: Combine (lot’s <strong>of</strong>) simple tools<br />
– Underneath: Strong <strong>and</strong> simple abstraction (“Everything is a file”)<br />
Stephan Schulz 9
“Pragmatic” high level language:<br />
C<br />
– H<strong>and</strong>les characters, numbers, adresses as implemented by most computers<br />
– Small core language, much functionality provided by libraries (mostly in C!)<br />
– Compilers are easy to write<br />
– Compilers are easy to port<br />
– Even naive compilers produce reasonably efficent code<br />
Hacker-friendly<br />
– Straightforward compilation (nothing is hidden)<br />
– Compact source code (fewer keystrokes, fast to read)<br />
– Typed, but no bondage-<strong>and</strong>-discipline language<br />
Adequate support for building abstractions<br />
– Structures (composing objects), unions, enumerations<br />
– Arrays <strong>and</strong> pointer<br />
– Support for defining new types<br />
Stephan Schulz 10
<strong>UNIX</strong> history tree (simplified)<br />
For a fuller tree see http://www.levenez.com/unix/<br />
Stephan Schulz 11
A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />
1969 Ken Thompson wrote the first <strong>UNIX</strong> (in assembler) on a PDP7 at AT&T Bell<br />
Labs, allegedly to play Space Travel<br />
1970 Brian Kernighan coins the name <strong>UNIX</strong>. The <strong>UNIX</strong> project gets a PDP11 <strong>and</strong><br />
a task: Writing a text processing system<br />
1971-72 Creation <strong>of</strong> C (Dennis Ritchie), <strong>UNIX</strong> rewritten in C<br />
1972 Pipes arrive, <strong>UNIX</strong> installed on 10 (!) systems<br />
1975 AT&T <strong>UNIX</strong> “Version 6” distributed with sources under academic licenses<br />
1976 Ken Thompson in Berkely, leading to BSD <strong>UNIX</strong><br />
1977 1BSD release<br />
1978 <strong>UNIX</strong> “Version 7”, leading to System V (AT&T)<br />
Stephan Schulz 12
1978 3BSD, adding virtual memory<br />
1980 Micros<strong>of</strong>t XENIX br<strong>and</strong> <strong>of</strong> <strong>UNIX</strong><br />
1982 4.2BSD, adding TCP/IP<br />
1982 SGI IRIX<br />
A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />
1983 Bjarne Stroustrup creates C++ (at AT&T Bell labs)<br />
1983 GNU Project announced (Aim: Free <strong>UNIX</strong>-like system)<br />
1983-1984 Berkeley Internet Name Demon (BIND) created<br />
1984 SUN introduces NFS (Network File System)<br />
1985 Free S<strong>of</strong>tware Foundation (Stallman), GNU manifesto, GNU Emacs<br />
Stephan Schulz 13
A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />
1986 HP-UX, SunOS3.2 (from BSD Unix), “attack <strong>of</strong> the killer micros”<br />
1986 MIT Project Athena creates X11 (Network window system)<br />
1986 POSIX.1 (Portable operating system interface st<strong>and</strong>ard)<br />
1988 GNU GPL<br />
1988 System VR4 “One <strong>UNIX</strong> to rule them all” (AT&T+SUN)<br />
1988 NeXTCUBE with NeXTSTEP operating system<br />
1989 ANSI-C St<strong>and</strong>ard “C89”(adds prototypes, st<strong>and</strong>ard library)<br />
1889 SunOS 4.0x<br />
1990 Net/1 Release (free BSD <strong>UNIX</strong>)<br />
1990 IBM AIX<br />
Stephan Schulz 14
A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />
1991 Linux 0.01, “attack <strong>of</strong> the killer PCs” (continuing till this day)<br />
1991 World Wide Web born<br />
1991–1992 Lawsuits around BSD <strong>UNIX</strong> Net/1 <strong>and</strong> Net/2 releases<br />
1992 SunOS 5 aka Solaris-2 (from System VR4)<br />
1993 FreeBSD 1.0<br />
1994 Linux 1.0<br />
1994 NetBSD 1.0, 4.4BSD Lite (unencumbered by AT&T copyrights, becomes new<br />
base for all non-commercial BSD flavours)<br />
1995 “<strong>UNIX</strong> wars” are over<br />
1996 Tux the Penguin becomes Linux mascot<br />
Stephan Schulz 15
A Short History <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> C<br />
1998 <strong>UNIX</strong>-98 br<strong>and</strong>ing (Single <strong>UNIX</strong> specification)<br />
2000 New ANSI “C99”<br />
2001 IBM runs prime time TV ads for Linux<br />
2001 <strong>UNIX</strong>-based MacOS X<br />
2002 Linux is at version 2.4, Emacs is version 21.2, SunOS is at 5.9 (aka Solaris 9),<br />
BIND is version 9.2.1<br />
Stephan Schulz 16
Another Opinion<br />
<strong>UNIX</strong> is not an operating system. . .<br />
. . . but is the collected folklore <strong>of</strong> the<br />
hacker community!<br />
Stephan Schulz 17
Spot the Even Ones<br />
Stephan Schulz 18
Upshot<br />
You don’t have to grow a beard<br />
to become a world-class <strong>UNIX</strong> hacker. . .<br />
. . . but it does seem to help!<br />
Stephan Schulz 19
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>UNIX</strong> from a User’s Perspective<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Shell<br />
<strong>UNIX</strong> Architecture<br />
Libraries<br />
<strong>UNIX</strong> Kernel<br />
Hardware<br />
Application<br />
Stephan Schulz 21
Shell<br />
<strong>UNIX</strong> Architecture<br />
Libraries<br />
<strong>UNIX</strong> Kernel<br />
Hardware<br />
Application<br />
Stephan Schulz 22
Some Concepts<br />
<strong>UNIX</strong> is a multi-user system. Each user has:<br />
– User name (mine is schulz on most machines)<br />
– Numerical user id (e.g. 500)<br />
– Home directory: A place where (most <strong>of</strong>) his or her files are stored<br />
<strong>UNIX</strong> is a multi-tasking system, i.e. it can run multiple programs at once. A<br />
running program (with its data) is called a process. Each process has:<br />
– Owner (a user)<br />
– Working directory (a place in the file system)<br />
– Various resources<br />
A shell is a comm<strong>and</strong> interpreter, i.e. a process accepting <strong>and</strong> executing comm<strong>and</strong>s<br />
from a user.<br />
– A shell is typically owned by the user using it<br />
– The initial working directory <strong>of</strong> a shell is typically the users home directory<br />
(but can be changed by comm<strong>and</strong>s)<br />
Stephan Schulz 23
There are two kinds <strong>of</strong> users:<br />
– Normal users<br />
– Super users (“root”)<br />
Super-users:<br />
More on Users<br />
– Have unlimited access to all files <strong>and</strong> resources<br />
– Always have numerical user id 0<br />
– Normally have user name “root” (but there can be more than one user name<br />
associated with UID 0)<br />
– Can seriously damage the system!<br />
Normal users<br />
– Can only access files if they have the appropriate permissions<br />
– Can belong to one or more groups. Users within a group can share files<br />
– Usually cannot damage the system or other users files!<br />
Stephan Schulz 24
<strong>UNIX</strong>: Provide Tools, Not Policy<br />
The User Interface<br />
– Most tools operate on all (ASCII) file formats<br />
– Extremely configurable environment – different users have different user experiences<br />
– No restrictions ⇔ Little consistency<br />
– We will assume the default environment on the lab machines for examples<br />
X Window System: Provide Mechanisms, Not Policy<br />
– Windowing system <strong>of</strong>fers (networked) drawing primitives<br />
– Different GUIs built on top <strong>of</strong> this<br />
– GUI conventions may even differ from one application to the other!<br />
– Modern desktop environments (GNOME/KDE) try to change this, but you are<br />
bound to use many legacy applications anyways!<br />
Stephan Schulz 25
My Graphical Desktop<br />
Stephan Schulz 26
Default KDE Desktop (SuSE Linux)<br />
Stephan Schulz 27
My Desktop<br />
Desktop Discussion<br />
– Uses windowing mostly to provide a better text-based interface (compared to<br />
pure text terminals)<br />
– Text editor <strong>and</strong> shell (comm<strong>and</strong> line) windows<br />
– (Can also run graphical applications)<br />
KDE Desktop<br />
– Graphical, mouse-based user experience<br />
– Mostly a launcher for GUI-based programs<br />
∗ Office prgrams<br />
∗ Graphics programs<br />
∗ Web browser<br />
– Can also run shell windows!<br />
Stephan Schulz 28
KDE Desktop with Terminal Application<br />
Stephan Schulz 29
Exploring the Text Interface<br />
Convention: System output is shown in typewriter font, user input is written in<br />
bold face, <strong>and</strong> comments (not to be entered) are written in italics.<br />
whoami will print the user name <strong>of</strong> the current user (more exactly: It will print<br />
the first user name associated with the effective user id)<br />
[schulz@gettysburg ∼]$ whoami<br />
schulz<br />
pwd prints the current working directory (more later):<br />
[schulz@gettysburg ∼]$ pwd<br />
/lee/home/graph/schulz Non-st<strong>and</strong>ard setup!<br />
ls lists the files in the current working directory:<br />
[schulz@gettysburg ∼]$ ls<br />
core Desktop Not much there at the moment<br />
Stephan Schulz 30
Text Interface Example (contd.)<br />
Most <strong>UNIX</strong> programs accept options to modify they behavior. One-letter<br />
(“short”) options start with a single dash, followed by a letter:<br />
[schulz@gettysburg ∼]$ ls -a (Show all files, even hidden ones)<br />
. .gnome<br />
.. .ICEauthority<br />
.bash_logout .kde<br />
.bash_pr<strong>of</strong>ile .mcop<br />
.bashrc .MCOP-r<strong>and</strong>om-seed<br />
core .mcoprc<br />
.DCOPserver_hopewell.cs.miami.edu .screenrc<br />
.DCOPserver_potomac.cs.miami.edu .ssh<br />
.DCOPserver_richmond.cs.miami.edu .tcshrc<br />
Desktop .xauth<br />
.emacs .Xauthority<br />
.first_start_kde .xsession-errors<br />
As you can see, hidden files start with a dot.<br />
Stephan Schulz 31
The <strong>UNIX</strong> File System<br />
In <strong>UNIX</strong>, all files are organized in a single directory tree, regardless <strong>of</strong> where they<br />
are stored physically<br />
There are two main types <strong>of</strong> files:<br />
– Plain files (containing data)<br />
– Directories (“folders”), containing both plain files (optionally) <strong>and</strong> other directories<br />
Each file in a directory is identified by its name <strong>and</strong> has a number <strong>of</strong> attributes:<br />
– Name<br />
– Type<br />
– Owner<br />
– Group (each file belongs to one group, even if the owner belongs to multiple<br />
groups)<br />
– Access rights<br />
– Access dates<br />
Stephan Schulz 32
Typical File System Layout<br />
/<br />
(Root directory)<br />
bin dev etc home<br />
tmp usr<br />
(System programs) (Devices) (Configuration) (Home directories) (Temporary files) (User programs)<br />
cp ls ps hda hdb kbd passwd hosts joe jane schulz<br />
(Private files)<br />
core Desktop<br />
Files in the directory trees are described by pathnames<br />
local lib bin<br />
(Site−installed) (Vendor) (Vendor)<br />
lib bin<br />
– Pathnames consist <strong>of</strong> file names, separated by slashes (/)<br />
– Absolute pathnames start with a /. /bin/cp denotes cp<br />
– Relative pathnames are interpreted relative to the current working directory. If<br />
/home is the current working directory, then schulz/core denotes core<br />
Stephan Schulz 33
Moving Through the File System<br />
We can use the comm<strong>and</strong> cd to change our working directory:<br />
[schulz@gettysburg ∼]$ pwd<br />
/lee/home/graph/schulz<br />
cd /<br />
[schulz@gettysburg /]$ pwd<br />
/<br />
[schulz@gettysburg /]$ cd bin<br />
[schulz@gettysburg /bin]$ pwd<br />
/bin<br />
[schulz@gettysburg /bin]$ cd /lee/home/graph/schulz<br />
[schulz@gettysburg ∼]$ pwd<br />
/lee/home/graph/schulz<br />
Each directory contains two special entries: . <strong>and</strong> ..<br />
– . represents the directory itself. cd . is a NOP<br />
– .. normally represents the parent directory. cd .. moves the working directory<br />
up one level. In /, .. points to / itself<br />
Stephan Schulz 34
More about files<br />
We can use the -l (“long format”) option to ls to show us all all attributes<br />
[schulz@gettysburg ∼]$ ls -l<br />
-rw------- 1 schulz users 1531904 Aug 29 10:55 core<br />
drwxr-xr-x 3 schulz users 4096 Aug 29 10:55 Desktop<br />
The long format <strong>of</strong> ls shows us more about the files:<br />
– The first letter tells us the file type. d is a directory, - means a plain file<br />
– The next nine letters describe access rights, i.e. who is allowed to read, write,<br />
<strong>and</strong> execute the file. More on those later!<br />
– The next number is the number <strong>of</strong> (hard) links to a file. More on that much<br />
later!<br />
– Next is the user that owns the file<br />
– After that, the group that owns the file<br />
– Next comes the file size in bytes<br />
– Then the date the file was changed for the last time<br />
– Finally, the name <strong>of</strong> the file<br />
Stephan Schulz 35
<strong>UNIX</strong> Online Documentation 1<br />
The <strong>UNIX</strong> Programmer’s Manual (“man pages”)<br />
– Traditionally available on every <strong>UNIX</strong> system, quite terse<br />
– Usage: man [section] <br />
– Sections (may differ by <strong>UNIX</strong> flavour):<br />
1. User comm<strong>and</strong>s<br />
2. System calls<br />
3. C library routines<br />
4. Device drivers <strong>and</strong> network interfaces<br />
5. File formats<br />
6. Games <strong>and</strong> demos<br />
7. Misc. (ASCII, macro packages, tables, etc)<br />
8. Comm<strong>and</strong>s for system administration<br />
9. Locally installed manual pages. (i.e. X11)<br />
– man -k gives you a list <strong>of</strong> pages relevant to <br />
– To leave the man program (or rather the pager it uses), hit q<br />
Stephan Schulz 36
GNU info files<br />
<strong>UNIX</strong> Online Documentation 2<br />
– Available with most Linux systems <strong>and</strong> most GNU packages<br />
– Usage: info , then browse interactively<br />
– You can also use the info reader build into GNU Emacs<br />
∗ Enter emacs, then type C-h i, then select topic<br />
∗ If you do not use Emacs, you should ;-)<br />
∗ . . . but we will introduce it later on<br />
Stephan Schulz 37
Exercises<br />
Move through the file system using cd. You can inspect most files using<br />
more if they are ASCII text. Try e.g. /etc/passwd <strong>and</strong> /etc/hosts.<br />
Try man man <strong>and</strong> info info<br />
Read the man <strong>and</strong> info documentation for<br />
– ls<br />
– whoami<br />
– cd<br />
– pwd<br />
Don’t worry if you don’t underst<strong>and</strong> everything!<br />
(Do worry if you underst<strong>and</strong> nothing!)<br />
Stephan Schulz 38
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>UNIX</strong> from a User’s Perspective II<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Comm<strong>and</strong> Format<br />
Normal <strong>UNIX</strong> comm<strong>and</strong> format: . . . <br />
– The first word is interpreted as a comm<strong>and</strong><br />
– The remaining words (separated by spaces or blanks) are arguments<br />
– The implementation <strong>of</strong> a comm<strong>and</strong> is free in how it treats the arguments<br />
– Convention: Arguments starting with a dash - are options<br />
Many characters have special meaning in most shells, including $, (, ), [,<br />
], *, &, |, ;, \, , ’, ", ’ ’ (blank, the argument separator)<br />
– Arguments may be enclosed in single quotes (’ ’) or in double quotes (" ")<br />
to suppress most special meanings<br />
∗ Single quotes suppress (nearly) all special meanings<br />
∗ Double quotes suppress most special meanings<br />
∗ In particular, both suppress the meaning <strong>of</strong> blank: A string ’a a’ will appear<br />
as a single argument to a comm<strong>and</strong><br />
∗ Quotes are not passed on to the comm<strong>and</strong>!<br />
– The backslash \ can be used to suppress the special meaning <strong>of</strong> individual<br />
characters. \” represents a double quote, \\ a backslash character<br />
Stephan Schulz 40
Comm<strong>and</strong> Types<br />
There are different types <strong>of</strong> comm<strong>and</strong>s a shell can execute:<br />
Shell built-in comm<strong>and</strong>s are executed directly by the shell<br />
– Examples: cd, pwd, echo, alias<br />
Shell functions are user-defined shell extensions<br />
– Particularly useful in scripting, rare in interactive use<br />
Executable programs (the normal case) are loaded from the disk <strong>and</strong> executed<br />
– Examples: ls, whoami, man<br />
– If a pathname is given, that file is executed (if possible)<br />
– If just a filename is given, bash searches in all directories specified in the<br />
variable $PATH<br />
– Note that neither . nor ∼ are necessarily in $PATH!<br />
Stephan Schulz 41
<strong>UNIX</strong> User Comm<strong>and</strong>s: echo <strong>and</strong> touch<br />
echo . . . prints its arguments to the screen<br />
– echo is <strong>of</strong>ten a shell built-in comm<strong>and</strong>. To guarantee the behavior described<br />
in the man-page, use /bin/echo<br />
– Example:<br />
[schulz@gettysburg ∼]$ echo ”Hello World”<br />
Hello World (simplest ”Hello World” program in <strong>UNIX</strong>)<br />
[schulz@gettysburg ∼]$ echo ’$PATH = ’ $PATH<br />
$PATH = .:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/us<br />
r/java/jdk1.3.1 01/bin:/home/graph/schulz/bin:/usr/X11R6/bin<br />
touch . . . sets the access <strong>and</strong> modification time <strong>of</strong> the given files to<br />
the current time<br />
– If one <strong>of</strong> the files does not exist, touch will create an empty file <strong>of</strong> that name<br />
– Important option:<br />
∗ -c: Do not create non-existing files (long form --no-create is supported by<br />
modern implementations (GNU))<br />
∗ Other options: man touch<br />
Stephan Schulz 42
<strong>UNIX</strong> User Comm<strong>and</strong>s: rm, mkdir, rmdir<br />
rm , . . . will delete the named files<br />
– Important options:<br />
∗ -f: Force removal, never ask the user (even if the user has withdrawn write<br />
permission for that file)<br />
∗ -i: Interactively ask the user about each file to be deleted<br />
∗ -r: If some <strong>of</strong> the files are directories, recursively delete their contents first,<br />
then delete them<br />
mkdir . . . will create the directories named (if the user has the permission<br />
to do so)<br />
rmdir . . . will delete the directories named (if the user has the permission<br />
to do so <strong>and</strong> if they are empty)<br />
Stephan Schulz 43
Effective Shell Use: History<br />
Modern shells like the bash or the tcsh keep a history <strong>of</strong> your previous comm<strong>and</strong>s<br />
– Type history to see these comm<strong>and</strong>s<br />
– Type ! re-execute the comm<strong>and</strong> with the given number<br />
– Type ! to re-execute the most recent comm<strong>and</strong> starting with the<br />
(partial) word <br />
Example:<br />
[schulz@gettysburg ∼]$ history<br />
(. . . many entries omitted)<br />
194 more <strong>CSC322</strong>.tex<br />
195 gv <strong>CSC322</strong> 1.pdf<br />
196 ls<br />
197 ll <strong>CSC322</strong> 1.pdf<br />
198 history<br />
– !197 will execute ll <strong>CSC322</strong> 1.pdf<br />
– !g will execute gv <strong>CSC322</strong> 1.pdf<br />
Stephan Schulz 44
Effective Shell Use: Editing/Completion<br />
While typing comm<strong>and</strong>s, bash <strong>of</strong>fers you many ways to ease your task:<br />
– [Backspace] will delete the character preceding the cursor<br />
– [C-d] (hold down [CTRL], then press [d]) will delete the character under the<br />
cursor (if there is such a character)<br />
– [C-k] will delete all characters under <strong>and</strong> right <strong>of</strong> the cursor<br />
– Left arrow <strong>and</strong> right arrow move the cursor in the comm<strong>and</strong> line (alternatively,<br />
try [C-b] <strong>and</strong> [C-f])<br />
– [C-a] <strong>and</strong> [C-e] move to the begin <strong>and</strong> end <strong>of</strong> the line, respectively<br />
– Up arrow <strong>and</strong> down arrow will move you through the history (as will [C-p] <strong>and</strong><br />
[C-n])!<br />
– In general, default bash key bindings are inspired by emacs editing comm<strong>and</strong>s<br />
One <strong>of</strong> the more intriguing features: Name completion<br />
– At any time, hit [TAB], <strong>and</strong> bash will complete the current word as far<br />
as possible. Hitting [C-d] at the end <strong>of</strong> a non-empty line will list possible<br />
completions<br />
– It is quite smart (configurably smart, in fact) about this<br />
Stephan Schulz 45
Effective Shell Use: Globbing<br />
Idea: Use simple patterns to describe sets <strong>of</strong> filenames<br />
A string is a wildcard pattern if it contains one <strong>of</strong> ?, * or [<br />
A wildcard pattern exp<strong>and</strong>s into all file names matching it<br />
– A normal letter in a pattern matches itself<br />
– A ? in a pattern matches any one letter<br />
– A * in a pattern matches any string<br />
– A pattern [l1. . . ln] matches any one <strong>of</strong> the enclosed letters (exception: ! as<br />
the first letter)<br />
– A pattern [!l1. . . ln] matches any one <strong>of</strong> the characters not in the set<br />
– A leading . in a filename is never matched by anything except an explicit<br />
leading dot<br />
– For more: man 7 glob<br />
Important: Globbing is performed by the shell!<br />
Stephan Schulz 46
Example: File H<strong>and</strong>ling <strong>and</strong> Globbing<br />
$ mkdir TEST DIR<br />
$ cd TEST DIR<br />
$ touch a ba bba bbba bbbba bbbbba LongFilename .LongHiddenFile<br />
$ ls -a<br />
. .. a ba bba bbba bbbba bbbbba LongFilename .LongHiddenFile<br />
$ echo *a* (Everything with an a anywhere)<br />
a ba bba bbba bbbba bbbbba LongFilename<br />
$ echo *Long*<br />
LongFilename (Note: Does not match .LongHiddenFile)<br />
$ echo .* (all hidden files)<br />
. .. .LongHiddenFile<br />
$ echo [ab]*<br />
a ba bba bbba bbbba bbbbba<br />
$ echo *[ae] (everything that ends in a or e)<br />
$ echo ?*[ae] (everything that ends in a or e <strong>and</strong> has at least one more letter)<br />
ba bba bbba bbbba bbbbba LongFilename<br />
Stephan Schulz 47
Example: File H<strong>and</strong>ling <strong>and</strong> Globbing (Contd.)<br />
$ cd ..<br />
$ rmdir TEST DIR<br />
rmdir: ‘TEST DIR’: Directory not empty<br />
$ rm TEST DIR/*<br />
rmdir: ‘TEST DIR’: Directory not empty<br />
$ rmdir TEST DIR<br />
$ rm TEST DIR/.L*<br />
$ rmdir TEST DIR<br />
Alternative:<br />
$ mkdir TEST DIR<br />
$ touch TEST DIR/.HiddenFile<br />
$ rmdir TEST DIR<br />
rmdir: ‘TEST DIR’: Directory not empty<br />
$ rm -r TEST DIR<br />
Stephan Schulz 48
<strong>UNIX</strong> User Comm<strong>and</strong>s: cat/more/less<br />
cat . . . will concatenate the named files <strong>and</strong> print them to st<strong>and</strong>ard<br />
output (by default, your terminal)<br />
– It’s usually just used to display short files ;-)<br />
more <strong>and</strong> less are pagers<br />
– Each will show you a text (e.g. the contents <strong>of</strong> a file given on the comm<strong>and</strong><br />
line) by pages, stopping after each page <strong>and</strong> waiting for a key press (normally<br />
[space])<br />
– Major differences:<br />
∗ more will automatically exit at the end <strong>of</strong> the data, less requires explicit<br />
termination with [q]<br />
∗ less allows you to scroll backwards (using [p]), more only allows scrolling<br />
forward<br />
– For more (or less): man more, man less<br />
Stephan Schulz 49
Text Editing under <strong>UNIX</strong><br />
There are 3 ways to edit text under <strong>UNIX</strong>:<br />
1. The vi way<br />
2. The emacs way<br />
3. The wrong way<br />
vi (the visual editor) is the text editor written by Bill Joy for BSD <strong>UNIX</strong> (published<br />
about 1978)<br />
– Screen-oriented WYSIWYG editor (for plain text)<br />
– Available on just about any <strong>UNIX</strong> system<br />
– About 35% <strong>of</strong> all serious <strong>UNIX</strong> hackers still prefer vi (or a derivative)!<br />
– Current version on Lab machines: vim 5.8.7 (Vi Improved)<br />
emacs (editing macros) started in 1976 as a set <strong>of</strong> TECO macros on ITS<br />
– Currently popular emacs versions (GNU Emacs <strong>and</strong> XEmacs) go back to 1985<br />
GNU Emacs by Stallman. Both basically are a LISP system with a large text<br />
editing library <strong>and</strong> an editor-like user interface<br />
– About 35% <strong>of</strong> all serious <strong>UNIX</strong> hackers use Emacs. Also widespread use on<br />
other operating systems<br />
– emacs on the lab machines is GNU Emacs 20.7.1<br />
Stephan Schulz 50
Getting into it: vi <br />
vi flyby<br />
Modal interface: Normally letters denote editing comm<strong>and</strong>s, only in insert mode<br />
can actual letters be typed into the file<br />
The editor starts in comm<strong>and</strong> mode (see next slide)<br />
Insert mode (shows {-- INSERT --} in bottom line):<br />
Key Effect<br />
[ESC] Go back to comm<strong>and</strong> mode<br />
Any normal key Insert corresponding letter<br />
[Backspace] Delete last typed letter<br />
Tutorials e.g. at http://www.cfm.brown.edu/Unixhelp/vi_.html.<br />
Stephan Schulz 51
vi flyby II<br />
Comm<strong>and</strong> mode (comm<strong>and</strong>s marked (*) change into insert mode):<br />
Key(s) Effect<br />
Cursor keys Move around<br />
:r Insert file content at cursor position<br />
:w Write file<br />
:q Leave vi<br />
:wq Write file <strong>and</strong> leave<br />
:q! Leave vi even if unsafed changes<br />
:h Help!<br />
i Insert text at the cursor position (*)<br />
a Insert text after the cursor position (*)<br />
A Insert text at the end <strong>of</strong> the current line (*)<br />
o Open a new line <strong>and</strong> insert text (*)<br />
j Join two lines into one<br />
x Delete character under cursor<br />
dd Delete current line<br />
. Repeat last comm<strong>and</strong><br />
: Goto line number <br />
Stephan Schulz 52
Emacs for Everyone<br />
Getting into it: emacs or just emacs & (remark: Normally, emacs is only<br />
started once, <strong>and</strong> you visit different files from within the editor. Emacs can work<br />
on many files at once)<br />
Emacs is extremely configurable <strong>and</strong> extendable:<br />
– Special modes support nearly all programming languages<br />
∗ Indentation<br />
∗ Compilation/Error correcting<br />
∗ Debugging<br />
– You can read email <strong>and</strong> USENET news in emacs<br />
– Emacs can be used as a web browser<br />
An Emacs window normally has different sub-regions:<br />
– Menu bar (operate with a mouse, many frequently used comm<strong>and</strong>s)<br />
– One or more text windows, each displaying a buffer (a text editing area)<br />
– One mode line for each text window, displaying various pieces <strong>of</strong> information<br />
– Finally, the mini-buffer for typing complex comm<strong>and</strong>s <strong>and</strong> dialogs<br />
Stephan Schulz 53
Emacs for Everyone II<br />
Stephan Schulz 54
Emacs for Everyone III<br />
Emacs is non-modal, normal keys always insert the corresponding letter<br />
Comm<strong>and</strong>s are typed by using [CRTL] or [ALT] in combination with normal<br />
keys. We write e.g. [C-a] or [M-a] to denote [a] pressed with[CRTL] or [ALT]<br />
(M for meta). [C-h t] is [C-h] followed by plain [t].<br />
Key(s) What it does<br />
[C-h t] Enter the emacs tutorial<br />
[C-x C-c] Leave emacs<br />
Cursor keys Move around<br />
[C-x C-f] Open a new file (*)<br />
[C-x C-s] Save current file<br />
[C-x s] Save all changed files (*)<br />
[M-x] Call arbitrary LISP function by name (*)<br />
[C-s] Incremental search (try it!) (*)<br />
Entries marked with (*) will ask for additional information in the mini-buffer<br />
Stephan Schulz 55
Exercises<br />
Experiment with bash comm<strong>and</strong> line editing <strong>and</strong> history<br />
Create some files <strong>and</strong> play with globbing<br />
Write a short text in both vi <strong>and</strong> emacs<br />
Read the vi <strong>and</strong> emacs tutorials<br />
Note: You are strongly encuraged to learn basics <strong>of</strong> both editors, <strong>and</strong> to become<br />
pr<strong>of</strong>icient in at least one <strong>of</strong> them. I’ll not examinate you about either, but don’t<br />
complain if you have troube with any other editor<br />
Stephan Schulz 56
ed is the st<strong>and</strong>ard text editor<br />
When I log into my Xenix system with my 110 baud teletype, both vi<br />
*<strong>and</strong>* Emacs are just too damn slow. They print useless messages like,<br />
’C-h for help’ <strong>and</strong> ’"foo" File is read only’. So I use the editor<br />
that doesn’t waste my VALUABLE time.<br />
Ed, man! !man ed<br />
ED(1) <strong>UNIX</strong> Programmer’s Manual ED(1)<br />
NAME<br />
ed - text editor<br />
SYNOPSIS<br />
ed [ - ] [ -x ] [ name ]<br />
DESCRIPTION<br />
Ed is the st<strong>and</strong>ard text editor.<br />
- ---<br />
<strong>Computer</strong> Scientists love ed, not just because it comes first<br />
alphabetically, but because it’s the st<strong>and</strong>ard. Everyone else loves ed<br />
because it’s ED!<br />
Stephan Schulz 57
"Ed is the st<strong>and</strong>ard text editor."<br />
And ed doesn’t waste space on my Timex Sinclair. Just look:<br />
- -rwxr-xr-x 1 root 24 Oct 29 1929 /bin/ed<br />
- -rwxr-xr-t 4 root 1310720 Jan 1 1970 /usr/ucb/vi<br />
- -rwxr-xr-x 1 root 5.89824e37 Oct 22 1990 /usr/bin/emacs<br />
Of course, on the system *I* administrate, vi is symlinked to ed.<br />
Emacs has been replaced by a shell script which 1) Generates a syslog<br />
message at level LOG_EMERG; 2) reduces the user’s disk quota by 100K;<br />
<strong>and</strong> 3) RUNS ED!!!!!!<br />
"Ed is the st<strong>and</strong>ard text editor."<br />
Let’s look at a typical novice’s session with the mighty ed:<br />
golem> ed<br />
?<br />
help<br />
?<br />
Stephan Schulz 58
?<br />
?<br />
quit<br />
?<br />
exit<br />
?<br />
bye<br />
?<br />
hello?<br />
?<br />
eat flaming death<br />
?^C<br />
?<br />
^C<br />
?<br />
^D<br />
?<br />
- ---<br />
Note the consistent user interface <strong>and</strong> error reportage. Ed is<br />
generous enough to flag errors, yet prudent enough not to overwhelm<br />
the novice with verbosity.<br />
Stephan Schulz 59
"Ed is the st<strong>and</strong>ard text editor."<br />
Ed, the greatest WYGIWYG editor <strong>of</strong> all.<br />
ED IS THE TRUE PATH TO NIRVANA! ED HAS BEEN THE CHOICE OF EDUCATED<br />
AND IGNORANT ALIKE FOR CENTURIES! ED WILL NOT CORRUPT YOUR PRECIOUS<br />
BODILY FLUIDS!! ED IS THE STANDARD TEXT EDITOR! ED MAKES THE SUN<br />
SHINE AND THE BIRDS SING AND THE GRASS GREEN!!<br />
When I use an editor, I don’t want eight extra KILOBYTES <strong>of</strong> worthless<br />
help screens <strong>and</strong> cursor positioning code! I just want an EDitor!!<br />
Not a "viitor". Not a "emacsitor". Those aren’t even WORDS!!!! ED!<br />
ED! ED IS THE STANDARD!!!<br />
TEXT EDITOR.<br />
When IBM, in its ever-present omnipotence, needed to base their<br />
"edlin" on a <strong>UNIX</strong> st<strong>and</strong>ard, did they mimic vi? No. Emacs? Surely<br />
you jest. They chose the most karmic editor <strong>of</strong> all. The st<strong>and</strong>ard.<br />
Ed is for those who can *remember* what they are working on. If you<br />
are an idiot, you should use Emacs. If you are an Emacs, you should<br />
not be vi. If you use ED, you are on THE PATH TO REDEMPTION. THE<br />
Stephan Schulz 60
SO-CALLED "VISUAL" EDITORS HAVE BEEN PLACED HERE BY ED TO TEMPT THE<br />
FAITHLESS. DO NOT GIVE IN!!! THE MIGHTY ED HAS SPOKEN!!!<br />
?<br />
Stephan Schulz 61
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>UNIX</strong> from a User’s Perspective<br />
The Goodies<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Usage: grep . . .<br />
<strong>UNIX</strong> User Comm<strong>and</strong>s: grep<br />
– grep will scan the input file(s) <strong>and</strong> print all lines containing a string that<br />
matches the regular expression <br />
– Important options:<br />
∗ -i: Ignore upper <strong>and</strong> lower case in the regular expression<br />
∗ -v: Print all lines not matching the regular expression<br />
– The name comes from an old editor comm<strong>and</strong> sequence st<strong>and</strong>ing for globally<br />
search for regular expression, print matches<br />
– It is one <strong>of</strong> the most useful <strong>UNIX</strong> tools!<br />
Regular expressions (much more by man grep):<br />
– A normal character matches itself<br />
– A . matches any normal character<br />
– A * after a pattern matches any number <strong>of</strong> repetitions<br />
– A range [...] works as for globbing (but use ^ instead <strong>of</strong> ! for negation)<br />
– Remember that many character are special for the shell – use quotes!<br />
– Example: grep ”Ste.*ulz” will find many versions <strong>of</strong> my full name in<br />
<br />
Stephan Schulz 63
Input <strong>and</strong> Output<br />
Each <strong>UNIX</strong> process normally is created with 3 input/output streams:<br />
– St<strong>and</strong>ard Input or stdin (file descriptor 0) is used for normal input. Many<br />
<strong>UNIX</strong> programs will read from stdin, if no file names are given<br />
– St<strong>and</strong>ard Output or stdout (file descriptor 1) is used for all normal output<br />
– St<strong>and</strong>ard Error or stderr (file descriptor 2) is used for out <strong>of</strong> b<strong>and</strong> output like<br />
warnings or error messages<br />
By default, all three are connected to your terminal<br />
It is possible to redirect the output streams into files<br />
It is possible to make stdin read from a file<br />
It is possible to connect one processes stdout to another ones stdin<br />
Stephan Schulz 64
Simple Output Redirection<br />
You can redirect the normal output <strong>of</strong> a comm<strong>and</strong> by appending > to<br />
the comm<strong>and</strong>.<br />
– Example 1:<br />
$ man stdin > stdin man page<br />
$ more stdin man page<br />
STDIN(3) System Library Functions Manual STDIN(3)<br />
NAME<br />
stdin, stdout, stderr - st<strong>and</strong>ard I/O streams<br />
...<br />
– Example 2: On the lab machines, the global password file is served over the<br />
NIS (or Yellow Pages) protocol, <strong>and</strong> is shown by the comm<strong>and</strong> ypcat passwd.<br />
ypcat passwd > my passwd gives you a private copy for password “quality<br />
checking”<br />
– Example 3: cat > myfile.c can replace a text editor (hit [C-d] on a line <strong>of</strong> its<br />
own to indicate the end <strong>of</strong> input)<br />
Stephan Schulz 65
Output Redirection II<br />
By default, stderr is not redirected, so you can still see error messages on the<br />
terminal (<strong>and</strong> discard the normal output if it is useless)<br />
To redirect stderr, use 2> (redirect file descriptor 2):<br />
– $ man bla will print No manual entry for bla<br />
– $ man bla 2> error will save that error message in the file error<br />
Special case: If you are not interested in any output, you can redirect it to<br />
/dev/null (a <strong>UNIX</strong> device file that just accepts data <strong>and</strong> ignores it):<br />
– $ man bla 2> /dev/null will make sure that you do not see the error message<br />
– Alternatively, $ man if bla > /dev/null will give you just the error message<br />
(even though man also prints the man page for the shell-built-in if)<br />
Stephan Schulz 66
Input Redirection<br />
You can also redirect the stdin file descriptor to read from a file<br />
– Append < to the comm<strong>and</strong><br />
– This is e.g. useful if you use an interactive program always for the same task<br />
(i.e. you always type the same data into the file)<br />
– Some <strong>UNIX</strong> comm<strong>and</strong>s only accept input on stdin (e.g. the tr utility)<br />
Example: cat < file is equivalent to cat file! (Why?)<br />
Stephan Schulz 67
Shell Pipes<br />
Pipes are a general tool for inter-process communication (IPC)<br />
The shell allows us to easily set up pipes connecting stdout <strong>of</strong> one process to<br />
stdin <strong>of</strong> another<br />
Example: man bash | cat will print the bash man page without using the pager<br />
– This can be chained: man bash| grep -i redir | grep -i input will print just<br />
the lines containing information about input redirection<br />
– ypcat passwd | grep schulz will give you just my entry in the password file<br />
Stephan Schulz 68
Basic Process Control<br />
You can start processes in the foreground or in the background<br />
– Foreground processes are started by just typing a normal comm<strong>and</strong><br />
– Background processes are started by appending an ampers<strong>and</strong> (&) to the<br />
comm<strong>and</strong>. This is particularly useful for graphical applications: emacs &<br />
– While a foreground process is running, the shell is blocked because the process<br />
is using the terminal as its stdin (i.e. you can have at most one non-suspended<br />
foreground process)<br />
– (Most) foreground processes can be terminated by hitting [C-c] (<strong>of</strong>ten written<br />
as ^C).<br />
– (Most) foreground processes can be suspended by hitting [C-z]<br />
– A suspended process can be continued by typing fg (to continue it as a<br />
foreground process) or bg (to let it run in the background)<br />
– A background process will be suspended automatically, if it needs to read data<br />
from stdin<br />
– jobs gives a numbered list <strong>of</strong> all processes started by the shell<br />
– You can use fg % to take a particular process into the foreground (bg<br />
% works on the same principle)<br />
– You can use kill % to terminate the named job<br />
Stephan Schulz 69
Usage: yes [arg]<br />
<strong>UNIX</strong> User Comm<strong>and</strong>s: Yes<br />
If no argument is given, yes will print an infinite sequence <strong>of</strong> lines containing just<br />
the character y<br />
If an argument is given, yes will print an infinite sequence <strong>of</strong> lines containing that<br />
argument<br />
Very little more is available by printing man yes<br />
Stephan Schulz 70
Job Control Example<br />
$ emacs & (Start emacs in the background – it opens its own window)<br />
$ yes Hello (Start yes in the foreground)<br />
Hello<br />
Hello<br />
Hello<br />
...<br />
^C (Enough <strong>of</strong> that)<br />
$ jobs<br />
[1] Running emacs (Just my emacs)<br />
$ yes Hi (I can never get enough)<br />
Hi<br />
Hi<br />
...<br />
^Z (Suspend it)<br />
Suspended (Indeed!)<br />
$ jobs<br />
[1] Running emacs<br />
[2] + Suspended yes<br />
$ kill %1 (Ooops! Emacs window closes)<br />
Stephan Schulz 71
Notice: Lab Hours<br />
At the moment, a TA for <strong>CSC322</strong> is in the lab Friday 4-6pm <strong>and</strong> Sunday 2-6pm.<br />
Stephan Schulz 72
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C - Basics<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
A C program is a collection <strong>of</strong><br />
– Declarations<br />
– Definitons<br />
for<br />
– Functions<br />
– Variables<br />
– Datatypes<br />
A Bird’s Eye View <strong>of</strong> C<br />
A program may be spread over multiples files<br />
A program file may contain preprocessor directives that<br />
– Include other files<br />
– Introduce <strong>and</strong> exp<strong>and</strong> macro definitions<br />
– Conditionally select certain parts <strong>of</strong> the source code for compilation<br />
Stephan Schulz 74
Consider the following C program<br />
#include <br />
#include <br />
int main(void)<br />
{<br />
printf("Hello World!\n");<br />
return EXIT SUCCESS;<br />
}<br />
A First C Program<br />
Assume that it is stored in a file called hello.c in the current working directory.<br />
Then:<br />
$ gcc -o hello hello.c<br />
(Note: Compiles without warning or error)<br />
$ ./hello<br />
Hello World!<br />
Stephan Schulz 75
#include <br />
#include <br />
int main(void)<br />
{<br />
printf("Hello World!\n");<br />
return EXIT SUCCESS;<br />
}<br />
A Closer Look (1)<br />
We are including two header files from the st<strong>and</strong>ard library<br />
– stdio.h contains declarations for buffered, stream-based input <strong>and</strong> output<br />
(we include it for the declaration <strong>of</strong> printf)<br />
– stdlib.h contains declarations for many odds <strong>and</strong> ends from the st<strong>and</strong>ard<br />
library (it gives us EXIT SUCCESS)<br />
– In general, preprocessor directives start with a hash #<br />
Stephan Schulz 76
#include <br />
#include <br />
int main(void)<br />
{<br />
printf("Hello World!\n");<br />
return EXIT SUCCESS;<br />
}<br />
A Closer Look (2)<br />
The program consist <strong>of</strong> one function named main()<br />
– main() returns a int (integer value) to its calling environment<br />
– In this case, it takes no arguments (its argument list is void)<br />
– In general, any C program is started by a call to its main() function, <strong>and</strong><br />
terminates if main() returns<br />
Stephan Schulz 77
#include <br />
#include <br />
int main(void)<br />
{<br />
printf("Hello World!\n");<br />
return EXIT SUCCESS;<br />
}<br />
A Closer Look (3)<br />
The function body contains two statements:<br />
– A call to the st<strong>and</strong>ard library function printf() with the argument ”Hello<br />
World!\n” (a string ending with a newline character)<br />
– A return statement, returning the value <strong>of</strong> the symbol EXIT SUCCESS to the<br />
caller <strong>of</strong> main()<br />
Stephan Schulz 78
A Closer Look (4)<br />
gcc is the GNU C compiler, the st<strong>and</strong>ard compiler on most free <strong>UNIX</strong> system<br />
(<strong>and</strong> <strong>of</strong>ten the preferred compiler on many other systems)<br />
– On traditional systems, the compiler is normally called cc<br />
gcc takes care <strong>of</strong> all stages <strong>of</strong> compiling:<br />
– Preprocessing<br />
– Compiling<br />
– Linking<br />
It automagically recognizes what to do (by looking at the file name suffix)<br />
Important options:<br />
– -o : Give the name <strong>of</strong> the output file<br />
– -ansi: Compile strict ANSI-89 C only<br />
– -Wall: Warn about all dubious lines<br />
– -c: Don’t perform linking, just generate a (linkable) object file<br />
– -O – -O6: Use increasing levels <strong>of</strong> optimization to generate faster executables<br />
Stephan Schulz 79
A More Advanced Example<br />
/* A program that prints a Fahrenheit -> Celsius conversion table */<br />
#include <br />
#include <br />
int main(void)<br />
{<br />
int fahrenheit, celsius;<br />
}<br />
printf("Fahrenheit -> Celsius\n\n");<br />
fahrenheit = 0;<br />
while(fahrenheit
The Fahrenheit-Celsius Example<br />
Compilation:<br />
$ gcc -ansi -Wall -W -o celsius fahrenheit celsius fahrenheit.c<br />
Running it:<br />
$ ./celsius fahrenheit | more<br />
Fahrenheit -> Celsius<br />
0 -17<br />
10 -12<br />
20 -6<br />
30 -1<br />
40 4<br />
50 10<br />
60 15<br />
70 21<br />
80 26<br />
90 32<br />
100 37<br />
--More--<br />
Stephan Schulz 81
Comments<br />
Comments in C are enclosed in /* <strong>and</strong> */<br />
Comments can contain any sequence <strong>of</strong> characters except for */ (although your<br />
compiler may complain if it hits a second occurence <strong>of</strong> /* in a comment)<br />
Comments can span multiple lines<br />
In assignments (<strong>and</strong> in live) use comments wisely<br />
– Do explain important ideas, like i.e. what a function or program does<br />
– Do explain clever tricks (if needed)<br />
– Do not repeat things obvious from the program code anyways<br />
Bad commenting will affect grading!<br />
Stephan Schulz 82
Variables<br />
“int fahrenheit, celsius;” declares two variables <strong>of</strong> type int that can store<br />
a signed integer value from a finite range<br />
– By intention, int is the fastest datatype available on any given C implementation<br />
– On most modern <strong>UNIX</strong> systems, int is a 32 bit type <strong>and</strong> interpreted in 2s<br />
complement, giving a range from -2 147 483 648 — 2 147 483 647. This is<br />
not part <strong>of</strong> the C language definition, though!<br />
In general, a variable in a program corresponds to a memory location <strong>and</strong> can<br />
store a value <strong>of</strong> a specific type<br />
– All variables must be declared, before they can be used<br />
– Variables can be local to a function (like the variables we have used so far),<br />
local to a single source file, or global to the hole program<br />
A variables value is changed by an assignment, an expression <strong>of</strong> the form<br />
“var = expression;”<br />
Stephan Schulz 83
(Arithmetic) Expressions<br />
C supports various arithmetic operators, including the usual +, - ,* , /<br />
– Subexpressions can be grouped using parenthenses<br />
– Normal arithmetic operations can be used on both integer <strong>and</strong> floating point<br />
values, with the type <strong>of</strong> the arguments determining the type <strong>of</strong> the result<br />
– Example: (fahrenheit-32)*5/9 is an arithmetic expression in C, implemeting<br />
the well-known formula C = 5<br />
9 (F − 32) for converting Fahrenheit to Celsius<br />
∗ Since all arguments are integer, all intermediate results are also integer (as<br />
well as the final result)<br />
∗ Therefore we have to multiply with 5 first, then divide by nine (multiplying<br />
with (5/9) would effectively multiply with 0)<br />
Bit-wise, logical <strong>and</strong> operator comparison operators also normally also return<br />
numeric values<br />
Possible oper<strong>and</strong>s include variables, numerical (<strong>and</strong> other) constants, <strong>and</strong> function<br />
calls<br />
Note: In C, any normal statement is an expression <strong>and</strong> has a value, including the<br />
assignment!<br />
Stephan Schulz 84
A while-loop has the form<br />
while()<br />
<br />
Simple Loops<br />
where either can be a single statement, terminated by a semicolon ’;’,<br />
or a statement block, included in curly braces ’{}’<br />
It operates as follows:<br />
– At the beginning <strong>of</strong> the loop, the controlling expression is evaluated<br />
– If it evaluates to a non-zero value, the loop body is executed once, <strong>and</strong> control<br />
returns to the while<br />
– If it evaluates to 0, the body is skipped, <strong>and</strong> the program continues on the<br />
next statement after the loop<br />
Note: The body can also be empty (but this is usually a programming bug)<br />
Stephan Schulz 85
Formatted Output<br />
printf() is a function for formatted output<br />
It has at least one argument (the format string), but may have an arbitrary<br />
number <strong>of</strong> arguments<br />
– The control string may contain various placeholders, beginning with the<br />
character %, followed by (optional) formatting instructions, <strong>and</strong> a letter (or<br />
letter combination) indicating the desired output format<br />
– Each placeholder corresponds to exactly one <strong>of</strong> the additional arguments<br />
(modern compilers will complain, if the arguments <strong>and</strong> the control string do<br />
not match)<br />
– In particular, %3d requests the output <strong>of</strong> a normal int in decimal representation,<br />
<strong>and</strong> with a width <strong>of</strong> atleast 3 characters<br />
Note: printf() is not part <strong>of</strong> the C language proper, but <strong>of</strong> the (st<strong>and</strong>ardized)<br />
C library<br />
Stephan Schulz 86
<strong>UNIX</strong> User Comm<strong>and</strong>s: cp <strong>and</strong> mv<br />
cp will either copy one file to another, or it will copy any number <strong>of</strong> files into a<br />
directory<br />
– Usage: cp <br />
Copy to <br />
– Usage: cp . . . <br />
Copy the named files into <br />
mv will likewise move files<br />
– Usage: mv <br />
Move to <br />
– Usage: mv . . . <br />
Move the named files into <br />
Warning: Unless used with option -i, both comm<strong>and</strong>s will happily overwrite<br />
existing files!<br />
Again, a more complete description is available by man!<br />
Stephan Schulz 87
Write the following two C programs:<br />
Assignment(also see Website)<br />
– celsius2fahrenheit should print a conversion table from Celsius to Fahrenheit,<br />
from -50 to +150 degrees Celsius, in steps <strong>of</strong> 5 degrees<br />
– imp metric should print two tables side by side (equivalent to a 4-column)<br />
table, one for the conversion from yards into meters, the other one for the<br />
conversion from km into miles. The output should use int values, but you<br />
can use the floating point conversion factors <strong>of</strong> 0.9144 (from yards to meters)<br />
<strong>and</strong> 1.609344 from mile to km. Try to make the program round correctly!<br />
Sample Output:<br />
Yards Meters Km Miles<br />
0 0 1 1<br />
10 9 2 1<br />
20 18 3 2<br />
30 27 4 2<br />
40 37 5 3<br />
...<br />
100 91 11 7<br />
Stephan Schulz 88
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
Extended Introduction<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Statements, Blocks, <strong>and</strong> Expressions<br />
C programs are mainly composed <strong>of</strong> statements<br />
In C, a statement is either:<br />
– An expression, followed by a semicolon ’;’ (as a special case, an empty expression<br />
is also allowed, i.e. a single semicolon represents the empty statement)<br />
– A flow-control statement (if, while, goto,break. . . )<br />
– A block <strong>of</strong> statements (or compound statement), enclosed in curly braces ’{}’.<br />
A compound statement can also include new variable declarations.<br />
Note: The following is actually legal C (although a good compiler will warn you<br />
that some <strong>of</strong> your statements have no effect):<br />
{<br />
}<br />
int a;<br />
10+20;<br />
10*(a=printf("Hello\n"));<br />
Stephan Schulz 90
Flow-Control: if<br />
The primary means for conditional execution in C is the if statement:<br />
if()<br />
<br />
– If the expression evalutes to a non-zero (“true”) value, then the statement will<br />
be executed<br />
– can also be a block <strong>of</strong> statements – in fact, it quite <strong>of</strong>ten is<br />
good style to always use a block, even if it contains only a single statement<br />
An if statement can also have a branch that is taken if the expression is zero<br />
(“false”):<br />
if()<br />
<br />
else<br />
<br />
Stephan Schulz 91
Flow-Control: while<br />
C supports different structured loop constructs, including a st<strong>and</strong>ard while-loop<br />
(see also example from last lesson):<br />
while()<br />
<br />
When control reaches the while at the top <strong>of</strong> the loop, the expression is tested<br />
– If it evaluates to true (non-zero), the body <strong>of</strong> the loop is executed <strong>and</strong> control<br />
returns to the while<br />
– If it evaluates to false (i.e. zero), control directly goes to the statement<br />
following the body <strong>of</strong> the loop<br />
Note: An empty loop body is possible (<strong>and</strong> sometimes useful)<br />
Again: In many cases it is recommended to use a block even if it contains only<br />
one statement (or even no statements)<br />
Stephan Schulz 92
Flow-Control: for<br />
The for-loop in C is a construct that combines initialization, test, <strong>and</strong> update <strong>of</strong><br />
loop variables in one place:<br />
for(; ; )<br />
<br />
– Before the loop is entered, is evaluated<br />
– Before each loop iteration, is evaluated<br />
∗ If it is true, the body is executed, then is evaluated <strong>and</strong> control<br />
returns to the top <strong>of</strong> the loop<br />
∗ If it is false, control goes to the first statement after the body<br />
∗ In the typical case, both <strong>and</strong> are assignments to the same<br />
variable, while tests some property depending on that variable<br />
Stephan Schulz 93
Example<br />
Here is the Fahrenheit/Celsius conversation using for:<br />
/* A program that prints a Fahrenheit -> Celsius conversion table */<br />
#include <br />
#include <br />
int main(void)<br />
{<br />
int fahrenheit, celsius;<br />
}<br />
printf("Fahrenheit -> Celsius\n\n");<br />
for(fahrenheit=0; fahrenheit
for vs. while<br />
Note that for is more general than while:<br />
while() <strong>and</strong> for(;;)<br />
<br />
are equivalent.<br />
Alternatively, you can achieve the functionality <strong>of</strong> for using just while (how?)<br />
The preference for one or the other <strong>of</strong>ten is a matter <strong>of</strong> personal choice<br />
– If there are clear initialization <strong>and</strong> update steps, for is <strong>of</strong>ten more convenient<br />
– In other cases, while is more natural<br />
Stephan Schulz 95
Variable names:<br />
Variable Declarations<br />
– A valid variable name starts with a letter or underscore ( ), <strong>and</strong> may contain<br />
any sequence <strong>of</strong> letters, underscores, <strong>and</strong> digits<br />
– Capitalization is significant – a variable is different from A Variable<br />
– In addition to the language keywords, certain other names are reserved (by the<br />
st<strong>and</strong>ard library or by the implementation). In particular, avoid using names<br />
that start with an underscore!<br />
Variable declarations:<br />
– A (simple) variable declaration has the form ;, where<br />
is a type identifier (e.g. int), <strong>and</strong> is a coma-separated list<br />
<strong>of</strong> variable names<br />
– In ANSI-89 C, variables can only be declared outside any blocks or directly<br />
after an open curly brace. The new st<strong>and</strong>ard relaxes this requirement<br />
– A variable declared in a block is (normally) visible just inside that block<br />
Stephan Schulz 96
Types: Integers <strong>and</strong> Characters<br />
C has a large number <strong>of</strong> integer data types:<br />
– It <strong>of</strong>fers char, short, int, long <strong>and</strong> (since the last language revision) long<br />
long types, all <strong>of</strong> which may represent integers from different ranges<br />
– Note that in particular char is an integer data type, i.e. characters are<br />
represented by their numerical encoding in the character set (normally ASCII)<br />
– Any <strong>of</strong> those can be signed or unsigned, i.e. capable <strong>of</strong> representing positive<br />
numbers only or both negative <strong>and</strong> positive numbers<br />
– All types can be freely mixed in expressions<br />
– Unsigned types always follow the rules <strong>of</strong> arithmetic modulo 2 n , where n is<br />
the width (in bits) <strong>of</strong> their representation (i.e. values greater than 2 n − 1 are<br />
reduced by subtracting 2 n until the result is in the range 0 − 2 n − 1)<br />
Integer constants are normally type int if they can be represented by that data<br />
type<br />
– 123 is int<br />
– 316L is long<br />
– 922U is unsigned int<br />
Stephan Schulz 97
Integer Type Table<br />
Type Alias Signed/Unsigned? Width(*)<br />
char Implementation 8 bit<br />
signed char Signed 8 bit<br />
unsigned char Unsigned 8 bit<br />
short int short Signed 16 bit<br />
signed short int short Signed 16 bit<br />
unsigned short int unsigned short Unsigned 16 bit<br />
int Signed 32 bit<br />
signed int int Signed 32 bit<br />
unsigned int unsigned Unsigned 32 bit<br />
long int long Signed 32 bit<br />
signed long int long Signed 32 bit<br />
unsigned long int unsigned long Unsigned 32 bit<br />
long long int long long Signed 64 bit<br />
signed long long int long long Signed 64 bit<br />
unsigned long long int unsigned long long Unsigned 64 bit<br />
Note (*): Width is not defined by the language st<strong>and</strong>ard but reflects currently<br />
common implementation choices!<br />
Stephan Schulz 98
Simple Character I/O<br />
The C library defines the three I/O streams stdin, stdout, <strong>and</strong> stderr, <strong>and</strong><br />
guarantees that they are open for reading or writing, respectively<br />
Reading characters from stdin: int getchar(void)<br />
– getchar() returns the numerical (ASCII) value <strong>of</strong> the next character in the<br />
stdin input stream<br />
– If there are no more characters available, getchar() returns the special value<br />
EOF that is guaranteed different from any normal character (that is why it<br />
returns int rather than char<br />
Printing characters to stdout: int putchar(int)<br />
– putchar(c) prints the character c on the stdout steam<br />
– (It returns that character, or EOF on failure)<br />
getchar(), putchar(), <strong>and</strong> EOF are declared in <br />
Stephan Schulz 99
#include <br />
#include <br />
int main(void)<br />
{<br />
int c;<br />
}<br />
c=getchar();<br />
while(c!=EOF)<br />
{<br />
putchar(c);<br />
c=getchar();<br />
}<br />
return EXIT_SUCCESS;<br />
Example: File Copying<br />
Copies stdin to stdout – to make a a file copy, use<br />
cat file | ourcopy > newfile<br />
Introduces != (“not equal”) relational operator<br />
Stephan Schulz 100
#include <br />
#include <br />
int main(void)<br />
{<br />
int c;<br />
}<br />
Example: File Copying (idiomatic)<br />
while((c=getchar())!=EOF)<br />
{<br />
putchar(c);<br />
}<br />
return EXIT_SUCCESS;<br />
Remember: Variable assignments have a value!<br />
Improvement: No duplication <strong>of</strong> call to getchar()<br />
Stephan Schulz 101
#include <br />
#include <br />
int main(void)<br />
{<br />
int c;<br />
long count = 0;<br />
}<br />
Example: Character Counting<br />
while((c=getchar())!=EOF)<br />
{<br />
count++;<br />
}<br />
printf("Number <strong>of</strong> characters: %ld\n", count);<br />
return EXIT_SUCCESS;<br />
New idiom: ++ operator (increases value <strong>of</strong> variable by 1)<br />
Test: $ man cat | charcount<br />
1091<br />
Stephan Schulz 102
Exercises<br />
Write a programm that continually increases the value <strong>of</strong> a short <strong>and</strong> a<br />
unsigned short variable <strong>and</strong> prints both (you can use printf("%6d %6u",<br />
var1, var2); to print them). What happens if you run the programm for some<br />
time? You can pipe the output into less <strong>and</strong> search for interesting things (man<br />
less to learn how!). Remember that [C-c] will terminate most programs under<br />
<strong>UNIX</strong>!<br />
Write a program that counts lines in the input. Hint: The st<strong>and</strong>ard library makes<br />
sure that any line in the input ends with a newline (’\n’)<br />
Write a program that computes the factorial <strong>of</strong> a number (given as a constant in<br />
the program). Test it for values <strong>of</strong> 3, 18, <strong>and</strong> 55. Any observations?<br />
Stephan Schulz 103
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
More on Expressions<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Nomination: Most Useless Use <strong>of</strong> cat Award<br />
If ourcopy is a program that just copies stdin to stdout, then<br />
– cat file | ourcopy > newfile will indeed copy file to newfile<br />
– So will ourcopy < file > newfile<br />
– (So will cat < file > newfile)<br />
Stephan Schulz 105
Usage: wc ...<br />
<strong>UNIX</strong> User Comm<strong>and</strong>s: wc<br />
wc counts the bytes, words <strong>and</strong> lines in each file specified (or in stdin if none is<br />
given) <strong>and</strong> print the results, including the total for all input files.<br />
Important options:<br />
– -c: Print just the byte count<br />
– -w: Print just the word count<br />
– -l: Print just the line count<br />
Example:<br />
$ wc < wordcount.c<br />
24 53 369<br />
Notice: The program does not print unnecessary headers or footers. The output<br />
can easily be interpreted by other programs!<br />
More: man wc<br />
Stephan Schulz 106
Example: Word Counting<br />
/* Count words */<br />
#include <br />
#include <br />
int main(void)<br />
{<br />
int c, in_word=0;<br />
long words = 0;<br />
while((c=getchar())!=EOF)<br />
{<br />
if(c == ’ ’ || c == ’\n’ || c == ’\t’)<br />
{<br />
in_word = 0;<br />
}<br />
else if(!in_word)<br />
{<br />
in_word = 1;<br />
words++;<br />
}<br />
}<br />
printf("%ld words counted\n", words);<br />
return EXIT_SUCCESS;<br />
}<br />
Stephan Schulz 107
In C, characters are just small integers<br />
Character Constants<br />
We can write character constants symbolically, by enclosing them in single quotes:<br />
– ’a’ is the numerical value <strong>of</strong> a lower case a in the character encoding (97 for<br />
ASCII)<br />
– ’A’ is upper case A (65 for ASCII)<br />
– These values can be assigned to any integer variable!<br />
You can use escape sequences (starting with \) for non-printable characters:<br />
– \t is the tabulator character (HT), ASCII 9<br />
– \n is the newline character (LF), ASCII 10 (<strong>and</strong> used by C to mark the end <strong>of</strong><br />
the current line)<br />
– \a is the BEL character, printing it will normally make the terminal beep<br />
– \0 is the NUL character, ASCII 0, <strong>and</strong> used by C to mark the end <strong>of</strong> a string<br />
– \\ is the backslash itself, ASCII 92<br />
You can get all C escape sequences (<strong>and</strong> more) via man ASCII<br />
Stephan Schulz 108
Another View at Expressions<br />
Expressions are build from operators <strong>and</strong> oper<strong>and</strong>s, with parentheses for grouping<br />
– Most operators are unary (taking one oper<strong>and</strong>) or binary (taking two)<br />
– Oper<strong>and</strong>s can be<br />
∗ (Sub-)Expressions<br />
∗ Constants<br />
∗ Function calls<br />
– In C, binary operators are written in infix, i.e. between the oper<strong>and</strong>s: 10+15<br />
– Unary operators are written either in prefix or postfix (some can even be<br />
written either way, with slightly different effects)<br />
All operators have a precedence, defining how to interprete operations with<br />
multiple operators<br />
– In the absence <strong>of</strong> parentheses, operators with a higher precedence bind tighter<br />
than those with a lower precedence: 10+3*4 == 22 is true, 10+4*3==42 is<br />
false<br />
– In doubt, we can always fully parenthesize expressions: 10+3*4 == 10+(3*4),<br />
but (10+4)*3==42<br />
Stephan Schulz 109
Expressions: Associativity <strong>of</strong> Binary Operators<br />
Binary operators have an additional property: Associativity<br />
– Example: 25+12+11 can be interpreted as (25+12)+11 or as 25+(12+11)<br />
Stephan Schulz 110
Expressions: Associativity <strong>of</strong> Binary Operators<br />
Binary operators have an additional property: Associativity<br />
– Example: 25+12+11 can be interpreted as (25+12)+11 or as 25+(12+11)<br />
– Worse: What about 25-12-11?<br />
By convention, st<strong>and</strong>ard arithmetic operators are left-associative, i.e. the bind to<br />
the left<br />
– Thus: 25-12-11 == (25-12)-11 has the value 2<br />
We will note associativity for many operators specifically, but unless otherwise<br />
noted, it’s probably left-associative!<br />
Stephan Schulz 111
Expressions: Relational Operators<br />
Relational operators take two arguments <strong>and</strong> return a truth value (0 or 1)<br />
We already have seen the equational operators. They apply to all basic data<br />
types <strong>and</strong> pointers:<br />
– a == b (equal) evaluates to 1 if the two arguments have the same value,<br />
otherwise it evaluates to 0<br />
– a != b evaluates to 1 if the two arguments have different values<br />
– Note: a == b == c is evaluated as (a == b) == c, i.e. it compares c to<br />
either 0 or 1!<br />
We can also compare the same types using the greater/lesser relations:<br />
– > evaluates to 1, if the first argument is greater than the second one<br />
– < evaluates to 1, if the second argument is greater than the first one<br />
– a >= b evaluates to 1, if either a > b == 1 or (a == b) ==1<br />
– a
Expressions: Logical Operators<br />
Logical operators operate on truth values, i.e. all non-zero values are treated the<br />
same way (representing true)<br />
The binary logical operators are || <strong>and</strong> &&<br />
– a||b evaluates to 1, if at least one <strong>of</strong> a or b is non-zero (otherwise it evaluates<br />
to 0)<br />
– a&&b evaluates to 1, if both a <strong>and</strong> b are non-zero<br />
– Both || <strong>and</strong> && are evaluated left-to-right, <strong>and</strong> the evaluation stops as soon<br />
as we can be sure <strong>of</strong> the result (short-circuit evaluation)<br />
∗ Example: If a!=b, then (a==b)&&c will not evaluate c<br />
∗ Similarly: (a==0 || 10/a >= 1) will never divide by zero!<br />
! is the (unary) logical negation operator, !a evaluates to 1, if a has the value<br />
0, it evaluates to 0 in all other cases<br />
Precedence rules:<br />
– The binary logical operators have lower precedence than the relational ones<br />
– || has lower precedence than &&<br />
– ! has a higher precedence than even arithmetic operators<br />
Stephan Schulz 113
Expressions: Assignments<br />
The assignment operator is = (a single equal sign)<br />
– a = b is an expression with the value <strong>of</strong> b<br />
– As a side effect, it will change the value <strong>of</strong> a to that same value<br />
The expression on the left h<strong>and</strong> side <strong>of</strong> an assignment (a) has to be an lvalue,<br />
i.e. something we can assing to. Legal lvalues are<br />
– Variables<br />
– Dereferenced pointers (“memory locations”)<br />
– Elements in a struct, union, or array<br />
The assignment operator is right-associative (so you can write<br />
a = b = c = d = 0; to set all for variables to zero)<br />
The assignment operator has extremely low precedence (lower than all other<br />
operators we have covered up to now)<br />
Stephan Schulz 114
Floating Point Numbers<br />
C supports three types <strong>of</strong> floating point numbers, float, double, <strong>and</strong> long<br />
double<br />
– float is the most memory-efficient representation (typically 32 bits), but has<br />
limited range <strong>and</strong> precision<br />
– double is the most commonly used floating point type. In particular, most<br />
numerical library functions accept <strong>and</strong> return double arguments. Doubles<br />
normally take up 64 bits<br />
– long double <strong>of</strong>fers extended range <strong>and</strong> precision (sometimes using 128 bits)<br />
<strong>and</strong> is a recent addition<br />
Floating point constants are written using a decimal point, or exponential notation<br />
(or both):<br />
– 1.0 is a floating point constant<br />
– 1 is an integer constant. . .<br />
– . . . but 1e0 <strong>and</strong> 1.0E0 are both floating point constants<br />
If we mix integer <strong>and</strong> floating point numbers in an expression, a value <strong>of</strong> a<br />
“smaller” type is converted to that <strong>of</strong> the bigger one transparently:<br />
– 9/2 == 4, but 9/2.0 == 4.5 <strong>and</strong> 9.0/2 == 4.5<br />
Stephan Schulz 115
Fahrenheit to Celsius – More Exactly<br />
/* A program that prints a Fahrenheit -> Celsius conversion table */<br />
#include <br />
#include <br />
int main(void)<br />
{<br />
int fahrenheit;<br />
double celsius;<br />
}<br />
printf("Fahrenheit -> Celsius\n\n");<br />
for(fahrenheit=0; fahrenheit
Administrative Notes<br />
Please ssh to lee.cs.miami.edu to use the lab machines over the net.<br />
To change your password on the lab machines, use yppasswd. Also check<br />
http://www.cs.miami.edu/~irina/password.html for the password policy<br />
To submit programming assignments, create a subdirectoy with the name<br />
ASSIGNMENT (where is the number <strong>of</strong> the current assigment) <strong>and</strong><br />
copy the relevant files to it<br />
Example: To submit the current assignment, do e.g.<br />
$ cd ∼ (go home)<br />
$ mkdir ASSIGNMENT 2<br />
$ cp mystuff/celsius2fahrenheit* ASSIGNMENT 2<br />
$ cp mystuff/imp metric* ASSIGNMENT 2<br />
Stephan Schulz 117
Excercises<br />
Exp<strong>and</strong> the word count program to count characters, words, <strong>and</strong> lines (<strong>of</strong> stdin)<br />
as wc does<br />
Write a program that prints useful imperial to metric (<strong>and</strong> back) conversion<br />
tables to a reasobale precision<br />
Stephan Schulz 118
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
Simple Arrays <strong>and</strong> Functions<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Arrays<br />
A array is a data structure that holds elements <strong>of</strong> one type so that each element<br />
can be (efficiently) accessed using an index<br />
In C, arrays are always indexed by integer values<br />
Indices always run from 0 to some fixed, predetermined value<br />
[]; defines a variable <strong>of</strong> an array type:<br />
– can be any valid C type, including user-defined types<br />
– is the name <strong>of</strong> the variable defined<br />
– is the number <strong>of</strong> elements in the array (Note: Indices run from 0<br />
to -1)<br />
Example: char x[10]; defines the variable x to hold 10 elements <strong>of</strong> type char,<br />
x[5] accesses the 5th element <strong>of</strong> that array<br />
Stephan Schulz 120
#include <br />
#include <br />
#include <br />
int main(void)<br />
{<br />
int freq_count[128];<br />
int i, c;<br />
Example: Counting Character Frequencies<br />
for(i=0; i
}<br />
Example: Counting Character Frequencies (Contd.)<br />
for(i=0; i
Initializing Arrays<br />
In the example, we used an explicit loop to initialize the array<br />
For short arrays we can also list the initial values in the definition <strong>of</strong> the array:<br />
– int days per month[12] = {31,28,31,30,31,30,31,31,30,31,30,31};<br />
– The number <strong>of</strong> values has to be smaller than or equal to the number <strong>of</strong><br />
elements in the array<br />
– Unspecified elements are initialized to all bits zero, (i.e. 0 for all basic data<br />
types)<br />
If we give an explicit intializer, we can omit the size <strong>of</strong> the array:<br />
– int days per month[] = {31,28,31,30,31,30,31,31,30,31,30,31};<br />
– The compiler will automatically allocate an array <strong>of</strong> sufficient size to hold all<br />
the values in the initializer<br />
Stephan Schulz 123
Array Layout<br />
C arrays are implemented as a sequence <strong>of</strong> consequtive memory locations <strong>of</strong> the<br />
right size to hold the element<br />
Example: Address Array Element Content<br />
0<br />
. . .<br />
112 Other data<br />
120 days per month[0] 31<br />
124 days per month[1] 28<br />
128 days per month[2] 31<br />
132 days per month[3] 30<br />
136 days per month[4] 31<br />
140 days per month[5] 30<br />
144 days per month[6] 31<br />
148 days per month[7] 31<br />
152 days per month[8] 30<br />
156 days per month[9] 31<br />
160 days per month[10] 30<br />
164 days per month[11] 31<br />
168 Other data<br />
. . .<br />
Stephan Schulz 124
No Safety Belts <strong>and</strong> No Air Bag!<br />
C does not check if the index is in the valid range!<br />
– If you access days per month[13] you might change some critical other data<br />
– The operating system may catch some <strong>of</strong> these wrong accesses, but do not<br />
rely on it!)<br />
This is source <strong>of</strong> many <strong>of</strong> the buffer-overflow errors exploited by crackers <strong>and</strong><br />
viruses to hack into systems!<br />
Stephan Schulz 125
Character Arrays<br />
Character arrays are the most frequent kind <strong>of</strong> arrays used in C<br />
– They are used for I/O operations<br />
– They are used for implementing string operations in C<br />
To make the use <strong>of</strong> character arrays easier, we can use string constants to<br />
initialize them. The following definitions are equivalent:<br />
– char hello[] = {’H’,’e’,’l’,’l’,’o’,’\0’};<br />
– char hello[] = "Hello";<br />
– char hello[6] = "Hello";<br />
Notice that the string constant is automatically terminated by a NUL character!<br />
Stephan Schulz 126
Functions<br />
Functions are the primary means <strong>of</strong> structuring programs in C<br />
A function is a named subroutine<br />
– It accepts a number <strong>of</strong> arguments, processes them, <strong>and</strong> (optionally) returns a<br />
result<br />
– Functions also may have side effects, like I/O or changes to global data<br />
structures<br />
– In C, any subroutine is called a function, wether it actually returns a result or<br />
is only called for its side effect<br />
Note: A function hides its implementation<br />
– To use a function, we only need to know its interface, i.e. its name, parameters,<br />
<strong>and</strong> return type<br />
– We can improve the implementation <strong>of</strong> a function without affecting the rest <strong>of</strong><br />
the program<br />
Function can be reused in the same program or even different programs, allowing<br />
people to build on existing code<br />
Stephan Schulz 127
Function Definitions<br />
A function definition consists <strong>of</strong> the following elements:<br />
– Return type (or void) if the function does not return a value<br />
– Name <strong>of</strong> the function<br />
– Parameter list<br />
– Function body<br />
The name follows the same rules as variable names<br />
The parameter list is a list <strong>of</strong> coma-separated pairs <strong>of</strong> the form <br />
The body is a sequence <strong>of</strong> statements included in curly braces<br />
Example:<br />
int timesX(int number, int x)<br />
{<br />
return x*number;<br />
}<br />
Stephan Schulz 128
Function Calling<br />
A function is called from another part <strong>of</strong> the program by writing its name,<br />
followed by a parenthesized list <strong>of</strong> arguments (where each argument has to have<br />
a type matching that <strong>of</strong> the corresponding parameter <strong>of</strong> the function)<br />
If a function is called, control passes from the call <strong>of</strong> the function to the function<br />
itself<br />
– The parameters are treated as local variables with the values <strong>of</strong> the arguments<br />
to the call<br />
– The function is executed normally<br />
– If control reaches the end <strong>of</strong> the function body, or a return statement is<br />
executed, control returns to the caller<br />
– A return statement may have a single argument <strong>of</strong> the same type as the<br />
return type <strong>of</strong> the function. If the statement is executed, the argument <strong>of</strong><br />
return becomes the value returned to the caller<br />
We can only call functions that have already been declared or defined at that<br />
point in the program!<br />
Stephan Schulz 129
Example: Printing Character Frequencies<br />
int print_freq(char c, int freq)<br />
{<br />
int i;<br />
}<br />
printf("%c :", c);<br />
if(freq < 75)<br />
{<br />
for(i=0; i
Example: Printing Character Frequencies (contd.)<br />
Assume that the previous function definition is inserted into the frequency<br />
counting program just in front <strong>of</strong> the int main(void) line<br />
We can then modify main as follows:<br />
...<br />
for(i=0; i
Exercises<br />
Rewrite the Fahrenheit→Celsius Program to use a function for the actual conversion<br />
Stephan Schulz 132
Assignment<br />
A prime number is a (positive integer) number that is evenly divisible only by 1<br />
<strong>and</strong> itself<br />
1. Write a function isprime() that determines if an integer number is prime.<br />
You can use the % modulus operator (division rest on integers) or work with<br />
plain division. Use your function to implement a program primes simple<br />
that prints all primes between 0 <strong>and</strong> 10000.<br />
2. The Sieve <strong>of</strong> Erathostenes is a more efficient (<strong>and</strong> ancient) algorithm for<br />
finding all primes up to a given number. It starts with a list <strong>of</strong> all numbers<br />
from 2 to the desired limit. It traverses this list, starting at two. Whenever<br />
it encounteres a new number, it strikes all multiples <strong>of</strong> it from the list. What<br />
remains at the end is a list <strong>of</strong> prime numbers.<br />
Example:<br />
Initial list: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16<br />
Striking multiples <strong>of</strong> 2: 2 3 5 7 9 11 13 15<br />
Striking multiples <strong>of</strong> 3: 2 3 5 7 11 13<br />
(There are no multiples <strong>of</strong> any remainig number, so we skip the<br />
Use the Sieve algorithm in a second program, primes sieve, that prints all<br />
primes between 0 <strong>and</strong> 10000. Hint: Use an array!<br />
Stephan Schulz 133
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
More on Functions<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
A function is a named subroutine<br />
Review <strong>of</strong> Function Properties<br />
It accepts a number <strong>of</strong> arguments <strong>of</strong> a predetermined type <strong>and</strong> returns a value <strong>of</strong><br />
a given type<br />
It can have its own local variables<br />
A function can be called from other places in the program, including other<br />
functions<br />
Functions have to be known (either defined or declared) before they can be called<br />
Stephan Schulz 135
Example: Reading Integers<br />
We want to write a function that reads a positive integer number from stdin,<br />
using only getchar()<br />
A number is defined as a sequence <strong>of</strong> decimal digits (characters from the range<br />
’0’ to ’9’<br />
– We can use the function isdigit(c) from ctype.h to test if a character is a<br />
(decimal) digit<br />
– The C st<strong>and</strong>ard guarantees that ’0’ to ’9’ have consecutive numerical values.<br />
We can thus get the value <strong>of</strong> a single character c that represents a digit by<br />
the expression c-’0’<br />
Idea: We read the most significant digits first. So whenever we read a new digit,<br />
the value <strong>of</strong> what we have read so far increases 10-fold:<br />
Read Value<br />
1 1<br />
13 10*1+3 = 13<br />
137 10*13+7 = 137<br />
1375 10*137+5 = 1375<br />
Stephan Schulz 136
Example: int read int10(void)<br />
/* We assume that stdio <strong>and</strong> ctype have been included */<br />
/* A function that reads a positive integer number in base 10 from<br />
* stdin. Return number or -1 on failure. Will read one character<br />
* ahead! */<br />
int read_int10(void)<br />
{<br />
int res = 0, c, count=0;<br />
}<br />
while(isdigit(c=getchar()))<br />
{<br />
res = (res*10)+c-’0’;<br />
count++;<br />
}<br />
if(count > 0) /* We read something */<br />
{<br />
return res;<br />
}<br />
return -1;<br />
Stephan Schulz 137
Improving the Function<br />
read int10(void) works fine, but can only read number in decimal notation<br />
We want to have a function that can read numbers in any base between 2 <strong>and</strong><br />
10 now<br />
Examples:<br />
– 142 in base 8 has the value 1 ∗ 8 2 + 4 ∗ 8 1 + 2 ∗ 8 0 = 1 ∗ 64 + 4 ∗ 8 + 2 = 98<br />
– 101010 in base two has the value 1∗2 5 +0∗2 4 +1∗2 3 +0∗2 2 +1∗2 1 +0∗2 0 =<br />
32 + 8 + 2 = 42<br />
– 1873 is not a valid number in base 6! All digits have to be smaller than the<br />
base<br />
The principle is the same, we just use a parameter base instead <strong>of</strong> the hardwired<br />
value 10!<br />
Stephan Schulz 138
Do we have a Valid Digit?<br />
/* Is a character a valid digit in base b? */<br />
int is_base_digit(int c, int base)<br />
{<br />
if(c - ’0’ < 0)<br />
{<br />
return 0;<br />
}<br />
if(c - ’0’ >= base)<br />
{<br />
return 0;<br />
}<br />
return 1;<br />
}<br />
Stephan Schulz 139
Reading a Number in any Base ( 0) /* We read something */<br />
{<br />
return res;<br />
}<br />
return -1;<br />
Stephan Schulz 140
Build General Functions!<br />
Good programs are build by breaking the task into many functions that are:<br />
– Small – at most one screen page (in your favourite editor)<br />
– Simple – they only do one thing, <strong>and</strong> they do that well<br />
– General – so that they can be reused at other parts in the program<br />
Going from general to specific is (generally) easy:<br />
/* Alternative to read_int10 */<br />
int read_int10b(void)<br />
{<br />
return read_int_b(10);<br />
}<br />
Stephan Schulz 141
Recursive Functions<br />
As we stated above, functions can call other functions. They can also call<br />
themselves recursively<br />
A recursive function always has to h<strong>and</strong>le at least two cases:<br />
– The base case h<strong>and</strong>les a simple situation without further calls to the same<br />
function<br />
– The recursive cases may do some work, <strong>and</strong> in between make recursive calls to<br />
the function for smaller (in some sense) subtasks<br />
Recursion is one <strong>of</strong> the most important programming principles!<br />
Stephan Schulz 142
Example: Printing Integers<br />
We now want to print positive integer numbers to stdout, using only putchar()<br />
Consider a number in base 10: 421 = 42 ∗ 10 + 1<br />
We can split the task into two subtasks:<br />
– Print everything but the last digit (recursively)<br />
– Print the last digit<br />
Base case: There are no digits to print any more<br />
Basic operations:<br />
– To get the last digit, we use the modulus operator %<br />
– To get rid <strong>of</strong> the last digit, we divide the number by the desired base (remember,<br />
integer division truncates)<br />
Stephan Schulz 143
Example: Decimal Representation <strong>of</strong> 421<br />
Let’s do an example: We want to print the number 421 in base 10<br />
– Step 1: 421%10 = 1 <strong>and</strong> 421/10 = 42. Hence the last number to print is 1<br />
<strong>and</strong> the rest we still have to print is 42<br />
– Step 2: 42%10 = 2 <strong>and</strong> 42/10 = 4. The second last digit is 2, the rest is 4<br />
– Step 3: 4%10 = 4 <strong>and</strong> 4/10 = 0. The next digit is 4<br />
– Step 4: Our rest is 0, hence there is nothing to do but printing the digits in<br />
the right order<br />
The same principle applies for other bases (just replace 10 by your base)<br />
Stephan Schulz 144
Writing a Number in any Base (
Writing Integers (Contd.)<br />
We can wrap the simple recursive function to h<strong>and</strong>le the abnormal case (but, as<br />
we saw on the last slide, we don’t need to):<br />
/* Write positive integer in any base to stdout */<br />
void write_int_b(int value, int base)<br />
{<br />
if(value == 0)<br />
{<br />
putchar(’0’);<br />
}<br />
write_int_b_rekursive(value, base);<br />
}<br />
Stephan Schulz 146
Putting Things Together: A Base Converter<br />
We now use the defined function to write a program that reads pairs number<br />
base <strong>and</strong> prints them back in the new base:<br />
– number is considered to be a decimal number<br />
– base should be a decimal number between 2 <strong>and</strong> 10 (inclusive)<br />
– Numbers <strong>and</strong> pairs are separated by a single, arbitrary character (including<br />
space <strong>and</strong> newline)<br />
– The program terminates, if one <strong>of</strong> the numbers is invalid<br />
Stephan Schulz 147
The Base Converter<br />
int main(void)<br />
{<br />
int num, base;<br />
while(1)<br />
{<br />
printf("Input decimal value <strong>and</strong> desired base!\n");<br />
num = read_int10();<br />
if(num == -1)<br />
{<br />
return EXIT_SUCCESS;<br />
}<br />
base = read_int10();<br />
if(base == -1 || base < 2 || base > 10)<br />
{<br />
printf("Error: No valid base!\n");return EXIT_FAILURE;<br />
}<br />
write_int_b(num, base);<br />
putchar(’\n’);<br />
}<br />
}<br />
Stephan Schulz 148
Usage Example<br />
$ ./base converter<br />
Input decimal value <strong>and</strong> desired base!<br />
123123 3<br />
20020220010<br />
Input decimal value <strong>and</strong> desired base!<br />
42 10<br />
42<br />
Input decimal value <strong>and</strong> desired base!<br />
42 Hallo!<br />
Error: No valid base!<br />
$<br />
Stephan Schulz 149
Exercise<br />
Extend the base converter to work with base 16, using 0-9 <strong>and</strong> A-F as digits<br />
(allow both upper <strong>and</strong> lower case!)<br />
Extend the base converter to accept tripplets input-base,value,outputbase,<br />
where value is interpreted in input-base (<strong>and</strong> input-base is a single hexadecimal<br />
digit >=2). Add reasonably robust error h<strong>and</strong>ling!<br />
The complete base converter code from the lecture is available from the <strong>CSC322</strong><br />
web page or directly at http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>/base_<br />
converter.c<br />
Stephan Schulz 150
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
Program Structure <strong>and</strong> the C Preprocessor<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Is combibed into<br />
Simple Program Structure<br />
Sources<br />
(Definitions)<br />
Compiler<br />
Executable<br />
C Preprocessor<br />
Headers<br />
(Declarations)<br />
Stephan Schulz 152
Includes<br />
Translates into<br />
Is combibed into<br />
Program Structure In Detail<br />
Sources<br />
(Definitions)<br />
Compiler<br />
Object<br />
System System<br />
Files Library<br />
Library<br />
Executable<br />
RTE<br />
(Shared libs)<br />
Linker<br />
C Preprocessor<br />
Headers<br />
(Declarations)<br />
Stephan Schulz 153
Program Structure for Multi-File Programs<br />
Headers Headers Headers<br />
(Declarations) (Declarations) (Declarations)<br />
Sources<br />
(Definitions)<br />
Object<br />
Object<br />
Object<br />
System System<br />
Files Files<br />
Files<br />
Library<br />
Library<br />
Includes<br />
Translates into<br />
Is combibed into<br />
Sources<br />
(Definitions)<br />
Executable<br />
RTE<br />
(Shared libs)<br />
Sources<br />
(Definitions)<br />
C Preprocessor<br />
Compiler<br />
Linker<br />
Headers<br />
(Declarations)<br />
Stephan Schulz 154
The C Preprocessor<br />
The C preprocessor performs a textual rewriting <strong>of</strong> the program text before it is<br />
ever seen by the compiler proper<br />
– It includes the contents <strong>of</strong> other files<br />
– It exp<strong>and</strong>s macro definitions<br />
– It conditionally processes or removes segments <strong>of</strong> the program text<br />
Preprocessor directives start with a hash # <strong>and</strong> traditionally are written starting<br />
in the very first column <strong>of</strong> the program text<br />
After preprocessing, the program text is free <strong>of</strong> all preprocessor directives<br />
Normally, gcc will transparently run the preprocessor. Run gcc -E if you<br />
want to see the preprocessor output<br />
Stephan Schulz 155
Including Other Files: #include<br />
The #include directive is used to include other files (the contents <strong>of</strong> the named<br />
file replaces the #include directive)<br />
Form 1: #include "file"<br />
– The preprocessor will search for file in the current directory<br />
– What happens if file is not found in the current directory, is implementationdefined<br />
∗ <strong>UNIX</strong> compilers will typically treat file as a pathname (that may be either<br />
absolute or relative)<br />
∗ If the file is not found, the compiler prints an error message <strong>and</strong> aborts<br />
Form 2: #include <br />
– file will be searched for in an implementation-defined way<br />
– <strong>UNIX</strong> compilers will typically treat file as a file name relative to the system<br />
include directory, /usr/include on the lab machines<br />
– You can add to the list <strong>of</strong> directories that will be searched using<br />
gcc -I<br />
Stephan Schulz 156
myfile.c:<br />
A Poem<br />
#include "mary"<br />
$ gcc -E myfile.c<br />
# 1 "myfile.c"<br />
A Poem<br />
# 1 "mary" 1<br />
Mary had a little lamb,<br />
Its fleece was white as snow;<br />
And everywhere that Mary went<br />
The lamb was sure to go.<br />
# 4 "myfile.c" 2<br />
Example: Include<br />
mary:<br />
Mary had a little lamb,<br />
Its fleece was white as snow;<br />
And everywhere that Mary went<br />
The lamb was sure to go.<br />
Stephan Schulz 157
Include Discussion<br />
Include directives are typically used for sharing common declarations between<br />
different program parts<br />
Libraries (including the st<strong>and</strong>ard library) come with header files that define their<br />
interface by<br />
– Defining data types <strong>and</strong> constants<br />
– Declaring functions (<strong>and</strong> defining macros)<br />
– Declaring variables<br />
Note that included files can contain further #include statements (that will be<br />
automatically exp<strong>and</strong>ed by the preprocessor)<br />
– This is frequent in system files, where the st<strong>and</strong>ard-prescribed include files<br />
<strong>of</strong>ten include system-specific files actually describing the features<br />
Stephan Schulz 158
Simple Macro Definitions: #define<br />
The #define directive is used to define macros<br />
Simple Form: #define <br />
– This will define a macro for , which has to follow the common rules for<br />
C identifiers (alphanumeric characters <strong>and</strong> underscore, should not start with a<br />
digit)<br />
– Any normal occurence <strong>of</strong> after the definition will be replaced by<br />
<br />
– Replacement will not take place in strings!<br />
– The macro definition normally ends at the end <strong>of</strong> the line, however, it can be<br />
extended to the next line by appending \ as the very last character <strong>of</strong> the line<br />
Note that macro expansion even takes place within further macro definitions!<br />
Most common use: Symbolic constants (e.g. EOF)<br />
Stephan Schulz 159
eality.c:<br />
Simple #define Example<br />
#define true 1<br />
#define false 0<br />
void reality_check(void)<br />
{<br />
if(true == false)<br />
{<br />
printf("Reality is broken!\n");<br />
}<br />
}<br />
$ gcc -E reality.c<br />
# 4 "reality.c"<br />
void reality_check(void)<br />
{<br />
if(1 == 0)<br />
{<br />
printf("Reality is broken!\n");<br />
}<br />
}<br />
Stephan Schulz 160
Macros with Arguments<br />
Macro definitions can also contain formal arguments<br />
#define (arg1,...,arg1) <br />
If a macro with arguments is exp<strong>and</strong>ed, any occurence <strong>of</strong> a formal argument in<br />
the replacement text is replaced with the actual value <strong>of</strong> the arguments in the<br />
macro call<br />
This allows a more efficient way <strong>of</strong> implementing small “functions”<br />
– But: Macros cannot do recursion<br />
– Macro calls have slightly different semantics from function calls<br />
– Therefore macros are usually only used for very simple tasks<br />
By convention, preprocessor defined constants <strong>and</strong> many macros are written in<br />
ALL CAPS (using underscores for structure)<br />
Stephan Schulz 161
macrotest.c:<br />
#define Examples<br />
#define XOR(x,y) ((!(x)&&(y))||((x)&&!(y))) /* Exclusive or */<br />
#define EQUIV(x,y) (!XOR(x,y))<br />
void test_macro(void)<br />
{<br />
printf("XOR(1,1) : %d\n", XOR(1,0));<br />
printf("EQUIV(1,0): %d\n", EQUIV(1,0));<br />
}<br />
$ gcc -E reality.c<br />
# 4 "macrotest.c"<br />
void test_macro(void)<br />
{<br />
printf("XOR(1,1) : %d\n", ((!(1)&&(0))||((1)&&!(0))));<br />
printf("EQUIV(1,0): %d\n", (!((!(1)&&(0))||((1)&&!(0)))));<br />
}<br />
Stephan Schulz 162
#define Caveats<br />
Since macros work by textual replacement, there are some unexpected effects:<br />
– Consider #define FUN(x,y) x*y + 2*x<br />
∗ Looks innocent enough, but: FUN(2+3,4) exp<strong>and</strong>s into 2+3*4+2*2+3 (not<br />
(2+3)*4+2*(2+3))<br />
∗ To avoid this, always enclose each formal parameter in parentheses (unless<br />
you know what you are doing)<br />
– Now consider FUN(var++,1)<br />
∗ It exp<strong>and</strong>s into x++*1 + 2*x++<br />
∗ Macro arguments may be evaluated more than once!<br />
∗ Thus, avoid using macros with expressions that have side effects<br />
Other frequent problems:<br />
– Semicolons at the end <strong>of</strong> a macro definition (wrong!)<br />
– “Invisible” syntax errors (run gcc -E <strong>and</strong> check the output if you cannot locate<br />
an error)<br />
Stephan Schulz 163
Conditional Compilation: #if/#else/#endif<br />
We can use preprocessor directives to conditionally include or exclude parts <strong>of</strong><br />
the program:<br />
– Program parts may be enclosed in #if /#endif pairs<br />
– has to be a constant integer expression<br />
– If it evaluates to 0, the text in the #if /#endif bracket is ignored,<br />
otherwise it is included<br />
– There also is an optional #else “branch”<br />
Most frequent use: Test for the definition <strong>of</strong> macros<br />
– defined() evaluates to 1 if is defined (even as the empty<br />
string), 0 otherwise<br />
– Short form: #if defined() is equivalent to #ifdef ,<br />
#if !defined() is equivalent to #ifndef ,<br />
– E.g.: #ifndef EOF<br />
#define EOF -1<br />
#endif<br />
Stephan Schulz 164
cond preproc.c:<br />
#define hallo<br />
#define fred barney<br />
#define test 2+2<br />
#if defined(hallo)<br />
"Hallo"<br />
#else<br />
#ifdef fred<br />
"Fred"<br />
#endif<br />
#endif<br />
#if test<br />
"test"<br />
#endif<br />
$ gcc -E cond preproc.c<br />
# 5 "cond_preproc.c"<br />
"Hallo"<br />
"test"<br />
Example: #ifdef<br />
Stephan Schulz 165
Exercises<br />
Search the /usr/include directory (use grep for faster progress) <strong>and</strong> find out<br />
where the following functions/macros are defined, <strong>and</strong>, for the macros, what<br />
their value is<br />
– LONG MAX<br />
– ULONG MAX<br />
– getchar()<br />
– getc()<br />
– EOF<br />
– EXIT FAILURE<br />
– EXIT SUCCESS<br />
– NULL<br />
Stephan Schulz 166
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
C Preprocessor/Declarations <strong>and</strong> Scoping<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Conditional Compilation: #if/#else/#endif<br />
We can use preprocessor directives to conditionally include or exclude parts <strong>of</strong><br />
the program:<br />
– Program parts may be enclosed in #if /#endif pairs<br />
– has to be a constant integer expression<br />
– If it evaluates to 0, the text in the #if /#endif bracket is ignored,<br />
otherwise it is included<br />
– There also is an optional #else “branch”<br />
Most frequent use: Test for the definition <strong>of</strong> macros<br />
– defined() evaluates to 1 if is defined (even as the empty<br />
string), 0 otherwise<br />
– Short form: #if defined() is equivalent to #ifdef ,<br />
#if !defined() is equivalent to #ifndef ,<br />
– E.g.: #ifndef EOF<br />
#define EOF -1<br />
#endif<br />
Stephan Schulz 168
cond preproc.c:<br />
#define hallo<br />
#define fred barney<br />
#define test 2+2<br />
#if defined(hallo)<br />
"Hallo"<br />
#else<br />
#ifdef fred<br />
"Fred"<br />
#endif<br />
#endif<br />
#if test<br />
"test"<br />
#endif<br />
$ gcc -E cond preproc.c<br />
# 5 "cond_preproc.c"<br />
"Hallo"<br />
"test"<br />
Example: #ifdef<br />
Stephan Schulz 169
More on Preprocessor Definitions<br />
You can use #undef to get rid <strong>of</strong> a definition<br />
– This is most <strong>of</strong>ten used to start from a clean slate:<br />
#undef true<br />
#undef false<br />
#define true 1<br />
#define false 0<br />
– It is, however, forbidden to undefine implementation-defined names<br />
You can use the -D option to gcc to cause certain names to be defined throughout<br />
the process<br />
– This is <strong>of</strong>ten used to select one <strong>of</strong> many alternatives for compilation<br />
∗ With or without internal consistency checkes<br />
∗ With or without certain features (e.g. Demo version vs. commercial version)<br />
∗ . . .<br />
Certain names may be predefined by the implementation (most starting with two<br />
underscores: __FILE__, __STDC__ . . . )<br />
Stephan Schulz 170
Combinations <strong>of</strong> #ifdef <strong>and</strong> #include<br />
#ifdef/endif also can be used to conditionally include or exclude files<br />
Usage: Compile for different operating systems:<br />
#ifdef __LINUX__<br />
#include "linux.h"<br />
#elif defined(__BSD__)<br />
#include "bsd.h"<br />
#else<br />
#include "default.h"<br />
#endif<br />
Usage: Guarding against multiple inclusions<br />
#ifndef THIS_HEADER<br />
#define THIS_HEADER<br />
<br />
#endif<br />
Stephan Schulz 171
Separate Compilation<br />
C supports the separate compliation <strong>of</strong> multiple source files<br />
– Each source file is translated into an object file<br />
– A linker combines different object files into the final executable<br />
gcc by default tries to create an executable program by performing operations as<br />
follows:<br />
1. Preprocessing<br />
2. Compilation (<strong>and</strong> assembly)<br />
3. Linking<br />
For multi-file programs, we have to perform separate compilation:<br />
– gcc -c file.c -o file.o will compile file.c into file.o without linking<br />
– gcc -o progname file1.o file2.o file3.o will link the three precomiled object<br />
files into an executable<br />
Stephan Schulz 172
Definitions <strong>and</strong> Declarations<br />
Definitions cause the defined objects to be created<br />
– Variable definitions allocate an appropriate amount <strong>of</strong> memory (<strong>and</strong> associate<br />
it with the variable name)<br />
– Function definitions cause code to be generated<br />
Declarations only state information about an object<br />
– For variables, they state the type<br />
– For functions, the state return type <strong>and</strong> argument types<br />
There can be any number <strong>of</strong> compatible declarations for an object<br />
There can be only one definition for the object<br />
A function or variable can only be used inside the scope <strong>of</strong> a matching declaration<br />
Any definition also implicitly declares an object<br />
Stephan Schulz 173
Explicit Declarations<br />
Variables can be declared by adding the extern keyword to the syntax <strong>of</strong> a<br />
definition:<br />
– extern int counter;<br />
– extern char filename[MAXPATHLEN];<br />
Function declarations just consist <strong>of</strong> the function header, terminated by a semicolon:<br />
– int isdigit(int c);<br />
– int putchar(int c);<br />
– bool TermComputeRWSequence(PStack p stack,Term p from,Term p to);<br />
Alternatively, the names <strong>of</strong> the formal parameters can be omitted<br />
– int isdigit(int);<br />
– int putchar(int);<br />
– bool TermComputeRWSequence(PStack p,Term p,Term p);<br />
– However, the first form is <strong>of</strong>ten preferred because the paramter names may<br />
document the purpose <strong>of</strong> the parameter<br />
Stephan Schulz 174
Scoping Rules<br />
There are two kinds <strong>of</strong> declarations in C<br />
– Declarations written inside a block are called local declarations<br />
– Declarations outside any block are global declarations<br />
The scope <strong>of</strong> a local declaration begins at the declaration <strong>and</strong> ends at the end <strong>of</strong><br />
the innermost enclosing block<br />
The scope <strong>of</strong> a global declaration begins at the declaration <strong>and</strong> continues to the<br />
end <strong>of</strong> the source file<br />
– Note that this refers to files after preprocessing, i.e. a declaration in a header file<br />
also is visible in the including file (from the point <strong>of</strong> the #include statement)<br />
Stephan Schulz 175
Scope Example<br />
| extern int global_count;<br />
|<br />
| | | int abs_val (double number)<br />
| | | {<br />
| | | | double help = number;<br />
| | | |<br />
| | | | if(number < 0)<br />
| | | | {<br />
| | | | help = -1 * help;<br />
| | | | global_count++;<br />
| | | | }<br />
| | | | }<br />
| |<br />
| | | int main()<br />
| | | {<br />
| | | printf("\%7f\n", abs_val(-1.0));<br />
| | | }<br />
| | |<br />
| | | int global_count;<br />
Stephan Schulz 176
Limiting Potential Scope<br />
By default, all declared variables <strong>and</strong> functions are accessible from any source file<br />
in the program<br />
– Of course, they may have to be declared to be visible<br />
Problems: We have no control over the use <strong>of</strong> these objects in other source files<br />
– Reuse <strong>of</strong> libraries may fail because <strong>of</strong> namespace polution<br />
– Unintentional or malicious misuse <strong>of</strong> internal functions may lead to program<br />
misbehaviour<br />
The static keyword, applied to a global definition (or declaration), limits the<br />
accessibility <strong>of</strong> the declared object to the source file it is defined in<br />
– static int internal help fun(int a1, int a2);<br />
In general, it is a good idea to declare everything not expected to be used by<br />
other program part static<br />
Stephan Schulz 177
Lifetime <strong>and</strong> Initialization <strong>of</strong> Variables<br />
Global variables have unlimited lifetime<br />
– They are created <strong>and</strong> initialized when the program starts<br />
– The expression used in the initialzation has to be constant, i.e. it has to be<br />
fully evaluable at compile time<br />
– If not explicitly initialized, they are guaranteed to be initialized to 0<br />
– They keep their values until the program terminates (unless explicitely changed,<br />
<strong>of</strong> course)<br />
Most local variables (<strong>and</strong> function parameters) only have limited lifetime<br />
– They are also called automatic variables <strong>and</strong> are typically allocated on the<br />
stack<br />
– They are created when the variable comes into scope <strong>and</strong> are destroyed when<br />
the variable goes out <strong>of</strong> scope – in particular, each recursive call gets a fresh<br />
copy <strong>of</strong> the variable<br />
– The initializing expression can use all variables <strong>and</strong> functions currently in scope<br />
– They are reinitialized every time they come into scope, if not initialized<br />
explicitly, they contain undefined values (“junk”)<br />
Stephan Schulz 178
Persistent Local Variables: static again<br />
static local variables have unlimited lifetime<br />
– They are initalized the very first time they come into scope<br />
– They are shared between different calls to the same function<br />
– They keep their values in between calls<br />
– However, they can only be accessed from inside their corresponing block<br />
Stephan Schulz 179
Example: Static <strong>and</strong> Automatic Variables<br />
#include <br />
#include <br />
static int global_count = 0;<br />
void counter_fun(void)<br />
{<br />
static int static_count = 0;<br />
int auto_count = 0;<br />
int pseudo_count = global_count;<br />
global_count++; auto_count++; static_count++; pseudo_count++;<br />
printf("Global: %3d Auto: %3d Static: %d Pseudo: %d\n",<br />
global_count,auto_count, static_count, pseudo_count);<br />
}<br />
int main(void)<br />
{<br />
counter_fun();<br />
counter_fun();<br />
global_count = 0;<br />
counter_fun();<br />
counter_fun();<br />
return EXIT_SUCCESS;<br />
}<br />
Stephan Schulz 180
Example: Static <strong>and</strong> Automatic Variables(Contd.)<br />
$ gcc -o vartest vartest.c<br />
$ ./vartest<br />
Global: 1 Auto: 1 Static: 1 Pseudo: 1<br />
Global: 2 Auto: 1 Static: 2 Pseudo: 2<br />
Global: 1 Auto: 1 Static: 3 Pseudo: 1<br />
Global: 2 Auto: 1 Static: 4 Pseudo: 2<br />
$<br />
Stephan Schulz 181
Assignment<br />
Write a data safe library <strong>of</strong>fering the following functionality:<br />
– Calling data safe(ds register, 0, 0) will return a unique r<strong>and</strong>om key (a<br />
positive integer). Use r<strong>and</strong>() to generate r<strong>and</strong>om numbers (<strong>and</strong> man r<strong>and</strong><br />
to find out how).<br />
– Calling data safe(ds store, key, value) will store the value (a positive<br />
integer) in the data safe (under the key). It should return the value if everything<br />
worked, -1 otherwise (e.g. if there is no space left)<br />
– Calling data safe(ds retrieve, key, n) will retrieve the nth value stored<br />
under the key, or -1 if less then n values have been stored under the key<br />
– Calling data safe(ds delete, key, 0) will delete all entries stored under<br />
key (you may then reuse key for future register calls, as long as you still<br />
generate a r<strong>and</strong>om key)<br />
– Make sure that at least 100 keys can be in use in parallel, <strong>and</strong> that at least<br />
10000 data items can be stored in total<br />
Make sure that the data is not accessible in any other way (using legal C)<br />
Stephan Schulz 182
Implement the libray in its own source file, with a header file data safe.h that<br />
contains all necessary declarations<br />
Write a main program ds test.c that uses the library, storing 10 values under 3<br />
different keys, retrieving them <strong>and</strong> delete them. Use a reasonably varied sequence<br />
<strong>of</strong> storage, retrieval, <strong>and</strong> registration<br />
Hints:<br />
– Use static local variables to store the necessary data in the data safe()<br />
function<br />
– Use preprocessor #define statements to define the symbolic constants<br />
ds register, ds store,. . .<br />
– Be careful to avoid h<strong>and</strong>ing a key already in use out on registration. Carefully<br />
design your data structures first, the operations will be simple to implement<br />
Stephan Schulz 183
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
rpn calc: An Extended Example<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Project: An RPN Calculator<br />
Aim: A calculator program that can do simple arithmentic<br />
– Conversion between different bases<br />
– Addition, subtraction, multiplication...<br />
We’ll use reverse polish notation<br />
– Operator is written after arguments: 7 5 + = 7+5<br />
– More complicated: 12 2 5 2 * + - = (12-(2+(5*2)))<br />
Advantages <strong>of</strong> RPN<br />
– Easy to underst<strong>and</strong><br />
– Easy to implement<br />
– No hassle with recursive parsing <strong>of</strong> parentheses <strong>and</strong> precedences<br />
– Can easily <strong>and</strong> consistently h<strong>and</strong>le operators <strong>of</strong> any arity (number <strong>of</strong> arguments)<br />
Stephan Schulz 185
Some Sugested Operators<br />
Arithmetic operators (others may be added):<br />
+ Pop two numbers , add them<br />
- Pop two numbers, subtract first from second<br />
* Pop two numbers , multiply them<br />
/ Pop two numbers, divide second by first<br />
% Pop two numbers, divide second by first, giving the division rest<br />
Non-Arithmetic operators (non-exclusive):<br />
p Print the topmost number on the stack<br />
o Pop topmost number on the stack, use it as new output base<br />
i Pop topmost number on the stack, use it as new input base<br />
S Print the whole stack (mainly for debugging)<br />
P Print input <strong>and</strong> output bases (in decimal)<br />
Stephan Schulz 186
$ ./rpn calc<br />
Usage Example<br />
10 8 +<br />
S<br />
18<br />
3 / p<br />
6<br />
3 / p<br />
2<br />
o<br />
p<br />
Stack underflow error<br />
P<br />
Input base (decimal): 10 Output Base (decimal): 2<br />
255<br />
p<br />
11111111<br />
16 p<br />
10000<br />
10 o<br />
S<br />
255 16<br />
Stephan Schulz 187
Basic idea:<br />
Implementation<br />
– Input is a sequence <strong>of</strong> numbers <strong>and</strong> operators<br />
– If a number is read, it is pushed onto a stack<br />
– If an operator is read, the necessary number or arguments is popped <strong>of</strong> the<br />
stack, the operation is performed, <strong>and</strong> the result is placed in the stack<br />
Input <strong>and</strong> output can happen in any representation from base 2-16<br />
– There is a strong convention for representing these numbers:<br />
∗ Digits are 0-9 with nominal value, A-F (or a-f) with values 10-15<br />
– Input <strong>and</strong> output use independent bases (base conversion made easy)<br />
Recognizing numbers <strong>and</strong> operators<br />
– Any string <strong>of</strong> valid digits in the current input base is a number<br />
– Any string starting with - <strong>and</strong> directly followed by valid digits in the current<br />
input base is a number<br />
– Everything else is treated as an operator<br />
Stephan Schulz 188
Subtasks<br />
From the above, we can identify a number <strong>of</strong> subtasks:<br />
– Reading numbers <strong>and</strong> operators<br />
– Printing numbers<br />
– H<strong>and</strong>ling the stack<br />
– Executing the actual operations<br />
Input h<strong>and</strong>ling is the hardest task!<br />
– We need to read up to 2 characters to decide if we read a number or an<br />
operator (’-+’ represents two operators, ’-1’ a number)<br />
– Rather than h<strong>and</strong>ling explicit lookahead variables throughout the program, we<br />
can build a general character I/O-library that allows us to read ahead, but to<br />
maintain (or restore) the status <strong>of</strong> the input queue<br />
Stephan Schulz 189
Program Organization<br />
ctype.h stdio.h<br />
stdlib.h<br />
chario.h<br />
chario.c<br />
integerio.h<br />
integerio.c<br />
chario.o integerio.o rpn_calc.o (libc)<br />
rpn_calc<br />
rpn_calc.c<br />
#include<br />
Compile (gcc −c)<br />
Link (gcc)<br />
Stephan Schulz 190
The Character I/O Library: Ideas<br />
Main interface similar to getchar()<br />
Read character can be “pushed back” into the input queue<br />
Implementation:<br />
– Internal buffer <strong>of</strong> character<br />
– Pushed characters go into the buffer<br />
– Reading first tries the buffer, <strong>and</strong> only reads stdio if the buffer is empty<br />
Additional help-functions<br />
– Look at a character, but don’t read it<br />
– Skip while space<br />
Stephan Schulz 191
The Character I/O Library: chario.h<br />
#ifndef UNGETCHAR<br />
#define UNGETCHAR<br />
#include <br />
#include <br />
/* Maximal number <strong>of</strong> characters the can be pushed back */<br />
#define MAX_BUFFERED_CHARS 1024<br />
/* As getchar(), but with unget cabability (provided by PushChar() */<br />
int GetChar(void);<br />
/* Push back a character into the read queue. Return c or EOF if the<br />
queue is full. */<br />
int PushChar(int c);<br />
/* Return the next character, but do _not_ read it */<br />
int LookChar(void);<br />
/* Skip over white space characters. Return the first non-white<br />
character (but it is not removed from the queue), or EOF if the<br />
pushback queue is full. */<br />
int SkipSpace(void);<br />
#endif<br />
Stephan Schulz 192
The Character I/O Library: Global Variables <strong>and</strong> Includes<br />
#include "chario.h"<br />
static int char_buff[MAX_BUFFERED_CHARS];<br />
static int buff_pos = 0;<br />
Stephan Schulz 193
The Character I/O Library: Reading <strong>and</strong> Unreading<br />
int GetChar(void)<br />
{<br />
if(buff_pos)<br />
{<br />
buff_pos--;<br />
return char_buff[buff_pos];<br />
}<br />
return getchar();<br />
}<br />
int PushChar(int c)<br />
{<br />
if(buff_pos < MAX_BUFFERED_CHARS)<br />
{<br />
char_buff[buff_pos] = c;<br />
buff_pos++;<br />
return c;<br />
}<br />
return EOF;<br />
}<br />
Stephan Schulz 194
int LookChar(void)<br />
{<br />
int c = GetChar();<br />
}<br />
PushChar(c);<br />
return c;<br />
int SkipSpace(void)<br />
{<br />
int c;<br />
}<br />
The Character I/O Library: Help Functions<br />
while(isspace((c=GetChar())))<br />
{ /* Empty body */ }<br />
return PushChar(c);<br />
Stephan Schulz 195
The Integer I/O Library: Ideas<br />
We use the same algorithms as discussed before<br />
However, because we allow bases up to 16, we add some additional helper<br />
functions for<br />
– Recognizing valid digits<br />
– Converting numerical values to character representation <strong>of</strong> digits<br />
– Giving the numerical value <strong>of</strong> digits<br />
Second difference: We allow negative numbers<br />
– We cannot use -1 to signal failure<br />
– Instead: We write a separate function that predicts the presence (or absence)<br />
<strong>of</strong> a number in the input stream<br />
– The calling functions have to make sure that the integer reading function is<br />
only called if there is valid input (i.e. success is guaranteed)<br />
Stephan Schulz 196
#include "chario.h"<br />
The Integer I/O Library: integerio.h<br />
/* Read an integer in base base. */<br />
int read_int_base(int base);<br />
/* Check if there is a integer to be read, i.e. a digit or ’-’<br />
directly followed by a digit */<br />
int int_available(int base);<br />
/* Write integer in any base to stdout */<br />
void write_int_base(int value, int base);<br />
Stephan Schulz 197
#include "integerio.h"<br />
The Integer I/O Library: Includes<br />
Stephan Schulz 198
The Integer I/O Library: Helper functions 1<br />
/* Consider c as a hexadecimal digit (0..9, a..f, A..F) <strong>and</strong> return its<br />
numerical value. If not a valid digit, return -1 */<br />
static int hex_digit_value(int c)<br />
{<br />
if(c >= ’0’ && c = ’a’ && c = ’A’ && c
The Integer I/O Library: Helper functions 2<br />
/* Check if a character c is a valid digit in base. */<br />
static int is_base_digit(int c, int base)<br />
{<br />
int value = hex_digit_value(c);<br />
}<br />
if(value < 0 || value >= base)<br />
{<br />
return 0;<br />
}<br />
return 1;<br />
Stephan Schulz 200
The Integer I/O Library: Helper functions 3<br />
/* Given an int 0
The Integer I/O Library: Reading Integers<br />
/* Read an integer in base base. */<br />
int read_int_base(int base)<br />
{<br />
int res = 0, c, sign = 1;<br />
}<br />
if((c=GetChar())==’-’)<br />
{<br />
sign = -1;<br />
}<br />
else<br />
{<br />
PushChar(c); /* Unread Character */<br />
}<br />
while(is_base_digit((c=GetChar()),base))<br />
{<br />
res = (res*base)+hex_digit_value(c);<br />
}<br />
PushChar(c);<br />
return res*sign;<br />
Stephan Schulz 202
The Integer I/O Library: Checking for Integer Presence<br />
/* Check if there is a integer to be read, i.e. a digit or ’-’<br />
directly followed by a digit */<br />
int int_available(int base)<br />
{<br />
int save_char , res = 0;<br />
}<br />
if(is_base_digit(LookChar(), base))<br />
{<br />
res = 1;<br />
}<br />
else if(LookChar() == ’-’)<br />
{<br />
save_char = GetChar();<br />
if(is_base_digit(LookChar(), base))<br />
{<br />
res = 1;<br />
}<br />
PushChar(save_char);<br />
}<br />
return res;<br />
Stephan Schulz 203
The Integer I/O Library: Writing Integers<br />
/* Write integer in any base (2
Exercises<br />
Download the program from http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.<br />
html, compile it, <strong>and</strong> read the source code. You may want to add more<br />
operators (e.g. t to duplicate the top <strong>of</strong> the stack, s to switch the two topmost<br />
numbers,. . .<br />
Stephan Schulz 205
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
rpn calc: An Extended Example (Part 2)<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Recapitulation: Some <strong>of</strong> our Library Functions<br />
The integerio library <strong>of</strong>fers functios for reading <strong>and</strong> printing integers. All<br />
functions have a parameter base for selecting the number system (2–16, or<br />
binary to hexadecimal)<br />
int read int base(int base);<br />
– Reads an integer from the st<strong>and</strong>ard input (using our GetChar()/PushChar()<br />
interface), returning its value<br />
– If no valid integer can be found, behavior is undefined!<br />
int int available(int base);<br />
– Returns 1 (true), if a valid integer can be read from st<strong>and</strong>ard input, 0 otherwise<br />
– Does not consume any characters from the input stream!<br />
void write int base(int value, int base);<br />
– Prints an integer number to stdout, using the number system selected by<br />
base<br />
Additional function from chario.c: int SkipSpace(void)<br />
Stephan Schulz 207
The Main Calculator Program<br />
Aim: RPN (Postfix) calculator program<br />
– Input: Operators <strong>and</strong> Numbers (oper<strong>and</strong>s)<br />
– Numbers are pushed on a stack<br />
– Operators pop oper<strong>and</strong>s <strong>and</strong> push the result <strong>of</strong> the operation<br />
while(there is input)<br />
{<br />
if(input is a number)<br />
{<br />
num = read_number();<br />
push(num);<br />
}<br />
else if(input is a valid operator)<br />
{<br />
pop oper<strong>and</strong>s, apply operator, push result;<br />
}<br />
else<br />
{<br />
print error mesage;<br />
}<br />
}<br />
Stephan Schulz 208
Case Distinctions<br />
Note: The operator determines which actions we have to perform<br />
– This is a case distinction: Based on a single (integer) value, we have to select<br />
one alternative<br />
– Possible implementation:<br />
if(value == val1)<br />
{<br />
action1;<br />
}<br />
else if((value == val2)<br />
{<br />
action2;<br />
}<br />
...<br />
else<br />
{<br />
default_action;<br />
}<br />
Stephan Schulz 209
C Alternative: switch<br />
switch(E)<br />
{<br />
case val1: action1;<br />
break; /* Otherwise we fall through! */<br />
case val2: action2;<br />
... break;<br />
default: default_action;<br />
break;<br />
}<br />
E has to be an integer-valued expression<br />
val1, val2,. . . have to be constant integer expressions<br />
E is evaluated <strong>and</strong> the result is compared to each <strong>of</strong> the constants after the case<br />
labels. Execution starts with the first statement after the matching case. If no<br />
case matches, execution starts with the (optional) default case.<br />
Note: Execution does not stop at the next case label! Use break; to break out<br />
<strong>of</strong> the switch<br />
Stephan Schulz 210
The Stack Abstract Datatype<br />
A stack is a last-in first-out (LIFO) data structure<br />
– It can store values <strong>of</strong> a given type<br />
– Values can be pushed onto a stack<br />
– The topmost element can be retrieved by poping it <strong>of</strong>f the stack<br />
– Typically, only the top element is accessed (enforced either by convention or<br />
by design)<br />
– Stacks can have a predetermined size (maximal number <strong>of</strong> elements) or grow<br />
as needed<br />
Stack impementation in C:<br />
– Values are stored in an array <strong>of</strong> the correct type<br />
– A stack pointer contains the index <strong>of</strong> the next unused cell<br />
Stephan Schulz 211
Stack Implementation in rpn calc.c<br />
We use a fixed maximal stack size:<br />
#define STACKSIZE 1024<br />
– Using a symbolic constant avoids mistyping <strong>and</strong> misreading, <strong>and</strong> allows us to<br />
eaily change the stack size later!<br />
Our stack data structure is realized by two variables:<br />
– int stack[STACKSIZE]; stores the values<br />
– int sp = 0; is the stack pointer, <strong>and</strong> initially points to the first element <strong>of</strong><br />
stack<br />
Stack operations are implemented as specialized macros<br />
Stephan Schulz 212
Pushing things onto the stack: PUSH()<br />
/* If stack is full, print an error message,<br />
otherwise push the value onto the stack */<br />
#define PUSH(value) \<br />
if(sp < STACKSIZE) \<br />
{ \<br />
stack[sp] = (value);\<br />
sp++;\<br />
}\<br />
else\<br />
{\<br />
printf("Stack overflow error\n");\<br />
}<br />
Stephan Schulz 213
Poping values: POP OR FAIL()<br />
/* If stack is empty, print an error message <strong>and</strong> "break;",<br />
otherwise pop the top value into varname */<br />
#define POP_OR_FAIL(varname) \<br />
if(sp > 0)\<br />
{\<br />
sp--;\<br />
(varname) = stack[sp];\<br />
}\<br />
else\<br />
{\<br />
printf("Stack underflow error\n");\<br />
break;\<br />
}<br />
Note that the macro contains a break; statement in the error case<br />
– Limits general usability but. . .<br />
– . . . exits the case it is used in early!<br />
Stephan Schulz 214
The Main Program: Prelimaries <strong>and</strong> Declarations<br />
int main(void)<br />
{<br />
int num, arg1, arg2, i;<br />
int stack[STACKSIZE];<br />
int sp = 0, in_base = 10, out_base = 10;<br />
SkipSpace();<br />
The number systems to be used for input <strong>and</strong> output is determined by in base<br />
<strong>and</strong> out base<br />
– Both are initialized to 10 (decimal)<br />
Note that the next character to be read is meaningful (not white space) now<br />
– This will be a loop invariant <strong>of</strong> the main loop)<br />
Stephan Schulz 215
}<br />
The Main Loop: Overall Structure<br />
while(LookChar()!=EOF)<br />
{<br />
if(int_available(in_base))<br />
{<br />
num = read_int_base(in_base);<br />
PUSH(num);<br />
}<br />
else<br />
{ /* Operator! */<br />
switch(GetChar())<br />
{<br />
case ’o’:<br />
... /* H<strong>and</strong>le different cases */<br />
default:<br />
printf("Unknown oper<strong>and</strong>\n");<br />
break;<br />
}<br />
}<br />
SkipSpace();<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 216
The Main Loop: Arithmetic operators<br />
switch(GetChar())<br />
{<br />
...<br />
case ’+’:<br />
POP_OR_FAIL(arg2);<br />
POP_OR_FAIL(arg1);<br />
num = arg1+arg2;<br />
PUSH(num);<br />
break;<br />
case ’-’:<br />
POP_OR_FAIL(arg2);<br />
POP_OR_FAIL(arg1);<br />
num = arg1-arg2;<br />
PUSH(num);<br />
break;<br />
case ’*’:<br />
POP_OR_FAIL(arg2);<br />
POP_OR_FAIL(arg1);<br />
num = arg1*arg2;<br />
PUSH(num);<br />
break;<br />
Stephan Schulz 217
case ’/’:<br />
POP_OR_FAIL(arg2);<br />
POP_OR_FAIL(arg1);<br />
num = arg1/arg2;<br />
PUSH(num);<br />
break;<br />
case ’%’:<br />
POP_OR_FAIL(arg2);<br />
POP_OR_FAIL(arg1);<br />
num = arg1%arg2;<br />
PUSH(num);<br />
break;<br />
...<br />
}<br />
Stephan Schulz 218
The Main Loop: I/O operators<br />
switch(GetChar())<br />
{<br />
...<br />
case ’p’:<br />
POP_OR_FAIL(num);<br />
write_int_base(num,out_base);<br />
putchar(’\n’);<br />
PUSH(num);<br />
break;<br />
case ’o’:<br />
POP_OR_FAIL(num);<br />
if(num < 2 || num >16)<br />
{<br />
printf("Only bases 2-16 (decimal) supported\n");<br />
}<br />
else<br />
{<br />
out_base = num;<br />
}<br />
break;<br />
Stephan Schulz 219
case ’i’:<br />
POP_OR_FAIL(num);<br />
if(num < 2 || num >16)<br />
{<br />
printf("Only bases 2-16 (decimal) supported\n");<br />
}<br />
else<br />
{<br />
in_base = num;<br />
}<br />
break;<br />
Stephan Schulz 220
case ’S’:<br />
for(i=0; i
Manual Compilation<br />
First, we comile all <strong>of</strong> the source files individually:<br />
$ gcc -ansi -Wall -c -o chario.o chario.c<br />
$ gcc -ansi -Wall -c -o integerio.o integerio.c<br />
$ gcc -ansi -Wall -c -o rpn calc.o rpn calc.c<br />
Then we perform the linking step:<br />
$ gcc -ansi -Wall -o rpn calc chario.o integerio.o rpn calc.o<br />
Now the program is ready to run:<br />
$ ./rpn calc<br />
2 o 10 p<br />
1010<br />
Stephan Schulz 222
<strong>UNIX</strong> User Comm<strong>and</strong>s: dc<br />
dc is an arbitrary precision RPN calculator<br />
– It h<strong>and</strong>les floating point numbers (to any preselected precision)<br />
– It h<strong>and</strong>les bignums, i.e. integers tgat do not fit into any st<strong>and</strong>ard data type<br />
– It has a lot <strong>of</strong> build-in functionality <strong>and</strong> can be extended by user-defined macros<br />
Usage is quite similar to our rpn calc<br />
For more: man dc or (particularly) info dc (or read info in emacs: [C-h i])<br />
Stephan Schulz 223
Read the man <strong>and</strong> info pages for dc<br />
Play with the program<br />
Enjoy the weekend <strong>and</strong> be merry<br />
Exercises<br />
Note: I’ve updated the rpn calc sources on the web page to the latest version<br />
(changes only comments <strong>and</strong> style)<br />
Stephan Schulz 224
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
More on Operators <strong>and</strong> Expressions<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Increment <strong>and</strong> Decrement Operators<br />
C supports the unary operators ++ <strong>and</strong> -- for incrementing <strong>and</strong> decrementing<br />
variables<br />
– ++ increments a variable by 1<br />
– -- decrements a variable by 1<br />
Both can be used as prefix <strong>and</strong> postfix operators: x++ or ++x<br />
– In both cases, x is incremented by 1<br />
– The difference is in the value <strong>of</strong> the expression:<br />
∗ The expression x++ has the value <strong>of</strong> x before incrementing<br />
∗ ++x has the value <strong>of</strong> x after incrementing, i.e. it is equivalent to the<br />
assignment x=x+1<br />
Both forms are used, but the postfix form is more common<br />
Stephan Schulz 226
#include <br />
#include <br />
int main(void)<br />
{<br />
int x,y;<br />
}<br />
Example<br />
x=5; y=5;<br />
printf("x = %d y = %d\n", x, y);<br />
printf("x++ = %d ++y = %d\n", x++, ++y);<br />
printf("x = %d y = %d\n", x, y);<br />
printf("x-- = %d --y = %d\n", x--, --y);<br />
printf("x = %d y = %d\n", x, y);<br />
return EXIT_SUCCESS;<br />
Output:<br />
x = 5 y = 5<br />
x++ = 5 ++y = 6<br />
x = 6 y = 6<br />
x-- = 6 --y = 5<br />
x = 5 y = 5<br />
Stephan Schulz 227
Binary Number Representation<br />
C guarantees a base 2 representation for all unsigned integer types:<br />
– Example: 16 bit representation (short on many implementations) <strong>of</strong> 42<br />
0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0<br />
2 15<br />
2 14<br />
2 13<br />
2 12<br />
2 11<br />
2 10<br />
2 9<br />
42 = 2 5 + 2 3 + 2 1 = 32 + 8 + 2<br />
– If a result <strong>of</strong> an arithmetic operation results in a value not representable by the<br />
result type, it is reduced modulo 2 n , where n is the width <strong>of</strong> the result type<br />
An unsigned number <strong>of</strong> a narrower type is converted to a wider type by adding<br />
an appropiate number <strong>of</strong> leading zeroes:<br />
– The 8 bit representation (char on many implementations) <strong>of</strong> 42 is:<br />
0 0 1 0 1 0 1 0<br />
2 7<br />
2 6<br />
2 5<br />
2 4<br />
2 3<br />
2 2<br />
2 1<br />
2 0<br />
The exact representation for signed integers is not fixed, however, positive signed<br />
integers are guaranteed to have the same representation in signed <strong>and</strong> unsigned<br />
types<br />
Stephan Schulz 228<br />
2 8<br />
2 7<br />
2 6<br />
2 5<br />
2 4<br />
2 3<br />
2 2<br />
2 1<br />
2 0
Bitwise Operators<br />
Bitwise operators operate on the binary representation <strong>of</strong> numbers<br />
The binary bitwise operators include<br />
– Bitwise <strong>and</strong> (&) sets a bit in the result, if it is set in both oper<strong>and</strong>s:<br />
6 & 3 == 2<br />
– | is the bitwise or, i.e. the result bit is set, if at least one <strong>of</strong> the corresponding<br />
bits in the input is set:<br />
6 | 3 == 7<br />
– ^ is the bitwise exclusive or (or xor) (the result bit is set if <strong>and</strong> only if the two<br />
oper<strong>and</strong>s differ at that position):<br />
6 ^ 3 == 5<br />
The bitwise not (or one’s complement) toggles all bits<br />
– The result value depends on the number format<br />
– For 16 bit unsigned short, ~42 == 65493<br />
Stephan Schulz 229
Bitwise Shifting<br />
C also supports the shifting <strong>of</strong> binary numbers<br />
The binary operator shifts an integer value right<br />
– For unsigned value, the new bits become zero<br />
– For signed values, either zeroes are shifted in (logical shift), or the first (sign)<br />
bit is replicated (arithmetic shift, equivalent to division by 2 n )<br />
Note: The shift operators are used seldomly<br />
– C++ has even recycled them for I/O operations<br />
– Binary <strong>and</strong>, or, <strong>and</strong> not, on the other h<strong>and</strong>, are used frequently to manipulate<br />
binary flags packed into a single integer value<br />
Stephan Schulz 230
Example<br />
These macros can be used to set <strong>and</strong> query properties in a variable, where each<br />
property is encoded in a single bit<br />
#define SetProp(var, prop) ((var) = (var) | (prop))<br />
#define DelProp(var, prop) ((var) = (var) & ~(prop))<br />
#define FlipProp(var, prop) ((var) = (var) ^ (prop))<br />
/* Absolutely assign properties masked by sel */<br />
#define AssignProp(var, sel, prop) DelProp((var),(sel));\<br />
SetProp((var),(sel)&(prop))<br />
/* Are _all_ properties in prop set in var? */<br />
#define QueryProp(var, prop) (((var) & (prop)) == (prop))<br />
/* Are any properties in prop set in var? */<br />
#define IsAnyPropSet(var, prop) ((var) & (prop))<br />
Stephan Schulz 231
Assignment Operators<br />
Very frequently, programming tasks require the updating <strong>of</strong> a varible, based on<br />
it’s old value<br />
– Frequent example: i=i+1;<br />
In addition to the general assignment operator, C <strong>of</strong>fers operators combining<br />
update <strong>and</strong> assignment<br />
– If is a binary operator, then = is the corresponding assignment<br />
operator<br />
– x = is equivalent to x = x <br />
– This is supported for ∈ { +, -, *, /, %, , &, ^, | }<br />
Most frequently used<br />
– += (as in fahrenheit += 10)<br />
– -= (e.g. in the update part <strong>of</strong> a for loop)<br />
Stephan Schulz 232
Conditional Expressions<br />
Similarly to conditional statements (if/else), C has conditional expressions:<br />
– If , , are expressions, then ? : is<br />
a conditional expression<br />
∗ If evaluates to true (non-zero), then is evaluated <strong>and</strong> its value<br />
returned<br />
∗ Otherwise, is evaluated <strong>and</strong> returned<br />
Example 1:<br />
#define MAX(a,b) ((a>b)?a:b)<br />
Example 2:<br />
printf("There %s %d item%s left\n",<br />
(count==1)?"is":"are",<br />
count,<br />
(count==1)?"":"s");<br />
Stephan Schulz 233
Expression Sequences<br />
The coma operator separates two expressions: , <br />
– Expressions are evaluated left to right<br />
– The value <strong>of</strong> a coma-separated sequence is the value <strong>of</strong> the last expression in<br />
it<br />
– Don’t confuse it with the coma separating function call arguments!<br />
Nearly only legitimate use: Initialize <strong>and</strong> update in for loops:<br />
for(cels=0, fahr=-32; cels
Type Conversion (Casting)<br />
As already stated, C performs type conversion in many situations automatically<br />
– If different numeric types are used in an expression, all values are promoted to<br />
the “largest” type<br />
– If a value <strong>of</strong> an unsigned integer type is assigned to a “smaller” variable <strong>of</strong><br />
smaller type, excess bits are dropped<br />
– For signed types, conversion is only partially specified<br />
In addition, values can be coerced to a different type<br />
– A cast expression has the syntax () <br />
Example:<br />
printf("Int: %d Float: %d\n",<br />
(1/2)*2,<br />
(int) (((float)1/2)*2));<br />
Int: 0 Float: 1<br />
Stephan Schulz 235
Exercises<br />
Write a function that counts the number <strong>of</strong> bits that are one in an unsigned<br />
long number (Footnote: Allegedly the NSA sponsors the inclusion <strong>of</strong> hardware to<br />
make this operation fast in many chips because they need it for speeding up the<br />
cracking <strong>of</strong> encrypted documents)<br />
Rewrite imp metric to use comma-separated expressions to build the tables<br />
Stephan Schulz 236
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
Expressions <strong>and</strong> the Type System<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
A final operator is size<strong>of</strong><br />
Getting the Size <strong>of</strong> Objects <strong>and</strong> Types<br />
– size<strong>of</strong> can be applied to an expression or to a parenthesized type name<br />
– Applying it to an expression is equivalent to applying it to the type <strong>of</strong> the<br />
expression<br />
size<strong>of</strong> returns the number <strong>of</strong> character-sized memory units necessary to store<br />
an object <strong>of</strong> the type<br />
– By definition, size<strong>of</strong> (char) == 1<br />
Example:<br />
printf("size<strong>of</strong> 1: %d size<strong>of</strong> (short)1: %d\n", size<strong>of</strong> 1, size<strong>of</strong> ((short)1));<br />
size<strong>of</strong> 1: 4 size<strong>of</strong> (short)1: 2<br />
Note: size<strong>of</strong> will be useful for dynamic memory h<strong>and</strong>ling<br />
Stephan Schulz 238
Order Of Execution<br />
In general, the order <strong>of</strong> execution <strong>of</strong> subexpressions is not defined!<br />
Exceptions:<br />
– &&, ||, ?:, <strong>and</strong> ,<br />
If you need a particular order <strong>of</strong> execution, you must force it<br />
– Since statements are executed sequentially, compute subexpression in separate<br />
statments (assigning them to different variables)<br />
– Other sequence points are set by the operators listed above<br />
The example on the next page may print One Two One Two or Two One One<br />
Two<br />
Stephan Schulz 239
#include <br />
#include <br />
int one(void)<br />
{<br />
printf("One ");<br />
return 1;<br />
}<br />
int two(void)<br />
{<br />
printf("Two ");<br />
return 1;<br />
}<br />
int main(void)<br />
{<br />
one()+two();<br />
one()&&two();<br />
printf("\n");<br />
return EXIT_SUCCESS;<br />
}<br />
Example<br />
Stephan Schulz 240
Types in C<br />
C <strong>of</strong>fers a set <strong>of</strong> basic types built into the language<br />
We can define new, quasi-basic types as enumerations<br />
We can construct new types using type contruction:<br />
– Arrays over a base type<br />
– Structures, combining different base types in one object<br />
– Unions (can store different type values alternatively)<br />
– Pointer to a base type<br />
This generates a recursive type hierarchy!<br />
– We can use new types to build further on them<br />
– E.g. Arrays <strong>of</strong> Pointers, Structures combining unions <strong>and</strong> enumerations, . . .<br />
Stephan Schulz 241
Basic types in C:<br />
Basic Types<br />
– char (typically used to represent characters)<br />
– short<br />
– int<br />
– long<br />
– long long<br />
– float<br />
– double<br />
All integer types come in <strong>and</strong> unsigned variety<br />
Stephan Schulz 242
Defining New Types with typedef<br />
The typedef keyword is used to define new names for types in C<br />
General syntax: If we add typedef to a variable definition, it turns into a type<br />
definition<br />
Examples:<br />
unsigned long ulong; /* Define variable */<br />
typedef long ulong_t; /* Define a new type ulong_t */<br />
ulong_t ulong1; /* Define variable <strong>of</strong> new type */<br />
char string[80]; /* Defining an array variable *<br />
typedef char string_t[80]; /* Define a string type */<br />
string_t string1; /* Define a variable <strong>of</strong> that type -- we can use<br />
string1[32] now */<br />
Stephan Schulz 243
Symbolic Names in the Data Safe Assignement<br />
The data safe assignement calls for a function data safe() with three arguments<br />
– The first argument is a symbolic method: ds register, ds store,<br />
ds retrieve, ds delete<br />
– We can implement this using a int argument <strong>and</strong> #define:<br />
#define ds_register 1<br />
#define ds_store 2<br />
#define ds_retrieve 3<br />
#define ds_delete 4<br />
Problems:<br />
int data_safe(int method, int key, int value_or_index);<br />
– Nothing in the declaration <strong>of</strong> data safe() tells us that the int is anything<br />
but a number<br />
– The #define statements are independent<br />
Wouldn’t it be nice to create a new type to reflect the intended use?<br />
Stephan Schulz 244
Enumerations in C<br />
Enumeration data types can represent values from a finite domain using symbolic<br />
names<br />
– The possible values are explictly listed in the definition <strong>of</strong> the data type<br />
– Typically, each value can be used in only one enumeration<br />
In C, enumerations are created using the enum keyword<br />
In C, enumeration types are integer types<br />
– A definition <strong>of</strong> an enumeration type just assigns numerical values to the<br />
symbolic name<br />
– Unless explicitely chosen otherwise, the symbolic names are numbered starting<br />
at 0, <strong>and</strong> increasing by one for each name<br />
– Jowever, any int value can be assigned to a variable <strong>of</strong> an enumeration type<br />
– Likewise, we can assing any enumeration constant to any integer type variable<br />
C enumerations have only mnemonic value, they do not enable the compiler to<br />
catch bugs resulting from mixing up different types<br />
Stephan Schulz 245
Enumeration Syntax<br />
An enumeration type is defined by the enum keyword, followed by a list <strong>of</strong><br />
identifiers (enumeration constants) in curly brackets<br />
The following code describes an enumeration data type for the data safe methods:<br />
enum{ds_register, ds_store, ds_retrieve, ds_delete}<br />
It can be used like any other type specifier:<br />
int data_safe(enum{ds_register, ds_store, ds_retrieve, ds_delete}method,<br />
int key, int value_or_index);<br />
...<br />
key = data_safe(ds_register, 0, 0);<br />
Stephan Schulz 246
enum <strong>and</strong> typedef<br />
Typically, enumeration data type are used to define new types<br />
– The enum keyword describes the new type<br />
– The typedef keyword assigns a name to the type<br />
– The new type can then be used consistently throughout the program<br />
Example:<br />
typedef enum{ds_register, ds_store, ds_retrieve, ds_delete}DS_operation;<br />
int data_safe(DS_operation method, int key, int value_or_index);<br />
...<br />
key = data_safe(ds_register, 0, 0);<br />
Typically, enumerations (<strong>and</strong> other new data types) are declared in header files<br />
(.h files), <strong>and</strong> form part <strong>of</strong> the interface <strong>of</strong> a module<br />
Stephan Schulz 247
More on Enumerations<br />
Since enumeration are actually integer types, we can assign specific values to the<br />
constants<br />
– We can even assign the same value to different constants!<br />
Example (also note preferred form <strong>of</strong> formatting for enums):<br />
typedef enum<br />
{<br />
ds_register = 1,<br />
ds_store = 2,<br />
ds_retrieve = 3,<br />
ds_delete = 4,<br />
ds_forget = 4<br />
}DS_operation;<br />
Stephan Schulz 248
Aggregating Data Types<br />
Let’s again look at the data safe assignment<br />
– We somehow have to associate a key <strong>and</strong> a value (or multiple values)<br />
– Simple approach: Use two arrays, one for keys, one for values<br />
– If keys[i] = key, then values[i] holds a value associated with key<br />
However, the association between those two elements is not reflected by this<br />
construction<br />
– The two arrays are independent<br />
– They can be manipulated independently<br />
– There is not even a guaranty that both arrays have the same size!<br />
– If we pass key <strong>and</strong> value to a function, we have to pass them as individual<br />
elements (what if we have 132 different elements?)<br />
Solution: Creating structures that combine different elements into one<br />
Stephan Schulz 249
struct<br />
A structure is a datatype that may have any number <strong>of</strong> members<br />
– Members can have different types<br />
– Members can have any other type (including arrays or other structures)<br />
– Members are referred to by their name in the structure<br />
Java analogy: A structure type is a class, but:<br />
– No member functions<br />
– All members are public<br />
Structures are defined using the struct keyword, followed by an optional name<br />
<strong>and</strong> a list <strong>of</strong> member definitions in curly braces<br />
– Each member definition is a normal variable definition, giving type <strong>and</strong> name<br />
<strong>of</strong> the member<br />
Stephan Schulz 250
Consider the following definition:<br />
Structure Example<br />
struct key_assoc {int key; int value;} key_pair;<br />
– It creates a variable key pair with two members<br />
– They can be referred to by name:<br />
key_pair.value = 10;<br />
...<br />
if(key_pair.key == user_key)<br />
{<br />
count++;<br />
}<br />
Stephan Schulz 251
stuct <strong>and</strong> typedef<br />
As with enumerations, structures are usually used with typedef:<br />
typedef struct key_assoc<br />
{<br />
int key;<br />
int value;<br />
} key_pair_t;<br />
static key_pair_t key_value_array[10000];<br />
– The first definition defines a new type, key pair t<br />
– The second one creates an array <strong>of</strong> 10000 <strong>of</strong> these pairs<br />
Using the name (struct key assoc), we can refer to the array even before we<br />
have seen the full definition<br />
– Important for self-referential data types using pointers<br />
Stephan Schulz 252
Exercises<br />
Create a function that has two primary colours (red, blue, yellow) as input, <strong>and</strong><br />
returns the colour that results from mixing them<br />
– Use an enumeration type for the colours<br />
– Use an struct to hold triples (colour1, colour2, mix) <strong>and</strong> an array to store all<br />
associations<br />
– You can use linear search to find matching patterns for your input<br />
Stephan Schulz 253
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
Data Structures <strong>and</strong> Pointers<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Assume the following problem:<br />
Representing Related Objects<br />
– In a drawing program, we need to represent geometrical shapes (circles, squares,<br />
rectangles, triangles...)<br />
– There is some common information for all shapes:<br />
∗ Border colour<br />
∗ Line width<br />
∗ Fill colour (if any)<br />
– However, the coordinates are different for each shape:<br />
∗ For a circle, we need center point <strong>and</strong> radius<br />
∗ For a square or rectangle we need two corners<br />
∗ For a triangle we need three corners<br />
Object-oriented languages allow a base class shape, <strong>and</strong> derived classes for the<br />
different shapes<br />
– In C, we have to program this explicitely, using unions!<br />
Stephan Schulz 255
Unions<br />
Unions <strong>of</strong> base types allow the new type to store one value <strong>of</strong> any <strong>of</strong> its base<br />
types (but only one at a time)<br />
The syntax is analogous to that <strong>of</strong> structures:<br />
– The keyword union is followed by a list <strong>of</strong> member definitions in curly braces<br />
Example<br />
– union {int i; float f; char *str;} numval<br />
– numval can store either an integer or a floating point number, or a pointer to<br />
a character (normally a string)<br />
– Access is as for structures: numval.i is the integer value<br />
Note: Unions weaken the type system:<br />
– numval.f=1.0; printf("%d\n",numval.i);<br />
– Situations like that are, in general, impossible to detect at compile time<br />
Stephan Schulz 256
typedef enum<br />
{<br />
circle,<br />
square,<br />
rectangle,<br />
triangle<br />
}ShapeType;<br />
typedef enum<br />
{<br />
red,<br />
green,<br />
blue,<br />
black, white<br />
}ColourType<br />
typedef struct<br />
{<br />
int center_x;<br />
int center_y;<br />
int radius;<br />
}CircleCoord;<br />
Shape Example Continued (1)<br />
Stephan Schulz 257
typedef struct<br />
{<br />
int lower_left_x;<br />
int lower_left_y;<br />
int upper_right_x;<br />
int upper_right_y;<br />
}RectangleCoord;<br />
typedef RectangleCoord SquareCoord;<br />
typedef struct<br />
{<br />
int point1_x;<br />
int point1_y;<br />
int point2_x;<br />
int point2_y;<br />
int point3_x;<br />
int point4_y;<br />
}TriangleCoord;<br />
Shape Example Continued (2)<br />
Stephan Schulz 258
typedef union<br />
{<br />
CircleCoord circle_coord;<br />
RectangleCoord rect_coord;<br />
SquareCoord square_coord;<br />
TriangleCoord tria_coord;<br />
}ShapeCoord;<br />
typedef struct<br />
{<br />
ShapeType type;<br />
int border_width;<br />
ColourType border_colour;<br />
ColourType fill_colour;<br />
ShapeCoord coords;<br />
}Shape;<br />
Shape Example Continued (3)<br />
Stephan Schulz 259
Shape Example Continued (4)<br />
void draw_shape(Shape draw_obj)<br />
{<br />
switch(draw_obj.type)<br />
{<br />
case circle:<br />
draw_circle(draw_obj.coords.circle_coord.center_x,<br />
draw_obj.coords.circle_coord.center_y,<br />
draw_obj.coords.circle_coord.radius,<br />
draw_obj.border_width,<br />
draw_obj.border_colour,<br />
draw_obj.fill_colour);<br />
break;<br />
case square:<br />
draw_square(draw_obj.coords.square_coord.lower_left_x,<br />
draw_obj.coords.square_coord.lower_left_y,<br />
draw_obj.coords.square_coord.upper_right_x,<br />
draw_obj.coords.square_coord.upper_right_y,<br />
draw_obj.border_width,<br />
draw_obj.border_colour,<br />
draw_obj.fill_colour);<br />
break;<br />
...<br />
Stephan Schulz 260
Pointers<br />
Pointers are derived types <strong>of</strong> a base type<br />
– A pointer is the memory address <strong>of</strong> an object <strong>of</strong> the base type<br />
– Given a pointer, we can manipulate the object pointed to<br />
Notice that there are two parts to a pointer:<br />
– The actual memory address (a dynamic feature in the running program)<br />
– The type <strong>of</strong> the pointer (pointer to int, pointer to Shape. . . ) telling us how<br />
to interprete the data at that address (a static feature that can be determined<br />
at compile time)<br />
C uses the unary * to define variables <strong>of</strong> pointer types:<br />
– int *count; defines the variable count as a pointer to int<br />
– Notice that this pointer does not contain a valid address - there is no object<br />
<strong>of</strong> type int created along with the pointer!<br />
– Pointers can be defined for any valid type in C: struct{double real;double<br />
imag;} *complex defines complex as a pointer to the struct<br />
Stephan Schulz 261
Basic Pointer Operations in C<br />
The most basic operations on pointers are:<br />
– Given an object, return a pointer to it<br />
– Given a pointer, give the object it points to (dereference the pointer)<br />
C uses the unary * operator for both pointer definition <strong>and</strong> pointer dereferencing,<br />
<strong>and</strong> & for getting the adress <strong>of</strong> an existing object<br />
– int var;int *p; defines var to be a variable <strong>of</strong> type int <strong>and</strong> p to be a<br />
variable <strong>of</strong> type pointer to int<br />
– p = &var makes p point to var (i.e. p now stores the address <strong>of</strong> var)<br />
– *p = 17; assigns 17 to the int object that p points to (in our example, it<br />
would set var to 17)<br />
– Note that &(*p) == p always is true for a pointer variable pointing to a valid<br />
object, as is *(&var)==var for an arbitrary variable!<br />
Stephan Schulz 262
#include <br />
#include <br />
void swap(int *x, int *y)<br />
{<br />
int z;<br />
}<br />
z =*x;<br />
*x =*y;<br />
*y = z;<br />
int main(void)<br />
{<br />
int var1=7, var2=42;<br />
}<br />
Pointers - A simple Example<br />
printf("var1: %d var2: %d\n", var1, var2);<br />
swap(&var1, &var2);<br />
printf("var1: %d var2: %d\n", var1, var2);<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 263
Output <strong>of</strong> the program:<br />
var1: 7 var2: 42<br />
var1: 42 var2: 7<br />
Example Continued<br />
Note that this technique is an example <strong>of</strong> a frequent way to simulate call by<br />
reference in C<br />
– Instead <strong>of</strong> passing an object, we pass a reference to it<br />
– Allows changes to the object inside the function<br />
– Often cheaper (especially for big objects)<br />
Stephan Schulz 264
Why Pointers?<br />
The are two main reasons for using pointers:<br />
– Efficiency<br />
– Dynamically growing data structures<br />
Efficiency Aspects<br />
– Pointers are typically represented by one machine word<br />
– Storing pointers instead <strong>of</strong> copies <strong>of</strong> large objects safes memories<br />
– Passing pointers instead <strong>of</strong> large objects is much more efficient<br />
Dynamically growing data structures<br />
– Each data type has a fixed size <strong>and</strong> memory layout<br />
– Pointers allow us to build dynamically growing data structures by adding <strong>and</strong><br />
removing fixed size cells<br />
Stephan Schulz 265
Pointing at Nothing <strong>and</strong> Pointing Nowhere<br />
Pointers <strong>of</strong> type void* are a special case:<br />
– A void* pointer is a generic pointer, without associated base type<br />
– void* pointers can be assigned to variables <strong>of</strong> any other pointer type (<strong>and</strong><br />
vice versa)<br />
– Such pointers are used for primarily for dynamic memory h<strong>and</strong>ling<br />
C has a special, reserved NULL pointer <strong>of</strong> type void*<br />
– The NULL pointer is guranteed to be different from all pointers pointing to<br />
legitimate objects<br />
– It can be written as plain 0 (in a pointer context)<br />
– stdlib.h defines a symbolic namen, NULL, for the NULL pointer<br />
– Dereferencing NULL is illegal!<br />
– Notice that NULL is considered to be false if used in logical expressions<br />
– Note: For most current machines, the NULL pointer actually is address 0.<br />
However, this is not guaranteed (<strong>and</strong> is false for some machines with strange<br />
memory models)<br />
Stephan Schulz 266
Exercises<br />
Write a program that prints the sizes <strong>of</strong> various build-in <strong>and</strong> self-defined data<br />
types (e.g. the Shape type <strong>and</strong> its subtypes). Do you see a relation between<br />
them?<br />
Write a program that uses swap() to sort an array <strong>of</strong> integers <strong>and</strong> print it. If<br />
you feel adventurous, use read int base() from the rpn calc example (or a<br />
similar function) to read integers to fill the array<br />
Notes<br />
Please email the TA, Raghu, at his UMiami address,raghu@lee.cs.miami.edu<br />
from now on<br />
Your grades for the assignments will be placed into your home directories<br />
Solutions to the prime number assignment will be available shortly after noon<br />
Stephan Schulz 267
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
Dynamic Data Structures<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Refresher: Pointers<br />
A pointer type is a derived type <strong>of</strong> a base type<br />
– A pointer is the address <strong>of</strong> an object <strong>of</strong> the base type<br />
– Given a pointer p, *p gives us the object it points to<br />
– Given an object o, &o gives us a pointer to that object in memory<br />
An object <strong>of</strong> type void* is a generic pointer (i.e. a plain address without<br />
associated base type)<br />
– A pointer <strong>of</strong> type void* can be assigned to a variable <strong>of</strong> any other pointer<br />
type<br />
– Similarly, a value <strong>of</strong> any pointer type can be assigned to a void* variable<br />
The special value NULL is a pointer <strong>of</strong> type void*<br />
– It is guaranteed different from all pointers to valid object<br />
– Its logical value is false, while that <strong>of</strong> all other pointers is true<br />
Stephan Schulz 269
Dynamic Memory H<strong>and</strong>ling<br />
The C library <strong>of</strong>fers functions for dynamic memory h<strong>and</strong>ling<br />
– We can request a block <strong>of</strong> memory <strong>of</strong> a certain size<br />
– If such a block is available, we will get a void* pointer to it<br />
– This block can be used to store any object that fits into it<br />
– If we do not need that object anymore, we can return it to the library<br />
Such blocks can be used to build arbitray sized data structures<br />
– . . . e.g. by allocating bigger <strong>and</strong> bigger arrays if the need arrises<br />
– . . . or by using pointers within a structure to point to additional structures<br />
(which may contain further pointers)<br />
Stephan Schulz 270
The malloc() function<br />
We request a block <strong>of</strong> memory using malloc() (declared in )<br />
– It’s declared as void *malloc(size t size);, i.e. it returns a generic<br />
pointer<br />
– size t is a new data type from the st<strong>and</strong>ard library. It’s guaranteed to be an<br />
unsigned integer data type (<strong>of</strong>ten unsigned int)<br />
– malloc() allocates a region big enough to hold the requested number <strong>of</strong> bytes<br />
on the heap (a reserved memory region) <strong>and</strong> returns the address <strong>of</strong> the first<br />
byte (a pointer to that region)<br />
– The size<strong>of</strong> operator is used to get the necessary size for the object datatype<br />
- p = malloc(size<strong>of</strong>(int)); allocates a memory region big enough to store<br />
an integer <strong>and</strong> makes p point to it<br />
- The void* pointer is silently converted to a pointer to int<br />
– If no memory is available on the heap, malloc() will return the NULL pointer<br />
(also written as plain 0)<br />
Stephan Schulz 271
Freeing Allocated Memory<br />
The counterpart to malloc() is free()<br />
– It is declared in as<br />
void free(void* ptr);<br />
– free() takes a pointer allocated with malloc() <strong>and</strong> returns the memory to<br />
the heap<br />
Note that it is a bug to call free() with a pointer not obtained by calling<br />
malloc() (i.e. a pointer generated by applying & to a variable)<br />
It also is a bug to call free() with the same pointer more than once<br />
Stephan Schulz 272
More on Dynamic Memory Allocation<br />
Good programming practice always checks if malloc() succeeded (i.e. returns<br />
not NULL)<br />
– In multi-tasking systems, even small allocations may fail, because other processes<br />
consume resources<br />
– The OS may limit memory usage to small values<br />
– Failing to implement that chack can lead to erratic <strong>and</strong> non-reproducable<br />
failure!<br />
Similarly, each call to malloc() should (eventually) be followed by a call to<br />
free() for the pointer obtained<br />
– If you do not know if you still need a piece <strong>of</strong> memory, or if a pointer still<br />
points somewhere, you are in deep trouble, anyways!<br />
– By consequently freeing all allocated memory, you can easily check if you<br />
return the same number <strong>of</strong> block you allocate!<br />
Stephan Schulz 273
Dangling pointers<br />
Pointers are a Mixed Blessing!<br />
– A dangling pointer is a pointer not pointing to a valid object<br />
– A call to free() leaves the pointer dangling (the pointer variable still holds<br />
the adress <strong>of</strong> a block <strong>of</strong> memory, but we are no longer allowed to use it)<br />
– Copying a pointer may also lead to additional dangling pointer if we call<br />
free() on one <strong>of</strong> the copies<br />
– Trying to access a dangling pointer typcially causes hard to find errors, including<br />
crashes<br />
Memory leaks<br />
– A memory leak is a situation where we lose the reference to an allocated piece<br />
<strong>of</strong> memory:<br />
p = malloc(100000 * size<strong>of</strong>(int));<br />
p = NULL; /* We just lost a huge gob <strong>of</strong> memory! */<br />
– Memory leaks can cause programs to eventually run out <strong>of</strong> memory<br />
– Periodically occurring leaks are catastophic for server programs!<br />
Stephan Schulz 274
Example: SecureMalloc()<br />
Note: In my programs, there is typically at most a single call to malloc():<br />
void* SecureMalloc(size_t size)<br />
{<br />
void* res = malloc(size);<br />
}<br />
if(!res)<br />
{<br />
printf("malloc() failure -- out <strong>of</strong> memory?");<br />
exit(EXIT_FAILURE);<br />
}<br />
return res;<br />
Stephan Schulz 275
Pointers <strong>and</strong> Structures/Unions<br />
Most interesting data strucures use pointers to structures<br />
– Examples: Linear lists (see below), binary trees, terms,. . .<br />
Most frequent operation: Given a pointer, access one <strong>of</strong> the elements <strong>of</strong> the<br />
structure (or union) pointed to<br />
– (*list).value = 0;<br />
– Note that that requires parentheses in C<br />
More intuitive alternative:<br />
– The -> operator combines dereferencing <strong>and</strong> selection<br />
– list->value = 0;<br />
– This is the preferred form (<strong>and</strong> seen nearly exclusively in many programs)<br />
Stephan Schulz 276
Example: Linear Lists (<strong>of</strong> Integers)<br />
A list over a can be recursively defined as follows:<br />
– The empty list is a list<br />
– If l is a list <strong>and</strong> e is an element <strong>of</strong> the base type, then e . l is a list<br />
We can represent that in C as follows:<br />
– The empty list is represented by the NULL pointer<br />
– A non-empty list is represented by a pointer to a struct containing the<br />
element <strong>and</strong> a pointer to the rest <strong>of</strong> a list<br />
Some list operations:<br />
– Insert an element as the first element<br />
– Insert an element as the last element<br />
– Print the list elements in order<br />
– Free the memory taken up by a list<br />
Stephan Schulz 277
Example Continued<br />
Graphical representation <strong>of</strong> the list structure for (7,9,13):<br />
7 9 13<br />
Notice the anchor <strong>of</strong> the list<br />
Stephan Schulz 278<br />
NULL
#ifndef INT_LISTS<br />
#define INT_LISTS<br />
#include <br />
#include <br />
typedef struct int_list_cell<br />
{<br />
int value;<br />
struct int_list_cell *next;<br />
}IntListCell;<br />
typedef IntListCell *IntList_p;<br />
void* SecureMalloc(size_t size);<br />
Example – Declarations<br />
void IntListInsertFirst(IntList_p *list, int new_val);<br />
void IntListInsertLast(IntList_p *list, int new_val);<br />
void IntListFree(IntList_p list);<br />
void IntListPrint(IntList_p list);<br />
#endif<br />
Stephan Schulz 279
Example – Inserting At the Front<br />
/* Insert a new integer as the first element <strong>of</strong> an integer list */<br />
void IntListInsertFirst(IntList_p *list, int new_val)<br />
{<br />
IntList_p h<strong>and</strong>le;<br />
}<br />
h<strong>and</strong>le = SecureMalloc(size<strong>of</strong> (IntListCell));<br />
h<strong>and</strong>le->value = new_val;<br />
h<strong>and</strong>le->next = *list;<br />
*list = h<strong>and</strong>le;<br />
Stephan Schulz 280
Example – Inserting At the End<br />
/* Insert a new integer as the last element <strong>of</strong> an integer list */<br />
void IntListInsertLast(IntList_p *list, int new_val)<br />
{<br />
IntList_p h<strong>and</strong>le, last;<br />
}<br />
h<strong>and</strong>le = SecureMalloc(size<strong>of</strong> (IntListCell));<br />
h<strong>and</strong>le->value = new_val;<br />
h<strong>and</strong>le->next = NULL;<br />
if(!*list)<br />
{<br />
*list = h<strong>and</strong>le;<br />
}<br />
else<br />
{<br />
last = find_last_element(*list);<br />
last->next = h<strong>and</strong>le;<br />
}<br />
Stephan Schulz 281
Example – Helper Function<br />
//* Helper function: Given a non-empty list, return last element */<br />
IntList_p find_last_element(IntList_p list)<br />
{<br />
if(list->next)<br />
{<br />
return find_last_element(list->next);<br />
}<br />
return list;<br />
}<br />
Stephan Schulz 282
Example – Freeing Lists<br />
/* Free the memory taken up by a list */<br />
void IntListFree(IntList_p list)<br />
{<br />
if(list)<br />
{<br />
IntListFree(list->next); /* Free rest */<br />
free(list); /* Free this cell */<br />
}<br />
}<br />
Stephan Schulz 283
Example – Printing Lists<br />
/* Print a list as a sequence <strong>of</strong> numbers */<br />
void IntListPrint(IntList_p list)<br />
{<br />
IntList_p h<strong>and</strong>le;<br />
}<br />
for(h<strong>and</strong>le = list; h<strong>and</strong>le; h<strong>and</strong>le = h<strong>and</strong>le->next)<br />
{<br />
printf("%d ", h<strong>and</strong>le->value);<br />
}<br />
putchar(’\n’);<br />
Stephan Schulz 284
Example – Main Function<br />
int main(void)<br />
{<br />
int value;<br />
IntList_p list1 = NULL, list2 = NULL;<br />
}<br />
SkipSpace();<br />
while(int_available(10))<br />
{<br />
value = read_int_base(10);<br />
IntListInsertFirst(&list1, value);<br />
IntListInsertLast(&list2, value);<br />
SkipSpace();<br />
}<br />
printf("List1: ");<br />
IntListPrint(list1);<br />
printf("List2: ");<br />
IntListPrint(list2);<br />
IntListFree(list1);<br />
IntListFree(list2);<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 285
Assignment<br />
A binary search tree is either empty, or it consist <strong>of</strong> a node storing a key (the root<br />
<strong>of</strong> the tree), <strong>and</strong> a left <strong>and</strong> right subtree, such that all keys in the left subtree<br />
are smaller than the key in the node, <strong>and</strong> all keys in the right subtree are bigger<br />
– To print a tree in (left-to-right) preorder, you first print the root, then the left<br />
subtree, then the right subtree<br />
– To print a tree in (left-to-right) postorder, you first print the left subtree, then<br />
the right subtree, then the root<br />
– To print a tree in natural order, you first print the left tree, then the root, then<br />
the right tree<br />
Design a data structure for binary search trees with int keys, using dynamic<br />
memory h<strong>and</strong>ling<br />
Implement functions to:<br />
– Insert keys into the tree (ignoring keys already in the tree)<br />
– Print a tree in preorder, natural order, <strong>and</strong> postorder<br />
– Free the memory taken up by the tree<br />
Stephan Schulz 286
Use this datatype <strong>and</strong> the functions from integerio to write a program that<br />
reads a list <strong>of</strong> integers from stdin into a tree, <strong>and</strong> prints that tree in the three<br />
different orders<br />
You can use the code from the linear list example as a base. The complete code<br />
will be available from the course homepage<br />
Stephan Schulz 287
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
Pointers <strong>and</strong> Arrays<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Monday, Oct. 14th, 11:00 – 11:50<br />
Topics: Everything we did so far<br />
– <strong>UNIX</strong> file system layout<br />
– Simple <strong>UNIX</strong> utilities<br />
– Job Control<br />
– Basic C<br />
– Compilation <strong>and</strong> the preprocessor<br />
– C flow control <strong>and</strong> functions<br />
– Data structures in C<br />
– Pointers<br />
Midterm Examn<br />
Friday we will refresh some <strong>of</strong> that stuff (but do reread the lecture notes yourself,<br />
<strong>and</strong> check the example solutions on the web)<br />
Stephan Schulz 289
Refresher: Pointers<br />
A pointer type is a derived type <strong>of</strong> a base type<br />
– A pointer is the address <strong>of</strong> an object <strong>of</strong> the base type<br />
– Given a pointer p, *p gives us the object it points to<br />
– Given an object o, &o gives us a pointer to that object in memory<br />
An object <strong>of</strong> type void* is a generic pointer (i.e. a plain address without<br />
associated base type)<br />
– A pointer <strong>of</strong> type void* can be assigned to a variable <strong>of</strong> any other pointer<br />
type<br />
– Similarly, a value <strong>of</strong> any pointer type can be assigned to a void* variable<br />
The special value NULL is a pointer <strong>of</strong> type void*<br />
– It is guaranteed different from all pointers to valid object<br />
– Its logical value is false, while that <strong>of</strong> all other pointers is true<br />
Stephan Schulz 290
Refresher: Dynamic Memory H<strong>and</strong>ling<br />
void* malloc(size t size); is a function from <br />
– It will return a pointer to an otherwise unused block <strong>of</strong> memory with at least<br />
size bytes (or NULL if no memory is available)<br />
– Typical use: int *p = malloc(size<strong>of</strong>(int));<br />
void free(void* ptr); is the counterpart to malloc()<br />
– It takes a pointer to a block allocated with malloc() <strong>and</strong> returns the block<br />
to the heap<br />
– It is a (usually fatal) bug to call free() more than once for the same block,<br />
or with a pointer not obtained from malloc()<br />
Very frequent case: Allocation <strong>of</strong> memory for structs<br />
– Accessing elements in a struct: (*list).value = 0;<br />
– More readable alternative: list->value = 0;<br />
Stephan Schulz 291
Pointers <strong>and</strong> Arrays in C<br />
In C, arrays <strong>and</strong> pointers are strongly related:<br />
– Everwhere except in a definition <strong>and</strong> the left h<strong>and</strong> side <strong>of</strong> an assignment, an<br />
array is equivalent to a pointer to its first element<br />
– In particular, arrays are passed to functions by passing their address!<br />
– More exactly: An array degenerates to a pointer if passed or used in pointer<br />
contexts<br />
Not only can we treat arrays as pointers, we can also apply array operations to<br />
pointers:<br />
– If p is a pointer to the first element <strong>of</strong> an array, we can use p[3] to access<br />
the third element <strong>of</strong> that array<br />
– In general, if p points to some memory address corresponding to an array<br />
element a[j], p[i] points to a[j+i]<br />
Stephan Schulz 292
int array[10];<br />
int *a, *b;<br />
a = array;<br />
b = &(array[0]);<br />
array[0] = 10;<br />
a[1] = 11;<br />
b[3] = *a;<br />
Graphic Example<br />
...<br />
...<br />
...<br />
10<br />
11<br />
10<br />
array[0]<br />
array[9]<br />
Stephan Schulz 293<br />
a<br />
b
#include <br />
#include <br />
int main(void)<br />
{<br />
char a[] = "<strong>CSC322</strong>\n";<br />
char *b;<br />
int i;<br />
}<br />
b=a;<br />
Example<br />
printf(b);<br />
for(i=0;b[i];i++)<br />
{<br />
printf("Character %d: %c\n", i, b[i]);<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 294
Compiling: gcc -o csc322 csc322.c<br />
Running:<br />
<strong>CSC322</strong><br />
Character 0: C<br />
Character 1: S<br />
Character 2: C<br />
Character 3: 3<br />
Character 4: 2<br />
Character 5: 2<br />
Character 6:<br />
Example Output<br />
Stephan Schulz 295
Parameter Passing in C<br />
In C, parameters to functions are always passed by value<br />
– The formal parameter (in the function) is a local variable<br />
– It is initialized to the value <strong>of</strong> the actual parameter (the expression we used in<br />
the function call)<br />
– Changing the local variable in the function does not change the formal<br />
parameter<br />
Arrays degenerate into pointers to the first element, however!<br />
– That pointer is still passed by value, however, in effect the array is passed by<br />
reference<br />
– We can thus change the array elements from inside the function!<br />
This is frequently used for efficient array manipulation!<br />
– Sorting arrays<br />
– Reading elements into an array from stdin<br />
– Applying a transformation to all elements<br />
Stephan Schulz 296
#include <br />
#include <br />
#include <br />
void upcase(char *string)<br />
{<br />
int i;<br />
for(i=0; string[i]; i++)<br />
{<br />
string[i] = toupper(string[i]);<br />
}<br />
}<br />
int main(void)<br />
{<br />
char str[] = "A test string.";<br />
}<br />
printf("%s\n", str);<br />
upcase(str);<br />
printf("%s\n", str);<br />
return EXIT_SUCCESS;<br />
Example<br />
Stephan Schulz 297
A test string.<br />
A TEST STRING.<br />
Example Output<br />
Stephan Schulz 298
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Midterm Review<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
<strong>UNIX</strong> is a multi-user system<br />
<strong>UNIX</strong> Concepts<br />
– Users hava a user name, a numerical user id (e.g. 500), <strong>and</strong> a home directory<br />
– The privileged user root with UID 0 has (essentially) unlimited access<br />
<strong>UNIX</strong> is a multi-tasking system, i.e. it can run multiple programs at once. A<br />
running program (with its data) is called a process. Each process has:<br />
– Owner (a user)<br />
– Working directory (a place in the file system)<br />
– Various resources<br />
A shell is a comm<strong>and</strong> interpreter, i.e. a process accepting <strong>and</strong> executing comm<strong>and</strong>s<br />
from a user.<br />
– A shell is typically owned by the user using it<br />
– The initial working directory <strong>of</strong> a shell is typically the users home directory<br />
(but can be changed by comm<strong>and</strong>s)<br />
Stephan Schulz 300
The File System<br />
/<br />
(Root directory)<br />
bin dev etc home<br />
tmp usr<br />
(System programs) (Devices) (Configuration) (Home directories) (Temporary files) (User programs)<br />
cp ls ps hda hdb kbd passwd hosts joe jane schulz<br />
(Private files)<br />
core Desktop<br />
In <strong>UNIX</strong>, all files are organized in a single directory tree<br />
There are two main types <strong>of</strong> files:<br />
local lib bin<br />
(Site−installed) (Vendor) (Vendor)<br />
lib bin<br />
– Plain files (containing data)<br />
– Directories, containing both plain files (optionally) <strong>and</strong> other directories<br />
Stephan Schulz 301
Globbing<br />
Glob patterns describe sets <strong>of</strong> file names<br />
A string is a wildcard pattern if it contains one <strong>of</strong> ?, * or [<br />
A wildcard pattern exp<strong>and</strong>s into all file names matching it<br />
– A normal letter in a pattern matches itself<br />
– A ? in a pattern matches any one letter<br />
– A * in a pattern matches any string<br />
– A pattern [l1. . . ln] matches any one <strong>of</strong> the enclosed letters (exception: ! as<br />
the first letter)<br />
– A pattern [!l1. . . ln] matches any one <strong>of</strong> the characters not in the set<br />
– A leading . in a filename is never matched by anything except an explicit<br />
leading dot<br />
Important: Globbing is performed by the shell, not an application program!<br />
Stephan Schulz 302
Some Important <strong>UNIX</strong> Comm<strong>and</strong>s (1)<br />
Orientation <strong>and</strong> moving around<br />
– whoami<br />
– pwd – print working directory<br />
– cd – change directory<br />
– ls – list files (Important options: -a, -l)<br />
Operating on files<br />
– cat – concatenate <strong>and</strong> print files<br />
– less <strong>and</strong> more – print files page by page<br />
– touch – change access dates (or create empty files)<br />
– mv – move files<br />
– cp – copy files<br />
– rm – remove files<br />
– wc – count words (<strong>and</strong> lines <strong>and</strong> characters)<br />
Stephan Schulz 303
Working on Directories:<br />
Some Important <strong>UNIX</strong> Comm<strong>and</strong>s (2)<br />
– mkdir – make a new directory<br />
– rmdir – remove an empty directory<br />
Miscellanous<br />
– man – read the manual (-k: Search for keywords in the manual)<br />
– info – read info format documentation (also available through emacs<br />
– echo – Print arguments<br />
– grep – Search lines matching a regular expression<br />
Stephan Schulz 304
Input <strong>and</strong> Output Redirection, Piping<br />
The three st<strong>and</strong>ard <strong>UNIX</strong> IO channels are<br />
– stdin (St<strong>and</strong>ard Input)<br />
– stdout (St<strong>and</strong>ard Output)<br />
– stderr (Errors)<br />
Normal output redirection redirects stdout into a file:<br />
Input redirection makes stdin read from a file<br />
Piping connects one processes stdout to the stdin <strong>of</strong> another process<br />
cat > newfile # Read stdin, write to newfile<br />
cat < newfile # Read newfile, write to terminal<br />
cat > newfile < oldfile # Poor man’s copy<br />
cat newfile | wc # Count words in newfile<br />
Stephan Schulz 305
Process Control<br />
Processes started from the shell can be<br />
– Running or Suspended<br />
– In the foreground (accepting keyboard input) or in the background<br />
Simple process control:<br />
– Running a comm<strong>and</strong> followed by & starts it in the background (normally<br />
comm<strong>and</strong>s are executed in the foreground)<br />
– ^Z (Control-Z) will suspend a foreground process<br />
– ^C (Control-C) will terminate it<br />
– fg wakes a suspended process <strong>and</strong> puts it into the foreground<br />
– bg puts it into the background<br />
– kill can be used to terminate it<br />
– jobs prints a list <strong>of</strong> active processes started from a shell<br />
Stephan Schulz 306
C Compiling with gcc<br />
Programs consisting <strong>of</strong> a single .c file can be compiled in one step<br />
– gcc -o file file.c will compile file.c into an executable program file<br />
Multiple C files must be compiled <strong>and</strong> linked separately!<br />
– gcc -c -o file1.o file1.c compiles the file into an object (.o) file<br />
– gcc -o file file1.o file2.o... links the different object files together to form an<br />
executable<br />
Important gcc options:<br />
– -o : Give the name <strong>of</strong> the output file<br />
– -ansi: Compile strict ANSI-89 C only<br />
– -Wall: Warn about all dubious lines<br />
– -c: Don’t perform linking, just generate a (linkable) object file<br />
– -O – -O6: Use increasing levels <strong>of</strong> optimization to generate faster executables<br />
Stephan Schulz 307
C Datatypes<br />
The language <strong>of</strong>fers a set <strong>of</strong> basic types built into the language<br />
– char, short, int, long, long long<br />
– float, double<br />
– Integer data types come in signed <strong>and</strong> unsigned variety!<br />
We can define new, quasi-basic types as enumerations (enum)<br />
We can derive new types as follows:<br />
– Arrays over a base type ([])<br />
– Structures combining base types (struct)<br />
– Unions (able to store alternative types) (union)<br />
– Pointer to a base type (*)<br />
typedef is used to define named new types<br />
Stephan Schulz 308
if...else<br />
– Conditional execution<br />
switch<br />
Flow Control<br />
– Select between many alternatives, based on a single integer type variable<br />
– Remember fall through property <strong>and</strong> break;!<br />
while<br />
– Loop as long as a condition is true<br />
for<br />
– As while, but included initialization <strong>and</strong> update in a single statement<br />
Stephan Schulz 309
Functions<br />
Any C program is a collection <strong>of</strong> functions<br />
– There has to be exactly one function called main() in the program<br />
– Execution starts by a call to main() (executed by the OS)<br />
– A function definition consists <strong>of</strong> a header <strong>and</strong> a body<br />
The header consists <strong>of</strong>:<br />
– The return type <strong>of</strong> the function<br />
– The name <strong>of</strong> the function<br />
– A parenthesized list <strong>of</strong> formal arguments<br />
The body <strong>of</strong> the function is a sequence <strong>of</strong> declarations <strong>and</strong> statements<br />
– Execution <strong>of</strong> the function ends when a return statement is encountered or<br />
the end <strong>of</strong> the body is reaches<br />
– The argument <strong>of</strong> the return statement is the value returned from the function<br />
call<br />
Stephan Schulz 310
C Preprocessor<br />
The #include directive is used to include other files (the contents <strong>of</strong> the named<br />
file replaces the #include directive)<br />
The #define directive is used to define macros<br />
– Macros can simply define a textual constant<br />
– Macros can have formal arguments, which will be instanciated in the replacement<br />
text<br />
#if/#else/#endif is used for conditional compilaton<br />
– The controlling expression <strong>of</strong> the #if has to be a constant integer expression<br />
– Special case: #ifdef tests if a macro is defined<br />
– Special case: #ifndef tests if a macro is not defined<br />
Stephan Schulz 311
Reread the lecture notes<br />
Exercises<br />
Download the C examples from the Web<br />
– Read the code<br />
– Compile them by h<strong>and</strong><br />
– Run them<br />
Stephan Schulz 312
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
<strong>Programming</strong> in C<br />
Dynamic Arrays <strong>and</strong> Pointer Arithmetic<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Dynamically Allocated Arrays<br />
Since pointers <strong>and</strong> arrays can be used interchangably in many contexts, we can<br />
use malloc() to allocate arrays <strong>of</strong> whatever size we need!<br />
– The size <strong>of</strong> an array <strong>of</strong> n elements <strong>of</strong> type t is just n*size<strong>of</strong>(t)<br />
Applications:<br />
– We can allocate arrays in a function <strong>and</strong> return pointers to them (remember<br />
that local variables are destroyed when control leaves a function)<br />
– We can determine array size at run time<br />
– We can dynamically increase array size by:<br />
∗ Allocating a bigger array<br />
∗ Copying the old array into the initial part <strong>of</strong> the new array<br />
∗ Freeing the old array<br />
Stephan Schulz 314
#include <br />
#include <br />
#define BUF_SIZE 1024<br />
int main(void)<br />
{<br />
int c, count=0;<br />
char* buffer;<br />
}<br />
Example<br />
buffer = malloc(size<strong>of</strong>(char)*BUF_SIZE);/* Missing check! */<br />
while((c=getchar())!=EOF)<br />
{<br />
if(count == BUF_SIZE-1)<br />
{<br />
printf("Buffer full\n"); exit(EXIT_FAILURE);<br />
}<br />
buffer[count++] = c;<br />
}<br />
buffer[count] = ’\0’;<br />
printf("%s\n", buffer);<br />
free(buffer);<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 315
Changing Allocated Block Size: realloc()<br />
void* realloc(void* ptr, size t size); is defined in <br />
– It’s first argument is a pointer to a block <strong>of</strong> memory on the heap (obtained<br />
with malloc(), realloc(), or an equivalent function)<br />
– The second argument is a desired new size <strong>of</strong> the block<br />
– realloc() returns a pointer to a new block <strong>of</strong> memory, <strong>of</strong> the desired size (if<br />
available, otherwise NULL)<br />
– If realloc() is successfull, the initial part <strong>of</strong> the new block (up to the smaller<br />
<strong>of</strong> the two sizes) will be identical to the old block<br />
Special cases:<br />
– if ptr is NULL, realloc() is equivalent to malloc()<br />
– If size is NULL, realloc() is equivalent to free<br />
– As with malloc(), we always have to check the return value!<br />
Most common use: Increase the size <strong>of</strong> some array<br />
Stephan Schulz 316
Example: Growing the Buffer as Needed<br />
#include <br />
#include <br />
int main(void)<br />
{<br />
int c, count=0, size = 2;<br />
char* buffer;<br />
}<br />
buffer = malloc(size<strong>of</strong>(char)*size); /* Missing check! */<br />
while((c=getchar())!=EOF)<br />
{<br />
if(count == size - 1)<br />
{<br />
size = size * 2;<br />
buffer = realloc(buffer, size); /* Missing check! */<br />
}<br />
buffer[count++] = c;<br />
}<br />
buffer[count] = ’\0’;<br />
printf("%s\n", buffer);<br />
free(buffer);<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 317
Additional Pointer Properties<br />
Pointers <strong>of</strong> the same type can be compared using , =<br />
– The result is only defined, when both pointers point at elements in the<br />
same array or struct, or if both pointers point to addresses within the same<br />
malloc()ed block<br />
– Pointers to elements with a smaller index are smaller than pointers to elements<br />
with a larger index<br />
Pointer arithmetic allows addition <strong>of</strong> integers to (non-void) pointers<br />
– If p points to element n in an array, p+k points to element n+k<br />
– As a special case, p[n] <strong>and</strong> *(p+n) can again be used interchangably (<strong>and</strong><br />
<strong>of</strong>ten are in practice)<br />
– Most frequent case: Use p++ to advance a pointer to the next element in an<br />
array<br />
– Note that pointer arithmetic only works on non-void pointers<br />
Stephan Schulz 318
char *cp, *cq;<br />
int *ip, *iq;<br />
Pointer Arithmetic<br />
cp<br />
cp+1<br />
cp+2<br />
cq=cp+12<br />
char arr1[28]<br />
a<br />
b<br />
c<br />
d<br />
e<br />
f<br />
g<br />
h<br />
i<br />
j<br />
k<br />
l<br />
m<br />
n<br />
o<br />
p<br />
q<br />
r<br />
s<br />
t<br />
u<br />
v<br />
w<br />
x<br />
y<br />
z<br />
0<br />
\0<br />
ip<br />
p+1<br />
&ip[2]<br />
iq=ip+3<br />
iq+2<br />
int arr2[7]<br />
17<br />
42<br />
−13<br />
2<br />
2147483647<br />
Stephan Schulz 319<br />
1024<br />
−1
Pointer Arithmetic Example<br />
#include <br />
#include <br />
int print_str(char *string)<br />
{<br />
int i = 0;<br />
while(*string)<br />
{<br />
putchar(*string);<br />
string++;<br />
i++;<br />
}<br />
return i;<br />
}<br />
int main(int argc, char* argv[])<br />
{<br />
char message[] = "Hello World!\n";<br />
int count;<br />
count = print_str(message);<br />
printf("Printed %d characters!\n", count);<br />
return EXIT_SUCCESS;<br />
}<br />
Stephan Schulz 320
Reading the Comm<strong>and</strong> Line: argc <strong>and</strong> argv<br />
The C st<strong>and</strong>ard defines a st<strong>and</strong>ardized way for a program to access its (comm<strong>and</strong><br />
line) arguments: main() can be defined with two additional arguments<br />
– int argc gives the number <strong>of</strong> arguments (including the program name)<br />
– char *argv[] is an array <strong>of</strong> pointers to character strings each corresponding<br />
to a comm<strong>and</strong> line argument<br />
Since the name under which the program was called is included among its<br />
arguments, argc is always at least one<br />
– argv[0] is the program name<br />
– argv[argc-1] is the last argument<br />
– argv[argc] is guranteed to be NULL<br />
Stephan Schulz 321
#include <br />
#include <br />
int main(int argc, char* argv[])<br />
{<br />
int i;<br />
}<br />
for(i=1; i
#include <br />
#include <br />
Example: Echoing Arguments – Idiomatic<br />
int main(int argc, char* argv[])<br />
{<br />
char **p;<br />
}<br />
for(p=argv+1; *p; p++)<br />
{<br />
printf("%s ", *p);<br />
}<br />
putchar(’\n’);<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 323
Exercises<br />
Write a function that reads a line (terminated by ’\n’) into an array, <strong>and</strong> a<br />
program that reads files line by line <strong>and</strong> prints it back. You can assume a<br />
reasonable fixed length (e.g. 1024 characters) per line<br />
Write a library that implements a dynamic array type for char arrays.<br />
– Implement functions that can assign <strong>and</strong> retrieve values from arbitrary positions,<br />
e.g. void darrayassign(darray p array, int index, char newval)<br />
<strong>and</strong> char darrayvalue(darray p array, int index)<br />
– Write a function darrayalloc() that returns a pointer to a freshly allocated<br />
dynamic array<br />
– Write a function darrayfree() that frees such an array<br />
– Hint: Use a struct that contains at least a pointer to the dynamically<br />
allocated proper array <strong>and</strong> the currently allocated array size. If an index<br />
greater than the size occurs, use realloc() to increase the size<br />
Put the two together: Write a function that can read a line <strong>of</strong> any length <strong>and</strong><br />
returns (a pointer to) it<br />
Stephan Schulz 324
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Making <strong>Programming</strong> Easier<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
The rpn calc Example<br />
ctype.h stdio.h<br />
stdlib.h<br />
chario.h<br />
chario.c<br />
integerio.h<br />
integerio.c<br />
chario.o integerio.o rpn_calc.o (libc)<br />
rpn_calc<br />
rpn_calc.c<br />
#include<br />
Compile (gcc −c)<br />
Link (gcc)<br />
Stephan Schulz 326
chario.h<br />
The rpn calc Example (Simplified)<br />
integerio.h<br />
chario.c integerio.c<br />
rpn_calc.c<br />
chario.o integerio.o rpn_calc.o<br />
rpn_calc<br />
Stephan Schulz 327
Program Dependencies<br />
In the example, changing one file may make many steps necessary to propagate<br />
the change<br />
– If any .h file has been changed, all .c files that include it may have to be<br />
recompiled<br />
– If any .c file has changed, it has to be recompiled<br />
– If any .o file has changed, we need to relink the program<br />
– In more complex programs, even more such situations exist!<br />
Recompiling all files <strong>and</strong> relinking (in the right order) solves the problem. . .<br />
– Very expensive for large programs<br />
∗ Mozilla, Windows NT: Many hours<br />
∗ Linux kernel (on modern machine): Many minutes<br />
∗ E theorem prover: 1-2 minutes<br />
– We still need to know the right order!<br />
Recompiling by h<strong>and</strong> is error-prone (<strong>and</strong> inconvenient)<br />
Stephan Schulz 328
<strong>UNIX</strong> User Utilities: make<br />
make is a <strong>UNIX</strong> utility that can automatically update large projects with complex<br />
dependencies<br />
– Dependencies <strong>and</strong> build instructions are described in a file called Makefile<br />
(preferred form) or makefile<br />
A makefile contains a number <strong>of</strong> rules for rebuilding the project<br />
A rule consist <strong>of</strong> a target, a list <strong>of</strong> prerequisites, <strong>and</strong> comm<strong>and</strong>s for rebuilding<br />
– The target normally is a file that needs rebuilding<br />
– The prerequisites are all files that are needed to rebuild the target<br />
– Finally, the comm<strong>and</strong>s describe how to rebuild the target<br />
Semantics:<br />
– Execution begins with the first target (or a target given on the comm<strong>and</strong> line)<br />
– First, rules for all prerequisites are activated (if any)<br />
– Then, if the target does not exist, or if any <strong>of</strong> the prerequisites is younger than<br />
the target, the comm<strong>and</strong>s are executed<br />
Stephan Schulz 329
Example: rpn calc makefile<br />
# Relink rpn_calc if one <strong>of</strong> the object files changed<br />
rpn_calc: chario.o integerio.o rpn_calc.o<br />
gcc -ansi -Wall -o rpn_calc chario.o integerio.o rpn_calc.o<br />
# Recompile chario if either the .h or the .h changed<br />
chario.o: chario.h chario.c<br />
gcc -ansi -Wall -c -o chario.o chario.c<br />
#...<br />
integerio.o: chario.h integerio.h integerio.c<br />
gcc -ansi -Wall -c -o integerio.o integerio.c<br />
#...<br />
rpn_calc.o: integerio.h chario.h rpn_calc.c<br />
gcc -Wall -ansi -c -o rpn_calc.o rpn_calc.c<br />
# General format:<br />
#<br />
# TARGET: PREREQUISITES<br />
# [TAB] comm<strong>and</strong>1<br />
# [TAB] comm<strong>and</strong>2 ...<br />
Stephan Schulz 330
Built-In Rules <strong>and</strong> Makefile Variables<br />
make knows how to remake many types <strong>of</strong> files!<br />
– In particular, make knows how to run the C compiler to build object (.o) files<br />
from .c files<br />
We could have omitted the comiler comm<strong>and</strong> e.g. from the rule for chario.o:<br />
chario.o: chario.h chario.c<br />
make allows the use <strong>of</strong> variables, both for custimization <strong>and</strong> for more compact<br />
makefiles<br />
– Variables are set using the assignment operator:<br />
RPN=chario.o integerio.o rpn_calc.o<br />
– Variables are referenced using a $: $(RPN)<br />
Important predefined variables:<br />
– CC: Name <strong>of</strong> the C compiler<br />
– CFLAGS: Additional flags for the C compiler<br />
Stephan Schulz 331
CC=gcc<br />
CFLAGS=-Wall -ansi -O6<br />
Example: rpn calc makefile revisited<br />
RPN=chario.o integerio.o rpn_calc.o<br />
# Relink rpn_calc if one <strong>of</strong> the object files changed<br />
rpn_calc: chario.o integerio.o rpn_calc.o<br />
gcc -ansi -Wall -o rpn_calc $(RPN)<br />
chario.o: chario.h chario.c<br />
integerio.o: chario.h integerio.h integerio.c<br />
rpn_calc.o: integerio.h chario.h rpn_calc.c<br />
Rebuilding from scratch:<br />
$ rm *.o<br />
$ make<br />
gcc -Wall -ansi -O6 -c -o chario.o chario.c<br />
gcc -Wall -ansi -O6 -c -o integerio.o integerio.c<br />
gcc -Wall -ansi -O6 -c -o rpn calc.o rpn calc.c<br />
gcc -ansi -Wall -o rpn calc chario.o integerio.o rpn calc.o<br />
Stephan Schulz 332
Phony Targets<br />
Not all targets need to correspond to files<br />
– Targets not corresponding to a file are called phony<br />
– Since no corresponding file exists, comm<strong>and</strong>s in rules with phony targets are<br />
always executed<br />
Frequent use: Cleanup comm<strong>and</strong>s<br />
clean:<br />
rm *.o<br />
rm rpn_calc<br />
Stephan Schulz 333
Assignment<br />
Write a program sort csc322 that reads an arbitray length file line by line<br />
(allowing for arbitrary line length), sort the lines in ASCIIbetical order, <strong>and</strong> prints<br />
it back<br />
– Order: A letter that has a smaller numerical value is smaller than a letter that<br />
has a bigger numerical value. To compare strings, find the first character that<br />
differs (including the terminating ’\0’)<br />
– Hints:<br />
∗ If you are lazy, reuse the binary tree code for sorting!<br />
∗ Define a data type for the lines, using struct <strong>and</strong> char*<br />
Include a Makefile for building your final program from the sources!<br />
– More hint: If you are lazy, read man makedepend<br />
Stephan Schulz 334
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Odds And Ends<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Errors, Bugs, <strong>and</strong> Other Unpleasant Animals<br />
Most hard-to-h<strong>and</strong>le errors are not syntax errors<br />
– Most syntax errors go away with experience<br />
– Even if not, they are usually easy to find <strong>and</strong> fix!<br />
Most serious problems are runtime errors, resulting from faulty program logic<br />
– Finding logic errors is hard<br />
– Not finding them is worse!<br />
Examples:<br />
– Spacecraft may crash (Mars Climate Orbiter) or explode (Ariane-5)<br />
– Medical devices may actually kill patients (Therac-25 cancer treatment device)<br />
– The IRS may decide you are a tax evader, <strong>and</strong> have you arrested!<br />
Ways to (more) correct s<strong>of</strong>tware:<br />
– Formal methods <strong>and</strong> a controlled development process<br />
– Testing<br />
– Internal consistency checks<br />
Stephan Schulz 336
Assertions<br />
Internal consistency check are used to verify that assumptions about the state <strong>of</strong><br />
the program are true<br />
– Very frequent use: Check if parameters to functions have valid values<br />
– Check loop invariants<br />
– Check array boundaries<br />
Problems<br />
– Checks are inconvenient to program<br />
– The checks may cause unacceptable slowdowns (E theorem prover: Factor <strong>of</strong><br />
2–3, depending on input data)<br />
C solution: The header file <strong>and</strong> macros<br />
– Convenient way to add simple consistency checks<br />
– Checks can be disabled at compile time (now slow-down for final product)<br />
Stephan Schulz 337
<strong>and</strong> assert()<br />
The assert() macro is defined in assert.h<br />
It is used with a single argument<br />
If that argument has the truth value “true”, nothing happens<br />
Otherwise, assert() prints an error message <strong>and</strong> aborts the program<br />
– Error message contains text <strong>of</strong> the assertion, name <strong>of</strong> source file, line in file<br />
If the preprocessor macro NDEBUG is defined, assert() is ignored (defined as the<br />
empty macro)<br />
Careful use <strong>of</strong> assert() while testing makes your programs much more robust<br />
<strong>and</strong> helps you weed out errors earlier!<br />
Stephan Schulz 338
#include <br />
#include <br />
#include <br />
int gcd(int a, int b)<br />
{<br />
assert(a>0);assert(b>0);<br />
if(a==b)<br />
{<br />
return a;<br />
}<br />
if(a > b)<br />
{<br />
return gcd(a-b,b);<br />
}<br />
return gcd(b-a,a);<br />
}<br />
int main(void)<br />
{<br />
printf("Result: %d\n", gcd(15,3));<br />
printf("Result: %d\n", gcd(0,2));<br />
return EXIT_SUCCESS;<br />
}<br />
Example<br />
Stephan Schulz 339
Example (Continued)<br />
$ gcc -ansi -Wall -o gcd assert gcd assert.c<br />
$ ./gcd assert<br />
Result: 3<br />
gcd assert: gcd assert.c:7: gcd: Assertion ‘a>0’ failed.<br />
Abort<br />
$ gcc -ansi -Wall -o gcd assert gcd assert.c -DNDEBUG<br />
$ ./gcd assert<br />
Result: 3<br />
Segmentation fault<br />
Stephan Schulz 340
Search in Loops<br />
A frequent use <strong>of</strong> loops is to search for something in a sequence (list or array) <strong>of</strong><br />
elements<br />
First attempt: Search for an element with property P in array<br />
for(i=0; (i< array_size) && !P(array[i]); i=i+1)<br />
{ /* Empty Body */ }<br />
if(i!=array_size)<br />
{<br />
do_something(array[i]);<br />
}<br />
– Combines property test <strong>and</strong> loop traversal test (unrelated tests!) in one<br />
expression<br />
– Property test is negated<br />
– We still have to check if we found something at the end (in a not very intuitive<br />
test)<br />
Is there a better way?<br />
Stephan Schulz 341
Early Exit: break<br />
C <strong>of</strong>fers a way to h<strong>and</strong>le early loop exits<br />
The break; statement will always exit the innermost (structured) loop (or<br />
switch) statement<br />
Example revisited:<br />
for(i=0; i< array_size; i=i+1)<br />
{<br />
if(P(array[i])<br />
{<br />
do_something(array[i]);<br />
break;<br />
}<br />
}<br />
– I find this easier to read<br />
– Note that the loop is still single entry/single exit, although control flow in the<br />
loop is more complex<br />
Stephan Schulz 342
Selective Operations <strong>and</strong> Special Cases<br />
Assume we have a sequence <strong>of</strong> elements, <strong>and</strong> have to h<strong>and</strong>le them differently,<br />
depending on properties:<br />
for(i=0; i< array_size; i=i+1)<br />
{<br />
if(P1(array[i])<br />
{<br />
/* Nothing to do */<br />
}<br />
else if(P2(array[i]))<br />
{<br />
do_something(array[i]);<br />
}<br />
else<br />
{<br />
do_something_really_complex(array[i]);<br />
}<br />
}<br />
Because <strong>of</strong> the special cases, all the main stuff is hidden away in an else<br />
Wouldn’t it be nice to just goto the top <strong>of</strong> the loop?<br />
Stephan Schulz 343
Early Continuation: continue<br />
A continue; statement will immediately start a new iteration <strong>of</strong> the current loop<br />
– For C for loops, the update expression will be evaluated!<br />
Example with continue:<br />
for(i=0; i< array_size; i=i+1)<br />
{<br />
if(P1(array[i])<br />
{<br />
continue;<br />
}<br />
if(P2(array[i]))<br />
{<br />
do_something2(array[i]);<br />
continue;<br />
}<br />
do_something_really_complex(array[i]);<br />
}<br />
Stephan Schulz 344
do/while Loops<br />
Both while <strong>and</strong> for loops in C are controlled at the top<br />
– If the controlling expression is false, the loop is not entered at all<br />
Occasionally, we can express some algorithms more conveniently, if we have a<br />
controlling expression at the end <strong>of</strong> the loop<br />
– Loop body is always executed at least once!<br />
C language construct: do/while() loop<br />
do<br />
{<br />
loop body<br />
}while(E);<br />
– If E evaluates to true at the end <strong>of</strong> the loop, control is transferred back to the<br />
do<br />
Stephan Schulz 345
#include <br />
#include <br />
int main(int argc, char* argv[])<br />
{<br />
int c;<br />
}<br />
Example<br />
do<br />
{<br />
printf("Please choose 1 for half <strong>of</strong> a bad joke or 2 for a cool number!\n");<br />
c=getchar();<br />
}while(!(c==’1’ || c==’2’));<br />
if(c==’1’)<br />
{<br />
printf("Why did the chicken cross the road? ...\n");<br />
}<br />
else<br />
{<br />
printf("42\n");<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 346
E theorem prover<br />
Some Loop Statistics<br />
– State <strong>of</strong> the art automated theorem prover<br />
– About 100000 lines <strong>of</strong> C code (20000 statements, the rest is comments, white<br />
space, definitions....)<br />
– Total <strong>of</strong> 942 structured loop statements in code base<br />
521 for() loops<br />
– Most iterate over integer values (for i=0; i
Exercises<br />
Go back over your excercises ans assignments, <strong>and</strong> think about good places to<br />
insert assert() statements<br />
Write a non-recursive function that searches for a value in a binary search tree.<br />
Use break to leave the lopp if you found it!<br />
Think about uses for do/while ;-)<br />
Stephan Schulz 348
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Function Pointers<br />
C St<strong>and</strong>ard Library (1)<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Functions as Arguments<br />
Occasionally, you want to be able to pass around functions just like data<br />
Example:<br />
– Configure an event h<strong>and</strong>ler (“call this function if the UPS signals power-down”)<br />
– Simulate some object-oriented techniques (virtual functions), e.g. to implement<br />
destructors<br />
– Most importantly: Parameterize algorithms<br />
Functional languages have functions as first class objects<br />
C is less flexible, but gives us function pointers to pass as arguments <strong>and</strong> store in<br />
variables<br />
– Idea: Pointers are addresses in memory<br />
– Functions are pieces <strong>of</strong> code in memory<br />
Stephan Schulz 350
Function Pointers<br />
We can use the address <strong>of</strong> a function to call it!<br />
– As with normal pointers, we need know the type <strong>of</strong> the function (in this case,<br />
the return type <strong>and</strong> the type <strong>of</strong> the arguments it takes)<br />
Syntax: Same principle as for other type!<br />
– To declare a function pointer, use a function declaration, but add parentheses<br />
<strong>and</strong> add a * to denote that it is a pointer:<br />
int (*add)(int x1, int x2);<br />
– This declares add to be a pointer to a function accepting two integer arguments<br />
<strong>and</strong> returning a third integer<br />
To use a function pointer: Just dereference the pointer<br />
a = (*add)(10,20);<br />
To assign a value to the pointer, just get the address <strong>of</strong> a function:<br />
add = &some_function_name;<br />
Stephan Schulz 351
Function Pointers (2)<br />
To confuse students (<strong>and</strong> for convenience), it is possible to omit both the<br />
dereferencing in calling <strong>and</strong> the ampers<strong>and</strong> in assigning:<br />
add = somefunction<br />
a = add(10,20);<br />
– Since there is nothing else you can do with functions in C, these simplifications<br />
do not create am ambiguity<br />
– They tend to make code easier to read, though, especially with functions that<br />
return pointers<br />
Note: Since declarations quickly become hard to read, it is wise to always use<br />
typedef to define a suitble function pointer type!<br />
Stephan Schulz 352
Example<br />
#include <br />
#include <br />
int add(int x1, int x2)<br />
{ return x1+x2; }<br />
int subtract(int x1, int x2)<br />
{ return x1-x2; }<br />
void use_fun(int limit, int (*fun)(int x1, int x2))<br />
{<br />
int i;<br />
for(i=0; i
Result: 20<br />
Result: 21<br />
Result: 22<br />
Result: 23<br />
Result: 24<br />
--------<br />
Result: 20<br />
Result: 19<br />
Result: 18<br />
Result: 17<br />
Result: 16<br />
Example Output<br />
Stephan Schulz 354
C Library Functions: qsort()<br />
qsort() is a very useful C library function (declared in that is able<br />
to sort any kind <strong>of</strong> array (<strong>and</strong> normally does so very efficiently)!<br />
qsort is defined as follows:<br />
void qsort(void *base, size_t nmemb, size_t size,<br />
int(*compar)(const void *, const void *));<br />
– The first argument points to the array to be sorted (i.e. to its first argument)<br />
– The second argument is the number <strong>of</strong> elements in the array<br />
– The third argument gives the size if a single element<br />
– Finally, the last element is a function pointer <strong>of</strong> a function taking two pointer<br />
arguments, <strong>and</strong> returning an integer value<br />
Stephan Schulz 355
qsort definition (repeated):<br />
C Library Functions: qsort() (2)<br />
void qsort(void *base, size_t nmemb, size_t size,<br />
int(*compar)(const void *, const void *));<br />
Purpose <strong>of</strong> compar: Let the caller define an order on elements<br />
– (*compar)() is called by qsort() to compare two arguments<br />
– It gets pointers to two array elements as arguments<br />
– It should compare these elements <strong>and</strong> return<br />
∗ 0, if the two elements are equal (under the order)<br />
∗ A negative integer, if the first element is smaller<br />
∗ A positive integer, if the first element is greater<br />
Stephan Schulz 356
#include <br />
#include <br />
Example<br />
typedef int (*CompareFun)(const void* arg1, const void* arg2);<br />
int compare_ints(int *arg1, int* arg2)<br />
{<br />
if(*arg1 < *arg2)<br />
{<br />
return -1;<br />
}<br />
if(*arg1 > *arg2)<br />
{<br />
return 1;<br />
}<br />
return 0;<br />
}<br />
Stephan Schulz 357
int main(int argc, char* argv[])<br />
{<br />
int array[10], i;<br />
}<br />
Example (continued)<br />
for(i=0; i
Unsorted:<br />
103<br />
70<br />
105<br />
115<br />
81<br />
127<br />
74<br />
108<br />
41<br />
77<br />
Sorted:<br />
41<br />
70<br />
74<br />
77<br />
81<br />
103<br />
105<br />
108<br />
115<br />
127<br />
Example (Output)<br />
Stephan Schulz 359
The C St<strong>and</strong>ard Library<br />
The C St<strong>and</strong>ard Library contains a large number <strong>of</strong> functions, some data types<br />
<strong>and</strong> system dependend constants<br />
– Covers many things that other languages h<strong>and</strong>le in the main language<br />
– Also contains primitives for extending some parts <strong>of</strong> the language<br />
– Notably missing: Any functionality for graphics (only stream-based I/O)<br />
Most parts <strong>of</strong> the library are automatically linked with the C programs (exception:<br />
Floating point math functions)<br />
The st<strong>and</strong>ard library is part <strong>of</strong> the C st<strong>and</strong>ard, <strong>and</strong> has to be supported on any<br />
st<strong>and</strong>ards-compliant full C implementation<br />
– Code written using only the st<strong>and</strong>ard library should be highly portable<br />
The library has 15 parts with corresponding header files<br />
– Some declarations are repeated in different headers<br />
Stephan Schulz 360
C St<strong>and</strong>ard Library Organisation<br />
– assert.h: Assertions (*)<br />
– ctype.h: Character classes (+)<br />
– errno.h: Error reporting for library functions<br />
– float.h: Implementation limits for floating point numbers<br />
– limits.h: Limits for other things<br />
– locale.h: Localization support<br />
– math.h: Mathematical functions<br />
– setjmp.h: Non-local function exits<br />
– signal.h: Signal h<strong>and</strong>ling<br />
– stdarg.h: Support for functions with a variable number <strong>of</strong> arguments (as<br />
e.g. printf())<br />
– stddef.h: St<strong>and</strong>ard macros <strong>and</strong> typedefs<br />
– stdio.h: Input <strong>and</strong> output (+)<br />
– stdlib.h: Miscellaneous library functions (+)<br />
– string.h: String (character array) h<strong>and</strong>ling<br />
– time.h: Functions about time <strong>and</strong> date<br />
Stephan Schulz 361
Error H<strong>and</strong>ling: errno.h<br />
Library functions typically signal an error by returning an out <strong>of</strong> range value, i.e.<br />
a value that cannot possibly be correct<br />
– For many functions that is -1 or NULL<br />
They communicate the cause <strong>of</strong> the error by setting the global int variable<br />
errno to a specific value<br />
– At the program start, errno is guaranteed to have the value 0<br />
– No library function will ever set errno to 0, but failed library functions will set<br />
it to an implemetation-defined value encoding the cause <strong>of</strong> the error<br />
Error codes have symbolic names (with #define):<br />
– EDOM: (Required by the st<strong>and</strong>ard) Domain error for some math functions<br />
– ERANGE: (Required by the st<strong>and</strong>ard) Range error for some math functions<br />
– EAGAIN: (<strong>UNIX</strong>) Temporary problem, try again<br />
– ENOMEM: (<strong>UNIX</strong>) Out <strong>of</strong> memory<br />
– EBUSY: (<strong>UNIX</strong>) Some necessary resource is already in use<br />
– EINVAL: (<strong>UNIX</strong>) Invalid argument to some function<br />
Stephan Schulz 362
#include <br />
#include <br />
#include <br />
#include <br />
int main(int argc, char* argv[])<br />
{<br />
char *res;<br />
}<br />
Example<br />
printf("errno: %d\n", errno);<br />
res = strdup("Hallo"); /* Allocate space, copy the string to it */<br />
if(!res)<br />
{<br />
printf("Could not copy string, errno: %d = %d\n", errno, ENOMEM);<br />
}<br />
else<br />
{<br />
printf("All is fine, errno: %d\n", errno);<br />
free(res);<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 363
Exercises<br />
Write a program that sorts an arbitrary sized array <strong>of</strong> double values<br />
Think about a program that sorts pointers to char, based on the characters (or<br />
character arrays) the pointers point to (yes, this is a hint for your assignement)<br />
Check out /usr/include/errno.h <strong>and</strong> /usr/include/asm/errno.h<br />
Stephan Schulz 364
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
C St<strong>and</strong>ard Library<br />
Characters <strong>and</strong> Strings<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Character Classes <strong>and</strong> <br />
Th C st<strong>and</strong>ard defines several character classes in a portable way<br />
– We can use these functions regardless <strong>of</strong> the underlying character set <strong>of</strong> the<br />
implementation<br />
– Most <strong>of</strong> these functions can be (<strong>and</strong> are) implemented in a very efficient<br />
manner for ASCII characters<br />
C characters are integer values, typically 8 bits wide<br />
– On most implementations, char is an 8 bit extension to ASCII (in recent time,<br />
isolatin-1 or variants have become popular)<br />
– There is limited support for bigger character sets using wchar t<br />
Character h<strong>and</strong>ling functions are defined in <br />
Stephan Schulz 366
Some C Character Classes<br />
All character class functions accept <strong>and</strong> return int values<br />
– Behaviour is only defined if the input is from the range <strong>of</strong> unsigned char or<br />
EOF<br />
– Each function returns true (non-0) if the character is in the range, 0 otherwise<br />
Character class test functions<br />
– isdigit(c): Digits, i.e. {0-9}<br />
– isalpha(c): Upper <strong>and</strong> lower case characters ({a-z,A-Z}, in some locales<br />
additional characters, e.g. umlauts like ä, Ö,. . .<br />
– isalnum(c): Equivalent to (isdigit(c)||isalpha(c))<br />
– iscntrl(c): Control characters, i.e. non-printable characters (in ASCII, those<br />
are characters with codes 0 to 31 <strong>and</strong> 127)<br />
– isxdigit(c): Hexadecimal digits, {0-9,a-z,A-Z}<br />
– islower(c): Lower case letters<br />
– isupper(c): Upper case letters<br />
– ispunct(c): Printing characters that are neither letters, digits, nor space<br />
– isprint(c): Normal, printable characters<br />
Stephan Schulz 367
Character Class Conversion Functions<br />
There are two functions for converting characters from one class to another:<br />
– tolower(c) converts upper case characters to lower case characters<br />
– toupper(c) converts lower case characters to upper case characters<br />
Both functions return the character unchanged, if it is not a upper or lower case<br />
character, respectively<br />
Stephan Schulz 368
#include <br />
#include <br />
#include <br />
int main(void)<br />
{<br />
int c;<br />
}<br />
Example<br />
while((c=getchar())!=EOF)<br />
{<br />
if(iscntrl(c))<br />
{<br />
printf("
$ man man | ctypedemo<br />
Example Output<br />
MAN(1) MANUAL PAGER UTILS MAN(1)<br />
<br />
<br />
<br />
N<br />
NA<br />
AM<br />
ME<br />
E<br />
MAN - AN INTERFACE TO THE ON-LINE REFERENCE MANUALS<br />
<br />
S<br />
SY<br />
YN<br />
NO<br />
OP<br />
PS<br />
SI<br />
Stephan Schulz 370
Strings<br />
Strings are not part <strong>of</strong> the C language proper<br />
– String literals are supported<br />
– Limited support by functions the C st<strong>and</strong>ard library<br />
String-h<strong>and</strong>ling functions are operating on char* (pointer to char) values<br />
– It is the responsibility <strong>of</strong> the program to make sure that there is sufficient<br />
space for the operations available!<br />
Convention for strings:<br />
– Strings are \0 terminated arrays <strong>of</strong> character<br />
– Important: Size <strong>of</strong> the array is not taken into account!<br />
char excess[10000] = "a"; /* String length 1, takes up two<br />
characters, a <strong>and</strong> \0 */<br />
char tooshort[2];<br />
tooshort[0] = ’a’;<br />
tooshort[1] = ’b’; /* tooshort is not a valid string, if treated<br />
as one, behaviour is undefined */<br />
Stephan Schulz 371
String Functions from (1)<br />
char *strcpy(char* s, const char *ct)<br />
– Copy a ’\0’-terminated string from ct to s<br />
– Returns s<br />
– s must point to a sufficiently large area <strong>of</strong> memory!<br />
– Note: For all string functions that copy strings, source <strong>and</strong> target areas may<br />
not overlap (otherwise, behaviour is undefined)<br />
char *strncpy(char* s, const char *ct, size t n)<br />
– As strcpy(), but copies at most n characters<br />
– Note: If ct is longer than n, s will not be ’\0’-terminated<br />
– If ct is shorter than n, then the result will be padded with additional ’\0’<br />
characters (i.e. s must always have space for n characters, even if ct is shorter<br />
than n characters)<br />
size t strlen(const char *cs)<br />
– Return the length <strong>of</strong> the string at cs<br />
– Does not count the trailing ’\0’<br />
Stephan Schulz 372
Example: Duplicating Strings<br />
Several <strong>UNIX</strong> st<strong>and</strong>ards define a function strdup() that allocates enough<br />
memory for a string, <strong>and</strong> then copies it, returning the pointer to the newly<br />
allocated memory<br />
Our version also makes sure that there is memory available:<br />
char* SecureStrdup(char* str)<br />
{<br />
char *newstr = SecureMalloc(strlen(str)+1);<br />
}<br />
return strcpy(newstr,str);<br />
Stephan Schulz 373
String Functions from (2)<br />
char *strcat(char *s, const char *ct)<br />
– Concatenates ct at the end <strong>of</strong> s<br />
– Returns s<br />
– Result is always ’\0’ terminated<br />
char *strncat(char *s, const char *ct, size t n)<br />
– As strcat(), but copies at most n characters from ct<br />
– Result is always ’\0’ (even if ct is longer than n<br />
Examples:<br />
char *t="World";<br />
char s[10] = "Hello";<br />
strncat(s,t,3); /* Ok, t now points to "HelloWor" */<br />
strcat(s,t); /* Error: "HelloWorld" requires 11 character (’\0’!) */<br />
Stephan Schulz 374
String Functions from (3)<br />
int strcmp(const char* cs, const char* ct)<br />
– Compare two strings in the lexical extension <strong>of</strong> the natural order on characters<br />
– First differing character decides which string is bigger (including terminating<br />
’\0’, i.e. a substring is always smaller than a superstring)<br />
– Return value: Integer 0, if ct is smaller, or 0 if both are<br />
equal<br />
int strncmp(const char* cs, const char* ct, size t n)<br />
– As strcmp(), but compare at most n characters<br />
char *strchr(const char *s, int c)<br />
– Return pointer to the first occurrence <strong>of</strong> c in cs (or NULL, if c is not present<br />
in cs)<br />
char *strrchr(const char *s, int c)<br />
– Return pointer to the last occurrence <strong>of</strong> c in cs (or NULL)<br />
Stephan Schulz 375
String Functions from (4)<br />
char *strpbrk(const char *cs, const char *ct)<br />
– Returns pointer to first character from ct in cs (or NULL), i.e. ct is treated as<br />
a set <strong>of</strong> characters<br />
– Example:<br />
strpbrk("Hello", "eul"); /* Returns pointer to the "e" in "Hello"<br />
char* strstr(const char *cs, const char *ct)<br />
– Return pointer to first occurrence <strong>of</strong> ct in cs, or NULL if ct is not a substring<br />
<strong>of</strong> cs<br />
char *strerror(int n)<br />
– Return a pointer to a string description <strong>of</strong> the library error with error code n<br />
(as defined in )<br />
– If n is not a known error code, a pointer to a generic “unknown error code”<br />
message is returned<br />
Stephan Schulz 376
Generic Memory Access Functions<br />
The original C st<strong>and</strong>ard used char* as a generic pointer, hence generic memory<br />
h<strong>and</strong>ling functions are lumped in with strings<br />
– Character is just another name for Byte in C, anyways<br />
– However, ANSI C has the generic void* pointer type<br />
The following functions are generally very similar to the string functions, but do<br />
not use a delimiter like ’\0’<br />
– All operations specify a lenght parameter n, <strong>and</strong> h<strong>and</strong>le exactly n characters<br />
These functions basically treat the virtual memory as one big character array!<br />
– Used to implement many basic operations<br />
– Typically implemented very efficiently (<strong>of</strong>ten by processor specific assembler<br />
subroutines)<br />
Stephan Schulz 377
Memory Functions from (1)<br />
void *memcpy(void *s, const void *ct, size t n)<br />
– Copy a sequence <strong>of</strong> n bytes from ct to s<br />
– The regions may not overlap!<br />
void *memmove(void *s, const void *ct, size t n)<br />
– Copy a sequence <strong>of</strong> n bytes from ct to s<br />
– There are no additional constraints (i.e. memmove() has to h<strong>and</strong>le cases where<br />
the regions overlap)<br />
int memcmp(const void *cs, const void *ct, size t n)<br />
– Compare the first n characters found at cs <strong>and</strong> ct<br />
– Return value: As strcmp() (, = 0)<br />
Stephan Schulz 378
Memory Functions from (2)<br />
void *memchr(const char *s, int c, size t n)<br />
– Search for character c in the first n bytes at cs, return pointer to it (or NULL)<br />
void *memset(void *s, int c, size t n)<br />
– Place character c into the first n characters at s, returning s<br />
Stephan Schulz 379
Exercises<br />
Write a simple version <strong>of</strong> grep (looking for plain strings in stdin only)<br />
Write a version <strong>of</strong> memmove() (the hard part is h<strong>and</strong>ling overlapping arrays –<br />
remember that you can compare pointers with <strong>and</strong> ==)!<br />
Stephan Schulz 380
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
C St<strong>and</strong>ard Library<br />
Memory H<strong>and</strong>ling <strong>and</strong> IO<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Generic Memory Access Functions<br />
The original C st<strong>and</strong>ard used char* as a generic pointer, hence generic memory<br />
h<strong>and</strong>ling functions are lumped in with strings<br />
– Character is just another name for Byte in C, anyways<br />
– However, ANSI C has the generic void* pointer type<br />
The following functions are generally very similar to the string functions, but do<br />
not use a delimiter like ’\0’<br />
– All operations specify a lenght parameter n, <strong>and</strong> h<strong>and</strong>le exactly n characters<br />
These functions basically treat the virtual memory as one big character array!<br />
– Used to implement many basic operations<br />
– Typically implemented very efficiently (<strong>of</strong>ten by processor specific assembler<br />
subroutines)<br />
Stephan Schulz 382
Memory Functions from (1)<br />
void *memcpy(void *s, const void *ct, size t n)<br />
– Copy a sequence <strong>of</strong> n bytes from ct to s<br />
– The regions may not overlap!<br />
void *memmove(void *s, const void *ct, size t n)<br />
– Copy a sequence <strong>of</strong> n bytes from ct to s<br />
– There are no additional constraints (i.e. memmove() has to h<strong>and</strong>le cases where<br />
the regions overlap)<br />
int memcmp(const void *cs, const void *ct, size t n)<br />
– Compare the first n characters found at cs <strong>and</strong> ct<br />
– Return value: As strcmp() (, = 0)<br />
Stephan Schulz 383
Memory Functions from (2)<br />
void *memchr(const char *s, int c, size t n)<br />
– Search for character c in the first n bytes at cs, return pointer to it (or NULL)<br />
void *memset(void *s, int c, size t n)<br />
– Place character c into the first n characters at s, returning s<br />
Stephan Schulz 384
Example<br />
#include <br />
#include <br />
#include <br />
int main(int argc, char* argv[])<br />
{<br />
char carray[10];<br />
int iarray[10], i;<br />
memset(&carray[0], ’a’, 10*size<strong>of</strong>(char));<br />
memset(&iarray[0], ’a’, 10*size<strong>of</strong>(int));<br />
for(i=0; i
a : 1633771873<br />
a : 1633771873<br />
a : 1633771873<br />
a : 1633771873<br />
a : 1633771873<br />
a : 1633771873<br />
a : 1633771873<br />
a : 1633771873<br />
a : 1633771873<br />
a : 1633771873<br />
b : 1650614882<br />
b : 1650614882<br />
b : 1633772130<br />
b : 1633771873<br />
b : 1633771873<br />
b : 1633771873<br />
b : 1633771873<br />
b : 1633771873<br />
b : 1633771873<br />
b : 1633771873<br />
Example Output<br />
Stephan Schulz 386
Input <strong>and</strong> Output in the St<strong>and</strong>ard Library<br />
Input <strong>and</strong> output in C is based on the concept <strong>of</strong> streams <strong>of</strong> bytes<br />
– Binary streams are raw, unprocessed bytes (only guarantee: If you write data<br />
to a binary stream, <strong>and</strong> then read it back, it is unchanged)<br />
– Text streams are composed <strong>of</strong> (possibly empty) lines, separated by a single newline<br />
(’\n’) character (the library has to make sure other text representations<br />
are converted properly)<br />
– In <strong>UNIX</strong>, text <strong>and</strong> binary streams are identical<br />
– In Windows, the library has to convert the newline/linefeed sequence used to<br />
separate lines to a single newline for text streams (but, <strong>of</strong> course, may not<br />
mangle binary streams)<br />
Streams are represented by FILE* objects in C (“file pointers”)<br />
– The FILE type is defined in <br />
– A stream normally has to be explicitely opened (connected to an input <strong>and</strong><br />
output device) <strong>and</strong> should be closed (made available for resuse)<br />
Stephan Schulz 387
St<strong>and</strong>ard Streams<br />
By default, each program has three text streams open on startup:<br />
– stdin is the st<strong>and</strong>ard input (normally reading from keyboard)<br />
– stdout is the st<strong>and</strong>ard output (normally conected to the terminal)<br />
– stderr is the st<strong>and</strong>ard error channel (also connected to the terminal)<br />
The I/O-functions we have used so far implicitely use the default streams:<br />
– printf() <strong>and</strong> putchar() write to stdout<br />
– getchar() reads from stdin<br />
Stephan Schulz 388
Opening File Streams<br />
In addition to the st<strong>and</strong>ard streams, we can create additional streams, normally<br />
associated with a file. The most general function is:<br />
FILE* fopen(const char *filename, const char *mode)<br />
– The first argument hat to be a valid filename<br />
– The second argument describes the mode in which the file should be opened<br />
The mode is a string <strong>of</strong> characters<br />
– "r" opens a file for reading in text mode<br />
– "w" opens a file for writing in text mode (will create new file, overwriting an<br />
existing file)<br />
– "a" opens a file for writing in text mode (but will append new output to the<br />
end <strong>of</strong> an existing file)<br />
– Adding a "b" will open the file as a binary file (e.g. "rb": Read binary)<br />
fopen() returns a valid file pointer, if successful, or NULL if it fails<br />
– In the case <strong>of</strong> failure, it sets errno to an appropriate value!<br />
Stephan Schulz 389
Closing <strong>and</strong> Reopening File Streams<br />
Once we are done with a certain file, we have to close it<br />
– The number <strong>of</strong> simultaneously open files is limited for most operating systems.<br />
Closing a stream makes it available for other purposes<br />
– Streams may be buffered. Closing a straem flushes the buffer (i.e. prints all<br />
remaining characters)<br />
int fclose(FILE *stream) closes the file associated with stream<br />
– It returns 0 if no errors occurred, EOF otherwise<br />
FILE* freopen(const char *filenam, const char *mode, FILE *stream)<br />
– This function closes stream <strong>and</strong> reopens it with a new associated file<br />
– It is useful to e.g. redirect stdin into a file (from within the program)<br />
Stephan Schulz 390
Simple Stream Based I/O Functions (Characters)<br />
int fgetc(FILE *stream)<br />
– Return the next character from the named stream (or EOF if no character is<br />
available or an error occurs)<br />
– Note: getchar() is equivalent to fgetc(stdin)<br />
int fputc(int c, FILE *stream)<br />
– Print the character c to the stream, returning c or EOF in case <strong>of</strong> error<br />
– putchar(c) is equivalent to fputc(c, stdout)<br />
int getc(FILE *stream) is equivalent to fgetc(), except that it may be<br />
implemented as a macro (<strong>and</strong> may hence evaluate stream more than once)<br />
Similarly, int putc(int c, FILE *stream) is equivalent to fputc, but may<br />
be a macro<br />
Stephan Schulz 391
Simple Stream Based I/O Functions (Strings)<br />
int fputs(const char *s, FILE *stream)<br />
– Writes the string pointed to by the first argument to the denoted stream<br />
– Returns EOF on failure, a non-negative value otherwise<br />
char *fgets(char *s, int n, FILE *stream)<br />
– Read at most n-1 characters into the array pointed to by s, stops early if a<br />
newline is encountered<br />
– *s is always ’\0’ terminated<br />
– Returns s, or NULL on error<br />
Note: There also is a function char *gets(char *s) that attempts to read a<br />
line <strong>of</strong> input from stdin<br />
– Never use gets()!<br />
– Since there is no way to specify a maximal number <strong>of</strong> characters to read, we<br />
cannot ensure that gets() will not result in a buffer overflow error!<br />
Stephan Schulz 392
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void print_file(FILE *stream)<br />
{<br />
int c;<br />
}<br />
Example: Simple cat Implementation<br />
while((c=fgetc(stream))!=EOF)<br />
{<br />
fputc(c, stdout);<br />
}<br />
Stephan Schulz 393
int main(int argc, char *argv[])<br />
{<br />
int i;<br />
FILE *file;<br />
if(argc == 1)<br />
{<br />
print_file(stdin);<br />
}<br />
else<br />
{<br />
Example Continued<br />
Stephan Schulz 394
}<br />
for(i=1; i
$ man man | ./mycat1<br />
Example Output<br />
man(1) man(1)<br />
NAME<br />
...<br />
man - format <strong>and</strong> display the on-line manual pages<br />
manpath - determine user’s search path for man pages<br />
$ ./mycat1 does not exist<br />
./mycat1: No such file or directory<br />
$ ./mycat1 mycat1.c<br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void print_file(FILE *stream)<br />
Stephan Schulz 396
Exercises<br />
Write a version <strong>of</strong> memmove() using pointer assignment (the hard part is h<strong>and</strong>ling<br />
overlapping arrays – remember that you can compare pointers with <strong>and</strong> ==)!<br />
You may need to cast void* to char* to access individual bytes.<br />
Write a version <strong>of</strong> wc that more closely mimics the behaviour <strong>of</strong> the <strong>UNIX</strong><br />
version, i.e. that gives separate accounts <strong>and</strong> a total if called with more than one<br />
argument (if called with a single arguments, it just gives an account for that file,<br />
if called with none, it reads from stdin)<br />
Stephan Schulz 397
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
C St<strong>and</strong>ard Library<br />
Input <strong>and</strong> Output<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Remark about fgets()<br />
char *fgets(char *s, int n, FILE *stream)<br />
– Read at most n-1 characters into the array pointed to by s, stops early if a<br />
newline is encountered<br />
– *s is always ’\0’ terminated<br />
– Returns s, or NULL on error<br />
Note:<br />
– It is the responsibility <strong>of</strong> the caller (i.e. your program) to provide enough<br />
memory!<br />
– s already has to point to an array (or malloc()ed area <strong>of</strong> sufficient size<br />
This holds for most st<strong>and</strong>ard library functions!<br />
– . . . including gets() (never use gets()!)<br />
Stephan Schulz 399
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void print_file(FILE *stream)<br />
{<br />
int c;<br />
}<br />
Example: Simple cat Implementation<br />
while((c=fgetc(stream))!=EOF)<br />
{<br />
fputc(c, stdout);<br />
}<br />
Stephan Schulz 400
int main(int argc, char *argv[])<br />
{<br />
int i;<br />
FILE *file;<br />
if(argc == 1)<br />
{<br />
print_file(stdin);<br />
}<br />
else<br />
{<br />
Example Continued<br />
Stephan Schulz 401
}<br />
for(i=1; i
$ man man | ./mycat1<br />
Example Output<br />
man(1) man(1)<br />
NAME<br />
...<br />
man - format <strong>and</strong> display the on-line manual pages<br />
manpath - determine user’s search path for man pages<br />
$ ./mycat1 does not exist<br />
./mycat1: No such file or directory<br />
$ ./mycat1 mycat1.c<br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void print_file(FILE *stream)<br />
Stephan Schulz 403
You can redirect files into stdin:<br />
– mycat1 < mycat.c<br />
Reminder: Using stdin<br />
You can type into stdin from your terminal<br />
– Type [C-d] (^d), ”Control-D” to indicate end <strong>of</strong> input<br />
– Depending on your version <strong>of</strong> <strong>UNIX</strong> <strong>and</strong> your terminal, you may have to type<br />
[C-d] on a line <strong>of</strong> it’s own<br />
Stephan Schulz 404
Buffering <strong>and</strong> Flushing<br />
Both input <strong>and</strong> output streams can be buffered<br />
– Unbuffered streams will pass on each individual character as soon as possible<br />
– Fully buffered streams will wait until the (arbitrary sized) buffer is full until<br />
they pass on the collected data as one chunk<br />
– Text streams can also be line buffered. A line buffered stream will collect at<br />
most one line <strong>of</strong> data<br />
int fflush(FILE* stream) will flush all buffers associated with an output<br />
stream<br />
– Causes data to be actually written (if the writing process dies, the data is<br />
safe), although the OS may still have another layer <strong>of</strong> buffers<br />
– Return value: 0 on success, EOF on failure<br />
– Calling fflush(NULL) flushes all open streams<br />
– Calling fflush(NULL) on an input stream invokes undefined behaviour<br />
Stephan Schulz 405
Buffering<br />
By default, the st<strong>and</strong>ard streams are buffered as follows:<br />
– stdin is line buffered<br />
– stdout is line buffered<br />
– stderr is unbuffered<br />
You can change the buffering state with the funcion<br />
int setvbuff(FILE *stream, char* buff, int mode, size t size)<br />
– buff points to a buffer <strong>of</strong> at least size byte (or it is NULL, in which case a<br />
buffer will be malloc()ed)<br />
– Mode can be one <strong>of</strong> three predefined values:<br />
∗ IOFBF for full buffering<br />
∗ IONBF to disable buffering<br />
∗ IOLBF to enable line buffering<br />
void setbuf(FILE *stream, char *buff) is a simpler interface:<br />
– If buff is zero, buffering is switched <strong>of</strong><br />
– Otherwise, full buffering wit a buffer size BUFSIZ is enabled (<strong>and</strong> buff has to<br />
point to a large enough buffer!)<br />
Stephan Schulz 406
#include <br />
#include <br />
int main(int argc, char* argv[])<br />
{<br />
char name[80];<br />
char buffer[BUFSIZ];<br />
}<br />
setbuf(stdout, buffer);<br />
printf("Please enter name: ");<br />
fgets(name,80,stdin);<br />
printf("Your name is: %s\n", name);<br />
setbuf(stdout, NULL);<br />
printf("Please enter name: ");<br />
fgets(name,80,stdin);<br />
printf("Your name is: %s\n", name);<br />
return EXIT_SUCCESS;<br />
Example<br />
Stephan Schulz 407
$ ./bufftest<br />
Example Behaviour<br />
Stephan<br />
Please enter name: Your name is: Stephan<br />
Please enter name: Schulz<br />
Your name is: Schulz<br />
Stephan Schulz 408
More Operations on Files<br />
int remove(const char *filename)<br />
– Removes a file (as in rm)<br />
– Return 0 on success, something else on failure<br />
in rename(const char *oldname, const char *newname)<br />
– Rename a file (as in mv)<br />
– Return 0 on success, something else on failure<br />
FILE *tmpfile(void)<br />
– Creates a temporary file with mode wb+ (reading <strong>and</strong> writing in binary)<br />
– The file will vanish if the program terminates normally<br />
– On failure, NULL will be returned<br />
Stephan Schulz 409
char *tmpnam(char *s)<br />
Even More File Operations<br />
– Creates a file name that is different from any existing name<br />
– If called with argument NULL, will return a pointer to a static buffer containing<br />
the name<br />
– Otherwise, s has to point to an array <strong>of</strong> at least L tmpnam bytes<br />
– Note: Using tmpnam() in security-critical applications is discouraged, as it<br />
creates a race condition (what if another process creates a file with the name<br />
in between the call to tmpnam() <strong>and</strong> fopen()?)<br />
Stephan Schulz 410
Error Functions<br />
Each FILE data structure stores two pieces <strong>of</strong> information:<br />
– If end-<strong>of</strong>-file has been reached during reading<br />
– If an error occurred<br />
int fe<strong>of</strong>(FILE *stream) returns true if the end-<strong>of</strong>-file indicator has been set<br />
int ferror(FILE *stream) returns true if the error indicator is set<br />
void clearerr(FILE *stream) clears both indicators<br />
void perror(const char *s prints an error message to stderr as follows:<br />
– First, the supplied string is printed, followed by a colon<br />
– Then the error message for the current value <strong>of</strong> errno is printed, followed by<br />
a newline<br />
Stephan Schulz 411
Exercises<br />
Write a simple database that keeps given name, family name, <strong>and</strong> date <strong>of</strong> birth<br />
for a person. Subtasks:<br />
– Create a dialog where people can enter data<br />
– Create an interface for searching for data, based on any criterium<br />
– Create an interface where you can print lists <strong>of</strong> people, possibly sorted by any<br />
<strong>of</strong> the data fields<br />
You need to think about the data base structure (a flat text file should work, see<br />
e.g. /etc/passwd for ideas)<br />
You need an architecture for your overall program<br />
– The conventional way is to use one monolithic program with a a menue<br />
structure (use text menues...)<br />
– The <strong>UNIX</strong> way would be to write one program for each task<br />
Stephan Schulz 412
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
C St<strong>and</strong>ard Library<br />
Formated Output<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Formatted Output<br />
The formatted output functions <strong>of</strong>fer a very convenient way <strong>of</strong> printing data in a<br />
controlled manner<br />
– They are able to print all basic C datatypes (<strong>and</strong> strings)<br />
– They can print any number <strong>of</strong> arguments with one comm<strong>and</strong><br />
– For most datatypes, there are multiple useful formats<br />
– Argument output <strong>and</strong> descriptive strings can be interspersed easily<br />
Output format is determined by a format string argument<br />
– The format string contains ordinary text that is copied directly to the output<br />
– It also contains conversion specifiers that describe how to format additional<br />
arguments<br />
Formatted output functions are variadic, i.e. they take a variable number <strong>of</strong><br />
arguments<br />
– Number <strong>of</strong> arguments is determined by the number <strong>of</strong> conversion specifiers<br />
– Modern compilers check this property if the format string is constant<br />
Stephan Schulz 414
A first Example<br />
printf("%d divided by %d = %f\n",22,7,22/7.0);<br />
– The first argument to printf is the format string<br />
– It contains 3 conversion specifiers:<br />
∗ The first %d specifies an int argument that should be printed in decimal<br />
notation <strong>and</strong> corresponds to the first extra argument, 22<br />
∗ The second %d corresponds to the third argument, 7<br />
∗ Finally, the %f specifies a double (floating point) argument that should be<br />
printed in pure decomal notation (with fractional part after the decimal dot)<br />
The format string also contains additional text<br />
– Text is printed<br />
– Note that normal conventions hold, i.e. \n in a string literal is the newline<br />
character<br />
Output printed:<br />
22 divided by 7 = 3.142857<br />
Stephan Schulz 415
The printf() Family <strong>of</strong> Functions<br />
All functions are declared in <br />
int printf(char *format, ...);<br />
– Print the additional arguments under control <strong>of</strong> the argument string to stdout<br />
– Returns number <strong>of</strong> characters printed, or any negative number on error<br />
int fprintf(FILE *stream, char *format, ...);<br />
– As printf(), but print to the designated output stream<br />
int sprintf(char *s, char *format, ...);<br />
– Instead <strong>of</strong> actually printing anything, sprintf() will store the output characters<br />
in the character array s points to<br />
– The string will be \0 terminated<br />
– It is the responsibility <strong>of</strong> the programmer to make sure *s is big enough<br />
– The returned count <strong>of</strong> characters does not include the terminating nul character<br />
(i.e. it is the same value that printf() would return)<br />
Stephan Schulz 416
Format Specifiers<br />
Format specifiers always start with a % character, <strong>and</strong> end in a conversion letter<br />
– The conversion letter describes the basic output format<br />
– It normally also decribes which kind <strong>of</strong> argument has to follow<br />
Optional parts <strong>of</strong> a format specifier include (in order)<br />
– Flags (affect how the result will be printed)<br />
– Minimum field width (if fewer characters are necessary, padding will be used)<br />
– Precision (number <strong>of</strong> significant digits/characters)<br />
– Size modifier (e.g. require short or long instead <strong>of</strong> int)<br />
Stephan Schulz 417
Some Conversion Letters (1)<br />
d: Convert an int argument <strong>and</strong> print it in decimal representation<br />
i: Alias for d<br />
u: Convert an unsigned int argument <strong>and</strong> print it in decimal representation<br />
o: Convert an unsigned int argument <strong>and</strong> print it in octal representation<br />
x: Convert an unsigned int argument <strong>and</strong> print it in hexadecimal representation,<br />
using {a, b, c, d, e, f} for the extra hexadecimal digits<br />
X: As x, but use upper case hex digits ({A, B, C, D, E, F})<br />
p: Convert a void* pointer <strong>and</strong> print it in an implementation-defined manner<br />
(for our system, <strong>and</strong> for many other systems, the argument is printed as a<br />
hexadecimal number representig the address)<br />
Stephan Schulz 418
Some Conversion Letters (2)<br />
f: Print a double argument (float is converted automatically) as a sequence<br />
<strong>of</strong> digits with a decimal point<br />
– Unless otherwise specified via the precision modifier, 6 digits are printed after<br />
the decimal point<br />
e: Print a double argument in normalized exponential form, with 1 digit before<br />
the decimal dot (<strong>and</strong> by default 6 digits after the dot). Example: 3.141593e+01<br />
(= 3.141593 ∗ 10 1 )<br />
E: As e, but print upper case E before exponent<br />
g: “Human-friendly floating point output”. Print a double number either as<br />
with e (for very small numbers) or with f letters, cutting <strong>of</strong> unneccessary training<br />
zeros<br />
G: As g, but use E instead <strong>of</strong> e<br />
Stephan Schulz 419
Some Conversion Letters (3)<br />
c: Print a single int argument by converting it to char <strong>and</strong> printing the<br />
corresponding character (use %i to print the numeric value <strong>of</strong> a character)<br />
s: Print a C style string, converting a char* argument pointing to a \0-terminated<br />
string<br />
%: Convert no arguments, just print a single % character (i.e. %% in the format<br />
string generates a single % in the output)<br />
Remarks:<br />
– We have not covered some <strong>of</strong> the more esoteric conversions<br />
– The 1995 addendum to C89 <strong>and</strong> the C99 st<strong>and</strong>ard add additional conversion<br />
characters<br />
– For more details, check man 3 printf<br />
Stephan Schulz 420
#include <br />
#include <br />
int main(void)<br />
{<br />
char *p = "This is a string";<br />
}<br />
Example<br />
printf("12 with... %%d: %d, %%u: %u, %%o: %o, %%x: %x, %%X: %X\n",<br />
12, 12, 12, 12, 12);<br />
printf("12.5 with... %%f: %f, %%e: %e, %%E: %E, \n%%g: %g, %%G: %G\n",<br />
12.5,12.5,12.5,12.5,12.5);<br />
printf("Printing a character with %%c: %c <strong>and</strong> %%d: %d\n",<br />
’a’, ’a’);<br />
printf("This is a string \"%s\" <strong>and</strong> its address: %p\n", p,p);<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 421
Example Output<br />
12 with... %d: 12, %u: 12, %o: 14, %x: c, %X: C<br />
12.5 with... %f: 12.500000, %e: 1.250000e+01, %E: 1.250000E+01,<br />
%g: 12.5, %G: 12.5<br />
Printing a character with %c: a <strong>and</strong> %d: 97<br />
This is a string "This is a string" <strong>and</strong> its address: 0x80485a0<br />
Stephan Schulz 422
Size Modifiers<br />
Size modifiers are used to change the default argument size:<br />
– The l modifier changes integer arguments to their long variants<br />
– It changes h modifier indicates that the argument is <strong>of</strong> type short or unsigned<br />
short instead <strong>of</strong> the default int<br />
The C99 st<strong>and</strong>ard introduces additional size modifiers:<br />
– z indicates argument <strong>of</strong> type size t (for integer arguments)<br />
– ll indicates long long versions <strong>of</strong> the integers<br />
– hh indicates char arguments instead <strong>of</strong> int types<br />
For us, the %ld version (long integer) is probably the most important one<br />
Stephan Schulz 423
Specifying Minimum Field Width<br />
The minimum field width is an integer literal between the % <strong>and</strong> the conversion<br />
letter (with optional size modifier)<br />
– It may be preceded by any flags<br />
– The precision, if any, follows it<br />
By default, any value is printed right-justified in its field<br />
– Padding is done with spaces:<br />
printf("|%7d|\n",12);<br />
| 12|<br />
If the natural value representation is bigger than the minimum field width, the<br />
specification has no effect<br />
printf("|%7s|\n", "A long string");<br />
|A long string|<br />
Stephan Schulz 424
The Flags<br />
-: Left-justify output (only useful in connection with a width specification)<br />
0: Use 0 for padding to requested field width (by default, ’ ’ (space) is used<br />
+: For numerical values: Always print a sign, either + or -<br />
’ ’ (space): Always print a character for the sign, - for negative numbers, ’ ’<br />
for positive ones<br />
#: Use a variant <strong>of</strong> the conversion operation<br />
– For %o, print a leading 0<br />
– For %x, print a leading 0x<br />
– For %X, print a leading 0X<br />
– For floating point numbers, trailing digits <strong>and</strong> decimal dot are always printed<br />
with the # flag<br />
Stephan Schulz 425
The Precision<br />
The precision field is used for a number <strong>of</strong> different things:<br />
– For any integer conversion character, it gives a minimum number <strong>of</strong> digits to<br />
print (by adding leading zeros)<br />
– For %e, %E <strong>and</strong> %f, it gives the number <strong>of</strong> digits in the fractional part<br />
– For %g <strong>and</strong> %G, it is the number <strong>of</strong> significant digits to be printed<br />
– Finally, for strings (%s), it gives the maximal number <strong>of</strong> characters to be<br />
printed from the string<br />
Stephan Schulz 426
#include <br />
#include <br />
Example<br />
int main(void)<br />
{<br />
printf("Floating point example: |%+8.2f|\n", 3.0/7.0);<br />
printf("Floating point example: |% 8.2f|\n", 3.0/7.0);<br />
printf("String: %-7.7s\n", "Longish String");<br />
printf("String: %-7.7s\n", "short");<br />
printf("String: %7.7s\n", "short");<br />
}<br />
return EXIT_SUCCESS;<br />
Output:<br />
Floating point example: | +0.43|<br />
Floating point example: | 0.43|<br />
String: Longish<br />
String: short<br />
String: short<br />
Stephan Schulz 427
Assignment<br />
Write an archiver program arch322. Your program should accept any number <strong>of</strong><br />
arguments (to be treated as filenames). It should write (to stdout) an archive,<br />
i.e. a file that contains enough information to recreate the original files with their<br />
names. For simplicity, allow only files in the current directory to be archived<br />
(check, if the arguments contain a / <strong>and</strong> print an error message if yes). Also<br />
print useful error messages if one <strong>of</strong> the named files does not exist, etc.<br />
Write a dearchiver dearch322 that accepts an archive file (in your format) on<br />
stdin <strong>and</strong> recreates the original files in the current directory. Print an error<br />
message if the file is not a valid archive.<br />
You are free to design your own archive format, but you may get some ideas from<br />
reading the documentation (man/info) on tar/gtar. Please document your<br />
format in one or two paragraphs. You may assume <strong>UNIX</strong> I/O, i.e. no difference<br />
between text <strong>and</strong> binary I/O.<br />
Example:<br />
Stephan Schulz 428
$ arch322 Makefile sort_csc322.c utilities.c > myarch.arch<br />
$ mkdir NEW<br />
$ cd NEW<br />
$ dearch322 < ../myarch.arch<br />
Recreating Makefile<br />
Recreating sort_csc322.c<br />
Recreating utilities.c<br />
$ ls<br />
$<br />
Makefile sort_csc322.c utilities.c<br />
Stephan Schulz 429
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Asynchronous Events <strong>and</strong> Signals<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Processes<br />
A <strong>UNIX</strong> process is an instance <strong>of</strong> a program in execution. It can be described by<br />
– The executable code (stored in the text segment <strong>of</strong> the virtual memory image<br />
<strong>of</strong> the process<br />
– The program data (stored in the data segement)<br />
– The state, including stack pointer <strong>and</strong> stack, program counter, etc. (usually<br />
collected in a process control block, or PCB)<br />
A process uses certain resources:<br />
– Processor time on a CPU<br />
– Memory, both virtual or real<br />
– File descriptors<br />
– . . .<br />
Some <strong>of</strong> its important properties are<br />
– Owner<br />
– Process id (pid), a unique non-negative integer<br />
– Parent (exception: init)<br />
Stephan Schulz 431
Usage: ps <br />
<strong>UNIX</strong> User Comm<strong>and</strong>s: ps<br />
– ps shows information about currently executing processes<br />
– It is one <strong>of</strong> the least st<strong>and</strong>ardized <strong>UNIX</strong> tools<br />
Our Linux ps can assume many different personalities<br />
– Different personalities show different behaviour<br />
– . . . <strong>and</strong> accept different options.<br />
Default behaviour (ps without options):<br />
– Show information about all existing processes <strong>of</strong> the current user controlled by<br />
the same terminal ps was run on<br />
– For each process, list:<br />
∗ Process Id (PID)<br />
∗ Controlling terminal (TTY)<br />
∗ CPU time used by the process<br />
∗ Name <strong>of</strong> the executable program file<br />
Stephan Schulz 432
$ ps<br />
PID TTY TIME CMD<br />
1125 pts/3 00:00:01 tcsh<br />
7157 pts/3 00:00:00 xevil<br />
7189 pts/3 00:00:00 gv<br />
7193 pts/3 00:00:00 gs<br />
7194 pts/3 00:00:00 ps<br />
Vanilla ps Example<br />
Stephan Schulz 433
Some ps Options<br />
Some simple BSD style options for the default personality (note: BDS style<br />
options for ps are not preceeded by a dash!)<br />
– a: Print information about all processes that are connected to any terminal<br />
– x: Print information about processes not connected to a terminal<br />
– U : Print information about processes owned by the named user<br />
– u: User oriented output with more interesting information:<br />
∗ Owner <strong>of</strong> a process (USER)<br />
∗ Process Id (PID)<br />
∗ Percentage <strong>of</strong> available CPU used by the process (%CPU)<br />
∗ Percentage <strong>of</strong> memory used (%MEM) (note that this measures virtual<br />
memory usage, real memory usage may be lower because <strong>of</strong> shared pages)<br />
∗ Virtual memory size <strong>of</strong> the process in KByte (VSZ)<br />
∗ Size <strong>of</strong> the resident set, i.e. the recently referenced pages not swapped out<br />
(RSS)<br />
∗ Controlling terminal (TTY)<br />
∗ Time or date when the process was started (START)<br />
∗ Seconds <strong>of</strong> CPU time used (TIME)<br />
∗ Full comm<strong>and</strong> used to start the process (COMMAND)<br />
Stephan Schulz 434
$ ps aux<br />
Interesting ps Example<br />
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND<br />
root 1 0.0 0.1 1368 432 ? S Oct30 0:04 init<br />
root 2 0.0 0.0 0 0 ? SW Oct30 0:03 [keventd]<br />
root 3 0.0 0.0 0 0 ? SW Oct30 0:00 [kapmd]<br />
...<br />
root 486 0.0 0.1 1372 408 ? S Oct30 0:00 /sbin/dhcpcd -n -h wo<br />
root 551 0.0 0.2 1644 668 ? S Oct30 0:00 syslogd -m 0<br />
...<br />
schulz 1095 0.0 4.8 16112 12268 ? S Oct30 4:40 emacs -geometry 96x77<br />
schulz 1096 0.0 0.8 4944 2216 ? S Oct30 0:05 xterm -geometry 80x40<br />
schulz 1997 0.0 0.5 3072 1476 ? S Oct31 0:12 ssh sherman emacs<br />
root 4073 0.0 1.0 7480 2768 pts/3 S Oct31 0:03 /usr/local/lib/xmcd/b<br />
schulz 22637 0.0 0.5 2940 1444 pts/5 S Nov05 0:03 ssh -X sunbroy2.infor<br />
schulz 22645 4.0 18.7 82248 47832 ? S Nov05 31:04 /usr/local/mozilla/mo<br />
schulz 6722 0.0 0.0 0 0 ? Z Nov05 0:00 [plugger ]<br />
schulz 7189 0.0 0.8 3948 2220 pts/3 S 00:15 0:00 gv <strong>CSC322</strong>_1.pdf<br />
schulz 7235 0.4 2.2 10060 5668 pts/3 S 00:41 0:00 gs -dNOPLATFONTS -sDE<br />
schulz 7236 76.8 38.0 98072 96896 pts/0 R 00:43 0:33 eprover /home/schulz/<br />
news 7237 0.5 1.0 3704 2796 ? S 00:43 0:00 leafnode<br />
schulz 7258 0.0 0.2 2624 708 pts/3 R 00:43 0:00 ps aux<br />
Stephan Schulz 435
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Signals <strong>and</strong> Signal H<strong>and</strong>lers<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Signals<br />
Signals are a way to signal unusal events to a process<br />
– Run time errors<br />
– User requests<br />
– Pending communication<br />
In general, signals can arrive assynchronously, i.e. at any time<br />
Signals can have many different values, depending on the value, the process can<br />
– Ignore a signal<br />
– Perform a default action (defined by the implementation)<br />
– Invoke an explicit signal h<strong>and</strong>ler<br />
Stephan Schulz 437
St<strong>and</strong>ard C Signals<br />
St<strong>and</strong>ard C defines a small number <strong>of</strong> signals, <strong>UNIX</strong> defines many more<br />
Signal Meaning Default Action (<strong>UNIX</strong>)<br />
SIGABRT Abort the process Terminate<br />
SIGFPE Floating point exception Terminate with core<br />
SIGILL Illegal instruction Terminate with core<br />
SIGINT Interactive interrupt Terminate<br />
SIGSEGV Illegal memory access Terminate with core<br />
SIGTERM Termination request Terminate<br />
Note: SIGINT is generated when you press [CTRL-C]!<br />
– The signal is delivered to the process<br />
– The default action is to terminate the process<br />
Stephan Schulz 438
Some <strong>UNIX</strong> Signals<br />
<strong>UNIX</strong> defines about 60 different signals, including all St<strong>and</strong>ard C signals<br />
Some important <strong>UNIX</strong> signals:<br />
Signal Meaning Default Action (<strong>UNIX</strong>)<br />
SIGHUP Terminal connection lost (or controlling<br />
process dies)<br />
Terminate<br />
SIGKILL Kill process, cannot be caught or<br />
ignored<br />
Terminate<br />
SIGBUS Bus error Terminate with core<br />
SIGSTOP Stop a process (does not terminate,<br />
cannot be caught or ignored)<br />
Suspends process<br />
SIGCONT Continue suspended process Ignored (*)<br />
SIGURG Out <strong>of</strong> b<strong>and</strong> data arrived on a socket Ignore<br />
SIGXCPU CPU time limit reached Terminate with core<br />
(*) OS will still wake process up<br />
[CTRL-Z] generates SIGSTOP!<br />
Stephan Schulz 439
<strong>UNIX</strong> User Comm<strong>and</strong>: kill<br />
Note: kill is <strong>of</strong>ten implemented as a shell built-in<br />
– Syntax may differ slightly from the kill program<br />
– Allows use <strong>of</strong> kill in job control<br />
Usage for our kill: kill [-] ...<br />
– If no signal is specified, SIGTERM is sent<br />
– Signals can be specified symbolically (for a list <strong>of</strong> names run kill -l) or<br />
numerically (man 7 signal gives a list <strong>of</strong> signals <strong>and</strong> their numeric values)<br />
kill accepts a list <strong>of</strong> arguments<br />
– Most common case: is a normal process id (a positive integer). The<br />
signal is sent to the corresponding process<br />
– If is -1, the signal is sent to all processes <strong>of</strong> the user (kill -KILL<br />
-1 is a surefire way to log yourself out)<br />
– Finally, if is any other negative number, the signal is sent to the<br />
corresponding process group<br />
Stephan Schulz 440
top is an interactive version <strong>of</strong> ps<br />
<strong>UNIX</strong> User Comm<strong>and</strong>s: top<br />
– It shows various information about the top processed currently running<br />
– Also shows general system information<br />
– All information is periodically updates<br />
– top seems to be more consistent between different <strong>UNIX</strong> dialects, <strong>and</strong> is <strong>of</strong>ten<br />
preferred for interactive use (or even for scripting)<br />
top also can be used to send signals to processes<br />
– Press [k] <strong>and</strong> then specify process <strong>and</strong> signal<br />
Non-interactive use <strong>of</strong> top (“better ps”):<br />
– top -b -n1 will print a single page in a ps-like manner<br />
For more information: man top or run top <strong>and</strong> hit [h] for help<br />
Stephan Schulz 441
top Example<br />
11:09pm up 8 days, 1:15, 7 users, load average: 0.59, 0.21, 0.07<br />
78 processes: 71 sleeping, 4 running, 3 zombie, 0 stopped<br />
CPU states: 95.2% user, 4.7% system, 0.0% nice, 0.0% idle<br />
Mem: 254576K av, 249892K used, 4684K free, 0K shrd, 7428K buff<br />
Swap: 522072K av, 30888K used, 491184K free 68440K cached<br />
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND<br />
12692 schulz 25 0 25548 24M 664 R 89.3 10.0 0:08 eprover<br />
1040 root 15 0 89416 15M 5424 S 5.5 6.4 919:35 X<br />
1097 schulz 15 0 2324 2124 1676 S 3.7 0.8 0:15 xterm<br />
12693 schulz 16 0 924 924 728 R 1.1 0.3 0:00 top<br />
1096 schulz 15 0 2512 2252 1708 R 0.1 0.8 0:07 xterm<br />
1 root 15 0 472 432 416 S 0.0 0.1 0:04 init<br />
2 root 15 0 0 0 0 SW 0.0 0.0 0:04 keventd<br />
3 root 15 0 0 0 0 SW 0.0 0.0 0:00 kapmd<br />
4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ks<strong>of</strong>tirqd_CPU0<br />
5 root 15 0 0 0 0 SW 0.0 0.0 0:09 kswapd<br />
6 root 15 0 0 0 0 SW 0.0 0.0 0:00 bdflush<br />
7 root 15 0 0 0 0 SW 0.0 0.0 0:00 kupdated<br />
8 root 25 0 0 0 0 SW 0.0 0.0 0:00 mdrecoveryd<br />
12 root 15 0 0 0 0 SW 0.0 0.0 0:01 kjournald<br />
...<br />
Stephan Schulz 442
Catching Signals<br />
User programs can set up a signal h<strong>and</strong>ler to catch signals<br />
– A signal h<strong>and</strong>ler is a normal function<br />
– It has to be explicitely set up for each signal type<br />
– It will be called asynchronously when a signal <strong>of</strong> the correct type has been<br />
caught<br />
– When the signal h<strong>and</strong>ler returns, the program will resume execution at the old<br />
spot<br />
<strong>UNIX</strong> implements several different ways <strong>of</strong> h<strong>and</strong>ling signals, we will concentrate<br />
on the ANSI C signal h<strong>and</strong>ling<br />
– All use the same signal: Signals are small integers<br />
– However, for all existing signals, we use the #defined name showed above<br />
(SIGHUP. . . )<br />
Signal h<strong>and</strong>ling stuff is defined in <br />
Stephan Schulz 443
ANSI C Signal H<strong>and</strong>ling with signal.h<br />
signal.h defines the signal() function for establishing signal h<strong>and</strong>lers as<br />
follows:<br />
void (*signal(int sig, void (*h<strong>and</strong>ler)(int)))(int)<br />
Huh?<br />
Stephan Schulz 444
ANSI C Signal H<strong>and</strong>ling with signal.h<br />
signal.h defines the signal() function for establishing signal h<strong>and</strong>lers as<br />
follows:<br />
void (*signal(int ig, void (*h<strong>and</strong>ler)(int)))(int)<br />
We can break this definition up as follows:<br />
typedef void (*SigH<strong>and</strong>ler)(int);<br />
SigH<strong>and</strong>ler signal(int sig, SigH<strong>and</strong>ler h<strong>and</strong>ler);<br />
– The first argument to signal() is the signal to be caught<br />
– The second argument is a pointer to the new signal h<strong>and</strong>ler<br />
– Return value is a pointer to the old signal h<strong>and</strong>ler for that signal (or SIG ERR<br />
if no signal h<strong>and</strong>ler could be established)<br />
Predefined (pseudo) signal h<strong>and</strong>lers (possible arguments to signal():<br />
– SIG DFL: Revert to the default behaviour for that signal<br />
– SIG IGN: Ignore the signal from now on<br />
Stephan Schulz 445
ANSI C Signal H<strong>and</strong>ling (Continued)<br />
Additional definitions in signal.h:<br />
sig atomic t is an integer type<br />
– We are guartanteed that an assignment to a variable <strong>of</strong> this type is atomic,<br />
i.e. will not be interrupted by e.g. another signal<br />
– That means that it’s value will always be well-defined<br />
int raise(int sig) raises a signal to the program<br />
– Return value: 0 on success, something else otherwise<br />
Stephan Schulz 446
ANSI C Signal H<strong>and</strong>ers<br />
A signal h<strong>and</strong>ler is a function that returns nothing <strong>and</strong> gets the signal that was<br />
caught as an argument<br />
There are several limitations on signal h<strong>and</strong>ler:<br />
– Since signals can arrive asynchronously, the state <strong>of</strong> the program is not<br />
well-defined!<br />
– Signals may be h<strong>and</strong>led even within a single C statement<br />
– Therefore a signal h<strong>and</strong>ler cannot make many assumptions about the state <strong>of</strong><br />
the program<br />
– For maximum portability, a signal h<strong>and</strong>ler should only<br />
∗ Reestablish itself by calling signal()<br />
∗ Assing a value to a variable <strong>of</strong> type volatile sig atomic t<br />
∗ Return or terminate the program (e.g. calling exit())<br />
Once a signal has been caught, the signal h<strong>and</strong>ler for that signal is reset to<br />
default behaviour<br />
– If you want to catch multiple signals, the signal h<strong>and</strong>ler has to reestablish itself<br />
Stephan Schulz 447
Common <strong>UNIX</strong> functions: sleep()<br />
Often, a program only has to perform task only occasionally, or it has to wait for<br />
a certain event to happen. ANSI C has no way <strong>of</strong> delaying a program<br />
– Old-style home computer programmers use busy delay loop<br />
– However, those are unacceptable on multi-user systems<br />
– Moreover, they can usually be optimized away by a good compiler<br />
All <strong>UNIX</strong> versions address this problem with the sleep() function (normally<br />
defined in ):<br />
unsigned int sleep(unsigned int seconds);<br />
sleep() makes the current process sleep (do nothing ;-) until either<br />
– (At least) seconds seconds have elapsed or<br />
– A non-ignored signal arrives<br />
Return value:<br />
– 0 if sleep terminated because <strong>of</strong> elapsed time<br />
– Number <strong>of</strong> seconds left when the process was awakened by a signal<br />
Stephan Schulz 448
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
Example: Counting Signals (Fluff)<br />
typedef void (*SigH<strong>and</strong>ler)(int);<br />
volatile sig_atomic_t sig_int_flag = 0;<br />
volatile sig_atomic_t sig_term_flag = 0;<br />
void EstablishSignal(int sig, SigH<strong>and</strong>ler h<strong>and</strong>ler)<br />
{<br />
SigH<strong>and</strong>ler res;<br />
}<br />
res = signal(sig, h<strong>and</strong>ler);<br />
if(res == SIG_ERR)<br />
{<br />
perror("Could not establish signal h<strong>and</strong>ler");<br />
exit(EXIT_FAILURE);<br />
}<br />
Stephan Schulz 449
Example: Counting Signals (The Signal H<strong>and</strong>lers)<br />
void sig_int_h<strong>and</strong>ler(int sig)<br />
{<br />
EstablishSignal(SIGINT, sig_int_h<strong>and</strong>ler);<br />
}<br />
assert(sig == SIGINT);<br />
printf("Caught SIGINT!\n"); /* Risky */<br />
sig_int_flag = 1;<br />
void sig_term_h<strong>and</strong>ler(int sig)<br />
{<br />
EstablishSignal(SIGTERM, sig_term_h<strong>and</strong>ler);<br />
assert(sig == SIGTERM);<br />
printf("Caught SIGTERM!\n"); /* Risky! */<br />
sig_term_flag = 1;<br />
}<br />
Stephan Schulz 450
int main(int argc, char* argv[])<br />
{<br />
int i;<br />
int int_counter = 0;<br />
}<br />
Example: Counting Signals (Main)<br />
EstablishSignal(SIGTERM, sig_term_h<strong>and</strong>ler);<br />
EstablishSignal(SIGINT, sig_int_h<strong>and</strong>ler);<br />
for(i=0; i
(Change to Terminal!)<br />
Example Session: Live<br />
Stephan Schulz 452
Exercises<br />
Start a long running process (e.g. top) in one xterm<br />
Send it various signals <strong>and</strong> see how it behaves<br />
Read man 7 signal, man kill <strong>and</strong> man 2 kill<br />
Download the source to the signal h<strong>and</strong>ler example <strong>and</strong> play with it<br />
– Send different signals to it<br />
– Add your own signal h<strong>and</strong>ler<br />
– Write a generic signal h<strong>and</strong>ler function that catches more than one signal (<strong>and</strong><br />
works correctly for multiple signals)<br />
Stephan Schulz 453
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
The <strong>UNIX</strong> File System<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
<strong>UNIX</strong> philosophy: Everything is a file<br />
<strong>UNIX</strong> File System<br />
– Plain files<br />
– Hardware devices (Keyboard, mouse, hard drives)<br />
– Network connections<br />
Consequently, <strong>UNIX</strong> specifies a lot more properties <strong>and</strong> has more ways <strong>of</strong><br />
manipulating a file then ANSI C<br />
– Low-level IO<br />
– File access rights<br />
– Different file types<br />
Note: These are not ANSI C features<br />
– We have to call gcc without the -ansi option to use most <strong>of</strong> these features<br />
(otherwise, most <strong>UNIX</strong> extensions are disabled)<br />
Stephan Schulz 455
Regular files<br />
<strong>UNIX</strong> File Types (1)<br />
– Boring old data file (most common type <strong>of</strong> file)<br />
– <strong>UNIX</strong> does not care what is inside that file<br />
Directories<br />
– Stores names <strong>and</strong> pointers to more information<br />
– Write access is limited to kernel file system functions to assure the integrity <strong>of</strong><br />
the file system<br />
Character special files<br />
– Represent hardware devices that generate individual characters (/dev/kbd,<br />
/dev/mouse)<br />
Block special files<br />
– Represent hardware where data is available in fixed-size blocks (e.g. hard<br />
drives, /dev/hda in Linux)<br />
Stephan Schulz 456
FIFOs (named pipes)<br />
<strong>UNIX</strong> File Types (2)<br />
– Special files used for interprocess communication<br />
Sockets<br />
– Special files used for network communication (or local interprocess communication)<br />
– Not available in all <strong>UNIX</strong> versions (some don’t represent network connections<br />
as files in the file system)<br />
Symbolic links<br />
– A symbolic link is a file containing just a file name<br />
– The kernel normally automatically redirects any access to the link to the named<br />
file<br />
Stephan Schulz 457
The stat() Functions<br />
The three functions in the stat family all allow us to extract information about<br />
a file<br />
– Who owns it<br />
– How big is it<br />
– What kind <strong>of</strong> file is it<br />
– . . .<br />
They are specified as follows:<br />
#include <br />
#include <br />
int stat(const char *pathname, struct stat *buf);<br />
int fstat(int filedes, struct stat *buf);<br />
int lstat(const char *pathname, struct stat *buf);<br />
Stephan Schulz 458
The stat() Functions (2)<br />
All three functions perform the same basic function:<br />
– Write information about a file into the structure buf points to (<strong>and</strong> which we<br />
have to provide)<br />
– Return 0 if the operation was possible, -1 otherwise (in which case they also<br />
set errno)<br />
Differences:<br />
– fstat() accepts a low level file descriptor referring to an open file<br />
– lstat() will not follow symbolic links, but give information about the link<br />
itself (stat() given information about the file pointed to)<br />
How exactly struct stat is defined may differ<br />
– It always constains certain st<strong>and</strong>ard members<br />
Stephan Schulz 459
The stat() Functions (3)<br />
struct stat {<br />
dev_t st_dev; /* device number*/<br />
dev_t st_rdev; /* device type (if inode device) */<br />
ino_t st_ino; /* inode number */<br />
mode_t st_mode; /* access rights <strong>and</strong> file type */<br />
nlink_t st_nlink; /* number <strong>of</strong> hard links */<br />
uid_t st_uid; /* user ID <strong>of</strong> owner */<br />
gid_t st_gid; /* group ID <strong>of</strong> owner */<br />
<strong>of</strong>f_t st_size; /* total size, in bytes */<br />
unsigned long st_blksize; /* blocksize for filesystem I/O */<br />
unsigned long st_blocks; /* number <strong>of</strong> blocks allocated */<br />
time_t st_atime; /* time <strong>of</strong> last access */<br />
time_t st_mtime; /* time <strong>of</strong> last modification */<br />
time_t st_ctime; /* time <strong>of</strong> last change */<br />
};<br />
Interpretation <strong>of</strong> some fields is supported by predefine macros<br />
– E.g. st mode<br />
Stephan Schulz 460
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
void stat_file(char *fname)<br />
{<br />
struct stat buff;<br />
char* type = "Unknown";<br />
if(lstat(fname, &buff) < 0)<br />
{<br />
err_sys("lstat");<br />
}<br />
Example: Simple ls -l Version<br />
Stephan Schulz 461
if(S_ISREG(buff.st_mode))<br />
{<br />
type = "Regular file";<br />
}<br />
else if(S_ISDIR(buff.st_mode))<br />
{<br />
type = "Directory";<br />
}<br />
else if(S_ISCHR(buff.st_mode))<br />
{<br />
type = "Character special file";<br />
}<br />
else if(S_ISBLK(buff.st_mode))<br />
{<br />
type = "Block special file";<br />
}<br />
else if(S_ISFIFO(buff.st_mode))<br />
{<br />
type = "Pipe or FIFO";<br />
}<br />
Example Continued<br />
Stephan Schulz 462
}<br />
Example Continued<br />
else if(S_ISLNK(buff.st_mode))<br />
{<br />
type = "Symbolic link";<br />
}<br />
else if(S_ISSOCK(buff.st_mode))<br />
{<br />
type = "Socket";<br />
}<br />
printf("%-30s %10ld Bytes %s\n", fname, buff.st_size,type);<br />
int main(int argc, char *argv[])<br />
{<br />
int i;<br />
}<br />
for(i=1; i
$ /SOURCES/CSC 322/myls *<br />
Example Output<br />
BINTREE 533 Bytes Directory<br />
LIST_DEMO 549 Bytes Directory<br />
Makefile 1322 Bytes Regular file<br />
Makefile~ 1277 Bytes Regular file<br />
RPN_CALC 630 Bytes Directory<br />
RPN_CALC.tgz 10197 Bytes Regular file<br />
SORT 373 Bytes Directory<br />
a.out 13756 Bytes Regular file<br />
base_converter 14634 Bytes Regular file<br />
base_converter.c 1918 Bytes Regular file<br />
base_converter.c~ 430 Bytes Regular file<br />
celsius2fahrenheit 13633 Bytes Regular file<br />
celsius2fahrenheit.c 395 Bytes Regular file<br />
charcount 13639 Bytes Regular file<br />
charcount.c 216 Bytes Regular file<br />
charcount.c~ 114 Bytes Regular file<br />
charuniq 13643 Bytes Regular file<br />
charuniq.c 571 Bytes Regular file<br />
...<br />
Stephan Schulz 464
Example Output (<strong>of</strong> device directory /dev/)<br />
$ /SOURCES/CSC 322/myls *<br />
...<br />
cdrom 8 Bytes Symbolic link<br />
cdu535 0 Bytes Block special file<br />
cfs0 0 Bytes Character special file<br />
cm205cd 0 Bytes Block special file<br />
cm206cd 0 Bytes Block special file<br />
console 0 Bytes Character special file<br />
core 11 Bytes Symbolic link<br />
cpu 196 Bytes Directory<br />
cua0 0 Bytes Character special file<br />
cua1 0 Bytes Character special file<br />
...<br />
ham 0 Bytes Character special file<br />
hda 0 Bytes Block special file<br />
hda1 0 Bytes Block special file<br />
hda10 0 Bytes Block special file<br />
hda11 0 Bytes Block special file<br />
hda12 0 Bytes Block special file<br />
...<br />
Stephan Schulz 465
Links<br />
Links form a connection between a file name <strong>and</strong> the actual file<br />
There are two kinds <strong>of</strong> links:<br />
– Hard links<br />
– Symbolic (or s<strong>of</strong>t) links<br />
A hard link links a name <strong>and</strong> a file<br />
– Each file can have multiple hard links<br />
– All are equivalent (no concept <strong>of</strong> “original link”), access is equally efficient for<br />
all hard links<br />
– rm actually only removes a link, if the number <strong>of</strong> links becomes 0, the file is<br />
finally removed)<br />
– Typically, it is only possible to have hard links to a file on the same physical<br />
partition or medium<br />
Stephan Schulz 466
Links (2)<br />
S<strong>of</strong>t links create indirect aliases for a file<br />
– They are just files that contain another file name<br />
– Following a s<strong>of</strong>t link incurrs a small performance penalty<br />
– Symbolic links point anywhere in the file system (no limitations as to physical<br />
medium, networked file system, . . . )<br />
– Symbolic links do not influence the file pointed to at all!<br />
– If the file does not exist any more, the link still exists, but is broken<br />
Most user-created links are s<strong>of</strong>t links nowadays<br />
– Used to share files<br />
– Used to hide file system reorganization<br />
Stephan Schulz 467
<strong>UNIX</strong> User Comm<strong>and</strong>s: ln<br />
ln is used to create both hard <strong>and</strong> symbolic links<br />
Usage is similar to mv <strong>and</strong> cp:<br />
– ln : Create a link to in the current directory (under the<br />
same file name)<br />
– ln : Make a link to <br />
– ln ... : Create links to all targets in the current<br />
directories<br />
Important option: -s (create symbolic links)<br />
More: man ln<br />
Stephan Schulz 468
Exercises<br />
Read man stat <strong>and</strong> extend the ls example to show more information (e.g.<br />
everything ls -l shows)<br />
Explain the difference between mv filea fileb, cp filea fileb <strong>and</strong> ln<br />
filea fileb<br />
Stephan Schulz 469
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
The <strong>UNIX</strong> File System<br />
File modes<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
All files have an owner (a user)<br />
File Ownership<br />
– ls -l displays the user name (if available) or the numerical user id (e.g. for<br />
files <strong>of</strong> a user that no longer exists)<br />
Similarly, each file has a group associated with it<br />
– This will be similarly displayed by ls -l<br />
Owner <strong>and</strong> group <strong>of</strong> a file determine who has what kind <strong>of</strong> access to that file.<br />
Access types are<br />
– Read access (open a file for reading, reading data)<br />
– Write access (change a file)<br />
– Execute access (run a file as a program, or, for directories, access file names in<br />
that directory<br />
Stephan Schulz 471
ls -l Output Explained<br />
−rw−r−−r−− 1 schulz schulz 1283190 Nov 11 10:35 <strong>CSC322</strong>.pdf<br />
File size in Bytes (st_size)<br />
Group that owns the file (st_gid)<br />
User that owns the file (st_uid)<br />
File access rights (encoded in st_mode)<br />
File type (encoded in st_mode)<br />
Number <strong>of</strong> hard links (st_nlink)<br />
Filename<br />
Modification time (st_mtime)<br />
Note: All information (except for the file name) are available by calling one <strong>of</strong><br />
the stat() functions!<br />
Stephan Schulz 472
User Groups<br />
Groups are used in <strong>UNIX</strong> to give a group <strong>of</strong> users the ability to access a common<br />
resource<br />
– Most obvious use: Share files on the disk<br />
– In practice more important: Allow access to a hardware device (Note: A<br />
modem is a file, e.g. /dev/modem!)<br />
Every user belongs to a primary group<br />
– The primary group for a user is listed in the passwd file (as a numerical group<br />
id or gid):<br />
schulz:x:500:500:Stephan Schulz:/home/schulz:/bin/tcsh<br />
∗ For normal <strong>UNIX</strong> systems, /etc/passwd<br />
∗ For systems running NIS, see the file with ypcat passwd<br />
– After logging in, the users primary group is active (the gid <strong>of</strong> the shell has the<br />
value for the primary group)<br />
∗ Processes started by another process (including the shell) inherit the gid<br />
Stephan Schulz 473
Groups (Continued)<br />
Additional group information is in /etc/group:<br />
– For each group, a symbolic name (displayed by ls -l) <strong>and</strong> a list <strong>of</strong> users<br />
belonging to that group:<br />
daemon:x:2:root,bin,daemon<br />
schulz:x:500:<br />
Secondary groups are additional groups which list the user as a member<br />
– A user can explicitely change to such a group using the newgrp comm<strong>and</strong><br />
(man newgrp)<br />
Stephan Schulz 474
<strong>UNIX</strong> User Utilities: chown <strong>and</strong> chgrp<br />
chown is used to change the owner <strong>of</strong> a file<br />
– Usage: chown ...<br />
– On most systems, only root is allowed to use chown (there are security issues<br />
even with giving away files!)<br />
chgrp changes the group <strong>of</strong> a file<br />
– Usage: chgrp ...<br />
– On most systems, you can only change the group <strong>of</strong> a file to a group in which<br />
you are a member (see above)<br />
Important option for both: -R<br />
– Recursively apply the operation to subdirectories <strong>and</strong> files in them<br />
Stephan Schulz 475
File Mode Bits<br />
The status word <strong>of</strong> a file (the st mode field in struct stat also contains 9 bits<br />
describing file access rights<br />
– Note: These rights exist for all files, includding special files <strong>and</strong> directories!<br />
There are three different groups with potentiallty different access rights:<br />
– The user who owns the file<br />
– Members <strong>of</strong> the group associated with the file<br />
– Other users<br />
There are also three different types <strong>of</strong> access:<br />
– Read access<br />
– Write access<br />
– Execute access<br />
There are three more bits describing special properties<br />
– The setuid bit: If true, the file will run under the effective user id <strong>of</strong> the<br />
program owner (not the one who started it)<br />
– The setgid bit: Same thing for the group id<br />
– The sticky bit with complex semantics <strong>and</strong> interesting history<br />
Stephan Schulz 476
Symbolic Encoding<br />
ls -l prints a string <strong>of</strong> 9 letters to represent the 9 12 file mode bits<br />
Normal case: The setuid, setgid <strong>and</strong> sticky bit are all cleared (0):<br />
– The mode has the form uuugggooo to encode user, group, <strong>and</strong> other access<br />
rights<br />
– Each letter may be - to denote that that bit is clear<br />
– Or it may have the mnemonic value <strong>of</strong> that right:<br />
∗ r for read (first letter)<br />
∗ w for write (second letter)<br />
∗ x for execute (third letter)<br />
If one <strong>of</strong> the special bits is set, this is denoted by changing the last letter <strong>of</strong> each<br />
group (x) to another letter. Common cases (more: info ls):<br />
– s in the user executable position: The file is user executable <strong>and</strong> the setuid<br />
bit is set<br />
– s in the group executable position: The file is group executable, <strong>and</strong> the<br />
setgid bit is set<br />
Stephan Schulz 477
Numerical Encoding<br />
The 12 permission bits are normally represented by 4 octal digits (each digit<br />
represents 3 bits):<br />
– 0001 represents execute access for others<br />
– 0002 represents write access for others<br />
– 0004 represents read access for others<br />
– 0010 represents execute access for group<br />
– 0020 represents write access for group<br />
– 0040 represents read access for group<br />
– 0100 represents execute access for user<br />
– 0200 represents write access for user<br />
– 0400 represents read access for user<br />
– 1000 is the sticky bit<br />
– 2000 is the setgid bit<br />
– 4000 is the setuid bit<br />
To generate a composite mode, just add up the individual modes<br />
Leading zeroes (especially the first one) are <strong>of</strong>ten omitted<br />
Stephan Schulz 478
Examples<br />
rw-r--r-- is the most common mode for a regular file on a conventional <strong>UNIX</strong><br />
system:<br />
– The user is allowed to read <strong>and</strong> write the file<br />
– Everyone else is allowed to read the file (no secrets ;-)<br />
– Corresponding numerical value:<br />
0004 Other read<br />
0040 Group read<br />
0400 User read<br />
0200 User write<br />
---------------<br />
0644<br />
Numeric mode 666 (the number <strong>of</strong> the beast) gives full read <strong>and</strong> write access for<br />
everyone (rw-rw-rw-)<br />
– Some people claim that this is not coincidence. . .<br />
Stephan Schulz 479
<strong>UNIX</strong> User Utilities: chmod<br />
chmod is used to change the file access bits<br />
Usage 1: chmod files<br />
– Sets the file mode <strong>of</strong> the named files to the octal mode absolutely<br />
Usage 2: chmod files<br />
– The symbolic mode comm<strong>and</strong> can add or remove privileges for the different<br />
groups<br />
– Format: <br />
∗ can be any sequence <strong>of</strong> letters from ugo or a (equivalent to ugo)<br />
∗ can be<br />
· + to add rights<br />
· - to remove rights<br />
· = to absolutely assign rights<br />
∗ can be any combination <strong>of</strong> letters from rwx<br />
Important option: -R<br />
– Recursively modify files <strong>and</strong> subdirectories<br />
Stephan Schulz 480
chmod Examples<br />
chmod ugo+rwx myfile # Grant full access rights to everybody<br />
chmod 777 myfile # Grant full access rights to everybody<br />
chmod -R go-rwx . # Paranoid: Remove read, write, <strong>and</strong> exute<br />
# rights for all other people on the current<br />
# directory <strong>and</strong> all files <strong>and</strong> subdirectory<br />
chmod -R 644 . # Trying to fix things, but removed all<br />
# execute rights from programs _<strong>and</strong>_<br />
# directories (makes things hard to fix ;-)<br />
Stephan Schulz 481
File Mode Creation Mask<br />
Each process maintains a file mode creation mask<br />
– This mask determines, which access rights are granted for newly created files<br />
<strong>and</strong> directories<br />
– The colloquial name is umask<br />
– The umask is inherited by new processes started (i.e. your files will be created<br />
with rights based on the umask <strong>of</strong> your shell)<br />
The umask contains 9 bits, corresponding to the rwxrwxrwx access rights<br />
– Bits set in the mask are always cleared<br />
– All other rights are granted by default (with the x bits only set for executables<br />
<strong>and</strong> directories)<br />
The shell maintains a umask that can be set with the umask comm<strong>and</strong> (which<br />
is normally in a user configuration file)<br />
– Example: umask 022<br />
– Removes write permissions for everybody but the owner<br />
Stephan Schulz 482
Exercises<br />
Read the man <strong>and</strong> info pages on chmod, chown <strong>and</strong> chgrp<br />
The <strong>UNIX</strong> comm<strong>and</strong>s chmod <strong>and</strong> chown correspond to system calls <strong>of</strong> the same<br />
name. To find out how they work, read:<br />
– man 2 chmod<br />
– man 2 chown<br />
Use this information to implement a rudimentary version <strong>of</strong> chmod<br />
Stephan Schulz 483
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
The <strong>UNIX</strong> File System<br />
File Descriptors<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
File Decriptors<br />
Files are identified for the kernel as file descriptors<br />
– A file descriptor is a small, non-negative integer<br />
– It’s used as an index into the file descriptor table <strong>of</strong> a process to obtain more<br />
information<br />
For many purposes, file descriptors are quite similar to file pointers (FILE*) from<br />
the C st<strong>and</strong>ard I/O library<br />
Hower, file descriptor I/O is much more lowlevel<br />
– No formatted I/O<br />
– No buffering – each I/O operation directly causes a system call to actually<br />
perform the data transfer<br />
Notes:<br />
– <strong>UNIX</strong>’s st<strong>and</strong>ard I/O library is implemented using file descriptors<br />
– Network communication also works via file descriptors<br />
Stephan Schulz 485
Opening Files: open()<br />
The open() system call opens a named file <strong>and</strong> returns a file descriptor (or -1<br />
on failure)<br />
It is defined as follows:<br />
#include <br />
#include <br />
#include <br />
int open(const char *pathname, int <strong>of</strong>lag, mode_t mode);<br />
Arguments:<br />
– pathname is a st<strong>and</strong>ard <strong>UNIX</strong> file name as for fopen()<br />
– <strong>of</strong>lag contains the options. The value is created by bitwise ORing <strong>of</strong> one <strong>of</strong><br />
the following values with a number <strong>of</strong> option flags:<br />
∗ O RDONLY: Open the file for reading<br />
∗ O WRONLY: Open the file for writing<br />
∗ O RDWR: Open for reading <strong>and</strong> writing<br />
– The third argument is only interpreted if open() is used for file creation (<strong>and</strong><br />
can be omitted otherwise)<br />
Stephan Schulz 486
Option flags for open()<br />
Note: All <strong>of</strong> the following flags have to be ORed (using the bitwise or operator |<br />
with the main access mode (O RDONLY, O WRONLY,O RDWR)<br />
Options:<br />
– O APPEND: All output on this file descriptor is appended at the end <strong>of</strong> the file<br />
– O CREAT: If the file does not exist, create it<br />
– O EXCL: Only used with O CREAT – give an error, if the named file already<br />
exists<br />
– O TRUNC: If the file exists <strong>and</strong> is opened for writing or r/w, truncate it to<br />
lenght 0<br />
– O SYNC: Only return from writes to that file when the physical output is<br />
complete<br />
There are some more flags that we only discuss when necessary<br />
Example: fd = open("/tmp/testfile", O WDONLY|O APPEND|O SYNC)<br />
Stephan Schulz 487
Using open() to create files<br />
If the option O CREAT is given, open() will create a file if no file with the given<br />
name exists<br />
This also requires the third argument to open() (which otherwise is ignored or<br />
can be omitted)<br />
– This argument describes the access rights set for the new file<br />
– It is created by binary ORing <strong>of</strong> the following constants:<br />
S IRUSR Read Permission for the user<br />
S IWUSR Write permission for the user<br />
S IXUSR Execute permission for the user<br />
S IRGRP Read Permission for the group<br />
S IWGRP Write permission for the group<br />
S IXGRP Execute permission for the group<br />
S IROTH Read Permission for the others<br />
S IWOTH Write permission for the others<br />
S IXOTH Execute permission for the others<br />
– Note: These are the same values used byst mode in struct stat<br />
Stephan Schulz 488
Notes on open() <strong>and</strong> close()<br />
The mode given to open() is modified by the umask <strong>of</strong> the process<br />
The file mode is only set if the file is actually created (not even if it exists but is<br />
truncated with O TRUNC)<br />
A file is closed using the close() function:<br />
#include <br />
int close(int fd);<br />
– Return value: 0 on success, -1 on failure<br />
There are three predefined file descriptors that are open by default, corresponding<br />
to the 3 st<strong>and</strong>ard I/O channels:<br />
– STDIN FILENO (traditionally 0)<br />
– STDOUT FILENO (traditionally 1)<br />
– STDERR FILENO (traditionally 2)<br />
Stephan Schulz 489
File Descriptor I/O: read() <strong>and</strong> write()<br />
The functions read() <strong>and</strong> write() perform unbuffered input <strong>and</strong> output:<br />
#include <br />
ssize_t read(int fd, void *buf, size_t count);<br />
ssize_t write(int fd, const void *buf, size_t count);<br />
– ssize t is an integer type defined in <br />
– fd is the file descriptor for input or output<br />
– buf is a pointer to an area <strong>of</strong> memory<br />
∗ write() reads the data to write from this buffer<br />
∗ read() stores the read data in the buffer<br />
– count is the number <strong>of</strong> bytes to transfer (<strong>and</strong> should not be bigger than the<br />
size <strong>of</strong> *buf!)<br />
Both functions return the number <strong>of</strong> bytes transmitted<br />
– For write(), a smaller number than requested signals an error<br />
– For read():<br />
∗ 0 indicates end <strong>of</strong> file<br />
∗ -1 signals error<br />
∗ Everything else is normal (there may be fewer characters than requested<br />
currently available)<br />
Stephan Schulz 490
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
Example: Simple cat<br />
Stephan Schulz 491
#define BUF_SIZE 1024<br />
int main(int argc, char* argv[])<br />
{<br />
int fd;<br />
char buf[BUF_SIZE];<br />
ssize_t count, check;<br />
Example Continued<br />
if(argc!=2)<br />
{<br />
fprintf(stderr, "USAGE: mycat2 file");<br />
exit(EXIT_FAILURE);<br />
}<br />
fd = open(argv[1], O_RDONLY);<br />
if(fd == -1)<br />
{<br />
err_sys("open");<br />
}<br />
Stephan Schulz 492
}<br />
Example Continued<br />
while((count = read(fd,&buf,BUF_SIZE)))<br />
{<br />
if(count==-1)<br />
{<br />
err_sys("read");<br />
}<br />
check = write(STDOUT_FILENO, &buf, count);<br />
if(check!=count)<br />
{<br />
err_sys("write");<br />
}<br />
}<br />
if(close(fd) == -1)<br />
{<br />
err_sys("close");<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 493
The St<strong>and</strong>ard I/O Library <strong>and</strong> File Descriptors<br />
Remember that a file pointer is actually <strong>of</strong> type FILE*<br />
It typically points to a structure in an array<br />
– stdin points to element number 0<br />
– stdout points to element number 1<br />
– stderr points to element number 2<br />
– More elements are filled in for each use <strong>of</strong> fopen()<br />
Each <strong>of</strong> the structures contains:<br />
– A buffer<br />
– Some counters <strong>and</strong> positions to manage the buffer<br />
– A file descriptor<br />
– Flags for the access mode (read or write)<br />
Consider the case <strong>of</strong> writing:<br />
– All write comm<strong>and</strong>s just write into the buffer space<br />
– If the buffer is full or a fflush() comm<strong>and</strong> is issued (or the stream is closed),<br />
all <strong>of</strong> the buffer is written using a single write() comm<strong>and</strong><br />
Reading similarly reads a large block <strong>and</strong> h<strong>and</strong>s it out piecewise<br />
Stephan Schulz 494
Cheating with fdopen()<br />
Formatted, buffered output is very convenient <strong>and</strong> quite efficient for many small<br />
I/O operations (getchar(), fprintf(), . . . )<br />
– Normally much better than read() <strong>and</strong> write()<br />
– But some I/O methods only give us file descriptors (dammit!)<br />
Solution: The function fdopen() will generate an entry in the FILE array from<br />
a file descriptor <strong>and</strong> return the pointer to it<br />
#include <br />
FILE *fdopen(int fildes, const char *mode);<br />
filedes has to be an open file descriptor<br />
mode is a string as for fopen() ("r", "w". . . ) <strong>and</strong> must be compatible with<br />
the flags <strong>of</strong> the file descriptior<br />
Stephan Schulz 495
Exercises<br />
Write simple version <strong>of</strong> cp using open(), read() <strong>and</strong> write(). Use a default<br />
buffer size, but support an option -b that allows you to set the buffer size from<br />
the comm<strong>and</strong> line. Measure the speed <strong>of</strong> copying a large file for different sizes<br />
Examples:<br />
– mycp file1 file2 copies file1 to file2, using the default buffer size<br />
– mycp -b3 file2 file2 copies the file using a buffer <strong>of</strong> 3 bytes<br />
Use the fstat() comm<strong>and</strong> on both files to get the native block size <strong>of</strong> the file<br />
systems for both files (the st blksize field in struct stat). What do you<br />
notice? Can you write a better cp now?<br />
Stephan Schulz 496
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
More on File Descriptors<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
More on the <strong>UNIX</strong> I/O System<br />
The file descriptor typically is an index into a table that contains information<br />
about all open files <strong>of</strong> the process<br />
– That table contains just the flags (read/write) for that file descriptor <strong>and</strong> a<br />
pointer to the kernels global file table<br />
The file table is global <strong>and</strong> shared by all processes. It has one entry per opened<br />
file , containing:<br />
– File status flags (read, write, append,sync. . . , the things we passed to open()<br />
– Current <strong>of</strong>fset into the file: The position where the next read or write will start<br />
– A pointer to the vnode <strong>of</strong> the file<br />
∗ The vnode contains the file type <strong>and</strong> information about how to actually<br />
access the file, as well as the current real file size<br />
∗ It also gives us a way to access the inode that contains all the information<br />
we get with stat()<br />
∗ There is only one vnode per file, i.e. the vnode is the same for all file<br />
descriptors <strong>and</strong> all processes that access the same file<br />
Stephan Schulz 498
FILE* myfile<br />
St<strong>and</strong>ard IO Library<br />
The <strong>UNIX</strong> File I/O System<br />
FILE array entry<br />
Buffer<br />
File descriptor<br />
Process table File table entry<br />
fd n: flags :<br />
Status flags<br />
Offset<br />
Per−Process Data Structures Global, Shared Data Structures<br />
vnode table entry<br />
Actual<br />
current<br />
filesize<br />
...<br />
Stephan Schulz 499
Blocking vs. Nonblocking I/O<br />
All I/O we have seen so far is blocking<br />
– read() waits (blocks) until some input becomes available<br />
– It then returns the read data<br />
– Similarly, if write() temporarily cannot write the data, it blocks until it can<br />
Non-blocking I/O always returns immediately from the I/O function<br />
– If the I/O failed temporarily, the functions return -1<br />
– errno is set to EWOULDBLOCK<br />
Question: How do we achieve non-blocking I/O?<br />
Answer: By manipulating the file descriptor<br />
– Each file descriptor has a number <strong>of</strong> associated flags<br />
– One <strong>of</strong> these selects blocking vs. non-blocking behaviour<br />
Stephan Schulz 500
Manipulating File Descriptors: fcntl()<br />
fcntl() is a catch-all function for manipulating file descriptors<br />
#include <br />
#include <br />
int fcntl(int fd, int cmd);<br />
int fcntl(int fd, int cmd, long arg);<br />
int fcntl(int fd, int cmd, struct flock *lock);<br />
We are only interested in the use <strong>of</strong> fcntl() for getting <strong>and</strong> changing the file<br />
status flags:<br />
– O RDONLY, O WRONLY, O RDWR<br />
– O APPEND<br />
– O NONBLOCK<br />
– O SYNC<br />
– . . . (depending on <strong>UNIX</strong> version)<br />
fcntl() may return various values, depending on cmd<br />
– On error, it always returns -1 <strong>and</strong> sets errno<br />
Stephan Schulz 501
fcntl() Continued<br />
Using fcntl() to get the file status flags:<br />
flags = fcntl(fd, F_GETFL);<br />
– To interprete the result, we need to logically AND it with the flag we are<br />
interested in (see example)<br />
– To get the read/write status, AND the result with O ACCMODE<br />
To set the file status flags:<br />
fcntl(fd, F_SETFL, newflags);<br />
– If we only want to change a single flag, we have to get the old value <strong>and</strong> use<br />
binary operations to change just that flag!<br />
– Example:<br />
int flags = fcntl(STDIN_FILENO, F_GETFL);<br />
flags = flags | O_NONBLOCK;<br />
fcntl(STDIN_FILENO, F_SETFL, flags);<br />
Stephan Schulz 502
#include <br />
#include <br />
#include <br />
#include <br />
Example: Printing Flags for a File Descriptor<br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
Stephan Schulz 503
Example (2)<br />
void print_fd_file_status(int fd)<br />
{<br />
int flags = fcntl(fd, F_GETFL);<br />
if(flags == -1)<br />
{<br />
err_sys("fcntl");<br />
}<br />
printf("Flags for file descriptor %d\n", fd);<br />
switch(flags & O_ACCMODE)<br />
{<br />
case O_RDONLY:<br />
printf("Read only\n");<br />
break;<br />
case O_WRONLY:<br />
printf("Write only\n");<br />
break;<br />
case O_RDWR:<br />
printf("Read/Write\n");<br />
break;<br />
default:<br />
printf("Strange\n");<br />
}<br />
Stephan Schulz 504
if(flags & O_APPEND)<br />
{<br />
printf("Append is set\n");<br />
}<br />
if(flags & O_NONBLOCK)<br />
{<br />
printf("Non-blocking\n");<br />
}<br />
if(flags & O_SYNC)<br />
{<br />
printf("Synchronous writes\n");<br />
}<br />
}<br />
int main(int argc, char* argv[])<br />
{<br />
print_fd_file_status(STDIN_FILENO);<br />
print_fd_file_status(STDOUT_FILENO);<br />
print_fd_file_status(STDERR_FILENO);<br />
print_fd_file_status(42);<br />
return EXIT_SUCCESS;<br />
}<br />
Example (3)<br />
Stephan Schulz 505
$./fcntl_example<br />
Flags for file descriptor 0<br />
Read/Write<br />
Flags for file descriptor 1<br />
Read/Write<br />
Flags for file descriptor 2<br />
Read/Write<br />
fcntl: Bad file descriptor<br />
$./fcntl_example < signal_test.c<br />
Flags for file descriptor 0<br />
Read only<br />
Flags for file descriptor 1<br />
Read/Write<br />
Flags for file descriptor 2<br />
Read/Write<br />
fcntl: Bad file descriptor<br />
Example Output<br />
Stephan Schulz 506
Multiplexing I/O<br />
Often, a program has to be able to read data from multiple sources<br />
– Data from the user<br />
– Data from the network<br />
– Data from a file that is in the process <strong>of</strong> being written<br />
Bad solution: Polling<br />
– Switch all file descriptors to non-blocking<br />
– Test them one after the other, until one <strong>of</strong> them has data<br />
– Uses to much system resources!<br />
Minimally better: Polling with a short waiting time between I/O attempts<br />
– But: Lousy reaction time<br />
Right solution: Use the right tool (select())<br />
Stephan Schulz 507
Multiplexing I/O: select()<br />
select() is used to watch a set <strong>of</strong> file descriptors for one <strong>of</strong> three conditions:<br />
– A file descriptor is ready for reading<br />
– A file descriptor is ready for writing<br />
– Is there an exceptional condition for a file descriptor?<br />
We can tell the function to either<br />
– Return immediately, telling us the current status<br />
– Wait until at least one <strong>of</strong> the conditions becomes true<br />
– Wait until at least one <strong>of</strong> the conditions becomes true, but at most a fixed<br />
amount <strong>of</strong> time<br />
Specification:<br />
#include <br />
#include <br />
#include <br />
int select(int max_fdp1, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,<br />
struct timeval *tvptr);<br />
Stephan Schulz 508
select() Arguments<br />
fd set is defined in sys/types.h<br />
– It is a data type that can store a set <strong>of</strong> file descriptors<br />
– We only know how to manipulate it:<br />
∗ FD ZERO(fd set *set) removes all file descriptors from the set<br />
∗ FD SET(int fd, fd set *set) inserts fd into the set<br />
∗ FD CLR(int fd, fd set *set) removes fd from the set<br />
∗ FD ISSET(int fd, fd set *set) returns true, if fd is contained in *set<br />
The three fd set* arguments are used for input <strong>and</strong> output <strong>of</strong> select()<br />
– The fd set structures the arguments point to describe which file descriptors<br />
we are interested in<br />
– If the pointer is NULL, we are not interested in any file descriptor for the<br />
corresponding property<br />
– If select() returns, the set have been modified to contain just the descriptors<br />
for which the property is true<br />
int max fdp1 has to be at least one bigger than the biggest file descriptor in<br />
any one <strong>of</strong> the three sets<br />
– It is used to speed up things in the <strong>UNIX</strong> kernel<br />
Stephan Schulz 509
select() Arguments <strong>and</strong> Return Value<br />
The last argument to select() is a pointer to a struct timeval<br />
This struct has two fields:<br />
– long tv_sec; /* Seconds */<br />
– long tv_usec; /* Microseconds */<br />
There are two possible cases:<br />
– tvptr is NULL: In this case, select() waits until one <strong>of</strong> the file descriptors is<br />
ready (or a signal is caught)<br />
– tvptr points to a valid struct timeval: In this case, select() waits at<br />
most the specified time<br />
Return value:<br />
– -1 on error or if select() returned because <strong>of</strong> a signal (errno will be set!)<br />
– Otherwise, the number <strong>of</strong> file descriptors for which the specified condition is<br />
true is returned<br />
Stephan Schulz 510
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
int main(int argc, char* argv[])<br />
{<br />
fd_set readfds;<br />
fd_set writefds;<br />
int res;<br />
FD_ZERO(&readfds);<br />
FD_ZERO(&writefds);<br />
FD_SET(STDIN_FILENO, &readfds);<br />
FD_SET(STDOUT_FILENO, &writefds);<br />
FD_SET(STDERR_FILENO, &writefds);<br />
Example<br />
res = select(3, &readfds, &writefds, NULL, NULL);<br />
printf("%d file descriptors are ready\n", res);<br />
Stephan Schulz 511
}<br />
Example (2)<br />
if(FD_ISSET(STDIN_FILENO, &readfds))<br />
{<br />
printf("STDIN is ready for reading\n");<br />
}<br />
if(FD_ISSET(STDOUT_FILENO, &writefds))<br />
{<br />
printf("STDOUT is ready for writing\n");<br />
}<br />
if(FD_ISSET(STDERR_FILENO, &writefds))<br />
{<br />
printf("STDERR is ready for writing\n");<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 512
$ ./select_example<br />
2 file descriptors are ready<br />
STDOUT is ready for writing<br />
STDERR is ready for writing<br />
$ ./select_example < select_example.c<br />
3 file descriptors are ready<br />
STDIN is ready for reading<br />
STDOUT is ready for writing<br />
STDERR is ready for writing<br />
Example Output<br />
Stephan Schulz 513
Internet Assignment (I)<br />
On the assignment home page you will find links to two binary programs, a chat<br />
server <strong>and</strong> a chat client. In the end, you should turn in a program that has the<br />
same functionality as the client<br />
Step 1:<br />
– Download the programs <strong>and</strong> underst<strong>and</strong> what they do<br />
– To start the server, type ./chat server , where is an integer<br />
greater than 1024<br />
– To connect to the sever, type ./chat client <br />
∗ is the IP-Address <strong>of</strong> the server host (use 127.0.0.1 if the server<br />
runs on the same host, use nslookup for other hosts)<br />
∗ is the same number as used for the server<br />
∗ is the nickname under which you will chat<br />
Caveats:<br />
– Due to firewalling, do not expect to be able to reach a server from outside the<br />
lab<br />
– The binaries only run under Linux<br />
I will try to keep a server running on lee on port 6666 for all <strong>of</strong> you to share<br />
Stephan Schulz 514
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Basic <strong>UNIX</strong> Network <strong>Programming</strong><br />
Introduction<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html
Networking<br />
Stephan Schulz 516
Networking<br />
Stephan Schulz 517
Networking<br />
Stephan Schulz 518
Networking<br />
Stephan Schulz 519
Networking<br />
Stephan Schulz 520
Networking<br />
Stephan Schulz 521
Networking<br />
The Internet<br />
Stephan Schulz 522
Networking Concepts<br />
Is communication occuring between two partners, or is it a broadcast communication?<br />
– How are partners identified (addressed)?<br />
Is traffic stream oriented or packet oriented?<br />
– Stream-oriented: Messages arrive as stream <strong>of</strong> bytes (similar to reading from<br />
a file)<br />
– Packet oriented: Traffic arrives in the form <strong>of</strong> distinct pakets <strong>of</strong> a fixed (or<br />
fixed maximal) size<br />
Is the communication reliable or unreliable?<br />
– Can messages disappear?<br />
– Can the order <strong>of</strong> messages change?<br />
– Can messages be duplicated?<br />
Stephan Schulz 523
Network Layers<br />
Level 0: Physical or Hardware layer<br />
– Copper wires or optical fiber<br />
– Radio waves or laser beams for wireless protocols<br />
Level 1: Data Link Layer<br />
– How is data transported?<br />
– Examples: Ethernet, Token ring, ATM, ISDN<br />
Level 2: Network layer<br />
– How are individual hosts or networks assembles into a network?<br />
– Examples: Internet protocol (IP)<br />
Level 3: Transport layer<br />
– Converts from st<strong>and</strong>ard user pakets to network layer pakets<br />
– May include error checking <strong>and</strong> correcting<br />
– Examples: TCP <strong>and</strong> UDP<br />
Higher layers. . .<br />
– Take care <strong>of</strong> data representation at various levels<br />
Stephan Schulz 524
Level 2 protocol (Hardware-Agnostic)<br />
Prevalent protocol today: IPv4<br />
The Internet Protocol (IP)<br />
– Unreliable (“best effort”)<br />
– Packet-oriented (IP-Datagrams)<br />
– Can be addressed to individual hosts or broadcast adresses<br />
– Addresses are 32 bit numbers (“4 binary octets”), normally written as dotted<br />
decimal numbers: 127.0.0.1<br />
– Addresses denote individual hosts!<br />
Currently being deployed: IPv6<br />
– Shares many properties<br />
– But: 128 bit addresses (8 4-digit hex numbers, written in Hex <strong>and</strong> separated<br />
by colons: 21DA:00D3:0000:2A3B:02AA:00BF:FE28:9C5A)<br />
Stephan Schulz 525
Based on IP<br />
Still. . .<br />
– paket-oriented<br />
– unreliable<br />
Adds: Service multiplexing<br />
The User Datagram Protocol (UDP)<br />
– The same host can have many different communications<br />
– Each communication uses a different port<br />
Supported by <strong>UNIX</strong> with sockets with socket type SOCK DGRAM<br />
Used for:<br />
– DNS (Domain name service)<br />
– NFS (Network file system)<br />
Stephan Schulz 526
The Transmission Control Protocol (TCP)<br />
Based on IP, but. . .<br />
– Connection-based<br />
– Stream-oriented<br />
– Reliable<br />
– Service multiplexing (with ports)<br />
Supported by <strong>UNIX</strong> with sockets with socket type SOCK STREAM<br />
Addresses for TCP (<strong>and</strong> UDP) have two parts:<br />
– The IP number for specifying the host<br />
– The port number for specifying the port<br />
Most services are associated with a fixed port number:<br />
– HTTP (WWW): Port 80<br />
– SMTP (Email transport): Port 25<br />
– FTP (File Transfer): Port 21<br />
– For a semi-complete list: more /etc/services<br />
– Server port numbers up to 1024 are normally reserved for root<br />
Stephan Schulz 527
<strong>UNIX</strong> Sockets<br />
Sockets are special file descriptors used for many different kind <strong>of</strong> inter-process<br />
communication<br />
– Local<br />
– Networked<br />
We can create sockets for different communication styles<br />
– Stream oriented<br />
– Datagram<br />
Sockets are used on both sides <strong>of</strong> a communictation<br />
– The receiver creates a socket <strong>and</strong> associates it with a port<br />
– The sender creates a socket <strong>and</strong> connects it to the receiver<br />
Stephan Schulz 528
A server <strong>of</strong>fers a certain service<br />
Client/Server Model<br />
– It is ready to accept connections on a certain port<br />
A client initiates communication by trying to connect to that port<br />
Stephan Schulz 529
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Basic <strong>UNIX</strong> Network <strong>Programming</strong><br />
Simple Connections<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />
Prerequisites: CSC220 or EEN218
The Client Side for TCP Connections<br />
The client has to perform the following steps:<br />
– Create a socket for stream-oriented communication over IP<br />
– Create an address structure for the server address<br />
– Connect the socket to the server port<br />
– Use the connection<br />
Stephan Schulz 531
Creating Sockets with socket() (1)<br />
To create a socket, we call socket()<br />
#include <br />
#include <br />
int socket(int domain, int type, int protocol);<br />
On success, socket() returns a valid file descriptor just like open()<br />
– After enough black magic, we can use it with read() <strong>and</strong> write()<br />
On failure, the function returns -1 <strong>and</strong> sets errno<br />
Stephan Schulz 532
Creating Sockets with socket() (2)<br />
int socket(int domain, int type, int protocol);<br />
The domain argument describes the protocol family that will be used. Interesting<br />
values:<br />
– PF INET: Internet with IPv4<br />
– PF INET6: Internet with IPv6<br />
– PF LOCAL: Local communication<br />
The type describes the communication style:<br />
– SOCK STREAM for connection based streams<br />
– SOCK DGRAM for datagrams<br />
The last argument specifies the protocol<br />
– There normally is only a single protocol for each domain/type pair, use 0 to<br />
select this (the default)<br />
– PF INET/SOCK STREAM gives us TCP/IPv4<br />
– PF INET/SOCK DGRAM gives us UDP/IPv4<br />
Stephan Schulz 533
This is a reasonably ugly topic!<br />
Socket Adresses<br />
Because sockets are used for so many things, there is no single data type for<br />
socket addresses<br />
– Instead, each address family has its own format<br />
– We have to pass this by address (casted to a bogus type struct sock addr*)<br />
– Additionally, we have to tell the system the size <strong>of</strong> our address format<br />
Because different computer models use different data formats (Big Endian vs.<br />
Little Endian), we have to convert values to network order using:<br />
#include <br />
uint32_t htonl(uint32_t hostlong); /* Host to Network conversion for long */<br />
uint16_t htons(uint16_t hostshort);<br />
uint32_t ntohl(uint32_t netlong);<br />
uint16_t ntohs(uint16_t netshort); /* Network to host conversion for short */<br />
Stephan Schulz 534
Socket Adresses for IPv4<br />
For IPv4 addresses, we use the data type struct sock addr in<br />
It contains the following fields we have to fill:<br />
u_char sin_family; /*----Internet address family */<br />
u_short sin_port; /*----Port number */<br />
struct in_addr sin_addr; /*----Holds the IP address */<br />
For sin family, we use a predefined constant AF INET<br />
For the port, we use the port number, converted with htons()<br />
sin addr is filled in by the function inet pton():<br />
#include <br />
#include <br />
#include <br />
int inet_pton(int af, const char *src, void *dst);<br />
Stephan Schulz 535
inet pton()<br />
int inet_pton(int af, const char *src, void *dst);<br />
inet pton() converts an internet address in string form to a network address<br />
structure<br />
First argument: Address family<br />
– AF INET for IPv4 adresses<br />
– AF INET6 for IPv6 adresses<br />
Second argument: Pointer to string containing address<br />
– For IPv4: IP-Numbers (4 numbers with dots)<br />
– IPv6: Hex representation (8 4-digit hex numbers separated by colons)<br />
Third argument: Pointer to the destination<br />
– Normaly a pointer to the sin addr field in a struct sock addr in<br />
Stephan Schulz 536
Connecting to a Remote Port: connect()<br />
After we have prepared an address in a struct sock adr in, we can connect<br />
an existing socket to a remote port:<br />
#include <br />
#include <br />
int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);<br />
– sockfd: Socket you want to connect<br />
– serv addr: Pointer to the carefully prepared address you want to connect to,<br />
casted to struct sockaddr*<br />
– addrlen: Size <strong>of</strong> your actual structure, i.e. size<strong>of</strong>(struct sockadr in)<br />
∗ Remember that by casting the second argument, we actually lie to the about<br />
the data structure we are pointing to<br />
∗ That’s ok – the socket library knows that we are probably lying<br />
∗ Passing the length helps the library to straighten things out<br />
Return value: 0 on success, -1 on failure<br />
Stephan Schulz 537
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
int main(int argc, char* argv[])<br />
{<br />
int sock;<br />
struct sockaddr_in server_addr;<br />
char buf[80];<br />
int msg_len,res;<br />
Example: Getting an Insult<br />
Stephan Schulz 538
Example (2)<br />
sock = socket(PF_INET, SOCK_STREAM, 0); /* Check against -1 omitted! */<br />
memset(&server_addr, 0, size<strong>of</strong>(server_addr));<br />
server_addr.sin_family = AF_INET;<br />
server_addr.sin_port = htons(1695);<br />
res = inet_pton(AF_INET, "128.138.196.16", &server_addr.sin_addr);<br />
if(res < 0)<br />
{<br />
err_sys("inet_pton (no valid address family)");<br />
}<br />
if(res == 0)<br />
{<br />
fprintf(stderr, "Not a valid IP address");<br />
exit(EXIT_FAILURE);<br />
}<br />
res = connect(sock, (struct sockaddr *) &server_addr,<br />
size<strong>of</strong>(server_addr));<br />
if(res == -1)<br />
{<br />
err_sys("connect");<br />
}<br />
Stephan Schulz 539
}<br />
Example (3)<br />
while(1)<br />
{<br />
msg_len = read(sock, buf, 80);<br />
if(msg_len == 0)<br />
{<br />
break;<br />
}<br />
write(STDOUT_FILENO, buf,msg_len);<br />
}<br />
close(sock);<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 540
The Server Side<br />
A server has a more complex task than a client<br />
General steps:<br />
– Create a socket (we now how to do this)<br />
– Create an address (on its own machine)<br />
– Bind the socket to the address<br />
– Listen for incomming connections on that socket<br />
For each client:<br />
– Accept the connection (on a new socket)<br />
– Use the connection<br />
– Close the connection<br />
Stephan Schulz 541
Server Side Addresses<br />
We need to specify a local address for the listening port<br />
– It contains the address family, IP address, <strong>and</strong> port<br />
Instead <strong>of</strong> actually digging out the servers IP address (which may be complex),<br />
we use the special address 0.0.0.0 or INADDR ANY<br />
Given this address, the server will accept connections on any IP address which<br />
refers to it<br />
Example:<br />
struct sockaddr_in sock_name;<br />
int sock;<br />
short port;<br />
...Get socket, set port to some value...<br />
memset(&sock_name, 0, size<strong>of</strong>(sock_name)); /* Clear address */<br />
sock_name.sin_family = AF_INET; /* Set address family */<br />
sock_name.sin_port = htons(port); /* Set port */<br />
sock_name.sin_addr.s_addr = htonl(INADDR_ANY);/* Set address */<br />
Stephan Schulz 542
Naming a Socket (Binding a Socket to an Address)<br />
Once we have created a socket <strong>and</strong> a local address, we need to bind the socket<br />
to an address<br />
– All future operations will make use <strong>of</strong> that address<br />
#include <br />
#include <br />
int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);<br />
– sock fd: Socket we want to bind<br />
– my addr: Pointer to the address<br />
– addrlen: Lenght <strong>of</strong> that address<br />
– See remarks for connect()!<br />
Return value:<br />
– 0 on success<br />
– -1 on failure<br />
Stephan Schulz 543
Listening for Incoming Connections<br />
We use the listen() function call to make a socket listen for incoming connections:<br />
#include <br />
int listen(int sock, int backlog);<br />
– sock is the file descriptor we want to set to listening state<br />
– backlock is the number <strong>of</strong> pending connections allowed at any one time<br />
∗ If more unanswered connection request are received, they will be refused or<br />
ignores<br />
∗ If we accept a connection, that slot becomes available again<br />
∗ A good value is 5 ;-)<br />
Return value: 0 on success, -1 on failure<br />
Stephan Schulz 544
Accepting Connections<br />
To finally establish a connection, we have to accept it:<br />
#include <br />
#include <br />
int accept(int sock, struct sockaddr *addr, socklen_t *addrlen);<br />
– sock: The socket we expect connection on<br />
– addr: A pointer to an address structure (or NULL)<br />
– addrlen: A pointer to an integer variable <strong>of</strong> type socklen t that initialy has<br />
to contain the size <strong>of</strong> *addr<br />
If accept() returns. . .<br />
– The return value is the file descriptor <strong>of</strong> a new socket (or -1)<br />
– If addr is not NULL, the address <strong>of</strong> the remote socket is written into it<br />
– *addrlen is changed to the actual size <strong>of</strong> the new variable<br />
By default, accept() blocks until a connection request is received<br />
– If we set the socket to non-blocking (using fcntl()), it will return with -1<br />
<strong>and</strong> set errno to EWOULDBLOCK if there are no pending requests<br />
Stephan Schulz 545
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
int main(int argc, char* argv[])<br />
{<br />
int sock, con_sock;<br />
struct sockaddr_in sock_name;<br />
Example: Greeting the World<br />
Stephan Schulz 546
Example (2)<br />
if(argc!=2)<br />
{<br />
fprintf(stderr, "Usage: simple_server \n");<br />
exit(EXIT_FAILURE);<br />
}<br />
sock = socket(PF_INET, SOCK_STREAM, 0);<br />
if(sock == -1)<br />
{<br />
err_sys("socket");<br />
}<br />
sock_name.sin_family = AF_INET;<br />
sock_name.sin_port = htons(atoi(argv[1]));<br />
sock_name.sin_addr.s_addr = htonl(INADDR_ANY);<br />
if (bind(sock, (struct sockaddr *) &sock_name, size<strong>of</strong>(sock_name)) < 0)<br />
{<br />
err_sys("bind");<br />
}<br />
if(listen(sock, 1) == -1)<br />
{<br />
err_sys("listen");<br />
}<br />
Stephan Schulz 547
}<br />
Example (3)<br />
while(1)<br />
{<br />
con_sock = accept(sock, NULL, NULL);<br />
if(con_sock == -1)<br />
{<br />
err_sys("accept");<br />
}<br />
write(con_sock, "Hiho <strong>and</strong> welcome!\n", strlen("Hiho <strong>and</strong> welcome!\n"));<br />
if(close(con_sock) == -1)<br />
{<br />
err_sys("close(con_sock)");<br />
}<br />
}<br />
/* sock closed automatically when we exit via ^C */<br />
Stephan Schulz 548
man pages:<br />
– man socket<br />
– man 2 bind<br />
– man accept<br />
More Information<br />
The GNU C library documentation on sockets<br />
– Available by doing info libc<br />
– In emacs: [C-h i]<br />
– On the internet, e.g. at:<br />
∗ http://www.gnu.org/manual/glibc-2.2.3/html_chapter/libc_16.html<br />
∗ http://www.gnuenterprise.org/doc/glibc-doc/html/chapters_16.html<br />
Stephan Schulz 549
Internet Assignment (II)<br />
Step 2: Write a simple client that<br />
– reads IP adress, port <strong>and</strong> nickname from the comm<strong>and</strong> line<br />
– Connects to the specified server<br />
– Uses select() or non-blocking read() to read everything the server transmits<br />
– Closes the connection<br />
Step 3: Modify the client to keep on reading. Be sure to use select() now!<br />
Step 4: Write a second client that connects, reads input from the terminal, <strong>and</strong><br />
sends it to the server (prepended with the nickname <strong>and</strong> a colon).<br />
– You should be able to see what you send if you simultaneously connect with<br />
the client from step 3.<br />
– If the user types [C-D] to signal end <strong>of</strong> input, close the connection to the<br />
server <strong>and</strong> terminate<br />
Step 5: Put everything together, using select() on the network connection <strong>and</strong><br />
st<strong>and</strong>ard input.<br />
– Send input from the terminal to the server<br />
– Set input from the network to st<strong>and</strong>ard output<br />
Stephan Schulz 550
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Process Creation <strong>and</strong> Termination (I)<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />
Prerequisites: CSC220 or EEN218
Subprocesses<br />
<strong>UNIX</strong> is a multi-process operating systems<br />
– Many processes run at the same time<br />
– Processes can be created <strong>and</strong> can terminated<br />
Processes form a hierarchy<br />
– All processes have a unique parent<br />
– In the end, all (real) processes descent from the init process<br />
Parent <strong>and</strong> child share a special relationship<br />
– The parent has to retrieve the termination status <strong>of</strong> a process<br />
– The child can get his parents process id<br />
– If a parent dies, its special role is taken over by the init process<br />
Stephan Schulz 552
Process Properties<br />
For each process, we can get various identifiers:<br />
– The process id<br />
– The process id <strong>of</strong> the parent<br />
– The real user id <strong>of</strong> the process (i.e. the user id <strong>of</strong> the owner)<br />
– The effective user id <strong>of</strong> the process (i.e. the user id that is used to check acces<br />
rights). It can differ e.g. for programs with the setuid bit set<br />
– The real group id<br />
– The effective group id<br />
#include <br />
#include <br />
pid_t getpid(void); /* Get process id */<br />
pid_t getppid(void); /* Get parent process id */<br />
uid_t getuid(void); /* Get real user id */<br />
uid_t geteuid(void); /* Get effective user id */<br />
gid_t getgid(void); /* Get real group id id */<br />
gid_t getggid(void); /* Get effective group id */<br />
Stephan Schulz 553
Creation <strong>of</strong> the process<br />
St<strong>and</strong>ard Execution <strong>of</strong> a <strong>UNIX</strong> Program<br />
– Can only happen via the fork() process<br />
Executution <strong>of</strong> a program<br />
– Via the kernel system call exec()<br />
– Comes in various h<strong>and</strong>y library variants<br />
Running<br />
– Process runs in its own process space (virtual memory)<br />
Termination<br />
– Normal exit<br />
– Call to abort()<br />
– Catching a signal for which the default action is aborting<br />
Stephan Schulz 554
Exiting<br />
There are three normal ways <strong>of</strong> terminating a program<br />
Calling return st; from main() (ANSI C)<br />
– In that case the exit status <strong>of</strong> the program is st<br />
– Interpretation <strong>of</strong> the exit status is implementation-defined for ANSI C (but<br />
defined for <strong>UNIX</strong>)<br />
Calling exit(st); from anywhere in the program (ANSI C)<br />
– Exit status is st<br />
– In main(), exit() <strong>and</strong> return are equivalent<br />
– In both cases, some cleanup actions are performed<br />
∗ Exit h<strong>and</strong>lers are called<br />
∗ All open files are flushed <strong>and</strong> closed<br />
Calling exit(st) (<strong>UNIX</strong>) or Exit(st) (new in ANSI-C 99, may not be widely<br />
supported)<br />
– Program is immediately terminated<br />
– Exit status is st<br />
Stephan Schulz 555
#include <br />
Exit Formalities<br />
void exit(int status);<br />
void _Exit(int status); /* New in C99 */<br />
#include <br />
void _exit(int status);<br />
ANSI C defines three different exit statuses:<br />
– EXIT SUCCESS (in stdlib.h)<br />
– EXIT FAILURE (in stdlib.h)<br />
– 0 (equivalent to EXIT SUCCESS<br />
In practice, EXIT SUCCESS is nearly always just #defined as 0<br />
Stephan Schulz 556
Cleaning up: atexit()<br />
ANSI C allows us to register up to 32 functions that will be called whenever the<br />
program terminates normally:<br />
#include <br />
int atexit(void (*func)(void));<br />
– Argument is a pointer to a function that neither takes an argument nor returns<br />
a value<br />
– Return value for atexit() is 0 on success, -1 on error<br />
Each call to atexit() results in a single call to the registered function<br />
– Registered functions are called in reverse order <strong>of</strong> registration<br />
– We can register the same function more than once<br />
Note: Exit h<strong>and</strong>lers should only access global variables<br />
Stephan Schulz 557
#include <br />
#include <br />
#include <br />
int h<strong>and</strong>ler_counter=0;<br />
Example<br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
void h<strong>and</strong>ler1(void)<br />
{<br />
printf("H<strong>and</strong>ler1, counter = %d\n", h<strong>and</strong>ler_counter);<br />
h<strong>and</strong>ler_counter++;<br />
}<br />
void h<strong>and</strong>ler2(void)<br />
{<br />
printf("H<strong>and</strong>ler2, counter = %d\n", h<strong>and</strong>ler_counter);<br />
h<strong>and</strong>ler_counter++;<br />
}<br />
Stephan Schulz 558
Example (2)<br />
int main(void)<br />
{<br />
if(atexit(h<strong>and</strong>ler1) != 0)<br />
{<br />
err_sys("atexit");<br />
}<br />
if(atexit(h<strong>and</strong>ler2) != 0)<br />
{<br />
err_sys("atexit");<br />
}<br />
if(atexit(h<strong>and</strong>ler1) != 0)<br />
{<br />
err_sys("atexit");<br />
}<br />
if(atexit(h<strong>and</strong>ler1) != 0)<br />
{<br />
err_sys("atexit");<br />
}<br />
printf("My PID is %d <strong>and</strong> my parents PID is %d\n", getpid(), getppid());<br />
return EXIT_SUCCESS;<br />
}<br />
Stephan Schulz 559
Example Output<br />
My PID is 2019 <strong>and</strong> my parents PID is 746<br />
H<strong>and</strong>ler1, counter = 0<br />
H<strong>and</strong>ler1, counter = 1<br />
H<strong>and</strong>ler2, counter = 2<br />
H<strong>and</strong>ler1, counter = 3<br />
Stephan Schulz 560
Running other Programs: system()<br />
The system() function is defined by ANSI C<br />
#include <br />
int system(const char *comm<strong>and</strong>);<br />
system() h<strong>and</strong>s the string pointed to by comm<strong>and</strong> to the systems comm<strong>and</strong><br />
processor for execution<br />
– system() returns, when the comm<strong>and</strong> returns<br />
– Return value <strong>of</strong> system() in this case is implementation-defined<br />
If comm<strong>and</strong> is NULL, system() checks if the implementation has a comm<strong>and</strong><br />
processor<br />
– It returns 0, if not<br />
– Anything else, otherwise<br />
Stephan Schulz 561
system() in <strong>UNIX</strong><br />
On <strong>UNIX</strong>, there always is a comm<strong>and</strong> processor<br />
– The comm<strong>and</strong> is h<strong>and</strong>ed to the st<strong>and</strong>ard shell, /bin/sh<br />
– It can make use <strong>of</strong> all shell facilities, including I/O redirection<br />
The return value <strong>of</strong> the system() comm<strong>and</strong> normally is an encoding <strong>of</strong> the exit<br />
status <strong>of</strong> the executed comm<strong>and</strong><br />
– If for some reason no new process for the shell can be created, -1 is returned<br />
(<strong>and</strong> errno is set to specify what went wrong)<br />
– If the shell cannot be executed, it is treated as if the shell returned 127<br />
– Otherwise, the return value is an encoding <strong>of</strong> the exit status <strong>of</strong> the shell (which<br />
always returns the exit status <strong>of</strong> the comm<strong>and</strong>, if it could be executed)<br />
Stephan Schulz 562
Termination Status Interpretation<br />
Termination status can come from multiple sources<br />
– system() (which nicely packs up all the work for us)<br />
– Functions that retrieve the exit status <strong>of</strong> a child process: wait() <strong>and</strong><br />
waitpid() (more later)<br />
Interpretation depends on the cause <strong>of</strong> the termination <strong>of</strong> the child process.<br />
Assume that status is the termination status<br />
– If WIFEXITED(status) is true, the process terminated normally (i.e. via<br />
exit(), exit() or return from main)<br />
∗ WEXITSTATUS(status) returns the (lower 8 bit <strong>of</strong>) the value that was<br />
passed to exit()<br />
– If WIFSIGNALED(status) is true, the process was terminated because <strong>of</strong> an<br />
uncaught signal with default action abort<br />
∗ WTERMSIG(status) gives the number <strong>of</strong> the signal<br />
If WIFSTOPPED(staus) is true, the process is currently stopped (via SIGSTOP or<br />
SIGSTP)<br />
– WSTOPSIG(status) returns the number <strong>of</strong> the stop signal<br />
Stephan Schulz 563
#include <br />
#include <br />
#include <br />
int h<strong>and</strong>ler_counter=0;<br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
Example: Executing Comm<strong>and</strong>s<br />
Stephan Schulz 564
Example: Executing Comm<strong>and</strong>s<br />
int main(int argc, char* argv[])<br />
{<br />
int i, status;<br />
for(i=1; i
Example Output<br />
$ ./system example ”date” ”man does not exist” ”whoami -q”<br />
Tue Nov 26 04:59:29 CET 2002<br />
Exited normally, returning 0<br />
No manual entry for does_not_exist<br />
Exited normally, returning 16<br />
whoami: invalid option -- q<br />
Try ‘whoami --help’ for more information.<br />
Exited normally, returning 1<br />
Stephan Schulz 566
Exercises<br />
Write a program that prints it parents PID <strong>and</strong> modify the last example to print<br />
its PID. Run the program via the example code. What do you notice? Why?<br />
Extend the example to a shell that reads comm<strong>and</strong>s from the user <strong>and</strong> executes<br />
them<br />
– H<strong>and</strong>le all cases <strong>of</strong> why a process can terminate, <strong>and</strong> print a useful message<br />
for all cases<br />
Stephan Schulz 567
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Process Creation <strong>and</strong> Termination (II)<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />
Prerequisites: CSC220 or EEN218
Creating new Processes: fork()<br />
The only way <strong>of</strong> creating a new process under <strong>UNIX</strong> is via the fork() function<br />
#include <br />
#include <br />
pid_t fork(void);<br />
fork() creates a new child process that is in nearly all ways an exact copy <strong>of</strong> the<br />
parent<br />
Execution continues in both parent <strong>and</strong> child<br />
Only (major) differences:<br />
– New PID <strong>and</strong> new parent PID<br />
– Return value <strong>of</strong> fork<br />
Return value <strong>of</strong> fork()<br />
– On failure: -1, errno will be set<br />
– On success:<br />
∗ In the child, 0 will be returned<br />
∗ In the parent, the PID <strong>of</strong> the child (a value >0) will be returned<br />
Stephan Schulz 569
#include <br />
#include <br />
#include <br />
#include <br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
int main(int argc, char* argv[])<br />
{<br />
pid_t pid, ppid, child_pid;<br />
int some_var = 42;<br />
Example<br />
pid = getpid();<br />
printf("Parent. My PID is %d <strong>and</strong> I am about to procreate\n", pid);<br />
child_pid = fork();<br />
if(child_pid
}<br />
Example<br />
if(child_pid == 0)<br />
{<br />
pid = getpid();<br />
ppid = getppid();<br />
printf("Child. My PID is %d, my parent is %d\n", pid, ppid);<br />
printf("Child: some_var=%d - Changing it now!\n", some_var);<br />
some_var=7;<br />
printf("Child: some_var=%d\n", some_var);<br />
}<br />
else<br />
{<br />
printf("Parent. My PID is %d, my child is %d\n", pid, child_pid);<br />
printf("Parent: some_var=%d\n", some_var);<br />
printf("Going to sleep now, waiting for my child to die...\n");<br />
sleep(5);<br />
printf("I’m awake again. some_var is still %d\n", some_var);<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 571
Example Output<br />
Parent. My PID is 12625 <strong>and</strong> I am about to procreate<br />
Parent. My PID is 12625, my child is 12626<br />
Parent: some_var=42<br />
Going to sleep now, waiting for my child to die...<br />
Child. My PID is 12626, my parent is 12625<br />
Child: some_var=42 - Changing it now!<br />
Child: some_var=7<br />
I’m awake again. some_var is still 42<br />
Notice that I took a snapshot <strong>of</strong> the processes with top:<br />
PID USER PRI ... SHARE STAT %CPU %MEM TIME COMMAND<br />
12625 schulz 16 ... 280 S 0.0 0.1 0:00 fork_example<br />
12626 schulz 16 ... 0 Z 0.0 0.0 0:00 fork_example <br />
- As long as the parent lives, the child remains around as a zombie<br />
- As the parent dies, the init process gets the termination status <strong>and</strong> is delivered<br />
from its undead state<br />
Stephan Schulz 572
Comments on fork()<br />
Order <strong>of</strong> execution for parent <strong>and</strong> child is unpredictable!<br />
Forked processes behave as if an actual copy has been made<br />
– All <strong>of</strong> the processes memory is accessible in both parent <strong>and</strong> child<br />
– Changing them in one does not affect the other<br />
On modern <strong>UNIX</strong> versions, fork() is implemented with copy on write<br />
– Both processes actually share the same pages in memory<br />
– Only when a process actually tries to change a value in memory is a private<br />
copy created<br />
– Consequence: Forking is very cheap – it only has to copy basic process<br />
structures<br />
<strong>UNIX</strong> programmers use forking a lot!<br />
– Servers may fork one process for each connection!<br />
– Shells fork for executing comm<strong>and</strong>s<br />
Stephan Schulz 573
#include <br />
int main(int argc, char* argv[])<br />
{<br />
while(1)<br />
{<br />
fork();<br />
}<br />
}<br />
Don’t Do This!<br />
Stephan Schulz 574
#include <br />
int main(int argc, char* argv[])<br />
{<br />
while(1)<br />
{<br />
fork();<br />
}<br />
}<br />
Don’t Do This!<br />
It is the simplest version <strong>of</strong> a fork bomb<br />
– Will create an exponentially growing number <strong>of</strong> processes<br />
– Quickly consumes all system resources<br />
– Makes system essentially unusable<br />
Stephan Schulz 575
Forking <strong>and</strong> I/O<br />
As the example showed, both parent <strong>and</strong> child were able to write to stdout<br />
– In general, parent <strong>and</strong> child share file descriptors open at the time <strong>of</strong> fork()<br />
– This can be problematic, as the order in which output is written is undefined<br />
– Even worse for input or output to files or sockets (on the screen, we can usually<br />
figure things out)<br />
If responsibility for file descriptors is clear, parent can delegate communication to<br />
child<br />
– Eample: Parent just accepts() connections<br />
– Child actually performs communication on the file descriptor<br />
– Both parent <strong>and</strong> child need to close an open file descriptor!<br />
Parent <strong>and</strong> child share file descriptor, but not st<strong>and</strong>ard I/O library buffers<br />
– Can have unexpected effects!<br />
Stephan Schulz 576
FILE* myfile<br />
St<strong>and</strong>ard IO Library<br />
I/O Setup before Forking<br />
FILE array entry<br />
Buffer<br />
File descriptor<br />
Process table File table entry<br />
fd n: flags :<br />
Status flags<br />
Offset<br />
Per−Process Data Structures Global, Shared Data Structures<br />
vnode table entry<br />
Stephan Schulz 577<br />
Actual<br />
current<br />
filesize<br />
...
FILE* myfile<br />
St<strong>and</strong>ard IO Library<br />
Per−Process Data Structures<br />
FILE* myfile<br />
St<strong>and</strong>ard IO Library<br />
Per−Process Data Structures<br />
I/O Setup after Forking<br />
FILE array entry<br />
Buffer<br />
File descriptor<br />
FILE array entry<br />
Buffer<br />
File descriptor<br />
Process table<br />
fd n: flags :<br />
Process table<br />
fd n: flags :<br />
File table entry<br />
Status flags<br />
Offset<br />
Global, Shared Data Structures<br />
vnode table entry<br />
Stephan Schulz 578<br />
Actual<br />
current<br />
filesize<br />
...
Example: Bufferd I/O <strong>and</strong> Forking<br />
/* Usual includes <strong>and</strong> stuff omitted */<br />
int main(int argc, char* argv[])<br />
{<br />
pid_t child_pid;<br />
}<br />
printf("Hiho "); /*
$fork example2<br />
Hiho from the parent!<br />
Hiho from the child!<br />
stdout is line buffered<br />
Example Output<br />
– Since we did not print a full line (<strong>and</strong> did not call flush(), the string was not<br />
printed<br />
– Calling fork() duplicated the buffer contents<br />
– Then, both parent <strong>and</strong> child caused a flush<br />
Stephan Schulz 580
Waiting for Children to Die<br />
As stated above, parents need to get the termination status <strong>of</strong> their children<br />
(otherwise those children become zombies)<br />
They can do so by calling wait()<br />
#include <br />
#include <br />
pid_t wait(int *status);<br />
– wait() waits until a child terminates<br />
– It returns the PID <strong>of</strong> the terminated child<br />
– If status is not equal to NULL, it writes the termination status <strong>of</strong> the child<br />
into the variable it points to<br />
– Note: If some children have already terminated, wait() picks one <strong>of</strong> those<br />
<strong>and</strong> returns its data<br />
– If there are no children, wait() returns -1 <strong>and</strong> sets errno<br />
Stephan Schulz 581
#include <br />
#include <br />
#include <br />
#include <br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
int main(int argc, char* argv[])<br />
{<br />
pid_t pid, ppid, child_pid;<br />
int i, status;<br />
Example<br />
pid = getpid();<br />
printf("Parent. My PID is %d <strong>and</strong> I am about to procreate\n", pid);<br />
fflush(stdout);<br />
Stephan Schulz 582
Example (2)<br />
for(i=0; i
}<br />
Example (3)<br />
printf("Parent: Waiting for my children\n");<br />
while((child_pid = wait(&status))!=-1)<br />
{<br />
printf("Child %d terminated with termination status %d\n", child_pid, status);<br />
if(WIFEXITED(status))<br />
{<br />
printf("Termination normal, exit status %d\n", WEXITSTATUS(status));<br />
}<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 584
Output:<br />
Example Output<br />
Parent. My PID is 13565 <strong>and</strong> I am about to procreate<br />
Child. My PID is 13567, my parent is 13565<br />
Child. My PID is 13568, my parent is 13565<br />
Child. My PID is 13569, my parent is 13565<br />
Parent: Waiting for my children<br />
Child 13569 terminated with termination status 512<br />
Termination normal, exit status 2<br />
Child 13568 terminated with termination status 256<br />
Termination normal, exit status 1<br />
Child 13567 terminated with termination status 0<br />
Termination normal, exit status 0<br />
Stephan Schulz 585
Exercises<br />
Here is a function that computes the rollercoaster numbers<br />
long rollercoaster(long i)<br />
{<br />
printf("%ld\n", i);<br />
if(i==1)<br />
{<br />
return 0;<br />
}<br />
if(i%2==0)<br />
{<br />
return 1+rollercoaster(i/2);<br />
}<br />
return 1+rollercoaster(3*i+1);<br />
}<br />
Write a program that forks <strong>of</strong> 10 processes, each <strong>of</strong> which computes the<br />
rollercoaster numbers for one <strong>of</strong> the numbers from 11 to 20 <strong>and</strong> prints it<br />
Make the parent wait for all children <strong>and</strong> print the PID’s <strong>and</strong> the exit status <strong>of</strong><br />
each in the order in which the children terminate, then terminate the parent<br />
Stephan Schulz 586
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Process Control (System Calls)<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />
Prerequisites: CSC220 or EEN218
Process Groups<br />
<strong>UNIX</strong> processes are organized in process groups<br />
– A process group has a group leader<br />
– All processes in the group have the same process group id (which is the process<br />
id <strong>of</strong> the group leader)<br />
Some operations can be done not just for single processes, but for a whole group:<br />
– Delivering signals with kill<br />
– Waiting for process termination with waitpid() (later)<br />
By default, a process inherits the process group id from its parent<br />
– Processes can change their own process group id<br />
∗ . . . to become process group leaders in a new process group, or<br />
∗ . . . to join an existing process group<br />
– Parents can change the process group id <strong>of</strong> their children (unless the children<br />
already called exec())<br />
Note: Don’t confuse the pgid (process group) with the gid (user/owner group)<br />
Stephan Schulz 588
Getting <strong>and</strong> Changing Process Groups<br />
#include <br />
#include <br />
pid_t getpgrp(void);<br />
int setpgid(pid_t pid, pid_t pgrp);<br />
getpgrp() always returns the process group id <strong>of</strong> the current process<br />
– No error condition!<br />
setpgid(pid t pid, pid t pgrp) sets the process group id <strong>of</strong> the process<br />
with the PID pid to pgrp<br />
– Return value: 0 on success, -1 on error (errno set)<br />
– Special values:<br />
∗ If pid is 0, the PID <strong>of</strong> the calling process is assumed<br />
∗ If pgrp is 0, the process id denoted by the first argument is assumed (i.e.<br />
that process is made into a process group leader <strong>of</strong> a new process group)<br />
– Note that this means that setpgid(0,0) makes the current process into a<br />
process group leader<br />
Stephan Schulz 589
#include <br />
#include <br />
#include <br />
#include <br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
int main(int argc, char* argv[])<br />
{<br />
pid_t pid, pgid, child_pid;<br />
int i, res;<br />
Example<br />
pid = getpid();<br />
pgid = getpgrp();<br />
printf("Parent. My PID is %d <strong>and</strong> my process group is %d\n",pid,pgid);<br />
Stephan Schulz 590
Example (2)<br />
res = setpgid(0,0);<br />
if(res==-1)<br />
{<br />
err_sys("setpgid");<br />
}<br />
printf("Parent. I’m now the process group leader.\n");<br />
for(i=0; i
}<br />
Example (3)<br />
if(child_pid == 0)<br />
{<br />
pid = getpid();<br />
pgid = getpgrp();<br />
printf("Child %d. My PID is %d, my process group is %d.\n", i, pid, pgid);<br />
sleep(1);<br />
res = setpgid(0,0);<br />
if(res==-1)<br />
{<br />
err_sys("setpgid");<br />
}<br />
pid = getpid();<br />
pgid = getpgrp();<br />
printf("Child %d. I’m now independent, pid %d <strong>and</strong> pgid %d\n",i, pid,pgid);<br />
printf("Child %d exiting\n", i);<br />
exit(EXIT_SUCCESS);<br />
}<br />
printf("Parent, sleeping.\n");<br />
sleep(3);<br />
printf("Parent, exiting.\n");<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 592
Example Output<br />
$ ./pg example<br />
Parent. My PID is 1946 <strong>and</strong> my process group is 1946<br />
Parent. I’m now the process group leader.<br />
Parent, sleeping.<br />
Child 0. My PID is 1947, my process group is 1946.<br />
Child 1. My PID is 1948, my process group is 1946.<br />
Child 2. My PID is 1949, my process group is 1946.<br />
Child 0. I’m now independent, pid 1947 <strong>and</strong> pgid 1947<br />
Child 0 exiting<br />
Child 1. I’m now independent, pid 1948 <strong>and</strong> pgid 1948<br />
Child 1 exiting<br />
Child 2. I’m now independent, pid 1949 <strong>and</strong> pgid 1949<br />
Child 2 exiting<br />
Parent, exiting.<br />
Note that the parent starts out as a process group leader!<br />
– Most shells with build-in job control will always execute comm<strong>and</strong>s in their<br />
own process group<br />
Stephan Schulz 593
#include <br />
int kill(pid_t pid, int sig);<br />
<strong>UNIX</strong> System Call: kill<br />
kill() sends the signal sig to the process or processes specified by pid<br />
– pid > 0: Signal is send to process with PID pid<br />
– pid == 0: Signal is sent to all processes in the same process group (if process<br />
has permission to send it)<br />
– pid < 0: Signal is sent to all processes with process group id |pid|<br />
– Special case: pid == -1: Most <strong>UNIX</strong> versions send signal to all processes<br />
with the same user id (real or effective) as the caller<br />
Possible signals: As for the kill comm<strong>and</strong> (defined in <br />
– Also see man signal<br />
Note: kill() is the function used to implement the kill comm<strong>and</strong><br />
Stephan Schulz 594
Is this good for Something?<br />
There are amany possible situations where an application consists <strong>of</strong> a set <strong>of</strong><br />
processes:<br />
– Server may have one process that accepts() connections, multiple workers<br />
that serve individual connections<br />
– Competitive theorem prover runs many search strategies in parallel<br />
If we make the top level control program into a process group leader, termination<br />
becomes a lot easier<br />
– We can kill whole process group with one comm<strong>and</strong><br />
– The leader can be made to automatically kill all processes<br />
Stephan Schulz 595
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
int main(int argc, char* argv[])<br />
{<br />
pid_t pid, pgid, child_pid;<br />
int i, res;<br />
res = setpgid(0,0);<br />
if(res==-1)<br />
{<br />
err_sys("setpgid");<br />
}<br />
Example<br />
Stephan Schulz 596
Example (2)<br />
pid = getpid();<br />
pgid = getpgrp();<br />
printf("Queen bee:PID is %d process group is %d\n",pid,pgid);<br />
for(i=0; i
}<br />
Example (3)<br />
if(child_pid == 0)<br />
{<br />
while(1)<br />
{<br />
printf("Worker bee %d gathering honey\n", i);<br />
sleep(1);<br />
}<br />
}<br />
for(i=0; i
Example Output with kill<br />
Queen bee:PID is 2412 process group is 2412<br />
Queen bee sleeping<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Queen bee sleeping<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Queen bee sleeping<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Queen bee terminates<br />
Stephan Schulz 599
Example Output without kill<br />
schulz@leonardo 4:31am [CSC_322] ./pgkill_example<br />
Queen bee:PID is 2460 process group is 2460<br />
Queen bee sleeping<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Queen bee sleeping<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Queen bee sleeping<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Queen bee terminates<br />
schulz@leonardo 4:32am [CSC_322] Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Worker bee 0 gathering honey<br />
Worker bee 1 gathering honey<br />
Worker bee 2 gathering honey<br />
Worker bee 0 gathering honey<br />
...<br />
Stephan Schulz 600
Waiting for Termination: waitpid()<br />
The wait() function waits for termination <strong>of</strong> any child <strong>of</strong> a process<br />
– It blocks until a child terminates<br />
– It cannot check the status <strong>of</strong> a specific child<br />
POSIX introduced waitpid() as a more general interface solving this problem:<br />
#include <br />
#include <br />
pid_t waitpid(pid_t wpid, int *status, int options);<br />
Stephan Schulz 601
waitpid() continued<br />
Return value: PID <strong>of</strong> terminated child (or 0 if no child terminated, or -1 on error)<br />
wpid: Process id describing processes we are waiting for<br />
– wpid == -1: Wait for all processes<br />
– wpid > 0: Wait for process with PID wpid<br />
– wpid < -1: Wait for all processes in process group with PDID |wpid|<br />
– wpid == 0: Wait for all children with PGID <strong>of</strong> the caller<br />
status: As for wait(), if !=NULL, termination status is written into it<br />
options: (Can be combined with |)<br />
– 0: Normal blocking wait<br />
– WNOHANG: Return immediately with 0 if no child is available<br />
– WUNTRACED: Used for job control <strong>and</strong> stopped processes<br />
Stephan Schulz 602
Exercises<br />
Write a program that keeps a network server alive (or reaninmates it):<br />
– The server accepts connections<br />
– For each connection, it forks a child that reads input from the net <strong>and</strong> appends<br />
it to a log file<br />
– All those processes should be in the same process group<br />
The monitor program just starts the main server process, makes it the group<br />
leader, <strong>and</strong> waits for the server to terminate<br />
– In that case, it kills all <strong>of</strong> the server processes <strong>and</strong> restarts the server<br />
Stephan Schulz 603
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Program Execution<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />
Prerequisites: CSC220 or EEN218
Process Environment<br />
Each <strong>UNIX</strong> process has an environment<br />
– the Environment consists <strong>of</strong> a list <strong>of</strong> strings<br />
– Normally, those strings have the form name=value (<strong>and</strong> most functions for<br />
manipulating the environment assume this form)<br />
– The name is called an environment variable<br />
– Since most environment variables are created <strong>and</strong> maintained by the shell, they<br />
are <strong>of</strong>ten also called shell variables<br />
Children inherit the environment <strong>of</strong> their parents<br />
– Note that children get a copy <strong>of</strong> the environment<br />
– Each process can change its own environment, but not that <strong>of</strong> its parent<br />
Environment variables are used for a large number <strong>of</strong> things<br />
– Where to look for executable programs<br />
– Which editor to use (in well-written applications)<br />
– What is the users username?<br />
– Some m<strong>and</strong>ated by st<strong>and</strong>ards (POSIX, SUSv2), others just customary<br />
Stephan Schulz 605
Environment <strong>and</strong> the Shell<br />
You can print the environment using the printenv program<br />
– Just printenv prints all environment variables (<strong>and</strong> their values)<br />
– printenv prints the value <strong>of</strong> the variable with name <br />
Since no process can modify its parents environment, you need to use a build-in<br />
comm<strong>and</strong> to change a shells environment<br />
– tcsh: setenv VAR VALUE <strong>and</strong> unsetenv VAR<br />
– bash: export VAR=VALUE <strong>and</strong> unset VAR<br />
Stephan Schulz 606
$ printenv<br />
Example: Part <strong>of</strong> my Environment<br />
PWD=/home/schulz/SOURCES/CSC_322<br />
VENDOR=intel<br />
HOSTNAME=wombat<br />
QTDIR=/usr/lib/qt3-gcc2.96<br />
LESSOPEN=|/usr/bin/lesspipe.sh %s<br />
USER=schulz<br />
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:<br />
MACHTYPE=i386<br />
XDM_MANAGED=/var/run/xdmctl/xdmctl-:0,maysd,mayfn,sched<br />
XMODIFIERS=@im=none<br />
EDITOR=emacsclient<br />
LANG=C<br />
HOST=wombat<br />
DISPLAY=:0.0<br />
FROM=Stephan Schulz <br />
LOGNAME=schulz<br />
SHLVL=3<br />
GROUP=schulz<br />
TEXINPUTS=:~/TEXT/TEXLIB/<br />
SUPPORTED=en_US.iso885915:en_US:en:de_DE@euro:de_DE:de<br />
SHELL=/bin/tcsh<br />
HOSTTYPE=i386-linux<br />
CVSROOT=stephan@gw.safelogic.se:/CVS<br />
OSTYPE=linux<br />
HOME=/home/schulz<br />
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass<br />
PATH=/home/schulz/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/bin:/usr/X11R6/bin:.<br />
_=/usr/X11R6/bin/xterm<br />
TERM=xterm<br />
WINDOWID=18874382<br />
Stephan Schulz 607
Some Important Environment Variables<br />
PATH (POSIX) determines where the shell looks for executable programs<br />
– List <strong>of</strong> directory names, separated by colon<br />
– Can contain . to include working directory (Dangerous on multi-user systems)<br />
EDITOR (traditional) is used by good <strong>UNIX</strong> program to determine which editor<br />
to run if you have to edit text<br />
LOGNAME (POSIX) is your user name<br />
TERM (POSIX) is your (text) terminal type<br />
– If you have trouble with remote logins, set it to vt100<br />
HOME (POSIX) is your home directory<br />
DISPLAY (X11 Window System) is the name <strong>of</strong> your display<br />
– <strong>UNIX</strong> can run programs on one host, <strong>and</strong> display them on another<br />
– DISPLAY tells it where to show output for X programs<br />
Stephan Schulz 608
Accessing the Environment from a Program<br />
There are two ways to access the environment <strong>of</strong> a process:<br />
– Via the environ variable<br />
– Via getenv() <strong>and</strong> putenv()<br />
If we want to go through all <strong>of</strong> the environment, we need to declare the environ<br />
variable:<br />
extern char **environ;<br />
– It points to a NULL-terminated array <strong>of</strong> pointers<br />
– Each array element points to \0-terminated C string <strong>of</strong> the form<br />
=<br />
Stephan Schulz 609
#include <br />
#include <br />
extern char **environ;<br />
int main(int argc, char* argv[])<br />
{<br />
char **h<strong>and</strong>le;<br />
}<br />
Example<br />
for(h<strong>and</strong>le=environ; *h<strong>and</strong>le; h<strong>and</strong>le++)<br />
{<br />
printf("%s\n", *h<strong>and</strong>le);<br />
}<br />
return EXIT_SUCCESS;<br />
Stephan Schulz 610
The POSIX Interface to the Environment<br />
#include <br />
char *getenv(const char *name);<br />
int putenv(char *string);<br />
getenv() takes a pointer to an environment variable name <strong>and</strong> returns its value<br />
(or NULL if the variable does not exist)<br />
– It’s even part <strong>of</strong> ANSI C (but ANSI C says nothing about the enviroment)<br />
putenv() takes a single string <strong>of</strong> the form =<br />
– Adds the string (i.e. the = pair) to the environment<br />
– If exists, the old definition is changed<br />
– Note that some versions <strong>of</strong> <strong>UNIX</strong> include just the pointer in the environment,<br />
while others create a copy <strong>of</strong> the string<br />
Additional functions <strong>of</strong> interest:<br />
– clearenv(): Clears environment (POSIX, but not traditional)<br />
– unsetenv(): Remove a single variable (traditional)<br />
– setenv(): More flexible version <strong>of</strong> putenv() (traditional)<br />
Stephan Schulz 611
Executing New Programs<br />
A process can cause the execution <strong>of</strong> a new program via one <strong>of</strong> the exec functions<br />
– Causes this same process to replace its own program, data, <strong>and</strong> stack with<br />
new data<br />
– Program code is loaded from disk<br />
– Heap <strong>and</strong> stack are reinitialized<br />
– New program starts running at its main() function<br />
There are 6 different exec functions that differ in:<br />
– How they look for the program to run (via path or via absolute filename)<br />
– How they accept arguments for the new program (as additional arguments to<br />
the exec function or via an array <strong>of</strong> pointers)<br />
– How they h<strong>and</strong>le the environment (inheritance <strong>of</strong> completely new environment)<br />
Stephan Schulz 612
The 6 exec Functions<br />
#include <br />
int execl(const char *path, const char *arg, ...);<br />
int execlp(const char *file, const char *arg, ...);<br />
int execle(const char *path, const char *arg , ..., char *const envp[]);<br />
int execv(const char *path, char *const argv[]);<br />
int execvp(const char *file, char *const argv[]);<br />
int execve(const char *filename, char *const argv [], char *const envp[]);<br />
All return -1 on error, <strong>and</strong> not at all on success<br />
execlp() <strong>and</strong> execvp() take a filename <strong>and</strong> search the PATH directories for the<br />
program<br />
execl(), execlp() <strong>and</strong> execle() take arguments for the new program as<br />
additional arguments<br />
– The list has to end with an additional NULL argument<br />
– The others take a pre-created argv vector<br />
Finally, execle() <strong>and</strong> execve() take an explicit environment pointer<br />
Stephan Schulz 613
The execvp() function<br />
execvp(const char *file, char *const argv[]) is reaonably easy to use:<br />
– First argument is a file name (not containing any /)<br />
– The program to be executed is found as by the shell, by looking through all<br />
the directories in PATH<br />
Second argument is a pointer to an array <strong>of</strong> argument pointers<br />
– Same format <strong>and</strong> conventions as argv in main<br />
– First argument should be program name<br />
– Array should be NULL terminated<br />
Upon execution, the new program runs<br />
– Keeps old PID, GID, PGID, working directory, . . .<br />
– Normal file descriptors stay open (unless the the flag FD CLOEXEC is set using<br />
fcntl())<br />
Stephan Schulz 614
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#define MAX_LINE 1024<br />
void err_sys(char* message)<br />
{<br />
perror(message);<br />
exit(EXIT_FAILURE);<br />
}<br />
Example: A mini-Shell<br />
Stephan Schulz 615
void* secure_malloc(int size)<br />
{<br />
void* res = malloc(size);<br />
}<br />
Example (2)<br />
if(!res)<br />
{<br />
fprintf(stderr, "malloc() failure -- out <strong>of</strong> memory?");<br />
exit(EXIT_FAILURE);<br />
}<br />
return res;<br />
char* secure_strdup(char* source)<br />
{<br />
void* res = secure_malloc(strlen(source)+1);<br />
}<br />
strcpy(res, source);<br />
return res;<br />
Stephan Schulz 616
int count_words(char* line)<br />
{<br />
int words=0, in_word=0;<br />
while(*line)<br />
{<br />
if(isspace(*line))<br />
{<br />
in_word = 0;<br />
}<br />
else<br />
{<br />
if(in_word == 0)<br />
{<br />
words++;<br />
in_word = 1;<br />
}<br />
}<br />
line++;<br />
}<br />
return words;<br />
}<br />
Example (3)<br />
Stephan Schulz 617
char **build_argv(char* line)<br />
{<br />
int argc = count_words(line);<br />
int i;<br />
char *new;<br />
char **argv;<br />
Example (4)<br />
if(argc == 0)<br />
{<br />
return NULL;<br />
}<br />
argv = secure_malloc(size<strong>of</strong>(char*)*(argc+1));<br />
Stephan Schulz 618
}<br />
Example (5)<br />
for(i=0; i
void print_argv(char **argv)<br />
{<br />
int i;<br />
}<br />
printf("Comm<strong>and</strong>: %s\n", argv[0]);<br />
printf("Arguments:\n");<br />
for(i=0; argv[i]; i++)<br />
{<br />
printf("%s\n", argv[i]);<br />
}<br />
printf("=======\n");<br />
Example (6)<br />
Stephan Schulz 620
int main(void)<br />
{<br />
pid_t child_pid;<br />
char line[MAX_LINE];<br />
char *line_res;<br />
char **argv;<br />
Example (7)<br />
while(1)<br />
{<br />
printf("# ");fflush(NULL);<br />
line_res = fgets(line, MAX_LINE, stdin);<br />
if(!line_res)<br />
{<br />
break;<br />
}<br />
argv = build_argv(line);<br />
if(!argv)<br />
{<br />
continue;<br />
}<br />
print_argv(argv);<br />
Stephan Schulz 621
Example (8)<br />
child_pid = fork();<br />
if(child_pid == -1)<br />
{<br />
err_sys("fork");<br />
}<br />
if(child_pid == 0) /* Child! */<br />
{<br />
setpgid(0,0);<br />
if(execvp(argv[0], argv) == -1)<br />
{<br />
err_sys("execvp");<br />
}<br />
}<br />
else<br />
Stephan Schulz 622
}<br />
{ /* Parent */<br />
setpgid(child_pid, child_pid);<br />
if(wait(NULL) == -1)<br />
{<br />
err_sys("wait");<br />
}<br />
free(argv);<br />
}<br />
}<br />
return EXIT_SUCCESS;<br />
Example (9)<br />
Stephan Schulz 623
Example Usage<br />
schulz@wombat 2:01am [CSC_322] ./shell_example<br />
# echo Hallo<br />
Comm<strong>and</strong>: echo<br />
Arguments:<br />
echo<br />
Hallo<br />
=======<br />
Hallo<br />
# ls -l macrotest.c wordcount env_example.c<br />
Comm<strong>and</strong>: ls<br />
Arguments:<br />
ls<br />
-l<br />
macrotest.c<br />
wordcount<br />
env_example.c<br />
=======<br />
-rw-rw-r-- 1 schulz schulz 233 Dec 3 21:31 env_example.c<br />
-rw-rw-r-- 1 schulz schulz 206 Nov 26 23:41 macrotest.c<br />
-rwxrwxr-x 1 schulz schulz 13715 Nov 26 23:47 wordcount<br />
Stephan Schulz 624
# ls *<br />
Comm<strong>and</strong>: ls<br />
Arguments:<br />
ls<br />
*<br />
=======<br />
ls: *: No such file or directory<br />
# hallo<br />
Comm<strong>and</strong>: hallo<br />
Arguments:<br />
hallo<br />
=======<br />
execvp: No such file or directory<br />
Example Usage (2)<br />
Stephan Schulz 625
Exercises<br />
Extend the shell example (code is on the web page) to<br />
– Have better error h<strong>and</strong>ling<br />
– Do background processing (with &)<br />
– Support job control<br />
– Offer I/O redirection with > <strong>and</strong> <<br />
Read the man pages on popen() <strong>and</strong> pipe() to see how we could achive piping<br />
If you are adventurous, implement:<br />
– Piping<br />
– Globbing (read man glob)<br />
Stephan Schulz 626
<strong>CSC322</strong><br />
C <strong>Programming</strong> <strong>and</strong> <strong>UNIX</strong><br />
Final Review<br />
Stephan Schulz<br />
<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> Science<br />
University <strong>of</strong> Miami<br />
schulz@cs.miami.edu<br />
http://www.cs.miami.edu/~schulz/<strong>CSC322</strong>.html<br />
Prerequisites: CSC220 or EEN218
Place <strong>and</strong> Time:<br />
Final Examn<br />
– Room LC 192 (the normal room)<br />
– Monday, Dec. 16th, 11:00 a.m. – 13:30 p.m.<br />
Topics:<br />
– Everything we covered in this class<br />
– Emphasis will be on second half<br />
You may bring:<br />
– Lecture notes, your own notes, books, printouts <strong>of</strong> your (or my solutions)to<br />
the exercises. . .<br />
– . . . but no computers, PDAs, mobile phones (switch them <strong>of</strong>f <strong>and</strong> stow them<br />
away) or similar items<br />
Note: I’ll only review material from the second half <strong>of</strong> the semester today<br />
– Check lecture notes, pages 299–312 for overview <strong>of</strong> first half<br />
Stephan Schulz 628
Pointers <strong>and</strong> Dynamic Arrays<br />
Arrays are passed as pointers to the first element<br />
– Arrays <strong>and</strong> pointers (to an allocated memory region) can be used in the same<br />
way (i.e. we can index a pointer: p[5])<br />
– We can use realloc() to dynamically enlarge dynamically allocated arrays<br />
Pointer arithmetic: We can add <strong>and</strong> subtract integers to pointers to step through<br />
an array<br />
– p[5] is equivalent to *(p+5)<br />
The following two program snippets are equivalent:<br />
int a[SIZE], i; int a[SIZE], *h<strong>and</strong>le;<br />
/* Assume some initialization in both versions */<br />
for(i=0; a[i]; i++) for(h<strong>and</strong>le = a; *h<strong>and</strong>le; h<strong>and</strong>le++)<br />
{ {<br />
printf("%d\n", a[i]) printf("%d\n", *h<strong>and</strong>le)<br />
} }<br />
Stephan Schulz 629
Make<br />
Make is a tool for automating multi-program builds<br />
– Rule-based (rules are stored in Makefile)<br />
– Performs just the necessary operations to update all program parts<br />
– You specify dependencies <strong>and</strong> actions<br />
Example:<br />
PROGS=hello fahrenheit2celsius fahrenheit2celsius2 fahrenheit2celsius3 \<br />
charcount ourcopy wordcount escape base_converter inc_example<br />
all: $(PROGS)<br />
clean:<br />
rm $(PROGS)<br />
hello: hello.c<br />
gcc -ansi -Wall -o hello hello.c<br />
fahrenheit2celsius: fahrenheit2celsius.c<br />
gcc -ansi -Wall -o fahrenheit2celsius fahrenheit2celsius.c<br />
...<br />
Stephan Schulz 630
New Flow Control Constructs<br />
break is used to break out <strong>of</strong> loops (<strong>and</strong> switch statments<br />
– Immediately transfers control to the first statement after the loop<br />
continue allows early continuation <strong>of</strong> a loop<br />
– Transfers control back to the beginning <strong>of</strong> the loop<br />
– In case <strong>of</strong> for loops, update expression will be evaluated<br />
do/while loops test the condition at the end <strong>of</strong> the loop<br />
– Loop body always gets executed once<br />
– Otherwise similar to plain while loop<br />
Stephan Schulz 631
Function Pointers <strong>and</strong> qsort()<br />
We can use pointers to functions (<strong>of</strong> a specific type) to<br />
– Implement generic functions <strong>and</strong> data types<br />
– Emulate object-oriented constructs (virtual functions)<br />
– Implement call back <strong>and</strong> signal h<strong>and</strong>lers<br />
Using function pointer:<br />
– Just use the function name or use the address operator (&fun<br />
– Calling the function: Either use pointer as is, or dereference: (*fun)(arg1)<br />
Example for function pointer usage: qsort() from stdlib<br />
void qsort(void *base, size_t nmemb, size_t size,<br />
int(*compar)(const void *, const void *));<br />
Stephan Schulz 632
St<strong>and</strong>ard Library: Characters <strong>and</strong> Strings<br />
ctype.h contains character classification functions:<br />
– isspace(c)<br />
– isprint(c)<br />
– isdigit(c)<br />
– isalpha(c)<br />
– isalnum(c) . . .<br />
– Also: toupper(c), tolower(c)<br />
String (\0 terminated sequence <strong>of</strong> characters) functions are defined in string.h<br />
– strcpy(to,from) copies a \0-terminated string to exiting memory<br />
– strcat(to,from) appends a string at the end <strong>of</strong> an existing string<br />
– strcmp(s1,s2) compares two strings, returns value 0<br />
– strncopy(), strncat(), strncmp() limit operation to a given number <strong>of</strong><br />
characters<br />
– strpbrk() searches for characters in a string<br />
– strstr() seraches for a substring<br />
Stephan Schulz 633
St<strong>and</strong>ard Library: Memory Accesses<br />
Memory access functions treat memory as a large array <strong>of</strong> characters<br />
– Important difference to string functions: Not \0-terminated, you always have<br />
to give a lenght<br />
Functions:<br />
– memcpy(to, from, n) copies n bytes<br />
– memmove(to, from, n) does the same even for overlapping regions <strong>of</strong> memory<br />
– memcmp(s1,s2,n) compares two memory regions<br />
– memchr(s, c, n) searches for character c in memory region starting at s<br />
– memset(s,c,n) writes n copies <strong>of</strong> character c into memory (used e.g. to zero<br />
out socket address data structures)<br />
Stephan Schulz 634
St<strong>and</strong>ard Library: Buffered I/O<br />
St<strong>and</strong>ard library supports buffered IO via streams<br />
– Stream creation: fopen(filename, mode)<br />
– Stream destruction: fclose(stream)<br />
– Predefined streams: stdin, stdout, stderr<br />
– Text streams: Lines separated by \n<br />
– Binary streams: Raw data (under <strong>UNIX</strong>, no difference)<br />
Basic I/O functions:<br />
– fgetc(stream) reads a single character <strong>and</strong> returns it as an int (<strong>and</strong> EOF on<br />
end <strong>of</strong> file)<br />
– fputc(c, stream) writes a single character to a stream<br />
– fgets(s,n,stream) reads a single line or n characters (whichever is less) into<br />
the preallocated memory at s<br />
– fputs(s, stream) writes a \0-terminated string to the stream<br />
Streams can be fflush()ed, <strong>and</strong> we can change buffering with setvbuff() <strong>and</strong><br />
setbuf()<br />
Stephan Schulz 635
St<strong>and</strong>ard Library: Formatted Output<br />
printf(format,...) <strong>and</strong> fprintf(stream, format, ...) write an arbitrary<br />
number <strong>of</strong> arguments under the control <strong>of</strong> a format string<br />
– Format string contains plain characters <strong>and</strong> conversion specifiers starting with<br />
a %<br />
– Each conversion specifier must have a matching argument<br />
– Conversion specifiers specify in which form argument is printed<br />
Conversion specifier format:<br />
– %, followed by optional flags, field width, precision, size modifier<br />
– Ends in a conversion letter<br />
Example: printf("%-5ld\n", i)<br />
– Prints integer, at least 5 characters, left-justified (fills up with spaces), followed<br />
by a newline<br />
Important conversion letters: d (int), s (string), c (character), g (floating point<br />
number)<br />
Stephan Schulz 636
Processes <strong>and</strong> Signals<br />
Processes are running programs <strong>and</strong> have a number <strong>of</strong> properties<br />
– Owner, PID, GID, PGID, Parent<br />
– Each process has its own virtual memory <strong>and</strong> cannot (directly) access other<br />
processes data<br />
– Multiple processes can run “at the same time”<br />
We can use a number <strong>of</strong> tools to work with running processes:<br />
– ps lists running processes<br />
– top gives an interactive view <strong>of</strong> running processes<br />
kill can be used to send signals to process <br />
– By default sends SIGTERM<br />
– You can also send other signals, e.g. kill -HUP <br />
Signals can also be generated by other events, e.g.<br />
– Floating point exception<br />
– Illegal memory access<br />
Stephan Schulz 637
Signal H<strong>and</strong>ling<br />
Each signal has a default action (either abort, abort with core dump, or ignore)<br />
– Action can be changed!<br />
The signal(sig, h<strong>and</strong>ler) function can be used to change the behaviour <strong>of</strong> a<br />
process to a signal<br />
– sig is the signal to respond to<br />
– h<strong>and</strong>ler is a pointer to a function that returns void <strong>and</strong> takes an int (the<br />
signal) as an argument<br />
– Predefined pseudo-h<strong>and</strong>lers: SIG DFL (re-establish default behaviour),<br />
SIG IGN (ignore signal)<br />
Established signal h<strong>and</strong>lers catch a single signal!<br />
– Must reestablish h<strong>and</strong>ler from within the h<strong>and</strong>ler<br />
Signals can occur at any time, state <strong>of</strong> the program may be undefined<br />
– It’s dangerous to do much beyond exiting, manipulating variables <strong>of</strong> type<br />
volatile sig atomic t, <strong>and</strong> calling signal() again<br />
Stephan Schulz 638
<strong>UNIX</strong>: Everything is a file<br />
File types:<br />
– Regular file<br />
– Directory<br />
– Character special file<br />
– Block special file<br />
– Socket<br />
– Symbolic link<br />
<strong>UNIX</strong> File System (I)<br />
stat() functions give us information about files<br />
– Owner<br />
– Mode<br />
– Size<br />
– Access <strong>and</strong> modification times<br />
Stephan Schulz 639
Importsant concepts:<br />
<strong>UNIX</strong> File System (II)<br />
– File ownership <strong>and</strong> group ownership<br />
– Access rights (read, write, execute for user/group/others)<br />
– Links: Connect a name to a file<br />
∗ Hard links: Directory entries<br />
∗ S<strong>of</strong>t links: Files with names <strong>of</strong> another file as data<br />
Important utilities:<br />
– ln: Creates links (both symbolic <strong>and</strong> hard)<br />
– ls: Shows files <strong>and</strong> file information<br />
– chmod: Allows us to change the mode <strong>of</strong> a file<br />
– chgrp: Changes group<br />
– chown: Changes owner<br />
Stephan Schulz 640
File Descriptors <strong>and</strong> select()<br />
File descriptors are used by the <strong>UNIX</strong> kernel to represent open files<br />
– File descriptors are small integers (indices into the process file table)<br />
– Can be associated with a number <strong>of</strong> flags we can manipulate with fcntl() or<br />
set when we open the file: O NONBLOCK, O APPEND, . . .<br />
– Predefined: STDIN FILENO, STDOUT FILENO, STDERR FILENO<br />
– Opening files: open()<br />
– Using files: read(fd, buf, n) <strong>and</strong> write(fd, buf, n)<br />
– Closing: close()<br />
select(maxfd, readfds, writefds, exceptfds, time) waits for certain<br />
things to become true for sets <strong>of</strong> file descriptors<br />
– Any <strong>of</strong> the file descriptors in readfds() is ready for reading<br />
– Any <strong>of</strong> the file descriptors in writefds() is ready for writing<br />
– An exceptional circumstance happens for one <strong>of</strong> the file descriptors in<br />
exceptfds()<br />
– Return value: Number <strong>of</strong> file descriptors for which condition is true<br />
– Also removes all file descriptors from sets for which condition is not true<br />
Stephan Schulz 641
Communication can be<br />
– Broadcast vs. dedicated partners<br />
– Stream-oriented vs. packet-oriented<br />
– Reliable vs. unreliable<br />
Networking Concepts<br />
Communication partners need to be uniquely identified<br />
– For IP: IP addresses (denote hosts) (4 8 bit numbers, e.g. 127.0.0.1)<br />
– For TCP/IP: IP address <strong>and</strong> port (16-bit integer)<br />
<strong>UNIX</strong> uses sockets (a special kind <strong>of</strong> file descriptors) for communication<br />
– Bi-directional streams<br />
– Use with read() <strong>and</strong> write()<br />
Stephan Schulz 642
TCP/IP (v4) Connections<br />
Reliable, stream-oriented, between two partners<br />
Client:<br />
– Create a socket: socket(PF INET, SOCK STREAM, 0)<br />
– Fill in struct sockaddr in address structure<br />
∗ sin family = AF INET<br />
∗ sin port = htons(port)<br />
∗ sin addr filled in with inet ptons()<br />
– Connect socket to address: connect(sock, addr, addr len)<br />
– Use socker <strong>and</strong> close() it<br />
Server:<br />
– Create socket<br />
– Create its own address (normally with INADDDR ANY)<br />
– bind()ing the socket to the address<br />
– listen()ing on the socket<br />
– accepting() the connection (giving a new socket)<br />
– Use <strong>and</strong> close the socket<br />
Stephan Schulz 643
fork() creates new process<br />
Creating <strong>and</strong> Ending Processes<br />
– Both parent <strong>and</strong> child execute the same program<br />
Parent has to wait() or waitpid() to pick up the childs termination status<br />
– Otherwise child becomes zombie<br />
– But orphans are inherited by init<br />
Process termination<br />
– exit()<br />
– return from main()<br />
– Abort (from a signal)<br />
Stephan Schulz 644
Process Environment <strong>and</strong> Program Execution<br />
Processes have access to environment variables<br />
– Inherited from (or set up by) parent<br />
– Can be modified<br />
To start a new program:<br />
– fork() to create a new process<br />
– Call one <strong>of</strong> the exec functions with:<br />
∗ Executable name (filename or path name)<br />
∗ Arguments (individual or as array)<br />
∗ For some functions, environment pointer<br />
Stephan Schulz 645
Learn hard ;-)<br />
Exercises<br />
Stephan Schulz 646