System-level routines for the UNIX operating system and their applications
We discuss here some system-level implementations of functions in C.
Context
Kernighan and Ritchie’s book on the C Programming Language is widely recognized as a standard reference for learning C. The book is culminated in Chapter 8, where it presents implementations of several system-level routines for the UNIX operating system and the applications of them. Providing a concise summary of how they’re used can be useful for quick reference.
1) Fopen
First of all, fopen
utilizes a lower level, finer control routine called open.
They are slightly different:
int open(char *name, int flags, int perms);
FILE* fopen(char *name, char *mode);
As we can see, open
returns file descriptor, while fopen
returns a pointer to a higher-level file stream
object. Further, fopen
uses a string mode such as "r"
, "w"
, "a"
for read, write, append, etc. In the end,
fopen
is part of the C standard library and is used by a user directly, while open
is specific to systems that
support POSIX. With this in mind, let us dive into the implementation of fopen
.
#include <fcntl.h>
#include "syscalls.h"
#define PERMS 0666 /* RW for owner, group, others */
FILE *fopen(char *name, char *mode) {
int fd;
FILE *fp;
if (*mode != 'r' && *mode != 'w' && *mode != 'a')
return NULL;
for (fp = _iob; fp < _iob + OPEN_MAX; fp++)
if ((fp->flag & (_READ | _WRITE)) == 0)
break; /* found free slot */
if (fp >= _iob + OPEN_MAX) /* no free slots */
return NULL;
if (*mode == 'w')
fd = creat(name, PERMS);
else if (*mode == 'a') {
if ((fd = open(name, O_WRONLY, 0)) == -1)
fd = creat(name, PERMS);
lseek(fd, 0L, 2);
} else
fd = open(name, O_RDONLY, 0);
if (fd == -1) /* couldn't access name */
return NULL;
fp->fd = fd;
fp->cnt = 0;
fp->base = NULL;
fp->flag = (*mode == 'r') ? _READ : _WRITE;
return fp;
}
Explanation of fopen
In order to manage file opening, the function initially ensures that the mode parameter corresponds to
one of the accepted modes (validated in the first if statement). It then scans for an unused entry within
an array of file structures (_iob
array) to use this file (executed in the for loop). If all entries are in use,
the function yields null return (checked in the second if statement).
Depending on the specified mode, we generate file descriptor (the third if statement), and in append
mode ('a'
), the file’s offset for write position is adjusted (lseek
).
Subsequently, the function populates the relevant data to the file and return its pointer (last part of the code).
Inquiry Points
1) What is (fp->flag & (_READ _WRITE)) == 0
?
If this evaluates to true, it means fp->flag
is not _READ
nor _WRITE
, which says that FILE
structure is
available.
2) What is PERMS?
There is an array of flags available for file operations. For example, O_RDONLY
is for read-only access,
O_WRONLY
is for write-only access, and O_RDWR
is for both reading and writing.
The flag O_CREAT
says “create a file if it does not exist.” When the O_CREAT
flag is employed, the PERMS
parameter is
used to set the file’s access permissions. This permission parameter is composed of nine bits, grouped
into three sets of three bits each, which define the access rights for the file’s owner, the group the owner
belongs to, and all other users, respectively.
For example:
int fd = open("example.txt", O_CREAT | O_WRONLY, 0644);
means ‘open the file example.txt
for writing only; if the file does not exist, create it with read and
write permissions for the owner, and read-only permissions for the group and others.’
2) fillbuf
This routine is utilized within the routine getchar()
function, which retrieves a single character from the
standard input (usually keyboard input). It is treated as if it were a file stream by the C standard I/O.
#define getchar() getc(stdin)
#define getc(p) (--(p)->cnt >= 0 ? (unsigned char) *(p)->ptr++ : _fillbuf(p))
getc
macro says that ‘if there are characters left in the buffer (cnt >= 0
), increment buffer pointer; if the
buffer is empty, refill the buffer with new data from the file.’ With this motivation, let us look at the
implementation of _fillbuf
.
#include "syscalls.h"
/* _fillbuf: allocate and fill input buffer */
int _fillbuf(FILE *fp) {
int bufsize;
if ((fp->flag & (_READ | _EOF | _ERR)) != _READ)
return EOF;
bufsize = (fp->flag & _UNBUF) ? 1 : BUFSIZ;
if (fp->base == NULL) { /* no buffer yet */
if ((fp->base = (char *) malloc(bufsize)) == NULL)
return EOF; /* can't get buffer */
}
fp->ptr = fp->base;
fp->cnt = read(fp->fd, fp->ptr, bufsize);
if (--fp->cnt < 0) {
if (fp->cnt == -1)
fp->flag |= _EOF;
else
fp->flag |= _ERR;
fp->cnt = 0;
return EOF;
}
return (unsigned char) *fp->ptr++;
}
In words:
- The
_fillbuf
function performs a series of checks before returning a character from the current position of the pointer (*fp->ptr
) and then advancing the pointer. - Initially, the function verifies if the file is open for reading (
READ
flag) and whether it has not reached the end of the file (EOF
) or encountered an error (ERR
); if the condition is not met, the function returnsEOF
immediately (the first if statement). - Next, the function determines the size of the buffer to use. It sets
bufsize
to 1 if the file is marked as unbuffered (UNBUF
flag is set), and toBUFSIZ
(1024) otherwise. - The second if statement checks whether a buffer has already been allocated for this file stream (
fp->base
). If not, the function attempts to allocate it usingmalloc
. Should memory allocation fail (indicating no available memory), it then returnsEOF
. - Subsequently, the function updates the file stream structure (
fp
) with the necessary information, including setting up the buffer pointers. - The last if statement examines whether the count of characters read (
cnt
) indicates an error or end-of- file condition. Ifcnt
is less than zero after decrementing, it checks whether this is because the end of the file has been reached or because of an error, updating the file stream’s status flags accordingly. - Finally, if no errors are present, it returns the character currently pointed to by the buffer pointer (
*fp->ptr
).
3) fsize
In the context of file system operations, directories are essentially treated as a special type of file. Each directory has an associated inode, which functions as a unique identifier within the file system.
#define NAME_MAX 14 /* longest filename component; */
/* system-dependent */
typedef struct { /* portable directory entry */
long ino; /* inode number */
char name[NAME_MAX+1]; /* name + '\0' terminator */
} Dirent;
typedef struct { /* minimal DIR: no buffering, etc. */
int fd; /* file descriptor for the directory */
Dirent d; /* the directory entry */
} DIR;
As we can see DIR
contains all the information on Dirent
plus file descriptor fd
.
DIR *opendir(char *dirname);
Dirent *readdir(DIR *dfd);
void closedir(DIR *dfd);
char *name;
struct stat stbuf;
int stat(char *, struct stat *);
stat(name, &stbuf);
Analogous to the way we open and close files, there are distinct functions designed for working with
directories: opendir
, readdir
, and closedir
.
It might seem confusing why we would need readdir
, where we could simply return Dirent
within DIR
. The
reason we nontrivially need readdir
is because accessing Dirent
via DIR
is protected, and so we need
other ways to get around it.
The stat
structure serves as a repository for a file’s metadata, and the stat
function is tasked with
populating this structure with the file’s details.
struct stat { /* inode information returned by stat */
dev_t st_dev; /* device of inode */
ino_t st_ino; /* inode number */
short st_mode; /* mode bits */
short st_nlink; /* number of links to file */
short st_uid; /* owners user id */
short st_gid; /* owners group id */
dev_t st_rdev; /* for special files */
off_t st_size; /* file size in characters */
time_t st_atime; /* time last accessed */
time_t st_mtime; /* time last modified */
time_t st_ctime; /* time originally created */
};
#define S_IFMT 0160000 /* type of file: */
#define S_IFDIR 0040000 /* directory */
#define S_IFCHR 0020000 /* character special */
#define S_IFBLK 0060000 /* block special */
#define S_IFREG 0100000 /* regular */
/* ... */
The last five macros are entries for st_mode
.
With this motivation, let us investigate the implementation of fsize
.
int stat(char *, struct stat *);
void dirwalk(char *, void (*)(char *));
/* fsize: print the name of file "name" */
void fsize(char *name) {
struct stat stbuf;
if (stat(name, &stbuf) == -1) {
fprintf(stderr, "fsize: can't access %s\n", name);
return;
}
if ((stbuf.st_mode & S_IFMT) == S_IFDIR)
dirwalk(name, fsize);
printf("%8ld %s\n", stbuf.st_size, name);
}
In words:
If we could not populate the information on stbuf
into name
, there was an error, so we use
stderr
to print out the error message (the first if statement). If the file itself is directory, we use recursion
to traverse through underlying files (the second if statement). Finally, stbuf
has a correct information of
the size of file.
Inquiry points
- what does %81d mean in the final print statement?
In the format ±*.*
, the first * determines left/right adjustment, the first * determines maximum width, and
the second * determines precision. So in our case, it means the maximum width of 81 characters.
Let us see how dirwalk
, which was used in fsize
, is defined:
#define MAX_PATH 1024
/* dirwalk: apply fcn to all files in dir */
void dirwalk(char *dir, void (*fcn)(char *)) {
char name[MAX_PATH];
Dirent *dp;
DIR *dfd;
if ((dfd = opendir(dir)) == NULL) {
fprintf(stderr, "dirwalk: can't open %s\n", dir);
return;
}
while ((dp = readdir(dfd)) != NULL) {
if (strcmp(dp->name, ".") == 0 || strcmp(dp->name, "..") == 0)
continue; /* skip self and parent */
if (strlen(dir)+strlen(dp->name)+2 > sizeof(name))
fprintf(stderr, "dirwalk: name %s/%s too long\n", dir, dp->name);
else {
sprintf(name, "%s/%s", dir, dp->name);
(*fcn)(name);
}
}
closedir(dfd);
}
Just like before, this code conducts a series of checks to ensure that the operation is error-free before applying the function.
Initially, it verifies whether the directory specified by dir
can be opened successfully (the first if
statement). Inside the while
loop, it determines if the current directory (denoted by .
) or the parent
directory (..
) is being referenced, and skips them (the first if statement within while loop).
Subsequently, it assesses if the concatenation of the directory name and the file name exceeds the
maximum allowable length (the second if statement within while loop). If these conditions are met
without errors, the function is executed within the else
block by calling (*fcn)(name)
.
Inquiry points (continued)
1) Why is there +2
when checking whether string literal is too long?
Well, 1 is for directory separation (/
) and 1 is for null terminator \0
.
2) The function readdir(dfd)
progresses through the directory entries within the while
loop.
This occurs because readdir
internally calls the read
system call to fetch each entry from the directory stream.
Finally, let us check how opendir
, closedir
, and readdir
, used in dirwalk
, are implemented.
opendir
and closedir
are relatively straightforward.
int fstat(int fd, struct stat *);
/* opendir: open a directory for readdir calls */
DIR *opendir(char *dirname) {
int fd;
struct stat stbuf;
DIR *dp;
if ((fd = open(dirname, O_RDONLY, 0)) == -1
|| fstat(fd, &stbuf) == -1
|| (stbuf.st_mode & S_IFMT) != S_IFDIR
|| (dp = (DIR *) malloc(sizeof(DIR))) == NULL)
return NULL;
dp->fd = fd;
return dp;
}
/* closedir: close directory opened by opendir */
void closedir(DIR *dp) {
if (dp) {
close(dp->fd);
free(dp);
}
}
In opendir
, the if
statement verifies several conditions:
First, it checks if the directory can be successfully opened;
Second, it assesses if the stbuf
structure can be correctly filled with information;
Third, it ensures that the item being opened is indeed a directory;
And lastly, it confirms that sufficient memory can be allocated for the operation.
In closedir
, we use close()
on dp->fd
and then free(dp)
.
#include <sys/dir.h> /* local directory structure */
/* readdir: read directory entries in sequence */
Dirent *readdir(DIR *dp) {
struct direct dirbuf; /* local directory structure */
static Dirent d; /* return: portable structure */
while (read(dp->fd, (char *) &dirbuf, sizeof(dirbuf)) == sizeof(dirbuf)) {
if (dirbuf.d_ino == 0) /* slot not in use */
continue;
d.ino = dirbuf.d_ino;
strncpy(d.name, dirbuf.d_name, DIRSIZ);
d.name[DIRSIZ] = '\0'; /* ensure termination */
return &d;
}
return NULL;
}
4) A storage Allocator
The malloc
function in C allows for dynamic memory allocation, which is advantageous because it lets us
release memory with free
, optimizing memory utilization. The operating system typically orchestrates this
management scheme. Each block of memory allocated by malloc
is prefaced with a header containing
metadata about the block, including its size.
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
To comply with the system’s alignment requirements, which is crucial for ensuring efficient and error-free
memory access, the header is structured using a union. This approach aligns the memory to the most
restrictive type within the union, often a long
, to accommodate any data type that might be stored in the
allocated block.
Let us now see how malloc
is implemented.
static Header base; /* empty list to get started */
static Header *freep = NULL; /* start of free list */
/* malloc: general-purpose storage allocator */
void *malloc(unsigned nbytes) {
Header *p, *prevp;
Header *morecore(unsigned);
unsigned nunits;
nunits = (nbytes + sizeof(Header) - 1) / sizeof(Header) + 1;
if ((prevp = freep) == NULL) {
base.s.ptr = freep = prevp = &base;
base.s.size = 0;
}
for (p = prevp->s.ptr; ; prevp = p, p = p->s.ptr) {
if (p->s.size >= nunits) {
if (p->s.size == nunits)
prevp->s.ptr = p->s.ptr;
else {
p->s.size -= nunits;
p += p->s.size;
p->s.size = nunits;
}
freep = prevp;
return (void *)(p + 1);
}
if (p == freep) {
if ((p = morecore(nunits)) == NULL)
return NULL;
}
}
}
Explanation:
Note that freep
is a global pointer. The first if statement initializes the free list if it’s empty.
In the for
loop, the function iterates over the free list looking for a block of memory that’s large enough to satisfy the request.
The if statement within for loop checks exactly this and makes necessary adjustments accordingly.
If no block is found after checking all entries (second if statement in the for loop), p
will
eventually be equal to freep
and so we use morecore
function to request for more memory (from
paging space or disk, but details are abstracted away).
morecore implementation
#define NALLOC 1024 /* minimum #units to request */
/* morecore: ask system for more memory */
static Header *morecore(unsigned nu) {
char *cp, *sbrk(int);
Header *up;
if (nu < NALLOC)
nu = NALLOC;
cp = sbrk(nu * sizeof(Header));
if (cp == (char *) -1) /* no space at all */
return NULL;
up = (Header *) cp;
up->s.size = nu;
free((void *)(up + 1));
return freep;
}
Here, sbrk
stands for space break, which is the system call to increase the program’s data space.
We do nu * sizeof(Header)
since nu
is in units of the size of Header
.
The last four lines represent putting the block in a free list.
In particular, free
inserts the block just after the header up + 1
into the free list, making it
available for future allocations.
free implementation
/* free: put block ap in free list */
void free(void *ap) {
Header *bp, *p;
bp = (Header *)ap - 1; /* point to block header */
for (p = freep; !(bp > p && bp < p->s.ptr); p = p->s.ptr) {
if (p >= p->s.ptr && (bp > p || bp < p->s.ptr))
break; /* freed block at start or end of arena */
}
if (bp + bp->s.size == p->s.ptr) { /* join to upper nbr */
bp->s.size += p->s.ptr->s.size;
bp->s.ptr = p->s.ptr->s.ptr;
} else {
bp->s.ptr = p->s.ptr;
}
if (p + p->s.size == bp) { /* join to lower nbr */
p->s.size += bp->s.size;
p->s.ptr = bp->s.ptr;
} else {
p->s.ptr = bp;
}
freep = p;
}
The for
loop starts with the pointer p
set to freep
, the beginning of the free list, and checks whether the
block pointer bp
falls between the current block p
and the next block p->s.ptr
. If bp
is not in this range, the
loop continues to the next block in the free list.
Subsequent two if
statements are executed to merge adjacent free blocks in the memory arena;
this process is known as coalescing.
Inquiry points
1) What is meant by the following line of code within a for loop?
if (p >= p->s.ptr && (bp > p || bp < p->s.ptr))
This means that the iteration reached a full cycle.
It suggests that bp
does not fit between any two existing blocks and should be inserted at the edges of
the memory arena managed by the free list.
2) Why do we need to subtract by 1 when assigning bp
?
Because we need to move the pointer back to the start of the header, instead of start of the memory block itself.
References
M. Ritchie, B. W. K. & D. (1988). C Programming Language: Vol. Second Edition. AT&T Bell Laboratories.