My favourite aspects of POSIX/BSD systems

By Joel Yliluoma, Wednesday 26th May 2004

I love programming in POSIX/BSD systems, generally referred as unix systems. I love it because it's so easy, wellthought, well-readable and standard. I decided to dedicate a webpage for listing the things I love in it.
Here goes.

File-descriptor (FD) way of thinking

open(), which opens a file, returns a FD.
socket(), which creates a network socket, returns a FD.
pipe(), which creates a pipe, creates two FDs (one for writing, one for reading).
accept(), which receives an incoming connection, returns a FD.

Everything is a FD. The type of a FD is int.
You do not need writetosocket(), writetopipe(), writetofile(), but just write(). The standard function write() writes to anything that has a FD.

The standard function close() closes anything that has a FD.
This also means that you can pass a socket as the stdin/stdout of a program. You don't need a converter in between.

File descriptors can be turned into FILE structure pointers with fdopen(). This means that you can do your network communication even with fread() and fwrite() from ANSI C, gaining the buffering and error management from the standard C library.

Process pipelining

unistd.h: fork(), pipe()

Thanks to these features, we don't need to have all features in the same program. You can have one program that records sound from soundcard, second that mixes it together with music, third that converts the result to mp3, and fourth that serves HTTP clients, and there you have a webradio. All of the parts are separately replaceable.
The pipeline can be deep and wide, if needed. It may combine data from many sources and then again spread to many directions.
Named pipes (aka. fifos) are also very handy.

Memory mapped files

sys/mman.h: mmap(), munmap()

Sometimes with (big) datafiles, it's ultimately handy that you can mmap() the file to memory.

You don't need to allocate buffers. You don't need to care about allocation / deallocation and memory constraints. You just open the file, mmap it, and close it. Then you continue using the memory pointer returned by mmap. It will do disk access only when you need to actually access those parts, and if the system is low on memory, it will automatically forget parts you don't need (mmap uses disk cache, not the program memory).

You can do the memory mapping in readonly mode, read-write mode, writeonly mode or even executable mode. If you violate the given mode, you'll receive a SIGSEGV signal, which kills the program if uncaught.

In Linux, executables and dynamic libs are loaded with memory mapping, which means the readonly images don't take program memory. Programs are literally ran from disk cache. If the program is inactive and the system is low on memory, the image doesn't need to be swapped - it can simply be discarded, and reloaded from the executable file when it's needed again.

Polling, that is: multiple FD management with minimal load

sys/poll.h: poll()

Because almost everything is used via a FD, poll() is a really powerful utility.

The poll() function is the center of any program that handles many data sources at the same time.
poll() will listen to a given set of FDs for a given time and report back if some of them had errors, pending data for reading, or buffer free for writing.

With poll(), a single threadless process can act as a webserver serving hundreds of clients simultaneously while waiting for more and still being able to handle errors.

poll() doesn't necessarily block. But you can make it block. You can tell it to wait 34,52 seconds for incoming data. Or infinitely. Or return immediately if there's nothing happening.

It's a really powerful function. On older systems, you can use select() for the same purpose, with the same ideology but somewhat different syntax.

Timers

sys/time.h: setitimer()
Timers are sometimes handy. You don't need to play with IRQs in
modern operating systems. Just call setitimer() to create a timer,
and you'll receive an alarm signal any time you want - like 10 ms
in future for example.

Dynamic libs

dlfcn.h: dlopen(), dlsym()

Loading dynamically linked routines and executing them is just awesomely cool. You can literally conjure up a world from a file.

The LD_PRELOAD environment variable.

Overriding dynamic functions is just awesomely cool. If you want to extend the standard fopen() to log all fopen() calls, you can do it: just create a new shlib, define your version of fopen() there, remember to call the old fopen() routine if necessary, and use your new lib in LD_PRELOAD statement for opening new programs.

Shared memory

sys/shm.h: shmget(), shmat(), shmdt()

When you create new processes, sometimes it's handiest to let them share the same memory area instead of using pipes to talk between them. Some people use threads as a solution to this, but the shared memory feature allows you to control which memory areas are shared. It keeps the borders clean.

Here's what's really curious: shared memory can be shared between two completely distinct programs that are started at different times. The client program only needs to know the magic key to access the shared memory, and of course have permissions (man shmctl) to handle the memory.

(But I admit that threads are often more useful than shared memory. Threading can be quite simple too.)

The file system

I like symbolic links. (man ln, man 2 symlink, man 2 readlink)
I like "hard" links. (man ln, man 2 link, man 2 unlink)
I like named pipes. (man mkfifo, man 2 mknod)

And I like the fact that all of them can be stat()ted!
Btw, also fstat() accepts any FD you can think of.

Symbolic links are really useful for having shortcuts around your directory hierachy.

Hard links are really useful for having certain files appearing in different places. You can have your handytools.cc file hardlinked into all of your source directories, and if you edit it in one of the directories, the changes are immediately shared by all of the projects. Saves diskspace, saves effort.

Unlike the symbolic link, a hard link is not a special file. A hard link is a connection between two entries in the filesystem. You can actually hardlink anything except directories.

Device interfaces

One thing I didn't like in DOS was that every program had to provide their own drivers for the hardware they wanted to use. Modern operating systems are wiser, and the casual programmer does no longer need to know about IRQs or DMAs and all various kinds of sound cards to be able to output sound.

The easiest way to produce sound (without audio-oriented programs) in a Linux system is: cat audio.raw > /dev/dsp

When you open the audio device and setup it, you can then pass it as as the stdout/stdin for programs and rest in peace.

Solutions are similar for all devices. Some devices require more ioctl()s than others, but all of them are used through a /dev/ entry somehow.

While the details of the device interfaces depend on the operating system and/or kernel version, this actual ideology is the same in all unix systems.

Yes, this also means that you can write to your soundcard with the same functions you use to write to a network socket or to a console terminal.

Sparse files

Unix systems allow users to create multiple megabyte files that only take a couple of kilobytes of disk space. This is accomplished by creating a file, but not writing to all positions of it, i.e. leaving holes in it. It is useful for maps and databases, where the amount of actual data may be small compared to the difference between the greatest and smallest index of the file.

In FAT filesystems, creating a file, seeking to a high position, say around 200 MB, and writing, would cause the first 200 MB filled with zero, eating immediately 200 MB of disk space. In unix systems, only the data you actually write will use disk space.

Curiously, Microsoft also recently introduced this feature to their NTFS format, where the feature has to be explicitly set file-by-file if it's used. They also made an API for querying ranges of holes in the files, which seems to be one thing in which they win over Posix.


Back to: writings, fandom.

Last edited at: 2007-12-04T14:29:44+00:00