By Joel Yliluoma, Wednesday 26th May 2004
I love programming in POSIX/BSD systems, generally referred as unix systems.
I love it because it's so easy, wellthought, well-readable and standard.
I decided to dedicate a webpage for listing the things I love in it.
Here goes.
File-descriptor (FD) way of thinking
open(), which opens a file, returns a FD.
socket(), which creates a network socket, returns a FD.
pipe(), which creates a pipe, creates two FDs (one for writing, one for reading).
accept(), which receives an incoming connection, returns a FD.
Everything is a FD. The type of a FD is
int.
You do not need
writetosocket(),
writetopipe(),
writetofile(), but just
write(). The standard function
write() writes to anything that has a FD.
The standard function
close() closes anything that has a FD.
This also means that you can pass a socket as the stdin/stdout
of a program. You don't need a converter in between.
File descriptors can be turned into
FILE structure pointers
with
fdopen(). This means that you can do your network communication
even with
fread() and
fwrite() from ANSI C, gaining the
buffering and error management from the standard C library.
Process pipelining
unistd.h: fork(), pipe()
Thanks to these features, we don't need to have all features in the
same program. You can have one program that records sound from soundcard,
second that mixes it together with music, third that converts the result
to mp3, and fourth that serves HTTP clients, and there you have a webradio.
All of the parts are separately replaceable.
The pipeline can be deep and wide, if needed. It may combine data from
many sources and then again spread to many directions.
Named pipes (aka. fifos) are also very handy.
Memory mapped files
sys/mman.h: mmap(), munmap()
Sometimes with (big) datafiles, it's ultimately handy
that you can
mmap() the file to memory.
You don't need to allocate buffers. You don't need to care
about allocation / deallocation and memory constraints.
You just open the file, mmap it, and close it. Then you
continue using the memory pointer returned by mmap.
It will do disk access only when you need to actually access
those parts, and if the system is low on memory, it will
automatically forget parts you don't need (mmap uses disk cache,
not the program memory).
You can do the memory mapping in readonly mode, read-write mode,
writeonly mode or even executable mode. If you violate the given
mode, you'll receive a
SIGSEGV signal, which kills the
program if uncaught.
In Linux, executables and dynamic libs are loaded with memory mapping,
which means the readonly images don't take program memory. Programs
are literally ran from disk cache. If the program is inactive and the
system is low on memory, the image doesn't need to be swapped - it
can simply be discarded, and reloaded from the executable file when
it's needed again.
Polling, that is: multiple FD management with minimal load
sys/poll.h: poll()
Because almost everything is used via a FD,
poll() is a really powerful
utility.
The
poll() function is the center of any program that handles many
data sources at the same time.
poll() will listen to a given set of FDs for a given time and report
back if some of them had errors, pending data for reading, or buffer
free for writing.
With
poll(), a single threadless process can act as a webserver serving
hundreds of clients simultaneously while waiting for more and still
being able to handle errors.
poll() doesn't necessarily block. But you can make it block. You can
tell it to wait 34,52 seconds for incoming data. Or infinitely.
Or return immediately if there's nothing happening.
It's a really powerful function.
On older systems, you can use
select() for the same purpose, with
the same ideology but somewhat different syntax.
Timers
sys/time.h: setitimer()
Timers are sometimes handy. You don't need to play with IRQs in
modern operating systems. Just call
setitimer() to create a timer,
and you'll receive an alarm signal any time you want - like 10 ms
in future for example.
Dynamic libs
dlfcn.h: dlopen(), dlsym()
Loading dynamically linked routines and executing them
is just awesomely cool.
You can literally conjure up a world from a file.
The LD_PRELOAD environment variable.
Overriding dynamic functions is just awesomely cool.
If you want to extend the standard
fopen() to log all
fopen() calls, you can do it: just create a new shlib,
define your version of
fopen() there, remember to call
the old
fopen() routine if necessary, and use your new
lib in
LD_PRELOAD statement for opening new programs.
Shared memory
sys/shm.h: shmget(), shmat(), shmdt()
When you create new processes, sometimes it's handiest to let them
share the same memory area instead of using pipes to talk between them.
Some people use threads as a solution to this, but the shared memory
feature allows you to control which memory areas are shared. It keeps
the borders clean.
Here's what's really curious: shared memory can be shared between
two completely distinct programs that are started at different times.
The client program only needs to know the magic key to access the shared
memory, and of course have permissions (
man shmctl) to handle the memory.
(But I admit that threads are often more useful than shared memory.
Threading can be
quite simple too.)
The file system
I like symbolic links. (man ln, man 2 symlink, man 2 readlink)
I like "hard" links. (man ln, man 2 link, man 2 unlink)
I like named pipes. (man mkfifo, man 2 mknod)
And I like the fact that all of them can be
stat()ted!
Btw, also
fstat() accepts any FD you can think of.
Symbolic links are really useful for having shortcuts around
your directory hierachy.
Hard links are really useful for having certain files appearing in
different places. You can have your handytools.cc file hardlinked
into all of your source directories, and if you edit it in one of
the directories, the changes are immediately shared by all of the
projects. Saves diskspace, saves effort.
Unlike the symbolic link, a hard link is not a special file.
A hard link is a connection between two entries in the filesystem.
You can actually hardlink anything except directories.
Device interfaces
One thing I didn't like in DOS was that every program had to provide their
own drivers for the hardware they wanted to use. Modern operating systems
are wiser, and the casual programmer does no longer need to know about IRQs
or DMAs and all various kinds of sound cards to be able to output sound.
The easiest way to produce sound (without audio-oriented programs)
in a Linux system is:
cat audio.raw > /dev/dsp
When you open the audio device and setup it, you can then pass
it as as the stdout/stdin for programs and rest in peace.
Solutions are similar for all devices. Some devices require
more
ioctl()s than others, but all of them are used
through a
/dev/ entry somehow.
While the details of the device interfaces depend on the operating system
and/or kernel version, this actual ideology is the same in all unix systems.
Yes, this also means that you can write to your soundcard
with the same functions you use to write to a network socket
or to a console terminal.
Sparse files
Unix systems allow users to create multiple megabyte files that only take
a couple of kilobytes of disk space. This is accomplished by creating a file,
but not writing to all positions of it, i.e. leaving holes in it.
It is useful for maps and databases, where the amount of actual data may be
small compared to the difference between the greatest and smallest index of
the file.
In FAT filesystems, creating a file, seeking to a high position, say around
200 MB, and writing, would cause the first 200 MB filled with zero, eating
immediately 200 MB of disk space. In unix systems, only the data you actually
write will use disk space.
Curiously, Microsoft also recently introduced this feature to their NTFS format,
where the feature has to be explicitly set file-by-file if it's used. They
also made an API for querying ranges of holes in the files, which seems to
be one thing in which they win over Posix.
Back to:
writings,
fandom.