Cromfs: Compressed ROM filesystem for Linux (user-space)

0. Contents

   1. Purpose
   2. News
   3. Overview
   4. Limitations
   5. Development status
   6. Comparing to other filesystems
      6.1. Compression tests
      6.2. Speed tests
   7. Getting started
   8. Tips
         8.0.1. To improve compression
         8.0.2. To improve mkcromfs speed
         8.0.3. To control the memory usage
         8.0.4. To control the filesystem speed
         8.0.5. Using cromfs with automount
   9. Understanding the concepts
         9.0.1. Inode
         9.0.2. Block
         9.0.3. Fblock
         9.0.4. Block number and block table
         9.0.5. Data locator
         9.0.6. Block indexing (mkcromfs only)
         9.0.7. Random compress period (mkcromfs only)
         9.0.8. Where are the inodes stored then?
   10. Using cromfs in bootdisks and tiny Linux distributions
   11. Other applications of cromfs
   12. Copying and contributing
      12.1. Contribution wishes
   13. Requirements
   14. Links
   15. Downloading

1. Purpose

cromfs Cromfs is a compressed read-only filesystem for Linux. It uses the LZMA compression algorithm from 7-zip, and a powerful block merging mechanism, that is especially efficient with gigabytes of large files having lots of redundancy.

The primary design goal of cromfs is compression power. It is much slower than its peers, and uses more RAM. If all you care about is "powerful compression" and "random file access", then you will be happy with cromfs.

The creation of cromfs was inspired from Squashfs and Cramfs.

The downloading section is at the bottom of this page.

2. News

See the ChangeLog.

3. Overview

[cromfs size demo]
  • Data, inodes, directories and block lists are stored compressed
  • Files are divided into fragments and those fragments are stored as offsets to solid blocks (fblocks) containing data, meaning that parts of different files are compressed together for effective compression, and identical fragments are compressed only once.
    • Duplicate inodes, files and even duplicate file portions are detected and stored only once without extra overhead
  • Most of inode types recognized by Linux are supported (see comparisons).
  • The LZMA compression is used for fblocks. In the general case, LZMA compresses better than gzip and bzip2.
  • Being a filesystem, the files on a cromfs volume can be randomly accessed in arbitrary order; by all the means one would expect, including memorymapping.
  • Works on 64-bit and 32-bit systems.
See the documentation of the cromfs format for technical details (also included in the source package as doc/FORMAT).

4. Limitations

  • Filesystem is write-once, read-only. It is not possible to append to a previously-created filesystem, nor it is to mount it read-write.
  • Max filesize: 264 bytes (16777216 TB), but 256 TB with default settings.
  • Max number of files in a directory: 230 (smaller if filenames are longer, but still more than 100000 in almost all cases)
  • Max number of inodes (all files, dirs etc combined): 260, but depends on file sizes
  • Max filesystem size: 264 bytes (16777216 TB)
  • There are no "." or ".." entries in directories. This does not matter in Linux.
  • cromfs and mkcromfs are slower than their peers.
  • The cromfs-driver consumes a lot of memory. It is not suitable for very size-constrained systems.
  • Maximum filename length: 4294967295 bytes
  • Maximum symlink length: 65535 bytes
  • Being an user-space filesystem, it might not be suitable for root filesystems of rescue, tiny-Linux and installation disks. (Facts needed.)
  • For device inodes, hardlink count of 1 is assumed. (This has no effect to compression efficiency.)

5. Development status

Development status: Stable. (Really: progressive.)
(Fully functional release exists, but is updated from time to time.)

Cromfs has been in beta stage for over a year, during which time very little bugs have been reported, and no known bugs remain at this time.

It does not make sense to keep it as "beta" indefinitely, but since there is never going to be a "final" version — new versions may always be released — it is now labeled as "progressive".

In practice, the author trusts it works as advertised, but as per GPL policy, there is NO WARRANTY whatsoever. The entire risk to the quality and performance of the program suite is with you.

#include "GNU gdb/show warranty"

6. Comparing to other filesystems

This is all very biased probably, hypothetical, and by no means a scientific study, but here goes:

Legend: Good, Bad, Partial
Feature Cromfs Cramfs (1.1) Squashfs (4.2) Cloop
Compression unit adjustable arbitrarily (2 MB default) adjustable, must be power of 2 (4 kB default) adjustable, must be power of 2 (1 MB max) adjustable in 512-byte units (1 MB max)
Files are compressed (up to block size limit) Together Individually Individually, except for fragments Together
Maximum file size 16 EB (264 bytes) (theoretical; actual limit depends on settings) 16 MB (224 bytes) 16 EB (264 bytes)
(4 GB before v3.0)
Depends on slave filesystem
Maximum filesystem size 16 EB (264 bytes) 272 MB 16 EB (264 bytes)
(4 GB before v3.0)
16 EB (264 bytes)
Duplicate whole file detection Yes No Yes No
Hardlinks detected and saved Yes Yes Yes, since v3.0 depends on slave filesystem
Near-identical file detection Yes (identical blocks) No No No
Compression method LZMA gzip (patches exist to use LZMA) gzip, LZO (since 4.1), XZ (LZMA2, since 4.2) gzip or LZMA
Ownerships uid,gid (since version 1.1.2) uid,gid (but gid truncated to 8 bits) uid,gid Depends on slave filesystem
Timestamps mtime only None mtime only Depends on slave filesystem
Endianess-safety Theoretically safe (untested on bigendian) Safe, but not exchangeable Safe, but not exchangeable Depends on slave filesystem
Linux kernel driver No Yes Yes Yes
Userspace driver Yes (fuse) No An extraction tool (unsquashfs) Yes (third-party, using fuse).
Cloop itself provides an extraction tool (extract_compressed_fs), but cannot be used to extract a single file.
Windows driver No No No No
Appending to a previously created filesystem No No Yes No (the slave filesystem can be decompressed, modified, and compressed again, but in a sense, so can every other of these.)
Mounting as read-write No No No No
Supported inode types all all all Depends on slave filesystem
(good for compression, bad for access speed)
Depends on compression settings None File tails only Depends on slave filesystem
Holes (aka. sparse files); storage optimization of blocks which consist entirely of nul bytes Any two identical blocks are merged and stored only once. Supported Supported Depends on slave filesystem
Padding (partially filled sectors, wastes space) No Unknown Mostly not Depends on slave filesystem, usually yes
Extended attributes No Unknown Unknown Unknown, may depend on slave filesystem

Note: If you notice that this table contains wrong information, please contact me telling what it is and I will change it.

Note: cromfs now saves the uid and gid in the filesystem. However, when the uid is 0 (root), the cromfs-driver returns the uid of the user who mounted the filesystem, instead of root. Similarly for gid. This is both for backward compatibility and for security.
If you mount as root, this behavior has no effect.

6.1. Compression tests

Note: I use the -e and -r options in all of these mkcromfs tests to avoid unnecessary decompression+recompression steps, in order to speed up the filesystem generation. This has no effect in compression ratio.

In this table, k equals 1024 bytes (210) and M equals 1048576 bytes (220).

Note: Again, these tests have not been peer-verified so it is not a real scientific study. But I attest that these are the results I got.
Item 10783 NES ROMs (2523 MB) Firefox source code (233 MB)
(MD5sum 5a6ca3e4ac3ebc335d473cd3f682a916)
Damn small Linux liveCD (113 MB)
(size taken from "du -c" output in the uncompressed filesystem)
Cromfs mkcromfs -s65536 -c16 -a… -b… -f…
With 16M fblocks, 2k blocks: 198,553,574 bytes (v1.4.1)
With 16M fblocks, 1k blocks, 194,813,427 bytes (v1.4.1)
With 16M fblocks, ¼k blocks: 187,575,926 bytes (v1.5.0)
With default options: 33,866,164 bytes (v1.5.2)
(Peak memory use (RSS): 97 MB (mostly comprising of memory-mapped files)
mkcromfs -f1048576
With 64k blocks (-b65536), 39,778,030 bytes (v1.2.0)
With 16k blocks (-b16384), 39,718,882 bytes (v1.2.0)
With 1k blocks (-b1024), 40,141,729 bytes (v1.2.0)
Cramfs v1.1 mkcramfs -b65536
dies prematurely, "filesystem too big"
with 2M blocks (-b2097152), 65,011,712 bytes
with 64k blocks (-b65536), 64,618,496 bytes
with 4k blocks (-b4096), 77,340,672 bytes
mkcramfs -b65536
51,445,760 bytes
Squashfs v3.2 mksquashfs -b65536
(using an optimized sort file) 1,185,546,240 bytes
49,139,712 bytes
mksquashfs -b65536
50,028,544 bytes
Cloop v2.05~20060829 create_compressed_fs
(using an iso9660 image created with mkisofs -R)
using 7zip, 1M blocks (-B1048576 -t2 -L-1): 1,136,789,006 bytes
(using an iso9660 image created with mkisofs -RJ)
using 7zip, 1M blocks (-B1048576 -L-1): 46,726,041 bytes
(1 MB is the maximum block size in cloop)
(using an iso9660 image)
using 7zip, 1M blocks (-B1048576 -L-1): 48,328,580 bytes
using zlib, 64k blocks (-B65536 -L9): 50,641,093 bytes
7-zip (p7zip) v4.30
(an archive, not a filesystem)
7za -mx9 -ma=2 a
with 32M blocks (-md=32m): 235,037,017 bytes
with 128M blocks (-md=128m): 222,523,590 bytes
with 256M blocks (-md=256m): 212,533,778 bytes
7za -mx9 -ma=2 -md=256m a
29,079,247 bytes
(Peak memory use: 2545 MiB)
7za -mx9 -ma2 a
37,205,238 bytes
An explanation why mkcromfs beats 7-zip in the NES ROM packing test:

7-zip packs all the files together as one stream. The maximum dictionary size in 32-bit mode is 256 MB. (Note: The default for "maximum compression" is 32 MB.) When 256 MB of data has been packed and more data comes in, similarities between the first megabytes of data and the latest data are not utilized. For example, Mega Man and Rockman are two almost identical versions of the same image, but because there's more than 400 MB of files in between of those when they are processed in alphabetical order, 7-zip does not see that they are similar, and will compress each one separately.
7-zip's chances could be improved by sorting the files so that it will process similar images sequentially. It already attempts to accomplish this by sorting the files by filename extension and filename, but it is not always the optimal way, as shown here.

mkcromfs however keeps track of all blocks it has encoded, and will remember similarities no matter how long ago they were added to the archive. (Click here to read how it does that.) This is why it outperforms 7-zip in this case, even when it only used 16 MB fblocks.

In the liveCD compressing test, mkcromfs does not beat 7-zip because this advantage is too minor to overcome the overhead needed to provide random access to the filesystem. It still beats cloop, squashfs and cramfs though.

6.2. Speed tests

Speed testing hasn't been done yet. It is difficult to test the speed, because it depends on factors such as cache (with compressed filesystems, decompression consumes CPU power but usually only needs to be done once) and block size (bigger blocks need more time to decompress).

However, in the general case, it is quite safe to assume that mkcromfs is the slowest of all. The same goes for resource testing (RAM).

cromfs-driver requires an amount of RAM proportional to a few factors. It can be approximated with this formula:

Max_RAM_usage = FBLOCK_CACHE_MAX_SIZE × fblock_size + READDIR_CACHE_MAX_SIZE × 60k + 8 × num_blocks


  • fblock_size is the value of "--fblock" used when the filesystem was created
  • FBLOCK_CACHE_MAX_SIZE is a constant defined in (default: 10)
  • READDIR_CACHE_MAX_SIZE is a constant defined in (default: 3)
  • 60k is an estimate of a large directory size (2000 files with average name length of 10-20 letters)
  • num_blocks is the number of block structures in the filesystem (maximum size is ceil(total_size_of_files / block_size), but it may be smaller.)
For example, for a 500 MB archive with 16 kB blocks and 1 MB fblocks, the memory usage would be around 10.2 MB.

7. Getting started

  1. Install the development requirements: make, gcc-c++ and fuse
    • Remember that for fuse to work, the kernel must also contain the fuse support. Do "modprobe fuse", and check if you have "/dev/fuse" and check if it works.
      • If "/dev/fuse" does not exist after loading the "fuse" module, create it manually (as root):
        # cd /dev
        # mknod fuse c 10 229
      • If an attempt to read from "/dev/fuse" (as root) gives "no such device", it does not work. If it gives "operation not permitted", it might work.
  2. Configure the source code:
    $ ./configure
    It will automatically determine your software environment (mainly, the features supported by your compiler).
  3. Build the programs:
    $ make

    This builds the programs "cromfs-driver", "cromfs-driver-static", "util/mkcromfs", "util/cvcromfs" and "util/unmkcromfs".

  4. Create a sample filesystem:
    $ util/mkcromfs . sample.cromfs
  5. Mount the sample filesystem:
    $ mkdir sample
    $ ./cromfs-driver sample.cromfs sample
  6. Observe the sample filesystem:
    $ cd sample
    $ du
    $ ls -al
  7. Unmounting the filesystem:
    $ cd ..
    $ fusermount -u sample

8. Tips

8.0.1. To improve compression

To improve the compression, try these tips:
  • Do not change --lzmafastbytes. The default value is 273, which is the maximum possible.
  • Specify values for --lzmabits , such as --lzmabits 2,0,3 . This will make the final compression phase considerably faster.
  • Adjust the block size (--bsize) in mkcromfs. If your files have a lot identical content, aligned at a certain boundary, use that boundary as the block size value. If you are uncertain, use a small value (500-5000) rather than a bigger value (20000-400000). Too small values will however make inodes large, so keep it sane.
    Note: The value does not need to be a power of two.
  • Adjust the fblock size (--fsize) in mkcromfs. Larger values cause almost always better compression. However, large values also increase memory consumption when the filesystem is mounted, so keep it sane. If uncertain, use the default value (2097152).
    Note: The value does not need to be a power of two.
  • Adjust the --autoindexperiod option (-A). A smaller value will increase the chances of mkcromfs finding an identical block from something it already processed (if your data has that opportunity). Finding that two blocks are identical always means better compression.
  • Sort your files. Files which have similar or partially identical content should be processed right after one other.
  • Adjust the --bruteforcelimit option (-c). Larger values will require mkcromfs to check more fblocks for each block it encodes (making the encoding much slower), in the hope it improves compression.
    Basically, --bruteforcelimit is a way to virtually multiply the --fsize (thus improving compression) by an integer factor without increasing the memory or CPU usage of cromfs-driver. Using it is recommended, unless you want mkcromfs to be fast.
    The upper limit on meaningful values for the -c option is the number of fblocks on the resulting filesystem.
    If uncertain, try something like the value of 33554432 / fsize. For 2 MB fblocks, that would make -c16.
  • You can approximate how many blocks your filesystem will have by this formula: total_amount_of_unique_data / bsize.
    • If the value is less than 65536, use the --16bitblocknums (-2) option. It will theoretically save (number_of_blocks*2) bytes of uncompressed room by making inodes smaller.
    • If the value is less than 16777216, use the --24bitblocknums (-3) option. It will theoretically save (number_of_blocks) bytes of uncompressed room by making inodes smaller.
    Due to LZMA compression, the saving in file size might become neglible, but it will make cromfs-driver slightly faster, and there are no speed penalties.
  • Adjust the --lzmabits values. This affects the compression phase of mkcromfs (the last phase after blockifying)
    • Use "--lzmabits full" if you have absolutely no regard for compression time — it will try each and every combination of pb, lp and lc and choose the one that results in best LZMA compression — for every compressed item separately. It is 225 times slower than the normal way.
    • Use "--lzmabits auto" if you want mkcromfs to use a heuristic algorithm to choose the parameters based on a few experiments. It is 27…200 times slower than the normal way, depending on the data. This is enabled by default. Specifying "full" or giving the values manually overrides it.

8.0.2. To improve mkcromfs speed

To improve the filesystem generation speed, try these tips:
  • Use the --decompresslookups option (-e), if you have the diskspace to spare.
  • Use a large value for the --randomcompressperiod option, for example -r100000. This together with -e will significantly improve the speed of mkcromfs, on the cost of temporary disk space usage. A small value causes mkcromfs to randomly compress one of the temporary fblocks more often. It has no effect to the compression ratio of the resulting filesystem.
  • Use the TEMP environment variable to control where the temp files are written. Example: TEMP=~/cromfs-temp ./mkcromfs …
  • Specify a low value for --lzmafastbytes in the mkcromfs command line. This will cause LZMA to consume less memory and be faster, at the cost of compression power. The default value is 273 (maximum). The minimum possible value is 5.
  • Use larger block size (--bsize). Smaller blocks mean more blocks which means more work. Larger blocks are less work.
  • Do not use the --bruteforcelimit option (-c). The default value 0 means that the candidate fblock will be selected straightforwardly.
  • If you have a multicore system, add the --threads option. Select --threads 2 if you have a dual core system, for example. You can also use a larger value than the number of cores, but same guidelines apply as with the -j in GNU make. Currently this option does not affect compression power, so it is recommended to use it.
  • Use "--lzmabits 2,0,3" (or other values of your choice) to make LZMA compression about 27 times faster, with a slight cost of compression power. The default option is "auto", which tests a number of different lzmabits values to end up with hopefully optimal compression.

8.0.3. To control the memory usage

To control the memory usage, use these tips:
  • Adjust the fblock size (--fsize). The memory used by cromfs-driver is directly proportional to the size of your fblocks. It keeps at most 10 fblocks decompressed in the RAM at a time. If your fblocks are 4 MB in size, it will use 40 MB at max.
  • In mkcromfs, adjust the --autoindexperiod option (-A). This will not have effect on the memory usage of cromfs-driver, but it will control the memory usage of mkcromfs. If you have lots of RAM, you should use smaller --autoindexperiod (because it will improve the chances of getting better compression results), and use bigger if you have less RAM.
  • Find the CACHE_MAX_SIZE settings in and edit them. This will require recompiling the source. (In future, this should be made a command line option for cromfs-driver.)
  • In mkcromfs, adjust the block size (--bsize). The RAM usage of mkcromfs is directly proportional to the number of blocks (and the filesystem size), so smaller blocks require more memory and larger require less.
  • Adjust the --blockindexmethod option. Different values of this option have different effect on the virtual memory use of mkcromfs (it does not affect cromfs-driver, though). Use "--blockindexmethod none" and "-A0" if you want the smallest possible memory usage for your selected block size. It has an impact on the compression power, but you can compensate it by using a large value for the --bruteforcelimit option instead, if you don't mind longer runtime.

8.0.4. To control the filesystem speed

To control the filesystem speed, use these tips:
  • The speed of the underlying storage affects.
  • The bigger your fblocks (--fsize), the bigger the latencies are. cromfs-driver caches the decompressed fblocks, but opening a non-cached fblock requires decompressing it entirely, which will block the user process for that period of time.
  • The smaller your blocks (--bsize), the bigger the latencies are, because there will be more steps to process for handling the same amount of data.
  • Use the most powerful compiler and compiler settings available for building cromfs-driver. This helps the decompression and cache lookups.
  • Use fast hardware…

8.0.5. Using cromfs with automount

Since version 1.3.0, you can use cromfs in conjunction with the automount (autofs) feature present in Linux kernel. This allows you to mount cromfs volumes automatically on demand, and umount them when they are not used, conserving free memory.

This line in your autofs file (such as auto.misc) will do the trick (assuming the path you want is "books", and your volume is located at "/home/myself/books.cromfs"):

books -fstype=fuse,ro,allow_other    :/usr/local/bin/cromfs-driver\#/home/myself/books.cromfs

9. Understanding the concepts

Skip over this section if you don't think yourself as technically inclined.

cromfs workings are explained in a nutshell here.

9.0.1. Inode

Every object in a filesystem (from user's side) is an "inode". This includes at least symlinks, directories, files, fifos and device entries. The inode contains the file attributes and its contents, but not its name. (The name is contained in a directory listing, along with the reference to the inode.) This is the traditional way in *nix systems.

When a file is "hardlinked" into multiple locations in the filesystem, the inode is not copied. The inode number just is listed in multiple directories.
A symlink however, is an entirely new inode unrelated to the file it points to.

The file attributes and the file contents are stored separately. In cromfs, the inode contains an array of block numbers, which are necessary in finding the actual contents of the file.

9.0.2. Block

The contents of every file (denoted by the inode) are divided into "blocks". The size of this block is controlled by the --bsize commandline parameter. For example, if your file is 10000 bytes in size, and your bsize is 4000, the file contains three blocks: 4000 + 4000 + 2000 bytes. The inode contains thus three block numbers, which refer to entries in the block table.

Only regular files, symlinks and directories have "contents" that need storing. Device entries for example, do not have associated contents.
The contents of a directory is a list of file names and inode numbers.

Every time mkcromfs stores a new block, a new block number is generated to denote that particular block (this number is stored in the inode), and a new data locator is stored to describe where the block is found (the locator is stored in the block table).

If mkcromfs reused a previously generated data locator, only the block number needs to be stored.

9.0.3. Fblock

Fblock is a storage unit in a cromfs filesystem. It is the physical container of block data for multiple files.
When mkcromfs creates a new filesystem, it splits each file into blocks (see above), and for each of those blocks, it determines which fblock they go to. The maximum fblock size is mandated by the --fsize commandline parameter.

Each fblock is compressed separately, so a few big fblocks compresses better than many small fblocks. Cromfs automatically creates as many fblocks as is needed to store the contents of the entire filesystem being created.

A fblock is merely a storage. Regardless of the sizes of the blocks and fblocks, the fblock may contain any number of blocks, from 1 to upwards (no upper limit). It is beneficial for blocks to overlap, and this is an important source of the power of cromfs.

The working principle behind fblocks is: What is the shortest string that can contain all these substrings?

9.0.4. Block number and block table

The filesystem contains a structure called "blktab" (block table), which is a list of data locators. This list is indexed by a block number.
Each locator describes, where to find the particular block denoted by this block number.

At the end of the filesystem creation process, the blktab is compressed and becomes "blkdata" before being written into the filesystem.
(These names are only useful when referencing the filesystem format documentation; they are not found in the filesystem itself.

9.0.5. Data locator

A data locator tells cromfs, where to find the contents of this particular block. It is composed of an fblock number and an offset into that fblock. These locators are stored in the global block table, as explained above.

Multiple files may be sharing same data locators, and multiple data locators may be pointing to same, partially overlapping data.

9.0.6. Block indexing (mkcromfs only)

When mkcromfs stores blocks, it remembers where it stored them, so that if it later finds an identical block in another file (or the same file), it won't need to search fblocks again to find a best placement.
The index is a map of block hashes to data locators and block numbers.

The --autoindexperiod (-A) setting can be used to extend this mechanism, that in addition to the blocks it has already encoded, it will memorize more locations in those fblocks — create "just in case" data locators for future use but not actually save them in the block table, unless they're utilized later. This helps compression when the number of fblocks searched (--bruteforcelimit) is low compared to the number of fblocks generated, at the cost of memory consumed by mkcromfs, and has also potential to make mkcromfs faster (but also slower).

9.0.7. Random compress period (mkcromfs only)

When mkcromfs runs, it generates a temporary file for each fblock of the resulting filesystem. If your resulting filesystem is large, those fblocks will take even more of space, a lot anyway.
To save disk space, mkcromfs compresses those fblocks when they are not accessed. However, if it needs to access them again (to search the contents for a match), it will need to decompress them first.

This compressing+decompressing may consume lots of time. It does not help the size of the resulting filesystem; it only saves some temporary disk space.

If you are not concerned about temporary disk space, you should give the --randomcompressperiod option a large number (such as 10000) to prevent it from needlessly decompressing+compressing the fblocks over and over again. This will improve the speed of mkcromfs.

The --decompresslookups option is related. If you use the --randomcompressperiod option, you should also enable --decompresslookups.

By the way, the temporary files are written into wherever your TEMP environment variable points to. TMP is also recognized.

9.0.8. Where are the inodes stored then?

All the inodes of the filesystem are also stored in a file, together. That file is packed like any one other file, split into blocks and scattered into fblocks. That data locator list of that file, is stored in a special inode called "inotab", but it is not seen in any directory. The "inotab" has its own place in the cromfs file.

10. Using cromfs in bootdisks and tiny Linux distributions

Cromfs can be used in bootdisks and tiny Linux distributions only by starting the cromfs-driver from a ramdisk (initrd), and then pivot_rooting into the mounted filesystem (but not before the filesystem has been initialized; there is a delay of a few seconds).

Theoretical requirements to use cromfs in the root filesystem:

  • Cromfs-driver should probably be statically linked (the Makefile automatically builds a static version since version 1.2.2).
  • An initrd, that contains the cromfs-driver program
  • Fuse driver in the kernel (it may be loaded from the initrd).
  • Constructing an unionfs mount from a ramdisk and the cromfs mountpoint to form a writable root
Do not use cromfs in machines that are low on RAM!

11. Other applications of cromfs

The compression algorithm in cromfs can be used to determine how similar some files are to each others.

This is an example output of the following command:

$ unmkcromfs --simgraph fs.cromfs '*.qh' > result.xml
from a sample filesystem:
<?xml version="1.0" encoding="UTF-8"?>
  <inode id="5595"><file>45/qb5/ir/basewc.qh</file></inode>
  <inode id="5775"><file>45/qb5/ir/edit.qh</file></inode>
  <inode id="5990"><file>45/qb5/ir/help.qh</file></inode>
  <inode id="6220"><file>45/qb5/ir/oemwc.qh</file></inode>
  <inode id="6426"><file>45/qb5/ir/qbasic.qh</file></inode>
  <inode id="18833"><file>c6ers/newcmds/toolib/doc/contents.qh</file></inode>
  <inode id="19457"><file>c6ers/newcmds/toolib/doc/index.qh</file></inode>
  <match inode1="5595" inode2="5990"><bytes>396082</bytes><ratio>0.5565442944</ratio></match>
  <match inode1="5595" inode2="6220"><bytes>456491</bytes><ratio>0.6414264256</ratio></match>
  <match inode1="5990" inode2="6220"><bytes>480031</bytes><ratio>0.6732618693</ratio></match>
It reads a cromfs volume generated earlier, and outputs statistics of it. Such statistics can be useful in refining further compression, or just finding useful information regarding the redundancy of the data set.

It follows this DTD:

 <!ENTITY % int "CDATA">
 <!ELEMENT simgraph (volume, inodes, matches)>
 <!ELEMENT volume (total_size, num_inodes, num_files)>
 <!ELEMENT total_size (%INTEGER;)>
 <!ELEMENT num_inodes (%INTEGER;)>
 <!ELEMENT num_files (%INTEGER;)>
 <!ELEMENT inodes (inode*)>
 <!ELEMENT inode (file+)>
 <!ATTLIST inode id %int; #REQUIRED>
 <!ELEMENT file (#PCDATA)>
 <!ELEMENT matches (match*)>
 <!ELEMENT match (bytes, ratio)>
 <!ATTLIST match inode1 %int; #REQUIRED>
 <!ATTLIST match inode2 %int; #REQUIRED>
 <!ELEMENT bytes (%INTEGER;)>
 <!ELEMENT ratio (%REAL;)>
Once you have generated the file system, running the --simgraph query is relatively a cheap operation (but still O(n2) for the number of files); it involves analyzing the structures created by mkcromfs, and does not require any search on the actual file contents. However, it can only report as fine-grained similarity information as were the options in the generation of the filesystem (level of compression).

12. Copying and contributing

cromfs has been written by Joel Yliluoma, a.k.a. Bisqwit,
and is distributed under the terms of the General Public License version 3 (GPL3).
The LZMA code from the LZMA SDK is in public domain.
The LZO code from liblzo2.03 embedded within is licensed under GPL version 2 or later.

Patches and other related material can be submitted to the author by e-mail at:n2@fc/ZpeJoelu2j Ylitzav@31@leluome5e.a <>

The author also wishes to hear if you use cromfs, and for what you use it and what you think of it.

You can discuss CROMFS at Freenode, on #cromfs.

12.1. Contribution wishes

The author wishes for the following things to be done to this package.
  • Topic: Mature enough to be included in distributions.
    • Manual pages of each utility (hopefully somehow autogenerated so that they won't be useless when new options are added)
    • Improve the configure script to make it cope better with different Fuse API versions and different compiler versions
    • Install and uninstall rules in Makefile
  • Topic: Increasing useability
    • A proof of concept example of utilizing cromfs in a root filesystem (with initramfs)
    • Add appending support (theoretically doable, just not very fast)
    • Add threading in cromfs-driver. Needs write-locks in fblock_cache and readdir_cache. Possibly in BWT too. Also blktab and fblktab if those are being changed.
  • Topic: Documentation
    • Graphical illustration on the filesystem structure (fs consists of fblocks, and files are split in blocks which are actually indexes to various fblocks)
    • Document the modular structure of the source code
  • Topic: Portability
  • Topic: Increasing compression power
    • A fast and powerful approximation of the shortest common superstring algorithm is needed in mkcromfs.
      Input description: A set of strings S1, …, Sn.
      Problem description: What is the shortest string S' such that for each Si, 1≤i≤n, the string Si appears as a substring of S'?
      For example, for input ["digital","organ","tall","ant"], it would produce "organtdigitall" or "digitallorgant".
      Note: This problem seems to reduce into an Asymmetric Travelling Salesman Problem, which is NP-hard or NP-complete. The task here is to find a good approximation that doesn't consume a lot of resources.

13. Requirements

  • GNU make and gcc-c++ are required to recompile the source code.
  • The filesystem works under the Fuse user-space filesystem framework. You need to install both the Fuse kernel module and the userspace programs before mounting Cromfs volumes. You need Fuse version 2.5.2 or newer.
  • liblzo2-dev is recommended on i386 platforms. If it is missing, mkcromfs will use a version shipped in the package.

15. Downloading

Downloading help

  • Do not download everything - you only need one file (newest version for your platform)!
  • Do not use download accelerators or you will be banned from this server before your download is complete!

The most recent source code (bleeding edge) for cromfs can also be downloaded by cloning the Git repository by:

Date (Y-md-Hi) acc        Size Name                
2014-0108-0736 r--      608053 cromfs-
2014-0108-0736 r--      646550 cromfs-
2012-0411-1033 r--      607756 cromfs-
2012-0411-1033 r--      646215 cromfs-
2011-0729-1421 r--      607599 cromfs-1.5.10.tar.bz2
2011-0729-1421 r--      645952 cromfs-1.5.10.tar.gz
2010-1221-1617 r--      609519 cromfs-
2010-1221-1617 r--      648135 cromfs-
2009-1217-1044 r--      609458 cromfs-1.5.9.tar.bz2
2009-1217-1044 r--      647770 cromfs-1.5.9.tar.gz
2009-0731-1805 r--      608866 cromfs-
2009-0731-1805 r--      647501 cromfs-
2009-0721-1541 r--      608740 cromfs-
2009-0721-1541 r--      647473 cromfs-
2009-0427-1100 r--      609078 cromfs-
2009-0427-1100 r--      647300 cromfs-
2009-0326-0917 r--      608802 cromfs-
2009-0326-0917 r--      647196 cromfs-
2009-0323-1635 r--      607965 cromfs-
2009-0323-1635 r--      646899 cromfs-
2009-0323-1637 r--      601663 cromfs-
2009-0323-1636 r--      639679 cromfs-
2009-0314-1428 r--      602037 cromfs-
2009-0314-1428 r--      639570 cromfs-
2009-0314-1304 r--      602065 cromfs-
2009-0314-1304 r--      640296 cromfs-
2009-0314-1155 r--      602123 cromfs-
2009-0314-1155 r--      639685 cromfs-
2009-0313-1124 r--      601754 cromfs-1.5.8.tar.bz2
2009-0313-1124 r--      639275 cromfs-1.5.8.tar.gz
2009-0202-0056 r--      531325 cromfs-1.5.7.tar.bz2
2009-0202-0056 r--      558083 cromfs-1.5.7.tar.gz
2008-1216-2220 r--      499828 cromfs-
2008-1216-2219 r--      517925 cromfs-
2008-1216-1625 r--      499297 cromfs-
2008-1216-1625 r--      517344 cromfs-
2008-1216-1618 r--      499299 cromfs-1.5.6.tar.bz2
2008-1216-1618 r--      517328 cromfs-1.5.6.tar.gz
2008-1214-0110 r--      495140 cromfs-
2008-1214-0110 r--      513442 cromfs-
2008-1213-1536 r--      495104 cromfs-
2008-1213-1536 r--      513446 cromfs-
2008-1213-1119 r--      494870 cromfs-
2008-1213-1119 r--      513129 cromfs-
2008-1209-1454 r--      492220 cromfs-
2008-1209-1454 r--      510252 cromfs-
2008-1208-1559 r--      491494 cromfs-1.5.5.tar.bz2
2008-1208-1559 r--      509284 cromfs-1.5.5.tar.gz
2008-1208-0958 r--      501688 cromfs-
2008-1208-0958 r--      525163 cromfs-
2008-1124-0950 r--      497141 cromfs-
2007-1015-0112 r--      519072 cromfs-1.5.4.tar.gz
2007-1220-1637 r--       29776 patch-cromfs-
2007-1220-1637 r--       33459 patch-cromfs-
2007-1220-1637 r--       34511 patch-cromfs-1.5.3-1.5.4.bz2
2007-1220-1637 r--       38779 patch-cromfs-1.5.3-1.5.4.gz
2007-0904-1643 r--      485305 cromfs-
2007-0904-1643 r--      516952 cromfs-
2007-0904-1643 r--        8873 patch-cromfs-
2007-0904-1643 r--        8503 patch-cromfs-
2007-0901-0003 r--      484632 cromfs-
2007-0901-0003 r--      515833 cromfs-
2007-0901-0003 r--        4104 patch-cromfs-1.5.3-
2007-0901-0003 r--        3607 patch-cromfs-1.5.3-
2007-0829-2201 r--      483393 cromfs-1.5.3.tar.bz2
2007-0829-2201 r--      513776 cromfs-1.5.3.tar.gz
2007-0829-2201 r--       44430 patch-cromfs-1.5.2-1.5.3.bz2
2007-0829-2201 r--       50206 patch-cromfs-1.5.2-1.5.3.gz
2007-0817-1005 r--      486472 cromfs-1.5.2.tar.bz2
2007-0817-1005 r--      516889 cromfs-1.5.2.tar.gz
2007-0817-1005 r--        9146 patch-cromfs-1.5.1-1.5.2.bz2
2007-0817-1005 r--        9001 patch-cromfs-1.5.1-1.5.2.gz
2007-0813-1922 r--      485958 cromfs-1.5.1.tar.bz2
2007-0813-1922 r--      515559 cromfs-1.5.1.tar.gz
2007-0813-1922 r--       19038 patch-cromfs-1.5.0-1.5.1.bz2
2007-0813-1922 r--       20536 patch-cromfs-1.5.0-1.5.1.gz
2007-0729-2316 r--      482571 cromfs-1.5.0.tar.bz2
2007-0729-2316 r--      512623 cromfs-1.5.0.tar.gz
2007-0729-2316 r--       90809 patch-cromfs-1.4.1-1.5.0.bz2
2007-0729-2316 r--      127932 patch-cromfs-1.4.1-1.5.0.gz
2007-0715-0350 r--      458771 cromfs-1.4.1.tar.bz2
2007-0715-0350 r--      486564 cromfs-1.4.1.tar.gz
2007-0715-0350 r--      132253 patch-cromfs-1.4.0-1.4.1.bz2
2007-0715-0350 r--      197145 patch-cromfs-1.4.0-1.4.1.gz
2007-0714-0010 r--      446317 cromfs-1.4.0.tar.bz2
2007-0714-0010 r--      473850 cromfs-1.4.0.tar.gz
2007-0714-0010 r--      103390 patch-cromfs-1.3.0-1.4.0.bz2
2007-0714-0010 r--      137352 patch-cromfs-1.3.0-1.4.0.gz
2007-0630-2017 r--      417746 cromfs-1.3.0.tar.bz2
2007-0630-2017 r--      444519 cromfs-1.3.0.tar.gz
2007-0630-2017 r--       31191 patch-cromfs-1.2.5-1.3.0.bz2
2007-0630-2017 r--       34755 patch-cromfs-1.2.5-1.3.0.gz
2007-0214-1532 r--      407660 cromfs-1.2.5.tar.bz2
2007-0214-1532 r--      432049 cromfs-1.2.5.tar.gz
2007-0214-1532 r--      309728 patch-cromfs-
2007-0214-1532 r--      306968 patch-cromfs-
2007-0214-1532 r--      324788 patch-cromfs-1.2.4-1.2.5.bz2
2007-0214-1532 r--      331183 patch-cromfs-1.2.4-1.2.5.gz
2007-0212-1600 r--      138210 cromfs-
2007-0212-1600 r--      170720 cromfs-
2007-0212-1600 r--        8886 patch-cromfs-
2007-0212-1600 r--        7644 patch-cromfs-
2007-0127-0305 r--      135354 cromfs-
2007-0127-0305 r--        7006 patch-cromfs-
2007-0126-1220 r--      134486 cromfs-
2007-0126-1220 r--        3025 patch-cromfs-
2007-0111-1006 r--      133931 cromfs-
2007-0111-1006 r--        3396 patch-cromfs-
2006-1205-1435 r--      133860 cromfs-
2006-1205-1435 r--        5723 patch-cromfs-1.2.4-
2006-1204-1845 r--      132924 cromfs-1.2.4.tar.bz2
2006-1204-1845 r--       11771 patch-cromfs-1.2.3-1.2.4.bz2
2006-0831-2028 r--      127894 cromfs-1.2.3.tar.bz2
2006-0831-2028 r--        4871 patch-cromfs-1.2.2-1.2.3.bz2
2006-0823-1444 r--      127087 cromfs-1.2.2.tar.bz2
2006-0823-1444 r--        3252 patch-cromfs-1.2.1-1.2.2.bz2
2006-0809-1448 r--      126659 cromfs-1.2.1.tar.bz2
2006-0809-1448 r--        9672 patch-cromfs-1.2.0-1.2.1.bz2
2006-0612-1056 r--      125628 cromfs-1.2.0.tar.bz2
2006-0612-1056 r--       22629 patch-cromfs-1.1.7-1.2.0.bz2
2006-0605-1412 r--      120064 cromfs-1.1.7.tar.bz2
2006-0605-1412 r--       12686 patch-cromfs-
2006-0605-1412 r--       17457 patch-cromfs-1.1.6-1.1.7.bz2
2006-0604-2358 r--      118414 cromfs-
2006-0604-2358 r--       10020 patch-cromfs-1.1.6-
2006-0604-0522 r--      116451 cromfs-1.1.6.tar.bz2
2006-0604-0522 r--       18575 patch-cromfs-1.1.5-1.1.6.bz2
2006-0603-1838 r--      114056 cromfs-1.1.5.tar.bz2
2006-0603-1838 r--        7808 patch-cromfs-
2006-0603-1838 r--       38568 patch-cromfs-1.1.4-1.1.5.bz2
2006-0601-0018 r--      113242 cromfs-
2006-0601-0018 r--       27213 patch-cromfs-
2006-0522-1446 r--       88166 cromfs-
2006-0522-1446 r--        8090 patch-cromfs-
2006-0522-0240 r--       87138 cromfs-
2006-0522-0240 r--        5613 patch-cromfs-1.1.4-
2006-0521-1652 r--       86597 cromfs-1.1.4.tar.bz2
2006-0521-1652 r--       16222 patch-cromfs-
2006-0521-1652 r--       17317 patch-cromfs-1.1.3-1.1.4.bz2
2006-0518-1341 r--       82859 cromfs-
2006-0518-1341 r--       10110 patch-cromfs-1.1.3-
2006-0517-1724 r--       81120 cromfs-1.1.3.tar.bz2
2006-0517-1724 r--       12567 patch-cromfs-
2006-0517-1724 r--       22044 patch-cromfs-1.1.2-1.1.3.bz2
2006-0517-0025 r--       81029 cromfs-
2006-0517-0025 r--        8218 patch-cromfs-
2006-0515-1626 r--       78767 cromfs-
2006-0515-1626 r--        8072 patch-cromfs-
2006-0515-1133 r--       78219 cromfs-
2006-0515-1133 r--        6621 patch-cromfs-1.1.2-
2006-0515-0354 r--       77801 cromfs-1.1.2.tar.bz2
2006-0515-0354 r--       10590 patch-cromfs-1.1.1-1.1.2.bz2
2006-0515-0058 r--       76576 cromfs-1.1.1.tar.bz2
2006-0515-0058 r--        8703 patch-cromfs-
2006-0515-0058 r--       11511 patch-cromfs-1.1.0-1.1.1.bz2
2006-0514-1854 r--       75585 cromfs-
2006-0514-1854 r--        4727 patch-cromfs-
2006-0514-1742 r--       74978 cromfs-
2006-0514-1742 r--        3306 patch-cromfs-1.1.0-
2006-0514-0552 r--       74930 cromfs-1.1.0.tar.bz2
2006-0514-0552 r--       16084 patch-cromfs-1.0.6-1.1.0.bz2
2006-0512-1507 r--       71269 cromfs-1.0.6.tar.bz2
2006-0512-1507 r--        6088 patch-cromfs-
2006-0512-1507 r--        6292 patch-cromfs-1.0.5-1.0.6.bz2
2006-0512-1250 r--       70040 cromfs-
2006-0512-1250 r--        2160 patch-cromfs-1.0.5-
2006-0512-0034 r--       69996 cromfs-1.0.5.tar.bz2
2006-0512-0034 r--        6525 patch-cromfs-1.0.4-1.0.5.bz2
2006-0511-2305 r--       69545 cromfs-1.0.4.tar.bz2
2006-0511-2305 r--        6477 patch-cromfs-1.0.3-1.0.4.bz2
2006-0511-2214 r--       68724 cromfs-1.0.3.tar.bz2
2006-0511-2214 r--        5640 patch-cromfs-1.0.2-1.0.3.bz2
2006-0511-1548 r--       68156 cromfs-1.0.2.tar.bz2
2006-0511-1548 r--       15307 patch-cromfs-1.0.1-1.0.2.bz2
2006-0511-0135 r--       64646 cromfs-1.0.1.tar.bz2
2006-0511-0135 r--        9948 patch-cromfs-1.0.0-1.0.1.bz2
2006-0510-2341 r--       67115 cromfs-1.0.0.tar.bz2
Back to the source directory index at Bisqwit's homepage

Last updated: Wed, 08 Jan 2014 07:38:23 +0200