Posted on July 8, 2011

HArchive is a prototype archiving tool that is geared towards storing databases of small text units. Solid compression may not be an option, because it does not allow random access. Compressing the individual texts results in inefficient compression. HArchive generates a “prefix” file that puts the compressor/decompressor in a more useful state, so individual compression of small files becomes effective.

I haven’t made the source code publicly available. You can mail me if you are interested though.

Slides and a paper are available.