2012-09-28

Testing bunzip2 files quickly using hexdump

My work sometimes involves dealing with large amounts of data stored in .tar.bz2 files. Recently, we found that some of the files were corrupted. The files are large enough that a successful "bunzip2 -t" takes prohibitively long. As a shortcut, we can just check for the bzip magic numbers at the beginning of the file. The bzip2 format requires that each file starts with the string "BZh". Our files are stored in batches by calendar date (MMDD). I generate a log of the files' magic numbers like so:

(for d in ????;
do
  for f in $d/*bz2;
  do
    echo $f `head -1 $f | hexdump -c -n3 | grep '0000000' `;
  done;
done) > magic_numbers.txt

And then I can find the bad files like so:

grep -v 'B Z h' magic_numbers.txt

hexdump is a nice tool for manipulating binary files on the command line. This method of testing the files runs relatively fast.

2012-09-04

Enabling hibernate on Ubuntu 12.04 LTS

I recently decided I would prefer to have the option of using hibernate on my laptop running Ubuntu 12.04 LTS. Not only does hibernate need to be enabled manually on any system running that OS; I also had insufficient swap space, so I had to resize my partitions.

Now hibernate would work, but resume would not! It turns out the reason is that I elected to encrypt my $HOME folder during installation. This requires encrypted swap; when you reboot, the swap is encrypted with a fresh, random key each time.

By insisting on using the same key each time, which requires you to enter a password during boot, you enable the boot system to detect and reload the hibernate image. If you forget the password, you boot with no swap partition.

Detailed instructions are here. It worked for me!

2012-08-22

Formatting terminal output in Python

From Nadia Alramli's Blog: Terminal Controller for Python. This page provides a simple module for prenting formatted text to the terminal using low-level commands in the curses module.