Command line tools for binary files

The tools of Unix command line are usually meant for processing line oriented text. Some of those tools can however also handle byte oriented content. In this post I go through some simple use cases. For more complicated uses, hex editors or self-made tools can be more appropriate.

Let’s first create two example files:
echo -n 1234567890 > file.bin echo -e "12345678901234567890\n12345678901234567890" > file2.bin

The default format of od is octal, which is not really used anywhere anymore except maybe in unix file rights. To get the more common hexdump use
od -A x -t x1z -v file.bin
Since this is a bit hard to remember, one can use hexdump instead:
hexdump -C file.bin
Even easier is the hd alias for this hexdump call
hd file.bin

The following commands convert binary bytes to text form. To create a list of hexpairs use:
hexdump -v -e '/1 "%02x "' file.bin ; echo
To create a list of decimal bytes use:
hexdump -v -e '/1 "%i "' file.bin ; echo

The following commands convert text to binary bytes.
echo 414243 | xxd -p -r
In the above spaces can also be used to separate the hexpairs.

To get first 5 bytes of a file, use
head -c 5 file.bin
and correspondingly
tail -c 5 file.bin
gives the last 5 bytes of a file. To get all but the last 5 bytes, use
head -c -5 file.bin
To get all bytes starting from the 5th byte, use
tail -c +5 file.bin
An example use case might be removing a byte order mark from the beginning of a file with
tail -c +2
The last two cases show that head and tail don’t work in completely symmetric way. I don’t know why it was done like this. For a range of bytes, use dd. The modern versions of it can support efficient byte level access instead of block level access. For example, to get byte range [3,6], use:
dd iflag=count_bytes,skip_bytes count=4 skip=2 < file.bin 2> /dev/null

The tr command allows filtering and conversion of bytes. To replace character 2 with byte \x20 (space), use:
tr 2 $'\x20' < file.bin
The tr program only allows specifying a byte literally or with an octal code (e.g. \123), hence the bash feature of specifying bytes in hex is used above.

iconv can be used to convert text from one format to another. For example, one can convert utf8 text to four-byte little-endian Unicode with:
echo αβγ | iconv -f utf8 -t UCS-4LE | hd

The grep program can be used to search for bytes.
The following command will search for byte pairs that
will begin with either byte \x31 or \x32
grep -bUPo "[\x31\x32]." file2.bin
Unfortunately, as usual, grep does not allow overlapping matches,
so the above is not completely satisfactory.
The options used above have the following meanings:
option -b causes the offset of matches to be printed (0-based offsets),
option -U switches to binary mode, option -P switches to Perl-style
regular expressions, allowing the syntax to specify bytes
in hex notation, and finally option -o causes only the match to be printed,
not the whole line.

Random bytes can be read from the device /dev/urandom:
head -c 5 /dev/urandom | hd
One can also create random bytes with openssl. The following will produce 10 bytes
openssl rand 10 | hd
and the following will output them as hex pairs:
openssl rand -hex 10
And the following will output them in base64 encoding (using only printable characters, allowing to code 6 bits of information per byte)
openssl rand -base64 10
The output of base64 is padded by = characters so that the length of the output is divisible by four.

The following converts the file to C code array and its size:
xxd -i file.bin

Finally I mention some other useful binary tools, also non-command-line tools. Okteta is quite a good GUI based hex editor with many ways to view data and do copy pasting and other manipulation. Radare2 has many tools for handling (executable) binary files, such as disassembler, executable structure viewer, hash tools, binary diffs, etc. The tools of Radare2 are scriptable but can also be used as (text-based) GUIs. t2b is a mini language for generating binary files. It is quite nice, but unfortunately does not allow conversion between different endian formats.

Share this:

Related

Leave a comment Cancel reply