Perl - pack & unpack

pack TEMPLATE,LIST

Converts values to a byte sequence containing representations according to a given specification, the “template” argument.

unpack TEMPLATE,EXPR

Derives some values from the contents of a string of bytes. The string is broken into chunks described by the TEMPLATE. Each chunk is converted separately to a value.

TEMPLATE

Packing hexadecimal strings

# packing byte contents from a list of ten 2-digit hexadecimal strings
# the pack template contains ten pack codes
print pack( 'H2' x 10, 30..39 ); #prints 0123456789 on a computer with ASCII character coding

# unpacking byte contents to a list of ten 2-digit hexadecimal strings
print join ' ', unpack( 'H2' x 10, '0123456789' ); #prints 30 31 32 33 34 35 36 37 38 39

Packing text (fixed-width data)

        1         2         3         4         5
1234567890123456789012345678901234567890123456789012345678

Date      |Description                | Income|Expenditure
01/24/2001 Zed's Camel Emporium                    1147.99
01/28/2001 Flea spray                                24.99
01/29/2001 Camel rides to tourists      235.00
my ($date, $desc, $income, $expend) = unpack("A10xA27xA7A*", $_);

If the unpack template doesn’t match the incoming data, Perl will scream and die.

print pack("A10xA27xA7xA*", $today, "Totals", $tot_income, $tot_expend);
01/24/2001 Zed's Camel Emporium                    1147.99
01/28/2001 Flea spray                                24.99
01/29/2001 Camel rides to tourists      235.00
03/23/2001Totals                     1235.001172.98

What we actually need to do is expand the width of the fields.

#put spaces in the template to make it more readable, they don't output
print pack("A11 A28 A8 A*", $today, "Totals", $tot_income, $tot_expend);
01/24/2001 Zed's Camel Emporium                    1147.99
01/28/2001 Flea spray                                24.99
01/29/2001 Camel rides to tourists      235.00
03/23/2001 Totals                      1235.00 1172.98

The last column which needs to be moved further over. Format with sprintf.

$tot_income = sprintf("%.2f", $tot_income);
$tot_expend = sprintf("%12.2f", $tot_expend);
01/24/2001 Zed's Camel Emporium                    1147.99
01/28/2001 Flea spray                                24.99
01/29/2001 Camel rides to tourists      235.00
03/23/2001 Totals                      1235.00     1172.98

Packing integers

Packing and unpacking numbers implies conversion to and from some specific binary representation:

my $ps = pack( 's', 20302 ); # pack 20302 to a signed 16 bit integer in your computer's representation

The result is a string, now containing 2 bytes. If you print this string you might see ON or NO (depending on your system’s byte ordering) - or something entirely different if your computer doesn’t use ASCII character encoding.

my ( $s ) = unpack( 's', $ps ); # returns the original integer value

ATTENTION: if the packed value exceeds the allotted byte capacity, high order bits are silently discarded, and unpack certainly won’t be able to pull them back. When you pack using a signed template code such as s, an excess value may result in the sign bit getting set, and unpacking this will smartly return a negative value.

Packing integers for “networking”

The pack code for big-endian (high order byte at the lowest address) is n for 16 bit and N for 32 bit integers

You should also use these pack codes if you exchange binary data, across the network, with some system that you know next to nothing about. The simple reason is that this order has been chosen as the network order, and all standard-fearing programs ought to follow this convention.

# send a message by sending the length first, followed by just so many bytes:
my $buf = pack( 'N', length( $msg ) ) . $msg;
#or...
my $buf = pack( 'NA*', length( $msg ), $msg );

Some protocols demand that the count should include the length of the count itself: then just add 4 to the data length. (But make sure to read Lengths and Widths before you really code this!)

Byte-order modifiers

# unpack a sequence of signed big-endian 16-bit integers in a platform-independent way
my @data = unpack 's*', pack 'S*', unpack 'n*', $buf;

unpack an unsigned short (16-bit) in “network” (big-endian) order, then pack it as an unsigned short value to finally unpack it as a signed short (16-bit) value.

As of Perl 5.9.2, there’s a much nicer way to express your desire for a certain byte-order:

my @data = unpack 's>*', $buf;
# the "big end" of the arrow touches the s, which is a nice way to remember that > is the big-endian modifier.

Unicode

The UTF-8 encoding stores the most common (from a western point of view) characters in a single byte while encoding the rarer ones in three or more bytes. Perl uses UTF-8, internally, for most Unicode strings.

# Equivalent to: $UTF8{Euro} = "\x{20ac}"; #Unicode codepoint number
$UTF8{Euro} = pack( 'U', 0x20AC );
# $UTF8{Euro} contains 3 bytes: "\xe2\x82\xac"
# However, it contains only 1 character, number 0x20AC.
$Unicode{Euro} = unpack( 'U', $UTF8{Euro} );

Usually you’ll want to pack or unpack UTF-8 strings:

# pack and unpack the Hebrew alphabet
my $alefbet = pack( 'U*', 0x05d0..0x05ea );
my @hebrew = unpack( 'U*', $utf );

These functions provide means of handling invalid byte sequences and generally have a friendlier interface:

Template Grouping

# return a string consisting of the first character from each string
join( '', map( substr( $_, 0, 1 ), @str ) )
pack( '(A)'.@str, @str )

# a repeat count * means "repeat as often as required"
pack( '(A)*', @str )

OBS: Note that the template A* would only have packed $str[0] in full length.

String Lengths

Packing a length followed by so many bytes of data is a frequently used recipe since appending a null byte won’t work if a null byte may be part of the data

# pack a short message: ASCIIZ, ASCIIZ, length, string
my $msg = pack( 'Z*Z*CA*', $src, $dst, length( $sm ), $sm );
( $src, $dst, $len, $sm ) = unpack( 'Z*Z*CA*', $msg );

Adding another field after the Short Message (in variable $sm) is all right when packing, but this cannot be unpacked naively. To solve this:

# pack a short message: ASCIIZ, ASCIIZ, length/string, byte
my $msg = pack( 'Z* Z* C/A* C', $src, $dst, $sm, $prio );
( $src, $dst, $sm, $prio ) = unpack( 'Z* Z* C/A* C', $msg );

Combining two pack codes with a slash (/) associates them with a single value from the argument list.

The pack code preceding / may be anything that’s fit to represent a number: All the numeric binary pack codes, and even text codes such as A4 or Z*

# pack/unpack a string preceded by its length in ASCII
my $buf = pack( 'A4/A*', "Humpty-Dumpty" );
# unpack $buf: '13  Humpty-Dumpty'
my $txt = unpack( 'A4/A*', $buf );

Dynamic Templates

If the list of pack items doesn’t have fixed length, an expression constructing the template is required (whenever, for some reason, ()* cannot be used)

To store named string values in a way that can be conveniently parsed by a C program, we create a sequence of names and null terminated ASCII strings, with = between the name and the value, followed by an additional delimiting null byte.

my $env = pack( '(A*A*Z*)' . keys( %Env ) . 'C',
          map( { ( $_, '=', $Env{$_} ) } keys( %Env ) ), 0 );

For the reverse operation, we’ll have to determine the number of items in the buffer before we can let unpack rip it apart:

my $n = $env =~ tr/\0// - 1;
my %env = map( split( /=/, $_ ), unpack( "(Z*)$n", $env ) );

The tr counts the null bytes. The unpackcall returns a list of name-value pairs each of which is taken apart in the map block.

Counting Repetitions

Precede the data with a count. Again, we pack keys and values of a hash, preceding each with an unsigned short length count, and up front we store the number of pairs:

my $env = pack( 'S(S/A* S/A*)*', scalar keys( %Env ), %Env );
my %env = unpack( 'S/(S/A* S/A*)', $env );
# you cannot use the same template for pack and unpack because pack can't determine a repeat count for a ()-group.

Pack Recipes

# Convert IP address for socket functions
pack( "C4", split /\./, "123.4.5.6" );

# Count the number of set bits in a bit vector
unpack( '%32b*', $mask );

# Determine the endianness of your system
$is_little_endian = unpack( 'c', pack( 's', 1 ) );
$is_big_endian = unpack( 'xc', pack( 's', 1 ) );

# Determine the number of bits in a native integer
$bits = unpack( '%32I!', ~0 );

# Prepare argument for the nanosleep system call
my $timespec = pack( 'L!L!', $secs, $nanosecs );

Hexadecimal to/from bytes

# H: A hex string (high nybble first).
my $buf = "\x12\x34\x56\x78";
print unpack('H*', $buf); # prints 12345678

# NOT RECOMMENDED!!!
# h: A hex string (low nybble first).
print unpack('h*', $buf); # prints 21436587

Hexadecimal pack/unpack + RC4

# cifrado RC4 del texto plano con una palabra clave
my $encrypted = RC4($passphrase, $plaintext);

# el texto cifrado se considera como un string de bytes y lee como caracteres hexadecimales
my $encrypted_hex = unpack('H*', $encrypted);

# el dato cifrado y pasado a hexa, se vuelve a convertir en un string de bytes
my $encrypted_from_hex = pack('H*', $encrypted_hex);

#el string de bytes es en realidad un cifrado RC4, a partir del cual se obtiene el texto plano
my $decrypted = RC4($passphrase, $encrypted_from_hex);

crypt-rc4.pl

Divide content in binary data blocks of the same size

my $size = 4096;#tamaño de un bloque
my $blocks = (length($data)/$n) -1; #cantidad de bloques con 4096 bytes

# el string de contenido lo divide en bloques de datos binarios
my @groups = unpack "a$size" x $blocks . "a*", $data;
print $_ for ( @groups );

IPv4 to/from decimal

Dirección IP v4 (4 octetos separados por .) <-> decimal

Una dirección ip v4 está formada por 32 bits. Por lo tanto el entero a generar debe ser de 32 bits porque de lo contrario se perdería información!

unpack N => pack CCCC => split /\./ => shift; #ip2dec

join '.', unpack CCCC, pack N, shift; #dec2ip

ip2dec separado en partes documentadas

my $ip = shift;
my @octets = split /\./, $ip;

# pack converts values to a byte sequence
my $bytes = pack 'CCCC', @octets; # C  An unsigned char (octet) value.

# unpack derives value from the contents of a string of bytes
my ( $dec ) = unpack 'N', $bytes; # N  An unsigned long (32-bit) in "network" (big-endian) order.

ip2dec - variantes de one-liner

# indicar la cantidad de caracteres en el template
unpack N => pack C4 => split /\./ => shift;

# => es el operador fat coma, el cual se puede reemplazar por la coma ,
unpack N , pack C4 , split /\./ , shift;

# agregar paréntesis para claridad
unpack(N , pack (C4 , split(/\./ , shift)));

# el template es un string, agregar comillas para claridad
unpack('N' , pack ('C4' , split(/\./ , shift)));

# Se puede reemplazar la cantidad de caracteres del template  por * para indicar hasta el final
# OBS: Si el template no está entre comillas * genera un error de compilación!
unpack('N' , pack ('C*' , split(/\./ , shift)));

ip2dec - línea de comandos

perl -e "print unpack N => pack CCCC => split /\./ => shift;" 192.168.0.1 #prints 3232235521

perl -e "print join '.', unpack CCCC, pack N, shift;" 3232235521 #prints 192.168.0.1

Image obtained from Converting IP Addresses To And From Integer Values With ColdFusion

ip2dec2ip.pl

Linked Sources