Network Automation and Endian

Posted On Feb-01
If you are in the business of Architecting and Automating Networks, then some level of proficiency in programming is mandatory. Sixteen years ago, when I started in the field of Networking, this was barely considered to be something necessary. Interviews for Network Architects zoned in on protocols, network types, design methodologies, and safely sidestepped automation and scripting.

These days, however, my first few interview questions are usually about Automation, and conceptualization of a Network as a data model. After all, Routing protocols are just software implementation of some algorithm on steroids. If you are a Network Automator, then you really need to embrace programming, and learn a lot more than just a few quick scripts in python.

So what is Endian-ness?

Network Automation , more often than not, involves movement of data between computers and the network. Whether you’re creating a simple ICMP code8 packet and sending it out the NIC, or dealing with Netconf-XML rpc that needs to perform some operation on a router, data has to leave the machine, traverse the network and finally hit a remote entity.

Unfortunately, a sequence of bytes isn’t usually ordered the same way on the network, as it is when stored in a computer. This ordering is called Endian-ness. If you are setting yourself up to get neck deep into the coding aspects of automation, Endian-ness is a very important part of the learning – coding – debugging  cycle.

Big-Endian and Little-Endian

Not really… These aren’t our favorite Indians. These are ‘End-ian’, used to clarify which end of a byte comes first.
Let me elaborate..

Everything in a computer is stored in binary (1s and 0s). When you type ‘XYZ’ in a file, you don’t really think those characters are stored as XYZ on a disk, do you?

So these characters are mapped to some binary equivalent (see ASCII, Unicode + UTF-8 etc). And when these codes are read back ‘in the context-of-text‘, they translate back to ‘XYZ’. The reason I emphasize ‘context-of-text’ is because similar codes exist for other formats too. JPG uses its own encoding. Audio files have their own. So when JPG and Audio files read back such codes, they interpret them as pixel-color variations, and musical-notes respectively.

Eventually, whats important is that all these codes are stored as binary in memory.

Seeing the Raw data

You can use the hexdump command that comes with Linux to read the contents of a file. Lets look at an example. Create a file with the string ‘XYZ’ in it.

echo 'XYZ' > file1

If you cat this file, sure enough, you see ‘XYZ’. But lets dump the file as it is stored using the hexdump utility that comes with linux.

ajaysdesk@dev1:~$ cat test1
ajaysdesk@dev1:~$ hexdump -C test1
00000000  58 59 5a 0a

The ‘-C’ option reads byte by byte and displays in Hex. So 0x58 corresponds to X, 0x59 to Y and so on.

So where does Endian-ness factor in?

In the previous example we were reading the file byte-by-byte. A single byte, is a basic building block, and if we store and read data only one-byte at a time, then it will look the same, whether on a computer, or on the network.

By passing in the -C option to hexdump, we forced it to read byte-by-byte. Lets now skip that option, and hexdump will default to reading 2-byte sequences at a time. This is where Endian-ness can be seen.

ajaysdesk@dev1:~$ cat test1
ajaysdesk@dev1:~$ hexdump test1
0000000 5958 0a5a

Notice what happened? ’58’ and ’59’ got flipped around and became ‘5958’ and so on with the next 2-byte sequence.

In a 2-byte string, as seen above, the computer is storing the LSB(least significant/rightmost byte) first, and going back from there. This is how modern x86 computers store information. This is what we call Little-Endian because the littlest end comes first.

Older mainframes used to store data in the opposite order, with the MSB(leftmost byte) coming first. Coincidentally, this is also how Humans read in most countries. This type of ordering is called Big-Endian.

And here’s the fun part – Network ordering of data is always Big-Endian.

So next time you’re debugging an RPC, and ‘UN IX’ on the computer, appears at ‘NU XI’ on the network capture (of course, after you decode the hex), then you’ve just encountered an Endian!! 🙂
Btw, I didnt choose UN-IX by sheer coincidence. See the great ‘NUXI’ problem!

Fun fact

Now, lets try to translate this hex back to text using the python interpreter.

Python 2.7.10 (default, Jan 21 2016, 22:36:23)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.

  >> print '\x59\x58\x0a\x5a'

So XY becomes YX. and ‘Z\n’ becomes ‘\nZ’

But this is not going to really happen to you while reading data from files. Cuz if you store them in sets of 2-bytes, you’ll read them as such. And therefore what is flipped around for storage, gets flipped back on retrieval within the confines of the same computer.