These days, however, my first few interview questions are usually about Automation, and conceptualization of a Network as a data model. After all, Routing protocols are just software implementation of some algorithm on steroids. If you are a Network Automator, then you really need to embrace programming, and learn a lot more than just a few quick scripts in python.
So what is Endian-ness?
Network Automation , more often than not, involves movement of data between computers and the network. Whether you’re creating a simple ICMP code8 packet and sending it out the NIC, or dealing with Netconf-XML rpc that needs to perform some operation on a router, data has to leave the machine, traverse the network and finally hit a remote entity.
Unfortunately, a sequence of bytes isn’t usually ordered the same way on the network, as it is when stored in a computer. This ordering is called Endian-ness. If you are setting yourself up to get neck deep into the coding aspects of automation, Endian-ness is a very important part of the learning – coding – debugging cycle.
Big-Endians and Little-Endians
Not really… These aren’t our favorite Indians. These are ‘End-ian’, used to clarify which end of a byte comes first.
Let me elaborate..
Everything in a computer is stored in binary (1s and 0s). When you type ‘XYZ’ in a file, you don’t really think those characters are stored as XYZ on a disk, do you?
So these characters are mapped to some binary equivalent (see ASCII, Unicode + UTF-8 etc). And when these codes are read back ‘in the context-of-text‘, they translate back to ‘XYZ’. The reason I emphasize ‘context-of-text’ is because similar codes exist for other formats too. JPG uses its own encoding. Audio files have their own. So when JPG and Audio files read back such codes, they interpret them as pixel-color variations, and musical-notes respectively.
Eventually, whats important is that all these codes are stored as binary in memory.
Seeing the Raw data
You can use the hexdump command that comes with Linux to read the contents of a file. Lets look at an example. Create a file with the string ‘XYZ’ in it.
If you cat this file, sure enough, you see 'XYZ'. But lets dump the file as it is stored using the hexdump utility that comes with linux.
<code lang="python" escaped="true" class="lang:default decode:true wrap:true">ajaysdesk@dev1:~$ cat test1
ajaysdesk@dev1:~$ hexdump -C test1
00000000 58 59 5a 0a
The ‘-C’ option reads byte by byte and displays in Hex. So 0x58 corresponds to X, 0x59 to Y and so on.
So where does Endian-ness factor in?
In the previous example we were reading the file byte-by-byte. A single byte, is a basic building block, and if we store and read data only one-byte at a time, then it will look the same, whether on a computer, or on the network.
By passing in the -C option to hexdump, we forced it to read byte-by-byte. Lets now skip that option, and hexdump will default to reading 2-byte sequences at a time. This is where Endian-ness can be seen.
ajaysdesk@dev1:~$ hexdump test1
0000000 5958 0a5a
Notice what happened? ’58’ and ’59’ got flipped around and became ‘5958’ and so on with the next 2-byte sequence.
In a 2-byte string, as seen above, the computer is storing the LSB(least significant/rightmost byte) first, and going back from there. This is how modern x86 computers store information. This is what we call Little-Endian because the littlest end comes first.
Older mainframes used to store data in the opposite order, with the MSB(leftmost byte) coming first. Coincidentally, this is also how Humans read in most countries. This type of ordering is called Big-Endian.
And here’s the fun part – Network ordering of data is always Big-Endian.
So next time you’re debugging an RPC, and ‘UN IX’ on the computer, appears at ‘NU XI’ on the network capture (of course, after you decode the hex), then you’ve just encountered an Endian!! 🙂
Btw, I didnt choose UN-IX by sheer coincidence. See the great ‘NUXI’ problem!
Now, lets try to translate this hex back to text using the python interpreter.
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>> print '\x59\x58\x0a\x5a'
So XY becomes YX. and ‘Z\n’ becomes ‘\nZ’
Well, as expected. Cuz we now know about the Indians!
But this is not going to really happen to you while reading data from files. Cuz if you store them in sets of 2-bytes, you’ll read them as such. And therefore what is flipped around for storage, gets flipped back on retrieval within the confines of the same computer.