The Era of Network-Modeling

Posted On Mar-12

It’s 2016. In contrast to the Networks of yester-years, Enterprise Networks can no longer be summarized as just a couple of rows of cabinets in a datacenter.

Today, when we say large-scale enterprises, we’re referring to a few hundred rows , multiple datacenters, several clouds, routers providing connectivity between different vrfs, and wan conections to a few different nodesites. At these scales, the need to represent the network in its entirety as a data-model, quickly becomes obvious.

State vs. Intent

At its simplest form, models are nothing more than a unified representation of the state of all protocols and neighborships in the Network. The idea is to keep this ‘network-state’ always updated.

Then you’d subscribe to events that are triggered when the above state-model updates. To each event, you’d react by verifying that the latest network state aligns with the expected-state. So this necessiates the maintenance of a separate ‘expected-state’ topology. Vendors , in 2016, have decided to name the expected state, the ‘Intent’, and the active-state the ‘State’.

The ‘intent’ and the ‘state’ together make up the core of Network-Modeling.

Monitor the State, Provision the Intent

In a stable Network, the state and intent are converged. From the perspective of Monitoring, the State is always the source-of-truth. Anything that should be monitored pertaining to the Network, is made part of the State-collection – be it OSPF adjacencies, BGP Summary, LLDP neighbors, Muticast State, just about anything.

Monitoring scripts then start working on this state, and diff-ing it against the Intent. Any divergence between the two indicates instability in the Network.

Similarly, Provisioning tools make changes only to the Intent. They don’t directly touch Network devices. The modified Intent then propagates out to the Network – and while doing so, automatically picks the best transport mechanism for propagation – falls back from most favorable to least favorable transport ( Think of it as cascading down from Netconf XML RPCs, thru YANG/Tail-F models, down to plain old SSH-CLI commands.) The idea is that all of that dirty transport-work is abstracted from the engineer, who only deals with writing tools to interact with the Intent.

What does abstraction provide?

Until recently, Provisioning tools interacted directly with the Network. But there are several gaps with that approach. For starters, the lack of any abstraction between provisioning and the actual propagation of candidate config out to the network, makes it nearly impossible to predict what the impact of config change might have on the Network topology as a whole. Maintaining a separate ‘Intent’ lets you run algorithms to try and predict this.

Also, any modification made to the intent, would be reflected in the following state-collection. So this forms the basis of a feedback-system, to check if changes have propagated out to the network successfully, with the intended results. All of these are critical in automating functions involved in managing an Enterprise Network.

The future of Network Models

Provisioning used to be through the CLI. And Monitoring was synonymous with SNMP. Welcome to 2016 – CLI is plain painful. SNMP’s gotta go. And we’re starting to run agents directly on Network equipment to facilitate both Provisioning and Monitoring, in smart and efficient ways.

Cisco has made onePK available publicly and NXAPI improves accessibility and provisioning of Nexus devices . Arista’s EOS comes with excellent EEM-functionality, along with eAPI that can do wonderful things. With the newest EOS release, they even stream such events to remote workstations, which can then react to sysdb changes. Juniper lets you run agents on Junos as well, and subscribe to events.

Think about it for a second – You dont have to ‘poll’ for MAC table changes, or LANZ messages or counters anymore. You just subscribe to these events, and only react when you recieve an event. Provisioning takes place through RPCs with well defined YANG models. Perhaps OpenConfig will gain more steam. Eventually, Networks will provision themselves, Monitor themselves, and hopefully heal themselves (atleast in part). This is the way we all envisioned Network Automation to be!

ajay divakaran

Thinking beyond Network Automation

Posted On Mar-03
Network architects often seem to be at a loss for words when asked to describe Network Automation. This is understandable though. Automation is a broad subject. From provisioning, to telemetry and monitoring, there are several faces to automation.

What is Network Automation?

For many small to medium enterprises, automation is nothing more than scripting a few repetitive tasks. A bunch of YAML files, a cookbook or two, and the deed is done. Sure, this covers parts of provisioning, but is that all there is to automation?

Bigger enterprises, typically automate a lot more – There is usually a baseline automation piece which gets called when any piece of physical hardware goes into the network. The job of this piece, is to ensure that the hardware is initialized in a way that the rest of the automation systems can communicate with it at a later time.

Then there is a provisioning automation piece, which takes over from that point onwards. Anytime a device requires modification – whether you need to reconfigure a switch port, turn up a protocol, or slap a snippet of BGP configuration on a TOR switch, the provisioning-piece takes care of that.

Need for unified automation frameworks

Larger enterprises witness network automation in several areas. However, there is sometimes a lack of forward-thought, and more importantly, lack of a well structured framework which enables code-reuse, thereby negating out several benefits of said automation.

Here is a very common scenario in an enterprise network. A new Architect walks in, finds a problem area that lacks automation, and writes a tool to address that. Another Architect, comes by the following week, identifies another area of need, and builds a second tool. A year later, you have an array of tools that do different tasks in the network. The network is chugging along fine…

And then, the vendor identifies a security vulnerability, a code-upgrade is required. But the new code alters a few things – say it changes the way the device responds to a particular request. This breaks all of your tools that rely on that particular response.

Not only did we just break all our automation tools, but we’ve also got to invest developer time and resources (read COST) in fixing all of those individual tools. How can we solve this?

This is where the ‘forethought’ that I mentioned earlier comes into play. If the suite of automation tools had used common building blocks, and shared the framework/code they used, then it would’ve been just one network-facing module to fix. All the remaining tools would’ve leveraged these network-facing modules as building blocks, so fixing the it automatically addresses all the remaining tools. Thats it – one place to modify, one bug to fix, and everything else falls into place. See, wasn’t that easy?

Thats it – one place to modify, one bug to fix, and everything else falls into place. See, wasn’t that easy?

Automation Modules as Building-Blocks?

Let me give you an example to clarify this. You have one tool that queries LLDP-Neighbors, parses the output and uses this to drive some cabling-automation. You have a second tool that queries LLDP-Neighbors and uses the output to throw those MAC-Addresses into a DHCP server. Instead of having your architects write two separate libraries, you first wrote just one network-facing piece, which was responsible for querying LLDP – Call this your lldp-module.

Then you have architects to go about writing their tools, but the difference is that their tools interface with this lldp-module whenever they need any lldp-information. Now if a code-upgrade alters the way that your network node responds to LLDP, then the changes you make to accomodate that, will be just in the one lldp-module. And as a side effect, everything else is fixed. Period.

Beyond Network Automation?

The difficult part lies in developing the internal culture to re-use code… To build upon common modules… and to avoid future costs.

Its hard to provide a convincing answer to a new engineer who says “Why do I have to use modules from that tool? Its better and quicker if I write my own stuff from the ground up.” Some of these questions are usually justified. Everyone has projects – those come with deadlines.

My answer usually starts with “True. But you are thinking of just THE PRESENT. When a new platform comes along, we now have to find engineers to modify both your tool, and the existing tools to work with the new platform.”

But I agree, this takes time for engineers to realize, and explaining is laborious, but eventually fruitful. We have a network to run, don’t we?

So I’ve decided to work on my CCIE (again!)

Posted On Nov-04
Being that this is the year of SDN, and very few people can refrain from tossing in some kind of SDN related acronym at every meeting, I decided to take a break from the SDN chit-chat and delve into pure networking for a bit (if thats even possible.)
Perhaps I’ll go for a certification, or two.

So then the obvious question becomes – CCIE or JNCIE? Perhaps I’ll go for both. I’ve always set my aspirations pretty high. And if Arista comes up with a certification, you bet I’ll be there as well.

But CCIE has become Cisco centric…

But that got me thinking – Lots of things have changed in the way that CISCO’s syllabus is laid out. CCIE has started to become very cisco-centric – something I’m not very happy about. Now, before yet another architect waves EIGRP in my face, and starts elaborating on how Cisco is also working towards standards, let me make one thing clear – there are very few enterprise networks that risk running their core homogeneously, using just a single vendor’s hardware. And in such multi-vendor networks, EIGRP is not the preferred choice. Come back with OSPF and BGP, and we’ll have a chat.

… and SDN is still taking baby steps (still!)…

On top of this, the SDN marketplace is still very premature. We’ve been talking, and talking and talking, and getting in several RFCs, but thats
about where we are. There is a lot of work to be done in this area. And for the next couple of years, I dont see any threat to CCIE from SDN.
Heck, I’d venture to say that with Cisco’s offerings in the SDN space, its only a matter of time before SDN gets covered broadly in CCIE.

… Commodity hardware , it is!

Another thing that came to mind, is how in a conversation last week with another industry veteran, I repeatedly heard how most companies are developing an aversion towards Cisco centric networks. Understandable. Given a choice, I’d setup networks with commodity hardware – (cumulus, anyone?). I’d offload more intelligence to software, and consider my network devices as just white-boxes, that I can toss out and replace , and avoid paying that hefty maintenance on. But I digress.

So we have CCIE with its Cisco centric syllabus, coupled with the Networking industry that is gravitating towards SDN, and one can clearly see how Cisco certifications have lost their charm. At this point, the case for CCIE starts getting more subjective.

In my opinion, as of now, there is still a lot of interesting coverage in CCIE. The lab piece is great, and totally fun. The theory is broad, sometimes very deep, and annoying. But it gets the essentials across. And it builds perseverance. So I’m set on going for the CCIE and the JNCIE. Not necessarily in that order, perhaps. 🙂