Understanding OpenFlow: Packet-In is a Page Fault

One of the most fundamental and often controversial features of the OpenFlow protocol is the “packet-in”.  Recall from the specification that the OpenFlow controller pushes packet forwarding rules down to an OpenFlow device.  Each rule is of the form “if a packet’s header matches pattern X then apply policy list Y”.  When a packet arrives that matches an existing rule, the corresponding policy is applied, e.g., what interface to forward out, what QoS queue to apply, etc.  However, if a packet does not match any of the existing rules, OpenFlow’s default policy is to send a copy of that packet up to the controller.  This “packet sent to the controller” message is called, in OpenFlow-parlance, a “packet-in”.

Now, if you work with large networks (either operate them or sell to people who do), the concept of _intentionally_ sending packets to controller by default—and thus across the control plane—frankly probably has you horrified.  Data packets going on the control plane is a denial of service (DoS) waiting to happen and you’ve probably spent a good chunk of your career either debugging the resulting problems (“Why is my router’s CPU at 100%?  Aargh!”) or trying to ensure that this DoS can’t happen in the first place (“Sure IP multicast (or unsampled netflow, IPv6 options parsing, etc.) would be useful, but I disable it because I worry that it could cause instabilities”).  Yes, equipment manufacturers spend a lot of time trying to ensure that data packets get punted up to the control plane only when absolutely necessary (e.g., for routing protocol updates), but the harsh reality is that even hardened “carrier-grade” production equipment still occasionally falls victim to a control-plane DoS[1].  This is not because vendors are incompetent but rather because it’s inherently a very complex engineering game of wack-a-mole.  Each new feature might impose a different requirement on data-to-control plane interactions, and the sum of all features on the box have to result in a unified set of rules that decide which data packets can go to the control plane and how they are rate limited.  Being too conservative with these data-to-control-plane rules and rate-limits risks DoS on the control plane: routing processes become starved, CLIs hang, and networks can destabilize.   But on the other side, being too aggressive—that is blocking or rate limiting too much—can break features or have a negative impact on performance.

If the bad news is that the data-to-control-plane problem is well-known and hard, then the good news is that all of the same tools that vendors already apply to this problem are also available to OpenFlow controllers.  That is: an OpenFlow controller can block traffic, install rate limiters, or even change the default policy for an unmatched packet to drop it on the ground.  Depending on the applications loaded and the target network deployment, the OpenFlow controller will apply different drop rules, rate limits, and default policies.  For example, a Tier-1 service provider network will likely choose to only send routing updates (likely rate limited) to the controller and drop all unmatched traffic, where a security-sensitive enterprise network might choose to send as much unmatched traffic as possible to the controller for auditing and monitoring purposes.  The bottom line is that using OpenFlow does not fundamentally change this problem: all of the same dangers, solutions, and trade-offs exist.

Once you accept that packet-in—when used properly—is a safe and even necessary component to any functioning network, the real fun can start because packet-in is the networking equivalent of a virtual memory page fault.   While they add a degree of overhead that must be managed, Packet-in’s, like page faults, also enable all sorts of new functionality.   They make it possible to offer applications a clean programming interface, create hooks for sophisticated resource management, and even enable advanced capabilities like virtualization.

Obviously, one could talk about this analogy at length but I wanted to present it to the community to generate discussion and get people to step a little out of their comfort zone.  Does it work for you – feel free to chime in on the comments…

[1] From which, unfortunately, I can personally attest: some of my past network research work accidentally exposed holes in the vendor’s control-plane rate limiting policy, much to the unhappy surprise of more than one operator.

 

An Open Source Foundation for SDN

Now that things have settled down since we launched Floodlight several weeks ago, I finally had a moment to step back and reflect on the state of the open source ecosystem for Software-Defined Networks (SDN).  It struck me that with Floodlight and open vSwitch (ovs), a multi-layer virtual switch, developers and network administrators have all the tools they need to build an SDN for virtual machines based purely on open, Apache-licensed, production quality components.   In fact, we just completed a full battery of integration tests between floodlight and open vswitch to make sure this is possible.Why am I so excited by this?  That question probably requires a bit of background.  One potential architecture for an SDN involves a central controller managing a number of open vswitch’s (1 or more per virtualized host), allowing them to transmit traffic over a physical network using various tunneling technologies.  With a handful of caveats, this architecture can enable the kind of flexibility SDN requires and support a wide range of new applications.  And it’s possible to build it today. Right now in fact.  Using purely open source technology.  That’s a huge step forward from where SDN was a few years ago.

So, what’s next?  Well, the above architecture is a good solution for a fully virtualized environments but it does have a few drawbacks.  First, virtualization penetration is somewhere in the 40% range according to study by Veeam.  Its growing but with the easiest workloads virtualized, it will be a long, long road to 100%.  So, to cope with both the reality of physical servers and physical devices, SDN needs a few more pieces — most importantly physical OpenFlow-enabled switches and support in Floodlight for these switches.  A number of vendors, most recently HP, are beginning to release hardware and we’re working hard with Floodlight to support all these variants as we get access to them.  In fact, there is an exciting project called Indigo, offering open source OpenFlow-enabled firmware to accelerate physical switch adoption.  Overall, we are making great progress here but its going to be an ongoing process as the ecosystem evolves.

The second limitation of the tunneled vSwitch architecture is the physical network itself.  Someone still has to configure, manage, and maintain the tunnel over which a virtual switch tunnels.  In fact, they would be doing so with even less visibility into network traffic due to the tunnels themselves, making things like traffic shaping difficult to impossible.  At the end of the day, all traffic, both tunneled and non-tunneled, needs to traverse the same physical infrastructure and requires configuration and management.  In this case, it would seem SDN would make the network admin’s job harder instead of making it easier.  That is obviously not the goal…

An optimal architecture, one that truly unlocks the promise of SDN, involves extending management beyond the virtual domain to the physical edge of the network. This would allow Floodlight to better manage the network and provide ultimate flexibility to network applications.  The networking administrator could work hand in hand with a virtualization administrator to control datacenter infrastructure.

Obviously, this is a bit of a long view but that’s the kind of future I’d love to see open source enable for SDN.  We’re glad we completed our testing with ovs and Floodlight — its a great incremental step, and now we’re on our to tackling the entire network.