Understanding OpenFlow: Packet-In is a Page Fault

One of the most fundamental and often controversial features of the OpenFlow protocol is the “packet-in”.  Recall from the specification that the OpenFlow controller pushes packet forwarding rules down to an OpenFlow device.  Each rule is of the form “if a packet’s header matches pattern X then apply policy list Y”.  When a packet arrives that matches an existing rule, the corresponding policy is applied, e.g., what interface to forward out, what QoS queue to apply, etc.  However, if a packet does not match any of the existing rules, OpenFlow’s default policy is to send a copy of that packet up to the controller.  This “packet sent to the controller” message is called, in OpenFlow-parlance, a “packet-in”.

Now, if you work with large networks (either operate them or sell to people who do), the concept of _intentionally_ sending packets to controller by default—and thus across the control plane—frankly probably has you horrified.  Data packets going on the control plane is a denial of service (DoS) waiting to happen and you’ve probably spent a good chunk of your career either debugging the resulting problems (“Why is my router’s CPU at 100%?  Aargh!”) or trying to ensure that this DoS can’t happen in the first place (“Sure IP multicast (or unsampled netflow, IPv6 options parsing, etc.) would be useful, but I disable it because I worry that it could cause instabilities”).  Yes, equipment manufacturers spend a lot of time trying to ensure that data packets get punted up to the control plane only when absolutely necessary (e.g., for routing protocol updates), but the harsh reality is that even hardened “carrier-grade” production equipment still occasionally falls victim to a control-plane DoS[1].  This is not because vendors are incompetent but rather because it’s inherently a very complex engineering game of wack-a-mole.  Each new feature might impose a different requirement on data-to-control plane interactions, and the sum of all features on the box have to result in a unified set of rules that decide which data packets can go to the control plane and how they are rate limited.  Being too conservative with these data-to-control-plane rules and rate-limits risks DoS on the control plane: routing processes become starved, CLIs hang, and networks can destabilize.   But on the other side, being too aggressive—that is blocking or rate limiting too much—can break features or have a negative impact on performance.

If the bad news is that the data-to-control-plane problem is well-known and hard, then the good news is that all of the same tools that vendors already apply to this problem are also available to OpenFlow controllers.  That is: an OpenFlow controller can block traffic, install rate limiters, or even change the default policy for an unmatched packet to drop it on the ground.  Depending on the applications loaded and the target network deployment, the OpenFlow controller will apply different drop rules, rate limits, and default policies.  For example, a Tier-1 service provider network will likely choose to only send routing updates (likely rate limited) to the controller and drop all unmatched traffic, where a security-sensitive enterprise network might choose to send as much unmatched traffic as possible to the controller for auditing and monitoring purposes.  The bottom line is that using OpenFlow does not fundamentally change this problem: all of the same dangers, solutions, and trade-offs exist.

Once you accept that packet-in—when used properly—is a safe and even necessary component to any functioning network, the real fun can start because packet-in is the networking equivalent of a virtual memory page fault.   While they add a degree of overhead that must be managed, Packet-in’s, like page faults, also enable all sorts of new functionality.   They make it possible to offer applications a clean programming interface, create hooks for sophisticated resource management, and even enable advanced capabilities like virtualization.

Obviously, one could talk about this analogy at length but I wanted to present it to the community to generate discussion and get people to step a little out of their comfort zone.  Does it work for you – feel free to chime in on the comments…

[1] From which, unfortunately, I can personally attest: some of my past network research work accidentally exposed holes in the vendor’s control-plane rate limiting policy, much to the unhappy surprise of more than one operator.


Comments are closed.