Skip to content
HomeSIP Network Operators Conferences (SIPNOC)SIPNOC US 2012PresentationsDay One -- June 26, 201212. Let’s Make SIP Geographic Redundancy Actually Work Well.

12. Let’s Make SIP Geographic Redundancy Actually Work Well.

[featured_image]
Download
Download is available until [expire_date]
  • Version [version]
  • Download 289
  • File Size 2 MB
  • File Count 1
  • Create Date July 2, 2012
  • Last Updated July 2, 2012

12. Let’s Make SIP Geographic Redundancy Actually Work Well.

Presented by Mark Lindsey, ECG.

Under the weight of new US Government rules requiring outage reporting, many VoIP carriers are thinking harder fault tolerance. Most carriers already have local redundancy, where mated pairs using an Active/Standby protocol provide protection against basic faults. But many are working toward geographic redundancy, where the active and the standby are intentionally decoupled in space and technology.

Yet geographic redundancy presents more challenges. While local redundancy allows the failover problem to be solved at Layer 2 and 3, some designers and vendors do not believe thesee network design patterns extend well to wide network separations. In fact, extending a local network technology to a wide area may not even provide the sort of disaster protection intended by geographic fault tolerance.

The result of these network limitations is that endpoint SIP devices and SIP peers need to implement failover within the application layer. SIP phones, IADs, SIP Peers, and VoIP Core network devices must each choose independently, when and how to failover to a standby component.

Unfortunately, the result has been a mishmash of efforts, ill-suited to rapid fault detection and recovery. The endpoint devices vary wildly in their ability and consistency of failover and fail-back. For example, a device may know that there is a primary SBC, and a secondary SBC. But what timeout triggers a failover to the secondary SBC? How often should the primary SBC be attempted? Will existing dialogs or sessions be retained upon failover? Inconsistency and unworkable behavior has led some service providers to abandon the goal of geographic fault tolerance entirely. Others have settled for slow failover only in the most dire circumstances, measured in minutes or hours.

In this session, Mr. Lindsey proposes an elegant new technique that moves SIP Registration from Stand-By Redundancy to Parallel Redundancy. This affords zero-downtime failover, and does so without modifications along the routing-NAT path, using existing SIP functionality. This is accomplished by ensuring that both the SIP phone and the core Call Server, the two endpoints in common VoIP Core Networks, maintain valid registrations through all of the available paths. In the common case of redundant session border controllers (SBCs), this would require the phone to register via both SBCs, and the core Call Server to record both these paths.

This would move explicit and current knowledge of a multiple SIP signaling paths into the application layer, so the endpoints can make smarter decisions. When a failure occurs along one path, no re-registration needs to occur. New calls proceed with little or no failover time. With more work, existing calls can even be retained.

Mark will show how this new model of registration redundancy simply generalizes the current model to provide superior results and a better quality of life for network engineers. End users come out ahead too.